3821Emerging Neurotherapeutic Technologies CHAPTER 487
adaptively perform within progressively more challenging distractor
environments. Neuroplasticity selective to distractor processing was
evidenced in this study at both the microscale, i.e., at the resolution of
single neuron spiking in sensory cortex, as well as macroscale, i.e., electroencephalography (EEG)-based event-related potential recordings.
Video games have also shown promise in the treatment of visual
deficits such as amblyopia, and in cognitive remediation in neuropsychiatric disorders such as schizophrenia. However, while the evidence
base has been encouraging in small-sample randomized controlled
studies (RCTs), larger RCTs are needed to demonstrate definitive therapeutic benefit. This is especially necessary as the commercial brain
training industry continues to make unsubstantiated claims of the benefits of neurogaming; such claims have been formally dismissed by the
scientific community. Like any other pharmacologic or device-based
therapy, neurogames need to be systematically validated in multiphase
RCTs establishing neural target engagement and documenting cognitive and behavioral outcomes in specific disorder populations.
Generalizability of training benefits from task-specific cognitive
outcomes to more broad-based functional improvements remains
the holy grail of neurogaming. Next-generation neurogames will aim
to integrate physiologic measures such as heart rate variability (an
index of physical exertion), galvanic skin responses, and respiration
rate (indices of stress response), and even EEG-based neural measures. The objectives of such multimodal biosensor integration are to
enhance the “closed-loop mechanics” that drive game adaptation and
hence improve therapeutic outcomes and perhaps result in greater
A B
C D
FIGURE 487-2 Augmented reality (AR) for phantom limb pain. A. A patient is shown a live AR video. B. EMG electrodes placed over the stump record muscle activation
during training. C. The patient matches target postures during rehabilitation. D. Patient playing a game in which a car is controlled by “phantom movements.”
(M Ortiz-Catalan et al: Phantom motor execution facilitated by machine learning and augmented reality as treatment for phantom limb pain: A single group, clinical trial in
patients with chronic intractable phantom limb pain. Lancet 388:2885, 2016.)
3822 PART 20 Frontiers
Real-time
fMRI
Feedback
display
(e.g., thermometer)
Image
reconstruction
3T MRI
acquisition
The task of the subject
is to lower the temperature display
FIGURE 487-3 Neurofeedback using functional MRI. (From T Fovet et al: Translating neurocognitive models of auditoryvisual hallucinations into therapy. Front Psychiatry 7:103, 2016.)
generalizability. These complex, yet
potentially more effective, neurogames
of the future will need rigorous clinical
study for demonstration of validity and
efficacy.
NEUROIMAGING
■ NEUROIMAGING OF
CONNECTIVITY
Multimodal neuroimaging methods
including functional magnetic resonance
imaging (fMRI), EEG, and magnetoencephalography (MEG) are now being
investigated as tools to study functional
connectivity between brain regions, i.e.,
extent of correlated activity between
brain regions of interest. Snapshots of
functional connectivity can be analyzed while an individual is engaged in
specific cognitive tasks or during rest.
Resting-state functional connectivity
(rsFC) is especially attractive as a robust,
task-independent measure of brain function that can be evaluated in diverse
neurologic and neuropsychiatric disorders. In fact, methodologic research has
shown that rs-fMRI can provide more
reliable brain signals of energy consumption than specific task-based fMRI approaches.
In recent years, there has been a surge of research to identify robust
rsFC-based biomarkers for specific neurologic and neuropsychiatric
disorders and thereby inform diagnoses, and even predict specific
treatment outcomes. For many such disorders, the network-level
neurobiologic substrates that correspond to the clinical symptoms
are not known. Furthermore, many are not unitary diseases, but
rather heterogeneous syndromes composed of varied co-occurring
symptoms. Hence, the research quest for robust network biomarkers
for complex neuropsychologic disorders is challenging and still in its
infancy; yet some studies have made significant headway in this domain.
For example, in a large multisite cohort of ~1000 depressed patients,
Drysdale et al. (2017) showed that rsFC measures can subdivide
patients into four neurophysiologic “biotypes” with distinct patterns
of dysfunctional connectivity in limbic and frontostriatal networks.
These biotypes were associated with different clinical-symptom profiles (combinations of anhedonia, anxiety, insomnia, anergia, etc.) and
had high (>80%) diagnostic sensitivity and specificity. Moreover, these
biotypes could also predict responsiveness to transcranial magnetic
stimulation (TMS) therapy. Another recent study demonstrated utility
of rsFC measures to predict diagnosis of mild traumatic brain injury
(mTBI), which is clinically challenging by conventional means.
Apart from fMRI-based measures of rsFC, EEG- and MEG-based
rsFC measures are also being actively investigated, as these provide a
relatively lower-cost alternative to fMRI. While EEG is of lowest cost,
it compromises on spatial resolution. The major strength of MEG is its
ability to provide more accurate source-space estimates of functional
oscillatory coupling than EEG as well as provide measures at various
physiologically relevant frequencies (up to 50 Hz shown to be clinically useful). In this regard, EEG/MEG are complementary to fMRI,
which can only be used to study slow activity fluctuations (i.e., <0.1 Hz);
the potential for EEG/MEG modalities to provide valid diagnostic
biomarkers is currently underexploited and requires further study.
■ CLOSED-LOOP NEUROIMAGING
Neuroscientific studies to-date are predominantly designed as “openloop experiments,” interpreting the neurobiologic substrates of human
behavior via correlation with simultaneously occurring neural activity.
In recent years, advances in real-time signal processing have paved
the way for “closed-loop neuroimaging,” wherein humans can directly
manipulate experiment parameters in real-time based on specific
brain signals (Fig. 487-3). Closed-loop imaging methods can not only
advance our understanding of dynamic brain function but also have
therapeutic potential. Humans can learn to modulate their neural
dynamics in specific ways when they are able to perceive (i.e., see/hear)
their brain signals in real-time using closed-loop neuroimaging-based
neurofeedback. Early studies showed that such neurofeedback learning
and resulting neuromodulation could be applied as therapy for patients
suffering from chronic pain, motor rehabilitation in Parkinson’s and
stroke patients, modulation of aberrant oscillatory activity in epilepsy,
and improvement of cognitive abilities such as sustained attention
in healthy individuals and patients with attention-deficit hyperactivity disorder (ADHD). It has also shown potential for deciphering
state-of-consciousness in comatose patients, wherein a proportion of
vegetative/minimally conscious patients can communicate awareness
via neuroimaging-based mental imagery.
Closed-loop neuroimaging therapeutic studies have utilized realtime fMRI, EEG, and MEG methods. It is common for neural signals
to be extracted from specific target brain regions for neuromodulation.
However, given that distributed neural networks underlie behavioral
deficits, new studies have also explored neurofeedback on combinatorial brain signals from multiple brain regions extracted using multivariate pattern analysis (MVPA). While early studies indicate therapeutic
potential, clinical RCTs of closed-loop neuroimaging neurofeedback
have shown mixed results. This may largely be because of the individual heterogeneity in neuropsychiatric disorders such that there is no
one-size-fits-all therapy. Closed-loop neuroimaging-based therapies
need to be better personalized to the pre-intervention cognitive and
neurophysiologic states of the individual, and a better understanding
needs to be developed regarding learning principles and mechanisms
of self-regulation underlying neurofeedback. Clinical practitioners
applying these methods also need better education on the hardware/
software capabilities of these brain–computer interfaces to maximize
patient outcomes.
NONINVASIVE BRAIN STIMULATION
Noninvasive brain stimulation (NIBS) is widely recognized as having
great potential to modulate brain networks in a range of neurologic and
psychiatric diseases; it is currently approved by the U.S. Food and Drug
3823Emerging Neurotherapeutic Technologies CHAPTER 487
Administration (FDA) as a treatment for depression. Importantly, there
is a very large body of basic research indicating that neuromodulation
of the nervous system with electrical stimulation can have both shortterm and long-term effects. While transcranial magnetic stimulation
(TMS) uses magnetic fields to generate electrical currents, transcranial
direct current stimulation (tDCS), in contrast, is based on direct stimulation using electrical currents applied at the scalp (Fig. 487-4). TMS
induces small electrical currents in the brain by magnetic fields that
pass through the skull; it is known to be painless and therefore widely
used for NIBS. Animal research suggests that anodal tDCS causes a
generalized reduction in resting membrane potential over large cortical areas, whereas cathodal stimulation causes hyperpolarization.
Prolonged stimulation with tDCS can cause an enduring change in
cortical excitability under the stimulated regions. Further, changes in
resting-state fMRI-based activity and functional connectivity have also
been observed post-tDCS. Notably, there is uncertainty regarding precisely how much electrical current is able to penetrate through the skull
and modulate neural networks. Indeed, recent work has found that typical stimulation paradigms may not generate sufficient electrical fields
to modulate neural activity; an alternate possibility is that peripheral
nerves may be modulated and thus affect neural activity.
Neuromodulation via stimulation techniques such as tDCS and
TMS have shown promise as methods to improve motor function after
stroke; there are a growing number of studies demonstrating functional
benefits of combining physical therapy with brain stimulation. Two
commonly utilized TMS paradigms include low-frequency “inhibitory” stimulation of the healthy cortex or high-frequency “excitatory”
stimulation of the injured hemisphere. Each of these two approaches
aims to modify the balance of reciprocal inhibition between the two
hemispheres after stroke. A meta-analysis of randomized controlled
trials published over the past decade found a significant beneficial
effect on motor outcomes. Unfortunately, a recent large multicenter
trial to assess the long-term benefits of TMS on motor recovery after
stroke (NICHE trial) did not find a benefit at the population level.
Ongoing research aims to better understand how stimulation can
directly affect neural patterns and thus allow more customization
of stimulation—past trials did not record the neural responses to
stimulation.
TMS and tDCS interventions are also being applied in psychiatric
disorders. A substantial body of evidence supports the use of TMS as an
antidepressant in major depressive disorder (MDD). TMS is also being
investigated for its potential efficacy in posttraumatic stress disorder
(PTSD), obsessive compulsive disorder (OCD), and treatment of auditory hallucinations in schizophrenia. Various repetitive TMS (rTMS)
protocols have shown efficacy in major depression. These include both
low-frequency (≤1 Hz) and high-frequency (10–20 Hz) rTMS stimulation over the dorsolateral prefrontal cortex (DLPFC). Mechanistically,
low-frequency rTMS is associated with decreased regional cerebral
blood flow while high-frequency rTMS elicits increased blood flow, not
only over the prefrontal region where the TMS is applied, but also in
associated basal ganglia and amygdala circuits. Notably, the differential
mechanisms of the low- vs. high-frequency rTMS protocols are associated with mood improvements in different sets of MDD patients, and
patients showing benefits with one protocol may even show worsening
with the other, again pointing to individual heterogeneity in network
function. EEG-guided TMS is also being investigated in psychiatric
disorders, for instance, individual resting alpha-band (8–12 Hz)
peak frequency to determine TMS stimulation rates. With respect
to transcranial electrical stimulation in psychiatry, tDCS is the most
commonly used protocol. In major depression, there is a documented
imbalance in left vs. right DLPFC activity; hence, differential anodal vs.
cathodal tDCS in the left vs. right prefrontal cortex may be a potentially
efficacious approach. Interestingly while meta-analysis shows promise
for NIBS methods in psychiatric illness, large RCTs have failed to
generate effects compared to placebo treatment. Future success may
require careful personalized targeting based on network dynamics and
refinement of protocols to accommodate combinatorial treatments.
IMPLANTABLE NEURAL INTERFACES
INCLUDING BRAIN–MACHINE INTERFACES
Fully implantable clinically relevant neural interfaces that can improve
function already exist. Cochlear implants, for example, are sensory
prostheses that can restore hearing in deaf patients. Environmental
sounds are processed in real-time and then converted into patterned
stimulation delivered to the cochlear nerve. Importantly, even while
the patterned stimulation remains the same, there are gradual improvements in the perception of speech and other complex sounds over a
period of several months after device implantation. Activity-dependent
sculpting of neural circuits is hypothesized to underlie the observed
perceptual improvements. Similarly, the development of deep-brain
stimulation (DBS) was based on decades of work showing that surgical lesions to specific nuclei could alleviate tremor and bradykinesia
symptoms in animal models. DBS involves chronic implantation of a
stimulating electrode that targets specific neural structures (e.g., subthalamic nuclei or the globus pallidus in Parkinson’s disease). At least
for movement disorders, it is commonly thought that targeted areas are
functionally inhibited by the chronic electrical stimulation.
■ IMPLANTABLE DEVICES FOR
NEUROMODULATION
There has been recent progress in the development of implantable
neural interfaces to treat neurologic and psychiatric illnesses. For
example, for patients with refractory focal epilepsy and clearly identified seizure foci, invasive “responsive stimulation” has now been FDA
approved. Responsive stimulation is grounded on principles of closedloop stimulation based on real-time monitoring of brain oscillations;
specifically, the device aims to detect the earliest signatures of the
onset of a seizure, usually at a stage that is not symptomatic, and then
deliver focused electrical stimulation to prevent further progression
and generalization. A large, randomized control trial of this device
was performed in patients with intractable focal epilepsy; they were
assigned to either sham or active stimulation in response to seizure
detection. There was a significant reduction in seizure frequency in the
tDCS electrode
+ –
+ +++ +– ––––
––
– +
++
Anode Cathode
Current flow
(min)
Magnetic field
(µs)
TMS coil
TMS coil
tDCS
electrodes
FIGURE 487-4 Illustration of TMS and tDCS setups. The upper panels show a TMS
setup. Coils generate magnetic fields that can in turn generate electrical fields in
the cortical tissue. The lower panels show a tDCS setup. The electrical current
is believed to flow from the anode (+) to the cathode (–) through the superficial
cortical areas leading to polarization. (Reproduced with permission from R Sparing,
FM Mottaghy: Noninvasive brain stimulation with transcranial magnetic or direct
current stimulation [TMS/tDCS]—From insights into human memory to therapy of its
dysfunction. Methods 44:329, 2008.)
3824 PART 20 Frontiers
stimulation group, but it was rare for patients to become seizure free.
There were also modest improvements in quality of life. Notably, there
was a small, elevated risk of hemorrhage associated with the device. In
addition to providing clinicians with another treatment option, this
device has offered important avenues for research and further optimization. For example, it is now possible to monitor subclinical and
clinical seizures and intracranial EEG in patients with chronic epilepsy.
This has resulted in new knowledge about the association of seizures
with circadian rhythms and sleep. It is also anticipated that a better
understanding of the triggers of seizures and the development of better
stimulation algorithms, based on real-world data, can ultimately lead
to more effective treatments.
There is also great interest in the development of treatments for
refractory depression. One area of focus has been on the development
of DBS to treat depression. While early smaller studies were quite
promising, a larger study failed to find benefits at the population level.
Subsequent analysis has suggested the possibility that more precise
tailoring of stimulation to each individual is warranted, both at the
level of specific pathways identified through neuroimaging as well as
network activity biomarkers. This approach is based on the hypothesis
that tailoring stimulation parameters to each individual may be more
promising. Recent studies have, in fact, supported the notion that
individualized patterns of network activity are predictive of a patient’s
symptoms and how he or she might respond to stimulation. There are
now planned studies that aim to tailor stimulation to each individual
with severe depression.
■ BRAIN–MACHINE INTERFACES FOR PARALYSIS
Brain–machine interfaces (BMIs) represent a more advanced neural interface that aims to restore motor function. Multiple neurologic disorders (e.g., traumatic and nontraumatic spinal cord injury,
motor-neuron disease, neuromuscular disorders, and strokes) can
result in severe and devastating paralysis. Patients cannot perform
simple activities and remain fully dependent for care. In patients with
high cervical injuries, advanced amyotrophic lateral sclerosis (ALS),
or brainstem strokes, the effects are especially devastating and often
leave patients unable to communicate. While there has been extensive
research into each disorder, little has proven to be clinically effective for
rehabilitation of long-term disability. BMIs offer a promising means to
restore function. In the patients described above, while the pathways
for transmission of signals to muscles are disrupted, the brain itself is
largely functional. Thus, BMIs can restore function by communicating
directly with the brain. For example, in a “motor” BMI, a subject’s
intention to move is translated in real-time to control a device. As
illustrated in Fig. 487-5, the components of a motor BMI include: (1)
recordings of neural activity, (2) algorithms to transform the neural
activity into control signals, (3) an external device driven by these control signals, and (4) feedback regarding the current state of the device.
Many sources of neural signals can be used in a BMI. While EEG
signals can be obtained noninvasively, other neural signals require
invasive placement of electrodes. Three invasive sources of neural signals include electrocorticography (ECoG), action potentials or spikes,
and local field potentials (LFP). Spikes and LFPs are recorded with
electrodes that penetrate the cortex. Spikes represent high-bandwidth
signals (300–25,000 Hz) that are recorded from either single neurons
(“single-unit”) or multiple neurons (“multiunit” or MUA). LFPs are
the low-frequency (~0.1–300 Hz) components. In contrast, ECoG
is recorded from electrodes that are placed on the cortical surface.
ECoG signals may be viewed as an intermediate-resolution signal in
comparison with spikes/LFPs and EEG. It is worth noting that there
is still considerable research into the specific neural underpinning of
each signal source and what information can be ultimately extracted
regarding neural processes.
A critical component of a BMI is the transform of neural activity into
a reliable control signal. The decoder is an algorithm that converts the
neural signals into control signals. One important distinction between
classes of decoders is biomimetic versus nonbiomimetic. In the case
of biomimetic decoders, the transform attempts to capture the natural
relationship between neural activity and a movement parameter. In
contrast, nonbiomimetic decoders can be more arbitrary transforms
between neural activity and prosthetic control. It had been hypothesized that learning prosthetic control with a biomimetic decoder is
more intuitive. Recent evidence, however, reveals that learning may be
important for achieving improvements in the level of control over an
external device (e.g., a computer cursor, a robotic limb) for either type
of decoder. This may be similar to learning a new motor skill.
A central goal of the field of BMIs is to improve function in patients
with permanent disability. This can consist of a range of communication and assistive devices such as a computer cursor, keyboard control,
wheelchair, or robotic limb. In the ideal scenario, the least invasive
method of recording neural signals would allow the most complex
level of control. Moreover, control should be allowed in an intuitive
manner that resembles the neural control of our natural limbs. There
is currently active research into developing and refining techniques to
achieve the most complex control possible using each signal source.
One measure of complexity is the degrees of freedom that are controlled. For example, control of a computer cursor on the screen (i.e.,
on the x and y axis) represents two degrees of freedom (DOF). Control
of a fully functional prosthetic upper arm that approaches our natural
range of motion would require >7 DOF. If the functionality of the hand
and fingers is included, then an even more complex level of control
would be required. There has been a large body of research on the use
of noninvasive recording of EEG signals. Studies suggest that two-DOF control using EEG is feasible. There
are also promising reports of patients with advanced
ALS communicating via email using EEG-based BCIs.
Known limitations of EEG-based BCIs include their
signal-to-noise ratio (due to filtering of neural signals
by bone and skin) and contamination by muscle activity. Ongoing research aims to test usability in a more
general nonresearch setting as well as targeted use in
patients with disability.
Numerous studies now also indicate that BMIs using
invasive recording of neural signals can allow rapid control over devices with multiple DOF. The clear majority
of this research has been conducted using recordings
of spiking activity via implanted microelectrode arrays.
Initial preclinical studies were performed in ablebodied nonhuman primates. More recently, there have
been numerous examples of human subjects with a
range of neurologic illnesses (e.g., brainstem stroke,
ALS, spinal cord injury) who have demonstrated the
actual use of implantable neural interfaces. This includes
demonstrations of both the control of communication
interfaces as well as robotic limbs. Pilot clinical trials of
Algorithm
Neural
signals
Action potentials
Field potentials
Electrodes
Control
signals
Computer cursor
Prosthetic limb
Device
control
Feedback
Signal
processing
Neural
signals
1
4
2 3
a
b
Elect
FIGURE 487-5 Components of a brain–machine interface (BMI). (Reproduced with permission from
A Tsu et al: Cortical neuroprosthetics from a clinical perspective. Neurobiol Dis 83:154, 2015.)
3825Emerging Neurotherapeutic Technologies CHAPTER 487
BCIs based on invasive recordings of neural signals have further shown
that significantly greater rates of communication are possible (e.g., >30
characters per minute). Notably, these BCI devices required a percutaneous connection and were always tested in the presence of research
staff. A case study additionally demonstrated that a fully implantable
BCI system could allow communication in a locked-in ALS patient
(Fig. 487-6). At the time of the study, the patient required mechanical
ventilation and could only communicate using eye movements. She
was implanted with multiple subdural cortical electrodes; the neural
signals were then processed and sent wirelessly to an external augmentative alternative communication (AAC) device. Importantly, she could
use the interface with no supervision from research staff.
BMIs have the potential to revolutionize the care of neurologically
impaired patients. While in its infancy, there have been multiple
proof-of-principle studies that highlight possibilities. Combined basic
and clinical efforts will ultimately lead to the development of products
that are designed for patients with specific disabilities. As outlined earlier, each signal source has strengths (e.g., noninvasive versus invasive,
recording stability) and weaknesses (e.g., bandwidth or the amount of
information that can be extracted). With additional research, a more
precise delineation of these strengths and weakness should occur. For
example, one hypothesis is that control of complex devices with high
DOF will only be possible using invasive recordings of high-resolution
neural activity such as spikes from small clusters of neurons. However,
recent trials using ECoG suggest that its stability might also allow higher
DOF control. As these limits become increasingly clear it should allow
targeted clinical translational efforts that are geared to specific patient
needs and preferences (e.g., extent of disability, medical condition,
noninvasive versus invasive). For example, patients with high cervical
injuries (i.e., above C4, where the arm and the hand are affected) have
rehabilitation needs different from patients with lower cervical injuries
(i.e., below C5–C6, where the primary deficits are the hand and fingers).
■ FURTHER READING
Baniqued PDE et al: Brain-computer interface robotics for hand
rehabilitation after stroke: A systematic review. J Neuroeng Rehabil
18:15, 2021.
Bassett DS et al: Emerging frontiers of neuroengineering: A network
science of brain connectivity. Annu Rev Biomed Eng 19:327, 2017.
Drysdale AT et al: Resting-state connectivity biomarkers define neurophysiological subtypes of depression. Nat Med 23:28, 2016.
Khanna P et al: Low-frequency stimulation enhances ensemble cofiring and dexterity after stroke. Cell 184:912, 2021.
Liu A et al: Immediate neurophysiological effects of transcranial electrical stimulation. Nat Comm 9:5092, 2018.
Mishra J et al: Video games for neuro-cognitive optimization. Neuron
90:214, 2016.
Reinkensmeyer DJ et al: Computational neurorehabilitation: Modeling plasticity and learning to predict recovery. J Neuroeng Rehabil
13:42, 2016.
Scangos K et al: State-dependent responses to intracranial brain stimulation in a patient with depression. Nat Med 27:229, 2021.
Ventilator
Antenna
Receiver
Electrodes
(implanted)
Transmitter
(implanted device)
Tablet
Anterior Posterior
e1 e2 e3
Electrode strip
e4
A
D
B C
FIGURE 487-6 Illustration of an ALS patient with a fully implanted communication interface. A Illustration of the location of electrodes on the brain. B X-ray of chest
showing the wireless module. C X-ray of leads and wire routing. D Schematic of the subject performing a typing task. (From MJ Vansteensel et al: Fully implanted brain–
computer interface in a locked-in patient with ALS. N Engl J Med 375:2060, 2016. Copyright © 2016 Massachusetts Medical Society. Reprinted with permission from
Massachusetts Medical Society.)
3826 PART 20 Frontiers
Machine learning has reshaped our consumer lives, with self-driving
vehicles, conversant digital assistants, and machine translation services
so ubiquitous that they are at risk of not being considered particularly
intelligent for much longer. Will the algorithms underlying these technologies similarly transform the art and practice of medicine? There
is hope that modern machine-learning techniques—especially the
resurgence of artificial neural networks in deep learning—will foster
a sea change in clinical practice that augments both the sensory and
diagnostic powers of physicians while, perhaps paradoxically, freeing
physicians to spend more time with their patients by performing laborious tasks.
From the birth of artificial intelligence at the Dartmouth Summer
Research Project on Artificial Intelligence in 1956 to self-driving
vehicles today, machine-learning methods and theory have developed
in symbiosis with growing datasets and computational power. In this
chapter, we discuss the foundations of modern machine-learning algorithms and their emerging applications in clinical practice. Modern
machine-learning techniques are sufficiently capacious as to learn
flexible and rich representations of clinical data and are remarkably
adept at exploiting spatial and temporal structure in raw data. The
newest machine-learning models perform on par with expert physicians or prior state-of-the art models on a variety of tasks, such as
the interpretation of images (e.g., grading retinal fundus photographs
for diabetic retinopathy), analysis of unstructured text (e.g., predicting hospital readmission from electronic health record notes), and
processing of speech (e.g., detecting depression from patient speech).
However, many evaluations of machine-learning models occur on tasks
that are narrow and unrealistic, and further lack the clinical context
that a physician would incorporate. The models themselves are also
often divorced from considerations of patient utility. To help ensure
these models benefit patients, this chapter aims to bring more physicians into the design and evaluation of machine-learning models by
providing an understanding of how modern machine-learning models
are developed and how they relate to more familiar methods from the
epidemiological literature.
Today, the terms machine learning and artificial intelligence evoke
images distinct from those conjured up by the same terms in the 1950s
and the 1980s, and they likely will mean something different in a
decade. Computer scientist John McCarthy originally defined artificial
488
intelligence in 1956 as “the broad science and engineering of creating
intelligent machines,” most often embodied today as computer software. Machine learning can be viewed as the subfield of artificial intelligence encompassing algorithms that extract generalizable patterns
from data. This stands in contrast to approaches to create intelligence
from human-engineered and explicitly programmed rules that characterized many early applications of artificial intelligence to medicine
during the 1970s and 1980s (e.g., expert systems such as INTERNIST-I
and MYCIN).
This chapter covers machine-learning methods and applications
that may augment physician expertise at the point of care. Many applications of machine learning in health care are therefore not reviewed
here, for example algorithms to improve hospital planning, detect
insurance fraud, and monitor new drugs for adverse events. Throughout this chapter, we discuss how new machine-learning methods can
learn rich representations of both clinical state and patient identity by
discovering how to represent raw data. At the same time, models largely
reflect the data on which they are trained and, thus, may encode and
amplify biased prior practices; they may also be brittle in unfamiliar
and evolving settings. If machine-learning methods are positioned
to tackle problems based on physician needs and are continuously
re-assessed, we envisage a future with clinics instrumented with
machine-learning tools that augment the ability of physicians to reason
precisely and probabilistically over populations at the point of care.
CONCEPTS OF MACHINE LEARNING
■ TYPES OF MACHINE LEARNING
Many physicians will be familiar with the major types of machine-learning
methods from methodologic counterparts discussed in the context of
“traditional” statistical and epidemiological modeling. In the current
machine-learning and epidemiology literature, much confusion arises
over whether a method “belongs” in one camp or the other. More is
gained by focusing on the computational and statistical connections,
particularly in understanding how new machine-learning methods
compare with familiar clinical risk-stratification approaches.
Broadly, there are four major types of machine learning with applications to clinical medicine: (1) supervised learning, (2) unsupervised
learning, (3) semi-supervised learning, and (4) reinforcement learning.
The four subfields of machine learning differ from one another in both
their objectives and the degree to which algorithms have access to
labeled examples from which to learn (Fig. 488-1). All four subfields
have roots tracing back decades with classical examples and modern
counterparts (Table 488-1).
To date, supervised machine-learning approaches have dominated
the medical literature and recent deep-learning applications. In supervised learning, paired input and labeled output examples are provided
together to a machine-learning algorithm that learns what combination
(potentially millions) of parameters optimizes a function that predicts
Machine Learning and
Augmented Intelligence
in Clinical Medicine
Arjun K. Manrai, Isaac S. Kohane
Supervised Semi-Supervised Unsupervised
FIGURE 488-1 The subfields of machine learning differ in their access to labeled examples. In supervised learning (left), the learning algorithm uses labeled output
examples (red, blue) to learn a classifier for determining whether a new unlabeled data point is red or blue. Semi-supervised learning methods (center) have access to both
unlabeled and labeled examples; the unlabeled data help learn a classifier with fewer labeled examples. Unsupervised learning methods (right) do not use labels but, rather,
identify structure present in the data. (Reproduced with permission from Luke Melas-Kyriasi.)
3827 Machine Learning and Augmented Intelligence in Clinical Medicine CHAPTER 488
output from input. The goal is to learn robust functions that work
well with unseen data. If this setting is familiar, it is because clinical
researchers often use well-known traditional statistical approaches like
linear and logistic regression to achieve the same goal. For example,
clinical risk scores, such as the American College of Cardiology (ACC)/
American Heart Association (AHA) Atherosclerotic Cardiovascular
Disease (ASCVD) Pooled Cohort Risk Equations or the Framingham
Risk Score, are based on fitting models with paired input data (e.g.,
age, sex, LDL cholesterol, smoking history) and labeled output data
(e.g., first occurrence of nonfatal myocardial infarction, coronary
heart disease [CHD] death, or fatal or nonfatal stroke). Contemporary
deep-learning methods can learn flexible representations of raw input
data as opposed to relying on expert-identified features (Table 488-1).
A contemporary clinical example of supervised machine learning with
convolutional neural networks is the histopathological detection of
lymph node metastases in breast cancer patients (Table 488-1).
The three remaining types of machine learning have not been
as widely applied to clinical problems to date, but we believe this is
likely to shift in coming years. These include unsupervised learning,
semi-supervised learning, and reinforcement learning. Unsupervised
learning, in contrast to supervised learning, encompasses methods that
use unlabeled input data, where the goal is to discover the “structure”
present in the data. A researcher may use unsupervised methods to
determine whether or not the data lie on a low-dimensional “manifold”
that is “embedded” in a higher-dimensional space. For example, a
researcher may obtain gene-expression measurements from more than
20,000 protein-coding genes for a large group of asthma patients and
then “project” each patient into a lower-dimensional space to visualize
and understand structure present in the dataset, or may group asthma
patients by similarity across all gene-expression values. Classical linear
methods include principal component analysis (PCA) and contemporary nonlinear approaches include uniform manifold and approximation and projection (UMAP) (Table 488-1). Semi-supervised learning
is a hybrid between supervised and unsupervised learning, with methods that use both labeled data and unlabeled data. These algorithms
exploit the (often low-dimensional) structure of unlabeled data to
learn better models than may be possible in a purely supervised setting
where labeled data may be scarce. Finally, reinforcement learning is a
distinct subfield of machine learning that focuses on optimizing the
iterative decision-making of an “agent” that is equipped with a cumulative “reward” function, and thus must navigate a trade-off between
exploration and exploitation of its environment, distinct from the other
three subfields of machine learning where the entire learning signal
(i.e., dataset) is presented at once. This approach to learning has been
successful in teaching computers to play games at world-expert levels
(e.g., Google’s AlphaGo Zero).
■ MODERN MEDICAL MACHINE LEARNING
The modern machine-learning toolkit includes methods that differ
extensively in their complexity and ability to learn directly from raw
data (Table 488-2). “Traditional” statistical methods, such as linear
and logistic regression, remain vital and often serve at minimum as
TABLE 488-1 Types of Machine Learning and Clinical Examples
SUBFIELD OF MACHINE
LEARNING DEFINITION CLASSIC EXAMPLE
CONTEMPORARY
EXAMPLE CLINICAL EXAMPLE
Supervised learning Methods that use paired input and labeled
output examples to learn a generalizable
function that predicts output labels from input
Logistic regression Convolutional neural
network (CNN)
Histopathological detection of lymph
node metastases in breast cancer
patients
Unsupervised learning Methods that use unlabeled input data to
discover data structure and learn efficient
representations for data (e.g., clustering,
dimensionality reduction)
Principle components
analysis (PCA)
Uniform Manifold
Approximation and
Projection (UMAP)
Visualizing structure in gene
expression levels and grouping
asthma patients into distinct
molecular clusters
Semi-supervised learning Methods that use both unlabeled and labeled
examples to learn functions better than possible
in supervised setting alone
Self-training Consistency
regularization
Use of unlabeled cardiac magnetic
resonance images alongside small
dataset of labeled examples to detect
hypertrophic cardiomyopathy
Reinforcement learning Methods to teach an “agent” that iteratively
interacts with its environment how to optimize a
numerical reward
Optimal control Deep reinforcement
learning (e.g., AlphaGo
Zero)
Selecting fluids and vasopressor
dosing iteratively to manage sepsis for
patients in the intensive care unit
TABLE 488-2 Select Tools in the Modern Medical Machine-Learning Toolkit
METHOD DEFINITION NOTES
Linear and Logistic
Regression
Models a linear relationship between predictors
and either a continuous or binary outcome variable;
“traditional” statistical modeling
Necessary baseline. In small carefully curated clinical datasets, these methods
often perform on par with more sophisticated methods
Gradient-Boosted Trees Ensemble of “decision trees” with parameters optimized
to efficiently learn accurate nonlinear functions in
supervised setting
Efficient to train and often performs well on machine-learning tasks with
tabular data
Convolutional Neural
Network (CNN)
Specialized deep-learning architecture with groups of
neurons (“convolutional filters”) that exploit structure
State-of-the-art in computer vision; de facto standard for medical imaging tasks
(e.g., U-Net architecture for biomedical image segmentation)
Transformer Models Deep-learning architecture designed for mapping input
sequences to output sequences (e.g., text, speech)
Variants include Bi-directional Encoder Representations from Transformers
(BERT), Generative Pre-trained Transformer 3 (GPT-3); state-of-the-art for tasks in
natural language processing and machine translation
Generative Adversarial
Network (GAN)
Deep-learning framework consisting of two networks
that compete to better learn the “generative model”
underlying training examples
Performs well on image-to-image translation tasks; can create realistic synthetic
data, art, style transfer
Uniform Manifold
Approximation and
Projection (UMAP)
Dimensionality reduction technique to visualize and
identify low-dimensional structure of high-dimensional
dataset while preserving global structure
Nonlinear technique; many other techniques exist, e.g., principal component
analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE)
Transfer Learning Family of approaches to adapt models trained for one
task and apply to another task (across domains)
Useful for “jump-starting” a model for a new problem, e.g., many medical
computer-vision models start with a network trained for a separate (often
nonmedical) task such as ImageNet and may “fine-tune” for a particular medical
application
3828 PART 20 Frontiers
useful and interpretable baselines, but often much more. Generalizations of these approaches as well as many other methods have been
developed to learn complicated functions including highly nonlinear
relationships. For example, models such as gradient-boosted trees (Table
488-2) often achieve excellent performance with tabular data that lack
the spatial or temporal structure that newer deep-learning methods
can exploit.
Modern deep-learning models can learn rich and flexible representations of raw clinical data. The building blocks of these models
are simple artificial “neurons,” often arranged into layers (Fig. 488-2).
Each neuron accepts input from neurons in a preceding layer, computes a weighted sum of these inputs, and applies a nonlinear function
to the weighted sum (Fig. 488-2).
The values of the adjustable weights between neurons are learned
during the training process. Neurons may be arranged into hierarchical layers that build an increasingly rich representation of input data.
For example, convolutional neural networks (CNNs) are specialized
architectures that combine groups of neurons with the mathematical
operation of convolution (“convolutional filters”) to exploit spatial
structure (Table 488-2). Initial layers learn low-level features (e.g.,
edges) and then build to higher layers that learn motifs and objects,
creating a powerful representation for discriminating between inputs
using output labels (Fig. 488-3).
Modern neural networks may have many millions and even billions
of weight parameters. For example, the VGG-16 network by Simonyan
and Zisserman, a path-breaking architecture and one of the top models
in the ImageNet Large Scale Visual Recognition Challenge in 2014,
had approximately 140 million parameters. The U-Net architecture
by Ronneberger and colleagues, a convolutional neural network, is
used frequently for biomedical image segmentation and other medical
imaging tasks. Other deep-learning models have architectures tailored
for distinct tasks and data types including sequence-to-sequence
and Transformer models, designed for sequential data, including
unstructured text notes from electronic health records (Table 488-2).
Generative adversarial networks (GANs) feature an architecture with
two co-trained competing (“adversarial”) networks and exhibit what
many would describe as artistic talent or creativity. Finally, many
unsupervised dimensionality reduction techniques have recently been
introduced to discover nonlinear structure including, for example,
Uniform Manifold Approximation and Projection (UMAP), though
linear (e.g., PCA) and nonlinear (e.g., t-SNE) alternatives exist as well
(Table 488-2). In many deep-learning applications, it is important to
note that practitioners often do not train networks tabula rasa but
j
1
j
2
j
3
j
4
j
5
w1
w2
w3
w4
w5
ReLU (Σwi × j
i
)
5
i = 1 k
FIGURE 488-2 The artificial neuron, the building block of deep learning models.
Neuron k accepts weighted input from neurons in the preceding layer and applies
a function (e.g., ReLU(x) = max(0,x), the rectified linear function) to the weighted
sum of inputs (often with a bias term, not shown). During training, the weights are
iteratively refined to better fit the data.
Input Softmax2
Softmax1
Edges (layer conv2d0) Textures (layer mixed3a) Patterns (layer mixed4a) Parts (layers mixed4b & mixed4c) Objects (layers mixed4d & mixed4e)
Softmax0
3a 3b 4a 4b 4c 4d 4e 5a 5b
FIGURE 488-3 Deep-learning models learn rich hierarchical representations. Visualization of what a convolutional neural network “sees” as it is processing images of
common objects from the ImageNet dataset. The initial layers learn low-level features like edges and the higher layers learn patterns and objects. (Reproduced from distill.
pub by C Olah et al: Feature visualization. Distill, 2017 https://distill.pub/2017/feature-visualization/.)
3829 Machine Learning and Augmented Intelligence in Clinical Medicine CHAPTER 488
instead benefit substantively from transfer learning (Table 488-2),
where the weights may be initialized or “pretrained” from another task.
■ PRACTICAL CONCEPTS IN TRAINING A MODERN
MACHINE-LEARNING MODEL
In this section, we provide a brief overview of practical concepts
that emerge when training a modern machine-learning model to
help readers understand the constraints and choices that confront
machine-learning practitioners.
Often the first task is to define what a “good” prediction is by specifying a “loss function,” which quantifies the error between a model’s
prediction and the true label (Table 488-3). There are many choices
for this function. Linear regression uses quadratic loss; other examples
include the cross-entropy and 0–1 loss. Given a loss function, the
stochastic gradient descent method coupled with the backpropagation algorithm quantifies how to alter the adjustable weights in the
network to optimize the loss function iteratively as labeled examples
are provided to the network, either one by one or in batches. The
weights themselves are often initialized carefully and frequently transferred over from another task. Most machine-learning practitioners
use specialized hardware called graphics processing units (GPUs) to
perform these calculations. As with most computer hardware, GPUs
range dramatically in performance and cost. Software frameworks like
PyTorch and TensorFlow automate much of the otherwise-cumbersome training process, abstracting away in a dozen lines what might
have previously taken a team of machine-learning engineers months
to build. The machine-learning community places great emphasis on
generalization through near-universal practices such as widely available benchmark datasets, and splitting training and testing data, with
performance measures such as the area under the receiver operating
curve (AUC) computed in held-out test data to improve estimates
(Table 488-3).
APPLICATIONS OF MODERN MACHINELEARNING TO CLINICAL MEDICINE
Two major classes of applications have dominated recent machine-learning
applications in medicine: computer vision and natural language processing. We review some of the recent applications below, highlighting
the breadth of challenges across clinical specialties and some emerging
new directions.
■ COMPUTER VISION
Medical computer-vision applications, particularly those employing convolutional neural networks, have dominated the medical
machine-learning literature over the past decade. Convolutional neural
networks are well suited to exploit the spatial structure in medicalimaging data and are able to learn detailed representations of input
images, yielding systems that often perform on par with or better than
expert physicians at select tasks. One of the most well-known medical
computer-vision applications during the past decade was published in
a paper by Gulshan and colleagues in the Journal of the American Medical Association (JAMA) in 2016. The authors trained a convolutional
neural network using 128,175 ophthalmologist-labeled retinal fundus
photographs to develop an automated diabetic retinopathy detection
system, achieving performance on par with expert ophthalmologists,
with an AUC of 0.99 in two separate validation datasets. In a separate
study by De Fauw and colleagues, 14,884 three-dimensional retinal
optical coherence tomography (OCT) scans were used to train a deeplearning model that could make referral suggestions and performed
with an accuracy at or superior to eight clinical experts who graded the
same scans, with an AUC in test data over 0.99 for urgent referral vs.
other referrals. Some machine-learning applications in ophthalmology
have already received approval from the U.S. Food and Drug Administration (FDA) including the IDx-DR “device” to classify more than
mild diabetic retinopathy.
Outside of ophthalmology, computer-vision applications have been
numerous across the many other specialties that rely on imaging
data. For example, dermatologist-level classification of skin cancer
was achieved in a study by Esteva and colleagues published in Nature
during 2017. The authors trained a convolutional neural network to
distinguish between keratinocyte carcinomas and benign seborrheic
keratoses as well as between malignant melanomas and benign nevi.
The authors concluded that the model performed at the level of the 21
board-certified dermatologists against which it was tested.
The uses of machine-learning models in radiology are numerous
as well, with applications including the detection of pneumonia from
chest x-rays, identification of pancreatic cancer from CT scans, and
fast, automated segmentation of cardiac structures from cardiac MRI,
as well as echocardiography.
Specialized deep-learning architectures such as the U-Net architecture by Ronneberger and colleagues have become especially popular
in the medical computer-vision community. Architectures are often
designed for specific imaging tasks (e.g., image segmentation) or specialized data types (e.g., three-dimensional images or videos). New frontiers
of computer-vision research in medicine include semi-supervised learning approaches to take advantage of extensive unlabeled data available
at hospitals, particularly given the practical difficulty and cost for an
individual researcher to obtain large expert-labeled datasets.
■ NATURAL LANGUAGE PROCESSING
Like computer vision, natural language processing (NLP) has been
transformed by modern machine-learning approaches, particularly
deep learning. Deep-learning approaches include recurrent neural
networks, newer sequence-to-sequence models, and the recently developed Transformer model (Table 488-2), which are well suited to exploit
the structure of text and natural language in both supervised and
unsupervised settings. These models have been successfully applied to
analyze physician notes in the electronic health record, detect depression symptom severity from spoken language, and scribe patientphysician visits. For example, a study by Rajkomar and colleagues
analyzed electronic health record data from 216,221 adult patients
to predict in-hospital mortality, 30-day unplanned readmission, and
discharge diagnoses amongst other outcomes, performing at high
accuracy, with an AUC of 0.93–0.94 for predicting in-hospital mortality. It is important to note that much of the progress in medical natural
language processing has stemmed from the widespread availability of
datasets, for example the Medical Information Mart for Intensive Care
(MIMIC) database.
Many specialized deep-learning architectures have been developed
for natural language processing applications, including the analysis of
electronic health record data, using both supervised (e.g., recurrent
neural network) and unsupervised (e.g., variational autoencoder)
approaches. Domain-specific language representation models have
TABLE 488-3 Practical Concepts for Training a Deep-Learning Model
CONCEPT DEFINITION EXAMPLES
Loss Function Mathematical function that
quantifies discrepancy between
predicted label and true label
Cross-entropy,
quadratic, 0–1
Back-propagation Algorithm to compute the
direction of the loss function
with respect to changes in the
adjustable weights (“gradient”)
—
Graphics Processing Unit
(GPU)
Specialized computer hardware
to speed up the many matrix
calculations involved in training
a neural network
NVIDIA Tesla V100
Train-test Split How data are divided to
ensure fair estimates of model
performance after training
70% training,
30% test
Area Under the Receiver
Operating Characteristic
Curve (AUC)
Common performance metric for
evaluating binary classification
models. 0.5 = random,
1.0 = perfect
—
Deep-Learning
Framework
Computational framework for
efficiently performing matrix
(tensor) calculations for training
deep-learning models
TensorFlow,
PyTorch, Keras
3830 PART 20 Frontiers
been developed for the purpose of biomedical text mining, serving as
a substrate for many downstream natural language processing tasks.
These include, for example, the BioBERT model by Lee and colleagues,
published in 2019, which adapts the Bi-directional Encoder Representations from Transformers (BERT) model (Table 488-2) for biomedical
text mining.
■ OTHER APPLICATIONS
While medical computer vision and natural language processing tasks
have been the focus of newer deep-learning models due to the extensive structure of imaging and text data, many other application classes
exist. For example, cardiologist-level performance has been achieved in
deep-learning approaches for detecting arrhythmias from ambulatory
electrocardiograms, standing in contrast to the rule-based algorithms
used traditionally to interpret electrocardiogram signals. In genomics,
investigators have analyzed tumor genomes with machine-learning
methods to better predict survival using both deep-learning and other
machine-learning approaches. Machine-learning methods have also
been used to characterize the deleteriousness of single-nucleotide
variants in DNA. Many other applications of machine learning to
new patient data streams are emerging, for example machine learning
applied to wearables (e.g., smartwatches).
MACHINE LEARNING AND PRECISION
MEDICINE: DEEPER REPRESENTATIONS
OF PATIENT STATE
The modern machine-learning methods described in this chapter have much in common with the concept of precision medicine.
As described in a report published by the National Academies of
Sciences, Engineering, and Mathematics in 2011, precision medicine
refers to “the ability to classify individuals into subpopulations that
differ in their susceptibility to a particular disease, in the biology and/
or prognosis of those diseases they may develop, or in their response
to a specific treatment.” This vision calls for the development of an
“Information Commons,” a patient-centric view of heterogeneous
data streams (e.g., genome, environmental exposures, clinical signs
and symptoms, transcriptome) that together paint a complete picture
of an individual’s health.
The machine-learning methods described in this chapter similarly
operate on a heterogeneous and rich set of data to improve both the
predictive abilities of physicians as well as the understanding of disease
structure within and across data types. Modern machine-learning
methods are especially well suited to improve the representation of a
patient’s clinical state, identity, and environmental context in order
to improve individualized medical decision-making, learning datadriven “embeddings” of a patient’s clinical state and identity in the
process (Fig. 488-4). Machine learning and precision medicine can
thus be seen as aligned disciplines where flexible and powerful learning
algorithms combine with rich and detailed data to augment clinical
decision-making.
CONCLUSION
Modern machine learning offers a powerful set of techniques to learn
feature representations directly from data, already performing on par
with expert physicians on select tasks. If carefully trained and judiciously applied to key areas of clinician workflow, the representational
power of new machine-learning methods makes them likely to touch
every area of clinical practice.
Genome Blood & Urine Laboratory Biomarkers Exposures
Embeddings
of Identity and
Clinical State
Gold-Standard
Diagnoses
Output Diagnoses
Cross-Entropy Loss
FIGURE 488-4 Machine learning and precision medicine: deeper representations of clinical state and identity. Diverse data streams (e.g., genome; blood and urine
biomarkers, including triglycerides and LDL cholesterol; and exposures) are combined alongside labeled output diagnoses in a deep-learning model. In the process of
training this model, a lower-dimensional representation of the high-dimensional input data (“embedding”) is learned.
3831 Metabolomics CHAPTER 489
■ FURTHER READING
Gulshan V et al: Development and validation of a deep learning
algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316:2402, 2016.
Krizhevsky A et al: Classification with deep convolutional neural
networks. Advances in neural information processing systems. 25
(NIPS’2012). 1, 2012.
Lecun YA et al: Deep learning. Nature 521:436, 2015.
Obermeyer Z, Emanuel EJ: Predicting the future - big data, machine
learning, and clinical medicine. N Engl J Med 375:1216, 2016.
Olah C et al: Feature visualization. Distill, 2017. https://distill.
pub/2017/feature-visualization/.
Rajkomar A et al: Machine Learning in Medicine. N Engl J Med
380:1347, 2019.
Ronneberger O et al: U-Net: Convolutional networks for biomedical
image augmentation, in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. Springer International
Publishing, 2015, pp. 234–241.
Silver D et al: Mastering the game of Go without human knowledge.
Nature 550:354, 2017.
Szolovits P, Pauker SG: Categorical and Probabilistic Reasoning in
Medical Diagnosis. Artif Intell 11:115, 1978.
Topol EJ: High-performance medicine: the convergence of human and
artificial intelligence. Nat Med 25:44, 2019.
Metabolism, loosely defined, represents the sum of all biochemical
reactions involving small molecules with a molecular mass of ≤1000
Daltons within a given tissue, cell, or fluid. These small molecules are
collectively referred to as metabolites and are involved in the biochemical processes used to create macromolecules and fulfill the evolving
energy needs of a cell or organism. Metabolomics, then, represents
the measurement of metabolites, either qualitatively or quantitatively,
often as a way to gain insight into the metabolism of a cell, tissue, or
organism. No one experimental approach can characterize metabolism
in its entirety; metabolomics instead strives to measure a portion of
the metabolome, which consists of all metabolites in a given biological
sample at a given time.
A link to a time-specific context is common to all “-omics” techniques, but is particularly important in metabolomics. As metabolic
processes are highly connected and interdependent, with individual
metabolites often being involved in multiple pathways, levels of a
specific metabolite can vary in response to an alteration in either the
production or the consumption of that metabolite. Because significant
changes in metabolite levels can occur over a very short time frame,
the levels measured can be sensitive to perturbations either upstream
or downstream of a metabolite in a pathway. This sensitivity can make
measurement challenging, but it also makes metabolomics a powerful
tool with which to assess either acute or chronic changes in cells or
tissues. Indeed, the metabolome can be quite dynamic and reflective
of the current condition of the material being assessed, as it ultimately
represents an integration of outputs from the genome, epigenome,
transcriptome, and proteome (Fig. 489-1).
489 Metabolomics
Jared R. Mayers, Mathew G. Vander Heiden
APPROACHES AND SAMPLING
CONSIDERATIONS
■ UNTARGETED AND TARGETED METABOLOMICS
There are two distinct approaches to measuring metabolites in biological materials: untargeted and targeted metabolomics. These strategies
differ in whether a predetermined subset of metabolites is intentionally sought in a sample, with the choice of approach dictated by the
question under investigation. Regardless of the method utilized, it
is important to recognize that no single metabolomics technique is
comprehensive. Technical considerations heavily influence metabolite
measurement, and no one method is able to capture the entire metabolome. In this respect, metabolomics contrasts with some other -omics
techniques, like genomics or transcriptomics—i.e., in metabolomics, if
something is not measured, its absence cannot necessarily be assumed.
Untargeted Metabolomics Untargeted metabolomics is the comprehensive analysis of all the measurable analytes in a sample, irrespective of their identity (Fig. 489-2). Among the benefits of this
approach is that it is agnostic in its measurement of the metabolome.
Thus, it allows for the discovery of novel or unexpected molecules for
further study. Coverage of the metabolome in an untargeted approach
is influenced by the techniques used for sample preparation, metabolite separation prior to detection, and the inherent sensitivity and
specificity of the analytical technique(s) employed (see “Metabolomics
Technologies,” below).
A major drawback of untargeted metabolomics is that molecules
of interest can be measured with less confidence or missed entirely
because this approach carries an inherent bias toward the detection
of high-abundance molecules. Handling and interpretation of data
also represent a major challenge, as each sample run generates large
amounts of data whose analysis can be both complicated and time
consuming. Identifying each metabolite measured requires database
searching, and further experimental investigation is often needed to
confirm the exact identity of a signal of interest. Finally, in most cases,
this technique yields only relative metabolite quantification, thereby
rendering it most useful for comparisons between biological samples.
Targeted Metabolomics Targeted metabolomics involves the
measurement of a predefined group of chemically characterized
metabolites—typically dictated by a hypothesis or predetermined
platform—with the aim of covering a select portion of the metabolome.
The metabolites measured represent only a subset of those that would
be measured by an untargeted approach; thus a targeted approach
generates a much smaller data set in which individual metabolites
are detected with higher confidence (Fig. 489-2). Because the identity of each signal is known in advance, standards can be added to
provide absolute quantification of each metabolite measured in the
sample, although the use of targeted metabolomics to compare relative metabolite levels across samples is common. In addition, sample
preparation and chromatographic separation before measurement can
be optimized to improve detection of specific metabolites, enabling
assessment of less abundant molecules.
The key downside of targeted metabolomics is that information is
gained about only those metabolites targeted by the analytical method.
■ SAMPLING CONSIDERATIONS
Regardless of the approach used, it is critical to consider potential
sources of error that can influence the conclusions drawn from a
metabolomic analysis. Because of the dynamic nature of the metabolome, numerous biological confounders inherent to the samples themselves can affect levels of the metabolites measured. For this reason,
Genome Epigenome Transcriptome Proteome Metabolome Phenotype
FIGURE 489-1 The metabolome is downstream of the outputs measured by other “-omics” technologies. Thus, the state of the metabolome can more closely reflect clinical
and experimental phenotypes.
3832 PART 20 Frontiers
the inclusion of controls or reference populations to account for these
confounders can be critical for interpretation of the data. Established
biological confounders for patient-derived material include age, sex,
body mass index, fasting status and/or dietary differences, and comorbid conditions such as diabetes or smoking. For example, metabolites
commonly altered with respect to aging are those in antioxidant and
redox pathways as well as breakdown products of macromolecules. Sex
differences influence a number of different metabolites, most prominently those involved in steroid and lipid metabolism. Perhaps it is
not surprising that diet can also affect the metabolome, and fasting has
been shown to impact almost every category of metabolite frequently
measured in biological fluids.
Differences in sample handling and processing also influence
metabolite measurements. Work using metabolomics to analyze
material from large prospective cohort studies has shown that changes
in metabolite levels introduced by sample handling can lead to falsely
positive associations between specific metabolite changes and disease
risk. Specific considerations include the large geographic area of distribution from which patients are drawn—e.g., a sample, such as blood, is
collected locally and then exposed to variable conditions before being
sent to a central lab for further processing. Moreover, because of the
costs associated with obtaining and storing samples, often only one
sample is available for each individual.
Time is a key variable in metabolite measurements, and efforts to assess
the impact of sample handling and processing have led to improved
analysis pipelines. For example, comparison of metabolites measured
in samples undergoing immediate versus delayed processing can provide
insight into those metabolites most affected by pre-processing
storage under varying conditions. More specifically, because metabolism occurs on a very rapid time scale, some metabolite levels will
continue to change after collection even if the sample is stored on ice.
Therefore, metabolism is ideally halted or “quenched” immediately
via rapid freezing or chemical extraction, but practical considerations
involved in the collection of material from patients can sometimes
make quenching impossible.
Sequential metabolomic analyses of the same type of biological
material from a patient can explore how metabolite levels vary over
time in individuals. It is interesting that, when measured, many metabolites are found to be relatively stable. However, the extensive variability exhibited by some metabolites indicates that findings involving
those metabolites should be interpreted with caution.
Finally, the method of sample processing can affect which metabolites are extracted from the material and thus influence what is
measured.
Targeted
metabolomics
Untargeted
metabolomics
FIGURE 489-2 Untargeted metabolomics strives to measure as much of the metabolome as possible within a given biological sample, whereas targeted metabolomics
focuses on measuring a predetermined subset of the metabolome. In untargeted metabolomics, a large number of signals corresponding to metabolites is generated, and
further investigation is often necessary to assign a particular signal to a specific metabolite. Targeted metabolomics allows investigators to definitively measure signals
that correspond to specific metabolites of interest.
METABOLOMICS TECHNOLOGIES
Metabolomics relies heavily on the intersection of instrumentation,
software, and statistical and computational approaches for measurement of metabolite levels and downstream data analysis. While the
development of new and emerging techniques to assess the metabolome is ongoing, the current, clinically applicable approaches can
be separated into two broad categories: nuclear magnetic resonance
(NMR)–based approaches and chromatography/mass spectrometry
(MS)–based approaches. Each of these two approaches has its own set
of advantages and disadvantages.
■ NUCLEAR MAGNETIC RESONANCE
NMR is a technique that, at its core, exploits intrinsic magnetic properties of atomic nuclei to generate data. Nuclei with an odd total number
of protons and neutrons (such as 1
H, 13C, 15N, and 31P) have a non-zero
spin, and this spin generates a magnetic field that can interact with
externally applied electromagnetic fields. NMR places compounds
into a magnetic field that induces the smaller magnetic fields to align
with the larger one. Samples are then exposed to a perpendicular electromagnetic field; the frequency of electromagnetic radiation needed
to flip the spin of a nucleus in the exact opposite direction represents
the frequency at which an atom “resonates” and can be measured. The
resonance frequency of a given atom is affected by adjacent atoms
and is ultimately unique for a given arrangement of atoms (i.e., each
metabolite). This distribution or “spectrum” of signals is measured and
recorded in an NMR experiment.
With respect to clinical applications, the primary benefits of
NMR-based approaches are that they are nondestructive and can be
performed on living samples, such as patients, cells or tissues. They
are also highly reproducible and require minimal sample preparation.
Measurements are necessarily quantitative as the signal measured
directly reflects concentration. These features ensure that multiple,
comparable measurements can be made in a given sample either at
a single point in time or across time. In addition, given that spins of
different elements require sufficiently disparate resonance-inducing
radio frequencies in order to be entirely distinguishable, multiple
elements can be assessed in a sample; this feature allows multidimensional cross-referencing of signals such as hydrogen and carbon. In an
untargeted analysis, these multidimensional data can then be used
for definitive metabolite identification, with comparison of results to
known databases where spectra for many metabolites in the human
metabolome have been systematically recorded.
Despite all these benefits, the primary challenge of NMR-based
approaches is a lack of sensitivity. Because the time required to detect
3833 Metabolomics CHAPTER 489
a signal is proportional to concentration, assessment of less abundant
species is impossible or impractical. For example, while a typical NMRbased metabolomics analysis will return data on up to a couple of hundred metabolites at concentrations of >1 μM, the MS-based approaches
discussed below can distinguish more than 1000 metabolites at concentrations one to two orders of magnitude lower (Table 489-1).
■ CHROMATOGRAPHY/MASS SPECTROMETRY
A distinguishing feature of chromatography/MS–based approaches is
that a multistep process that destroys the material is necessary to generate a sample for analysis. In addition, each step of the sample preparation
process involves decisions that influence the metabolites measured at
the time of analysis. In general, once a sample to be analyzed is prepared,
that material is subjected to a combined chemical and temporal separation of compounds via chromatography, with the output delivered to a
device for performance of mass-based detection (technically, measurement of a mass-to-charge [m/z] ratio)—i.e., mass spectrometry. Finally,
data collected by the mass spectrometer are analyzed (Fig. 489-3).
Sample Preparation Although occasionally a part of NMR-based
metabolite detection protocols, MS-based approaches almost uniformly require an initial sample-preparation phase called extraction.
This technique destroys the original sample by partitioning metabolites into distinct immiscible phases, such as polar and nonpolar.
These phases are then mechanically separated and processed further
for analysis. Given the nature of this extraction process, it is critical to
determine in advance the general class of metabolites to be measured.
This information will help to determine the optimal extraction protocol for specific types of metabolites of interest and to shape further
downstream decisions regarding the chromatography/MS technique
that also influence metabolite detection. In addition, depending on
the metabolites to be analyzed and the method of separation and/
TABLE 489-1 Comparison of Nuclear Magnetic Resonance (NMR)–
Based and Mass Spectrometry (MS)–Based Approaches to Metabolomic
Analyses
FEATURE NMR MS
Reproducibility High Lower
Sensitivity Low (low μM) High (low nM)
Selectivity Untargeted Targeted >> untargeted
Sample preparation Minimal Complex
Sample measurement Simple: single prep Multiple preps
Metabolites per sample 50–200 >1000
Identification Easy, using 1D or 2D
databases
Complex; need standards
and additional analyses
Quantitation Inherently quantitative;
intensity proportional to
concentration
Requires standards
because of varying
ionization efficiency
Sample recovery Easy, nondestructive No
Living samples Yes No
or analysis used, extracted samples sometimes are processed further
in a preparative step called derivatization: extracted metabolites are
chemically modified by the addition or substitution of distinct, known
chemical moieties that facilitate separation or detection of types of
metabolites. By changing the chemical properties of metabolites, derivatization may improve stability, solubility, or volatility or facilitate
separation from closely related compounds, enhancing measurement
of specific metabolites.
Chromatography Chromatography is a ubiquitous approach used
in chemistry for the separation of complex mixtures. The mixture of
interest in a mobile phase is passed over a stationary phase such that
compounds in the mixture interact with the stationary phase and
transit through that stationary phase at different speeds, allowing their
consequent separation. Two general types of chromatography are typically used in metabolomics.
LIQUID CHROMATOGRAPHY Liquid chromatography–mass spectrometry (LC-MS) is the most commonly used approach in MS-based
metabolomics. In this case, chromatography is characterized by a
mobile phase that is a liquid and a stationary phase that is a solid. In
liquid chromatography in particular, the choice of the solid and liquid
phases can dramatically influence the types of compounds separated
for input into the mass spectrometer. In general, LC-MS metabolomics
is highly sensitive and versatile in allowing detection of a broad range
of metabolites. A downside, however, is variability in exact separation
timing, especially between different instruments; which metabolites
are measured is impacted by the chromatography used and how well
molecules are separated.
GAS CHROMATOGRAPHY Gas chromatography–mass spectrometry
(GC-MS) involves chromatography in which the mobile phase is a gas.
In contrast to LC-MS, GC-based approaches have a narrower range
of applications because only volatile metabolites that enter a gaseous
phase are separated. When combined with appropriate derivatization,
GC-MS is a robust way to detect many organic acids, including amino
acids, and molecules of low polarity, such as lipids. GC-MS is more
reproducible than LC-MS across platforms and requires less expensive instrumentation and less specialized training, but it also typically
measures a much more restricted range of metabolites in a sample than
does LC-MS.
Mass Spectrometry Once the metabolites in a sample have been
separated by chromatography, they are sent into the mass spectrometer
for analysis and measurement. The first step in this stage of the process
is to generate charged ions, as mass spectrometers measure compounds
on the basis of their m/z ratio. Charge can be imparted through various
techniques, although most commonly it is attained by either applying a
high voltage to a sample or striking it with a laser.
A number of different types of mass spectrometer can be employed
for metabolomics. Three of the most commonly available types are
discussed below.
TANDEM MASS SPECTROMETRY Tandem MS relies on three sets of
quadrupole magnets arranged in series. The power of this arrangement
Extraction Derivatization Chromatography Mass spectrometry
data analysis
FIGURE 489-3 Metabolite measurement by chromatography/mass spectrometry–based approaches involves multiple steps, and decisions made at each step influence
what is measured. First, metabolites are extracted from a biological sample in a manner that is destructive of the original sample. This process stops biochemical activity
and creates metabolite samples that can be analyzed, sometimes after a chemical derivatization step that alters a subset of metabolites in a manner that facilitates their
downstream analysis. Second, metabolites in the sample are separated via chromatography. Finally, the chromatographically separated compounds are analyzed by mass
spectrometry. Each signal detected corresponds to a metabolite’s characteristic mass per unit charge while the amplitude of that signal reflects the abundance.
3834 PART 20 Frontiers
lies in its specificity through two sequential mass analyses of the same
starting compound. In the first quadrupole, the “parent” or full ion
is measured before being bombarded by an inert gas in the second
quadrupole; this process fragments the compound into characteristic
smaller “daughter” ions. The third quadrupole then measures these
daughter ions.
TIME-OF-FLIGHT MASS SPECTROMETRY While there are multiple
types of time-of-flight (TOF) mass spectrometers, they all operate on
similar principles. Most simply, lighter metabolites travel faster and
heavier metabolites travel more slowly. TOF machines have high mass
accuracy and sensitivity while also acquiring data quickly.
ION TRAP MASS SPECTROMETRY Ion trap mass spectrometers, of
which the orbital trap is a subtype, offer perhaps the highest degree of
flexibility when it comes to MS-based metabolomics. In general, these
machines can select for a specific mass range of metabolites at multiple
levels, first by filtering with a single quadrupole and then by trapping
and accumulating metabolites of a particular mass or range of masses.
This accumulation can be applied to low-abundance compounds,
allowing increased sensitivity. It also allows repeated fragmentation
of metabolites (called MSn
) to produce characteristic “daughter” ions,
increasing the specificity of the analysis. Given this versatility coupled
with high mass accuracy, the development of these machines is advancing rapidly; however, access to the latest versions can often be limited
by cost.
CURRENT CLINICAL APPLICATIONS
Tests to assess small molecules are ubiquitous and well established
throughout medicine. These include assays to measure select metabolites of known clinical relevance, such as glucose, lactate, and ammonia.
Of note, many standard tests assess these metabolites one at a time;
however, metabolomics can allow the assessment of many metabolites
in a sample and provide more information on metabolic state at a given
point in time. In some cases, metabolomics is used to detect molecules
for which there is not a robust single analyte test or when multiple
species measured in a sample might provide new information. Here
we will focus specifically on several applications of metabolomics techniques in current clinical practice.
■ MAGNETIC RESONANCE SPECTROSCOPY
Magnetic resonance spectroscopy (MRS) is an adaptation of magnetic
resonance imaging (MRI), a widely used technology in clinical practice. MRI, at its core, is essentially proton (1
H) NMR with the resulting
data rendered spatially to generate an image. Recall that NMR is nondestructive and can be applied to living samples. MRS, then, is a capability built into almost every MRI machine. In practice, radiologists
can focus in on specific volumes of interest within a patient’s imaging
and perform additional sequences to obtain an NMR spectrum in that
space that can allow for the identification and quantification of specific
metabolites in that space. With this approach, a number of different
metabolites across diverse classes, including lipids, sugars, and amino
acids, can be measured at a given time.
Extensive work has correlated different biological processes with
altered levels and/or ratios of metabolites measured via MRS. One
well-established application is in the diagnosis of brain masses. More
specifically, N-acetylaspartate (NAA) is an amino acid derivative that
is abundant in neurons, whereas choline is a metabolite whose level,
as measured by MRS, correlates with cellularity and/or proliferation.
Thus, an increase in the ratio of choline to NAA (and even loss of NAA
signal entirely) correlates with cancer; tumors biologically are associated with the properties of increased cellularity from proliferation and
the concurrent exclusion of normal neurons. A different process—for
example, a brain abscess—does not result in increased choline levels
(which instead may actually decrease), but does exclude neurons,
resulting in an isolated NAA decrease. Metabolites such as lactate can
also be helpful, depending on the clinical context, in providing insight
into the metabolism of a tumor or identifying areas of early hypoxic
brain injury after a stroke. Finally, among the several amino acids that
can be measured, high levels of glutamine/glutamate can be helpful in
a patient with altered mental status as changes in these amino acids
are associated with hyperammonemia. (Glutamate serves as the central nervous system sink for ammonia, generating glutamine in the
process.)
■ NEWBORN SCREENING PROGRAMS
Newborn screening programs are used to identify diseases within the
first few days of life such that they can be treated or managed with early
intervention. Among the classes of disease targeted by newborn screening programs are many inborn errors of metabolism, which often lead
to changes in the levels of specific metabolites in blood or urine. One
of the first newborn screening programs tested for phenylketonuria,
which results from the inability to metabolize phenylalanine and causes
high serum and urine levels of particular metabolites. Since that time,
the panel used by programs throughout the United States and around
the world has expanded dramatically. The general protocol is to collect
a blood sample from infants in the first few days of life (often by heel
prick on a piece of paper). These samples are sent to a central lab for
analysis, which typically includes metabolomics measurements with
targeted LC–tandem MS. Specific inborn errors of metabolism are suggested by abnormal levels of a given metabolite or set of metabolites.
■ METABOLITE MEASUREMENTS IN CHILDREN
AND ADULTS
Outside the window of newborn screening, direct clinical measurement of metabolite levels is also used in pediatric and adult patients.
In these cases, clinical samples such as serum, cerebrospinal fluid, or
urine are typically subjected to targeted LC–tandem MS to measure
metabolites such as amino acids, acylcarnitines, and fatty acids. These
measurements can help diagnose milder cases of inborn errors of
metabolism that may have been missed by newborn screening. They
can also help identify secondary metabolic defects, such as those that
are related to nutritional deficiencies or are acquired in the setting of
additional pathology. For example, these measurements are useful in
determining the etiology of noncirrhotic hyperammonemia exposed
by a catabolic stressor such as sepsis in a patient with a previously
unknown subclinical or acquired urea-cycle defect.
MS-based metabolomics is used by various athletic organizations for
detection of metabolites associated with banned substances and by the
pharmaceutical industry for assessment of levels of pharmaceuticals
and their metabolites in both blood and tissues. Such analyses can
provide key pharmacokinetic information to guide drug dosing and
illuminate toxicology. These approaches can also be useful in clinical practice. For example, chronic pain and its management remain
a challenge, and the sequelae of opiate/opioid use and abuse are of
concern to many providers, their patients, and their patients’ families.
Therefore, many electronic medical records systems strive to ensure
appropriate and consistent patient access to pain medications, while
providers may need a means to ensure that patients are adhering to
their prescribed regimens. One way to monitor drug use is to perform
targeted LC–tandem MS for detection of specific drug metabolites in
patients’ urine. This approach is more sensitive than first-generation
immunoassays and can detect a range of metabolites associated with
other drugs beyond the one prescribed. Given that the first-generation
immunoassays also often rely on confirmatory MS testing, upfront
metabolomics reduces lab turnaround time and may also reduce costs
by limiting multiple tests on the same sample.
EMERGING AND EXPERIMENTAL CLINICAL
APPLICATIONS
The current clinical applications of metabolomics are largely limited
to the indications described above. However, many ongoing efforts are
aimed at expanding the use of metabolomics for detection of biomarkers that can help with disease diagnosis or prognostication.
■ METABOLITES AS BIOMARKERS OF DISEASE
There has been increasing work in prospective human cohort studies
on the use of metabolomics, primarily MS-based approaches, to empirically identify small groups of metabolites whose altered levels are
No comments:
Post a Comment
اكتب تعليق حول الموضوع