Search This Blog

Translate

خلفيات وصور / wallpapers and pictures images / fond d'écran photos galerie / fondos de pantalla en i

Buscar este blog

11/1/25

 


The term predictive value has often been used as a synonym for the

posttest probability. Unfortunately, clinicians commonly misinterpret

reported predictive values as intrinsic measures of test accuracy rather

than calculated probabilities. Studies of diagnostic test performance

compound the confusion by calculating predictive values from the

same sample used to measure sensitivity and specificity. Such calculations are misleading unless the test is applied subsequently to populations with exactly the same disease prevalence. For these reasons, the

term predictive value is best avoided in favor of the more descriptive

posttest probability following a positive or a negative test result.

The nomogram version of Bayes’ rule (Fig. 4-2) helps us to understand at a conceptual level how it estimates the posttest probability of

disease. In this nomogram, the impact of the diagnostic test result is

summarized by the likelihood ratio, which is defined as the ratio of

the probability of a given test result (e.g., “positive” or “negative”) in a

patient with disease to the probability of that result in a patient without

disease, thereby providing a measure of how well the test distinguishes

those with from those without disease.

The likelihood ratio for a positive test is calculated as the ratio of the

true-positive rate to the false-positive rate (or sensitivity/[1 – specificity]).

For example, a test with a sensitivity of 0.90 and a specificity of 0.90

has a likelihood ratio of 0.90/(1 – 0.90), or 9. Thus, for this hypothetical test, a “positive” result is 9 times more likely in a patient with the

disease than in a patient without it. Most tests in medicine have likelihood ratios for a positive result between 1.5 and 20. Higher values

are associated with tests that more substantially increase the posttest

likelihood of disease. A very high likelihood ratio positive (>10) usually

implies high specificity, so a positive high specificity test helps “rule

in” disease (the “SpPin” mnemonic introduced earlier). If sensitivity is

excellent but specificity is less so, the likelihood ratio positive will be

reduced substantially (e.g., with a 90% sensitivity but a 55% specificity,

the likelihood ratio positive is 2.0).

The corresponding likelihood ratio for a negative test is the ratio of the

false-negative rate to the true-negative rate (or [1 – sensitivity]/specificity).

Good

Fair

No predictive value

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

True-positive rate

0 0.1 0.2 0.3 0.4

False-positive rate

0.5 0.6 0.7 0.8 0.9 1

FIGURE 4-1 Each receiver operating characteristic (ROC curve) illustrates a tradeoff that occurs between improved test sensitivity (accurate detection of patients

with disease) and improved test specificity (accurate detection of patients without

disease), as the test value defining when the test turns from “negative” to “positive”

is varied. A 45° line would indicate a test with no predictive value (sensitivity =

specificity at every test value). The area under each ROC curve is a measure of

the information content of the test. Thus, a larger ROC area signifies increased

diagnostic accuracy.


26PART 1 The Profession of Medicine

99

1

2

5

0.01

0.1

0.2

0.5

1

2

5

10

20

30

40

50

60

70

80

90

95

98

99

0.02

0.05

0.1

0.2

0.5

10

99

98

95

90

80

70

60

50

40

30

20

10

1

2

5

10

20

50

0.01

0.1

0.2

0.5

1

2

5

10

20

30

40

50

60

70

80

90

95

98

99

0.02

0.05

0.1

0.2

0.5

5

2

1

0.5

0.2

0.1

20

50

98

95

90

80

70

60

50

40

30

20

10

5

2

1

0.5

0.5

0.1

Pretest

Probability, %

Posttest

Probability, %

Likelihood

Ratio

Pretest

Probability, %

Posttest

Probability, %

Likelihood

Ratio

FIGURE 4-2 Nomogram version of Bayes’ theorem used to predict the posttest probability of disease (right-hand scale)

using the pretest probability of disease (left-hand scale) and the likelihood ratio for a positive or a negative test (middle

scale). See text for information on calculation of likelihood ratios. To use, place a straightedge connecting the pretest

probability and the likelihood ratio and read off the posttest probability. The right-hand part of the figure illustrates the

value of a positive exercise treadmill test (likelihood ratio 4, green line) and a positive exercise thallium single-photon

emission CT perfusion study (likelihood ratio 9, broken yellow line) in a patient with a pretest probability of coronary

artery disease of 50%. (Adapted from Centre for Evidence-Based Medicine: Likelihood ratios. Available at http://www.

cebm.net/likelihood-ratios/.)

Lower likelihood ratio negative values more substantially lower the

posttest likelihood of disease. A very low likelihood ratio negative

(falling below 0.10) usually implies high sensitivity, so a negative

high sensitivity test helps “rule out” disease (the SnNout mnemonic).

The hypothetical test considered above with a sensitivity of 0.9 and a

specificity of 0.9 would have a likelihood ratio for a negative test result

of (1 – 0.9)/0.9, or 0.11, meaning that a negative result is about onetenth as likely in patients with disease than in those without disease

(or about 10 times more likely in those without disease than in those

with disease).

■ APPLICATIONS TO DIAGNOSTIC TESTING IN CAD

Consider two tests commonly used in the diagnosis of CAD: an exercise treadmill and an exercise single-photon emission CT (SPECT)

myocardial perfusion imaging test (Chap. 241). A positive treadmill

ST-segment response has an average sensitivity of ~60% and an average

specificity of ~75%, yielding a likelihood ratio positive of 2.4 (0.60/

[1 – 0.75]) (consistent with modest discriminatory ability because it

falls between 2 and 5). For a 41-year-old man with nonanginal pain and

a 10% pretest probability of CAD, the posttest probability of disease

after a positive result rises to only ~30%. For a 60-year-old woman with

typical angina and a pretest probability of CAD of 80%, a positive test

result raises the posttest probability of disease to ~95%.

In contrast, exercise SPECT myocardial perfusion test is more accurate for diagnosis of CAD. For simplicity, assume that the finding of a

reversible exercise-induced perfusion defect has both a sensitivity and

a specificity of 90% (a bit higher than

reported), yielding a likelihood ratio for

a positive test of 9.0 (0.90/[1 – 0.90])

(consistent with intermediate discriminatory ability because it falls between

5 and 10). For the same 10% pretest

probability patient, a positive test raises

the probability of CAD to 50% (Fig.

4-2). However, despite the differences in

posttest probabilities between these two

tests (30 vs 50%), the more accurate test

may not improve diagnostic likelihood

enough to change patient management

(e.g., decision to refer to cardiac catheterization) because the more accurate

test has only moved the physician from

being fairly certain that the patient

did not have CAD to a 50:50 chance

of disease. In a patient with a pretest

probability of 80%, exercise SPECT test

raises the posttest probability to 97%

(compared with 95% for the exercise

treadmill). Again, the more accurate test

does not provide enough improvement

in posttest confidence to alter management, and neither test has improved

much on what was known from clinical

data alone.

In general, positive results with an

accurate test (e.g., likelihood ratio for

a positive test of 10) when the pretest

probability is low (e.g., 20%) do not

move the posttest probability to a range

high enough to rule in disease (e.g.,

80%). In screening situations, pretest

probabilities are often particularly low

because patients are asymptomatic. In

such cases, specificity becomes especially important. For example, in screening first-time female blood donors

without risk factors for HIV, a positive

test raised the likelihood of HIV to only

67% despite a specificity of 99.995%

because the prevalence was 0.01%. Conversely, with a high pretest

probability, a negative test may not rule out disease adequately if it is

not sufficiently sensitive. Thus, the largest change in diagnostic likelihood following a test result occurs when the clinician is most uncertain

(i.e., pretest probability between 30 and 70%). For example, in patients

with a pretest probability for CAD of 50%, a positive exercise treadmill test moves the posttest probability to 80% and a positive exercise

SPECT perfusion test moves it to 90% (Fig. 4-2).

As presented above, Bayes’ rule employs a number of important

simplifications that should be considered. First, few tests provide only

“positive” or “negative” results. Many tests have multidimensional outcomes (e.g., extent of ST-segment depression, exercise duration, and

exercise-induced symptoms with exercise testing). Although Bayes’

theorem can be adapted to this more detailed test result format, it

is computationally more complex to do so. Similarly, when multiple

sequential tests are performed, the posttest probability may be used

as the pretest probability to interpret the second test. However, this

simplification assumes conditional independence—that is, that the

results of the first test do not affect the likelihood of the second test

result—and this is often not true.

Finally, many texts assert that sensitivity and specificity are

prevalence-independent parameters of test accuracy. This statistically

useful assumption, however, is often incorrect. A treadmill exercise

test, for example, has a sensitivity of ~30% in a population of patients

with one-vessel CAD, whereas its sensitivity in patients with severe

three-vessel CAD approaches 80%. Thus, the best estimate of sensitivity


Decision-Making in Clinical Medicine

27CHAPTER 4

to use in a particular decision may vary, depending on the severity of

disease in the local population. A hospitalized, symptomatic, or referral

population typically has a higher prevalence of disease and, in particular, a higher prevalence of more advanced disease than does an outpatient population. Consequently, test sensitivity will likely be higher in

hospitalized patients and test specificity higher in outpatients.

■ STATISTICAL PREDICTION MODELS

Bayes’ rule, when used as presented above, is useful in studying diagnostic testing concepts, but predictions based on multivariable statistical models can more accurately address these more complex problems

by simultaneously accounting for additional relevant patient characteristics. In particular, these models explicitly account for multiple, even

possibly overlapping, pieces of patient-specific information and assign

a relative weight to each on the basis of its unique independent contribution to the prediction in question. For example, a logistic regression

model to predict the probability of CAD ideally considers all the relevant independent factors from the clinical examination and diagnostic

testing and their relative importance instead of the limited data that

clinicians can manage in their heads or with Bayes’ rule. However,

despite this strength, prediction models are usually too complex computationally to use without a calculator or computer. Guideline-driven

treatment recommendations based on statistical prediction models

available online, e.g., the American College of Cardiology/American

Heart Association risk calculator for primary prevention with statins

and the CHA2

DS2

-VASC calculator for anticoagulation for atrial fibrillation, have generated more widespread usage. When electronic health

records (EHRs) will provide sufficient platform support to allow for

routine use of predictive models in clinical practice and increase their

impact on clinical encounters and outcomes remains uncertain.

One reason for limited clinical use is that, to date, only a handful

of prediction models have been validated sufficiently (for example,

Wells criteria for pulmonary embolism; Table 4-2). The importance

of independent validation in a population separate from the one used

to develop the model cannot be overstated. An unvalidated prediction

model should be viewed with the skepticism appropriate for any new

drug or medical device that has not had rigorous clinical trial testing.

When statistical survival models in cancer and heart disease have

been compared directly with clinicians’ predictions, the survival models have been found to be more consistent, as would be expected, but

not always more accurate. On the other hand, comparison of clinicians

with websites and apps that generate lists of possible diagnoses to

help patients with self-diagnosis found that physicians outperformed

the currently available programs. For students and less-experienced

clinicians, the biggest value of diagnostic decision support may be in

extending diagnostic possibilities and triggering “rational override,”

but their impact on knowledge, information-seeking, and problemsolving needs additional research.

FORMAL DECISION SUPPORT TOOLS

■ DECISION SUPPORT SYSTEMS

Over the past 50 years, many attempts have been made to develop

computer systems to aid clinical decision-making and patient management. Conceptually, computers offer several levels of potentially

useful support for clinicians. At the most basic level, they provide

ready access to vast reservoirs of information, which may, however, be

quite difficult to sort through to find what is needed. At higher levels,

computers can support care management decisions by making accurate

predictions of outcome, or can simulate the whole decision process,

and provide algorithmic guidance. Computer-based predictions using

Bayesian or statistical regression models inform a clinical decision but

do not actually reach a “conclusion” or “recommendation.” Machine

learning methods are being applied to pattern recognition tasks such

as the examination of skin lesions and the interpretation of x-rays.

Artificial intelligence (AI) systems attempt to simulate or replace

human reasoning with a computer-based analogue. Natural language

processing allows the system to access and process large amounts of

data, both from the EHR and from the medical literature. To date, such

approaches have achieved only limited success. The most prominent

example, IBM’s Watson program, introduced publicly in 2011, has

yet to produce persuasive evidence of clinical decision support utility.

Reminder or protocol-directed systems do not make predictions but

use existing algorithms, such as guidelines or appropriate utilization criteria, to direct clinical practice. In general, however, decision

support systems have so far had little impact on practice. Reminder

systems built into EHRs have shown the most promise, particularly in

correcting drug dosing and promoting adherence to guidelines. Checklists may also help avoid or reduce errors.

■ DECISION ANALYSIS

Compared with the decision support methods discussed earlier,

decision analysis represents a normative prescriptive approach to

decision-making in the face of uncertainty. Its principal application

is in complex decisions. For example, public health policy decisions

often involve trade-offs in length versus quality of life, benefits versus

resource use, population versus individual health, and uncertainty

regarding efficacy, effectiveness, and adverse events as well as values or

preferences regarding mortality and morbidity outcomes.

One recent analysis using this approach involved the optimal

screening strategy for breast cancer, which has remained controversial,

in part because a randomized controlled trial to determine when to

begin screening and how often to repeat screening mammography is

impractical. In 2016, the National Cancer Institute–sponsored Cancer

Intervention and Surveillance Network (CISNET) examined eight

strategies differing by whether to initiate mammography screening at

age 40, 45, or 50 years and whether to screen annually, biennially, or

annually for women in their forties and biennially thereafter (hybrid).

The six simulation models found biennial strategies to be the most

efficient for average-risk women. Biennial screening for 1000 women

from age 50–74 years versus no screening avoided seven breast cancer

deaths. Screening annually from age 40–74 years avoided three additional deaths but required 20,000 additional mammograms and yielded

1988 more false-positive results. Factors that influenced the results

included patients with a 2–4-fold higher risk for developing breast

cancer in whom annual screening from age 40–74 years yielded similar

benefits as biennial screening from age 50–74. For average-risk patients

with moderate or severe comorbidities, screening could be stopped

earlier, at age 66–68 years.

This analysis involved six models that reproduced epidemiologic

trends and a screening trial result, accounted for digital technology and

treatments advances, and considered quality of life, risk factors, breast

density, and comorbidity. It provided novel insights into a public health

problem in the absence of a randomized clinical trial and helped weigh

the pros and cons of such a health policy recommendation. Although

such models have been developed for selected clinical problems, their

benefit and application to individual real-time clinical management

has yet to be demonstrated.

TABLE 4-2 Wells Clinical Prediction Rule for Pulmonary

Embolism (PE)

CLINICAL FEATURE POINTS

Clinical signs of deep-vein thrombosis 3

Alternative diagnosis is less likely than PE 3

Heart rate >100 beats/min 1.5

Immobilization ≥3 days or surgery in previous

4 weeks

1.5

History of deep-vein thrombosis or pulmonary

embolism

1.5

Hemoptysis 1

Malignancy (with treatment within 6 months)

or palliative

1

INTERPRETATION

Score >6.0 High

Score 2.0–6.0 Intermediate

Score <2.0 Low


28PART 1 The Profession of Medicine

DIAGNOSIS AS AN ELEMENT OF QUALITY

OF CARE

High-quality medical care begins with accurate diagnosis. The incidence of diagnostic errors has been estimated by a variety of methods

including postmortem examinations, medical record reviews, and

medical malpractice claims, with each yielding complementary but

different estimates of this quality of care patient-safety problem. In the

past, diagnostic errors tended to be viewed as a failure of individual

clinicians. The modern view is that they are mostly a system of care

deficiencies. Current estimates suggest that nearly everyone will experience at least one diagnostic error in their lifetime, leading to mortality, morbidity, unnecessary tests and procedures, costs, and anxiety.

Solutions to the “diagnostic errors as a system of care” problem

have focused on system-level approaches, such as decision support

and other tools integrated into EHRs. The use of checklists has been

proposed as a means of reducing some of the cognitive errors discussed

earlier in the chapter, such as premature closure. While checklists have

been shown to be useful in certain medical contexts, such as operating

rooms and intensive care units, their value in preventing diagnostic

errors that lead to patient adverse events remains to be shown.

EVIDENCE-BASED MEDICINE

Clinical medicine is defined traditionally as a practice combining medical knowledge (including scientific evidence), intuition, and judgment

in the care of patients (Chap. 1). Evidence-based medicine (EBM)

updates this construct by placing much greater emphasis on the processes by which clinicians gain knowledge of the most up-to-date and

relevant clinical research to determine for themselves whether medical

interventions alter the disease course and improve the length or quality

of life. The phrase “evidence-based medicine” is now used so often and

in so many different contexts that many practitioners are unaware of

its original meaning. The intention of the EBM program, as described

in the early 1990s by its founding proponents at McMaster University,

becomes clearer through an examination of its four key steps:

1. Formulating the management question to be answered

2. Searching the literature and online databases for applicable research

data

3. Appraising the evidence gathered with regard to its validity and

relevance

4. Integrating this appraisal with knowledge about the unique aspects

of the patient (including the patient’s preferences about the possible

outcomes)

The process of searching the world’s research literature and appraising the quality and relevance of studies can be time-consuming and

requires skills and training that most clinicians do not possess. In a

busy clinical practice, the work required is also logistically not feasible.

This has led to a focus on finding recent systematic overviews of the

problem in question as a useful shortcut in the EBM process. Systematic reviews are regarded by some as the highest level of evidence in the

EBM hierarchy because they are intended to comprehensively summarize the available evidence on a particular topic. To avoid the potential

biases found in narrative review articles, predefined reproducible

explicit search strategies and inclusion and exclusion criteria seek to

find all of the relevant scientific research and grade its quality. The prototype for this kind of resource is the Cochrane Database of Systematic

Reviews. When appropriate, a meta-analysis is used to quantitatively

summarize the systematic review findings (discussed further below).

Unfortunately, systematic reviews are not uniformly the acme of

the EBM process they were initially envisioned to be. In select circumstances, they can provide a much clearer picture of the state of

the evidence than is available from any individual clinical report, but

their value is less clear when only a few trials are available, when trials

and observational studies are mixed, or when the evidence base is only

observational. They cannot compensate for deficiencies in the underlying research available, and many are created without the requisite

clinical insights. The medical literature is now flooded with systematic

reviews of varying quality and clinical utility. The peer review system

has, unfortunately, not proved to be an effective arbiter of quality of

these papers. Therefore, systematic reviews should be used with circumspection in conjunction with selective reading of some of the best

empirical studies.

■ SOURCES OF EVIDENCE: CLINICAL TRIALS AND

REGISTRIES

The notion of learning from observation of patients is as old as medicine itself. Over the past 50 years, physicians’ understanding of how

best to turn raw observation into useful evidence has evolved considerably. Medicine has received a hard refresher lesson in this process

from COVID-19 pandemic. Starting in the spring of 2020, case reports,

personal and institutional anecdotal experience, and small singlecenter case series started appearing in the peer-reviewed literature

and within months turned into a flood of confusing and often contradictory evidence. Observational reports of treatments for COVID-19

fueled the confusion. Despite >40,000 publications appearing in the

first 7 months of the pandemic, an enormous amount of uncertainty

around prevention, diagnosis, treatment, and prognosis of the disease remained. Many of the early 2020 publications were either small

observational series or reviews of published series, neither of which

can resolve the key uncertainties clinicians need to address in caring

for these patients. These small observational studies often have substantial limitations in validity and generalizability, and although they

may generate important hypotheses or be the first reports of adverse

events or therapeutic benefit, they have no role in formulating modern

standards of practice. The major tools used to develop reliable evidence

consist of randomized clinical trials supplemented strategically by large

(high-quality) observational registries. A registry or database typically

is focused on a disease or syndrome (e.g., different types of cancer,

acute or chronic CAD, pacemaker capture, or chronic heart failure), a

clinical procedure (e.g., bone marrow transplantation, coronary revascularization), or an administrative process (e.g., claims data used for

billing and reimbursement).

By definition, in observational data, the investigator does not control patient care. Carefully collected prospective observational data,

however, can at times achieve a level of evidence quality approaching

that of major clinical trial data. At the other end of the spectrum, data

collected retrospectively (e.g., chart review) are limited in form and

content to what previous observers recorded and may not include the

specific research data being sought (e.g., claims data). Advantages of

observational data include the inclusion of a broader population as

encountered in practice than is typically represented in clinical trials

because of their restrictive inclusion and exclusion criteria. In addition,

observational data provide primary evidence for research questions

when a randomized trial cannot be performed. For example, it would

be difficult to randomize patients to test diagnostic or therapeutic

strategies that are unproven but widely accepted in practice, and it

would be unethical to randomize based on sex, racial/ethnic group,

socioeconomic status, or country of residence or to randomize patients

to a potentially harmful intervention, such as smoking or deliberately

overeating to develop obesity.

A well-done prospective observational study of a particular management strategy differs from a well-done randomized clinical trial most

importantly by its lack of protection from treatment selection bias.

The use of observational data to compare diagnostic or therapeutic

strategies assumes that sufficient uncertainty and heterogeneity exists

in clinical practice to ensure that similar patients will be managed

differently by diverse physicians. In short, the analysis assumes that a

sufficient element of randomness (in the sense of disorder rather than

in the formal statistical sense) exists in clinical management. In such

cases, statistical models attempt to adjust for important imbalances

to “level the playing field” so that a fair comparison among treatment

options can be made. When management is clearly not random (e.g.,

all eligible left main CAD patients are referred for coronary bypass

surgery), the problem may be too confounded (biased) for statistical

correction, and observational data may not provide reliable evidence.


No comments:

Post a Comment

اكتب تعليق حول الموضوع

Popular Posts

Popular Posts

Popular Posts

Popular Posts

Translate

Blog Archive

Blog Archive

Featured Post

  ABSTRACT Clinical application of doxorubicin (Dox) in cancer chemotherapy is limited by its cardiotoxicity. Present study aimed to demonst...