19

As of time of writing it's quite common to see headlines about so-and-so who have tested positive for Covid-19, e.g. Canadian PM Justin Trudeau's wife, Sophie, tests positive for coronavirus, officials say

However, these articles don't usually say how reliable the testing is. I couldn't find any recent information on it via Google, either (there are some results, e.g. this, but they are old).

How accurate are the tests? What are the odds that Sophie Trudeau's results are a false positive, or that Justin Trudeau's negative result is a false negative?

Allure
  • 618
  • 5
  • 14
  • 2
    "What are the odds that Sophie Trudeau's results are a false positive" This is a nearly impossible question to answer, because it depends on her specific circumstances. I think what you want to know are the sensitivity and specificity of available COVID-19 tests. I also am interested in these statistics. – Him Mar 13 '20 at 23:39
  • 1
    @Scott: Allure probably asks for the positive predictive value rather than only sensitivity and specificity - but you are right that the required incidence for the "people like Trudeau's wife" cohort may not be known with sufficient precision and accuracy to allow a meaningful calculation - particularly not with the current dynamics of the situation. (I haven't seen sens & spec for the current tests, neither, and am curious, too) – cbeleites unhappy with SX Mar 14 '20 at 00:16
  • Related: https://medicalsciences.stackexchange.com/q/21355/7951 – Chris Rogers Mar 14 '20 at 11:38
  • @cbeleitesunhappywithSX, you are correct that Sophie's and Justin's results are probably better interpreted in terms of ppv and npv than sensitivity and specificity. I think with the sensitivity, specificity and ppv you could recover the full confusion matrix modulo sample size, which would really be the best thing to have... – Him Mar 14 '20 at 18:30
  • @Scott: while I did not find validation results for the tests, I found emergency validation specifications of the FDA that allow "worst-test-passing" calculations. – cbeleites unhappy with SX Mar 14 '20 at 22:57
  • 2
  • Does "How accurate are coronavirus tests?" relate to the test of the RT-PCR method or to every type of COVID-19 test? – gotwo May 23 '20 at 13:52

3 Answers3

15

Short answer: Sophie Trudeau's positive test may still mean 3 : 1 odds of not having contracted Covid-19, but the odds could also be far more towards having Covid-19.
Justin Trudeau's negative test almost certainly means he was negative when tested. Of course, should Sophie be positive, that may have changed by now.


Update: I found a web page of the FDA listing tests that have this Emergency Use Approval. Each of them has manufacturer instuctions that list their test results towards the end. Some of the submitted test results use ≈100 negative samples in the clinical evaluation. But there is also one that has only 13 positive cases (all of which were correctly found) in their test - but that one got the emergency approval already early in February.
The latest 2 (by Thermo and Roche) used 60/60 and 50/100 cases for their clinical evaluation (all results correct). That gets the lower end of the confidence intervals for sensitivity and specificity to 94 - 97 % (with the lower end of the c.i. we'd thus have LR+ >≈ 20 or 17, expected values would be 61 or 51, respectively. For LR- <≈ 1/20 or 1/32 (expected 1/61 or 1/100).


Long version: Let's do some number juggling and see whether we can extract something useful. The following calculations are based on a somewhat worst-case scenario: the FDA emergency validation guidelines specify outcomes that such a new test in the current emergency situation must meet, and I calculate from the low end of performance that could be expected meet these criteria under somewhat unlucky circumstances.

So we know minimum performance requirements, but I do not know [yet] how good (= how much better than minimum performance) the tests are.

Sensitivity and Specificity

The starting point would be sensitivity and specificity of the respective test. I haven't found any published data on these, but the FDA has a Policy for Diagnostics Testing for Coronavirus Disease-2019 that says how labs that develop a test should do an "emergency validation". They can then get Emergency Use Approval.

  • Sensitivity tells us: of all truly Covid-19-positive specimen, which percentage is correctly recognized as positive by the test; I'm going to use 87.5 % (see below)
  • Specificity tells us: of all specimen that are truly negative for Covid-19, which percentage is correctly recognized by the test.

Sensitivity and specificity can be measured by standardized protocols, and they characterize the performance of the test.

From a patient's or doctor's point of view, however, they are not very useful numbers as they (we) need the answer to the inverse questions:

Predictive values

  • positive predictive value PPV: given that the test yielded positive, what is the probability that the patient truly has the virus?
  • negative predictive value NPV: given that the test yielded negative, what is the probability that the patient truly does not have the virus?

Predictive values can be calculated from sensitivity and specificity together with the prevalence (or, as we are talking about newly infected patients, the incidence) of virus among the tested population. Here's how they go for the assumed sensitivity and specificity:

predictive values
red for positive test outcomes, dark green for negative test outcomes. posterior probability = after the test said something, what is the probability that the test result is correct = NPV for negative test result, PPV for positive test result.

Good estimates of prevalence are often difficult to obtain, because we need the prevalence among those who are tested and that is (and should be) quite different from the prevalence of the disease in the general population.

For some countries such as Italy, we have numbers of performed tests (Wiki page giving numbers for more countries): as of today (March 14th), they ran 109170 tests and have a total of 21157 cases. Not all tests are for initial diagnosis (AFAIK, a patient is considered cured only after a bunch of tests are negative), but as most cases are still "fresh", we may the ratio positive cases : tests run as one surrogate for the prevalence in the tested population. This would be around 20 % for Italy.

From the diagram we can then read for a new patient being tested the first time:

  • if the test outcome is positive, chances are ≈ 75 % (or greater as I'm calcluating with the lower end of the possible range from the emergency validation) that the patient really has Covid-19.
    So, up to 25 % false positives.
  • if the test outcome is negative, chances are ≈ 95 % (or greater...) that the patient really does not have Covid-19.
    So, up to 5 % false negatives.

Canada currently reports 25 positive out of a total of 796 tests, so about 3 % prevalence in the tested population.

For the USA, the CDC reports case numbers and test numbers, currently 1629 cases with (3995 + 15749) tests - but it is not entirely clear to me whether the populations completely coincide. Anyways, I'll use 8 % as guesstimate for prevalence.

| prevalence |  PPV |    NPV | Country/Population | 
+------------+------+--------+--------------------+
| 3 %        | 26 % | 99.6 % | Canada             |
| 8-9 % (?)  | 50 % | 98.8 % | USA, Germany       |
| 20 %       | 75 % | 95   % | Italy              |

Now that is the population of "everyone who is tested". At least for Germany (March 14th) but probably also for Canada and the US, tests are done on people who show symptoms and on people that are known to have had contact with Corona virus cases: if they are found to be positive, they are sent home to quarantine and wait whether they did acutally catch the disease. But we may still say that the "people who are tested" population has subpopulations with and without symptoms.
So if there are further point of suspicion, say, the patient does cough, we'd say they belong to a high risk subpopulation with higher prevalence of Covid-19 compared to the overall prevalence among tested, so the chance of a false positive would be lower for them.

This will change as tests are performed only on people who show symptoms (by now, March 16th, everyone is already told to stay at home in Germany and self-isolate as well as possible - the test result will not change this in any way).

Handy for quick calculation: likelihood ratio positive (LR+) and negative (LR-)

If we express the prevalence as odds (1 : 99 instead of 1 %), LR+ and LR- allow easy back-of-the envelope calculations.

LR+ = sensitivity / (1 - specificity) ≈ 11 or better for our test and LR- = (1 - sensitivity) / specificity ≈ 0.05 = 1/20 or better for our test.

so they are indpendent of the prevalence and they tell us how the odds change due to the test: the posterior probability (predictive values) are the the pre-test odds multiplied with the likelihood ratio. In that sense, they tell us how much information we gain due to test (for positive and negative outcomes).

So, Canada reported odds of 25 positive : 771 negative tests. So for Sophie Trudeau, the odds went from 25 : 771 (or 3 %) to 25 : 771 * 11 = 273 : 771 ≈ 1 : 3 or 25 %. For Justin Trudeau 25 : 771 / 20 = 25 : 15420 ≈ 1 : 617 or 0.16 % of being Covid-19 positive (slight discrepancy to above due to rounding).

On the other hand, if we start testing more and more people for whom the risk of actcually having contracted the virus is lower and lower, we soon won't be able to draw meaningful conclusions from the test results any more: I'm in Germany, currently the federal info page lists 3800 confirmed cases. Even if we assume that there is a huge dark figure and in reality 20 x as many people are infected* that would be a prevalence in the general population of 0.1 %, the odds are 1 infected : 1.5 mio non-infected. If someone who does not belong to any particular risk group were tested positive, their odds would increase by factor 11, i.e. roughly 1 : 140000. A negative test result would decrease the odds by a factor of 1/20 to 1 : 31 mio. However, both results are of no practical use as they don't change the situation: pre-test situation is "very unlikely to be Covid-19 case", post-test situation is still "very unlikely to be Covid-19 case".

This is why, even if there were testing capacities for the whole population, tests don't make sense unless we know there are some risk factors such as some kind of respiratory disease or contact to someone who is known to have the virus. And why it is sensible to say that on slightly suspicious circumstances (= low, but not extremely low prevalence), one should self-quarantine/avoid contact but there's no point in testing [yet].


Detais on how I estimate sensitivity and specificity

  • The lab first determines the limit of detection (LoD) for virus RNA. The LoD here is the lowest concentration at which 19 out of 20 replicate tests are positive (that would be 95 % sensitivity, but for samples spiked in the lab, but with a relevant and as difficult as possible matrix such as sputum).
  • Next, [leftover] clinical samples are used:

    • As there may not be any positive samples available, the lab can spike negative samples with virus RNA; a minimum of 20 samples are spiked within a range of 1 - 2 LoD plus 10 samples to cover the remaining clinical range.
      Of these, at most 1 is allowed to have negative outcome (and that must be one in the lower concentration range). Thus, 29 positive out of 30 true positive.

    • Also 30 non-reactive specimens are tested. The policy doesn't say a number of false positives that is permissible, so I'm going to assume that must be zero.

  • The lab can then start with real patient samples. The first 5 positive and the first 5 negative cases must be confirmed by an approved test (and must all match).

Pooling the 2 "rounds" of testing clinical samples, we have:

  • 34 (or 35) positive tests out of 35 truly positive specimen -> sensitivity at least 97 % with 95% credible interval 87.5 - 100 %,
  • specficity 100 % with 95% credible interval 92 - 100 %

  • There are additional checks required but they don't help us here.

  • Sampling errors are not included in this validation.
    I'm not a clinical chemist, but I'm analytical chemist and in general analytical chemistry, that can easily be the dominating source of error. In that case, the above numbers would be useless.

  • Particularly the sensitivity may drop over time as the virus mutates.

  • To deal with this, in some cases multiple tests are done (read that in some newspaper article which I don't find at the moment).
    When doing multiple tests, tests by different providers are used if possible: as they are developed from different virus samples, this gives a better coverage for mutations in the virus than doing the same test in replicate.


* Very handwavy scenario I derived from the development of case numbers in China after their quarantine/shutdown on Jan 26, assuming an incubation period of 2 weeks and that noone got infected after quarantine started.


Update Jun 3rd, 2020:

  • Results of a round robin test/ring trial in April in Germany on detection of SARS-CoV2 RNA (retrieved from https://www.instand-ev.de/en/eqas-online/service-for-eqa-tests.html#rvp//340/2003/)

    • 7 samples: 4 positive for SARS-CoV2 covering a factor of 1000 in concentration of virus RNA; 3 negative for SARS-CoV: 1 negative for any coronavirus, 1 positive for HCoV OC43 but not SARS-CoV2 and 1 positive for HCoV 229E but not SARS-CoV2.
    • 463 laboratories participated, submitting a total of 983 results per sample (1 lab can submit results for several methods)

    • The final evaluation comprises only 4 out of the 7 samples: while the ring trial was still ongoing, the ground truth and a preliminary evaluation was published for 3 samples in order to allow labs to use the results to adjust their procedure.

    • There was a problem with 24 labs (59 lab×methods) where results for two samples (the positive 1:100000 dilution and the negative with HCoV 22E) were interchanged.

    I'll disregard these two samples for now.

    • Specificity was 969/983 = 98.6 % for the negative and 961/983 = 97.8% for the negative with HCoV OC43 (sample was unblinded).
      The report notes that it is not clear whether the false positives are a result of the specificity of the method itself or whether they stem from contamination of the test samples with SARS-CoV2 in the labs.

    • Sensitivity was 980/983 = 99.7 % for the highest (sample was unblinded) and 914/983 = 93.0% for the lowest virus RNA concentration.

    Such a ring trial measures the performance of diagnostic method and lab handling of the samples together.

  • Wenling Wang et al.: Detection of SARS-CoV-2 in Different Types of Clinical Specimens, JAMA, 2020 May 12; 323(18): 1843–1844. reports on sensitivity of RT-PCR tests for various sampling procedure/locations. 1070 specimen collected from 205 confirmed Covid-19 patients.
    They report 5 of 8 nasal swabs (95%c.i. 30 - 90%) to be positive and 126 of 398 pharyngeal swabs (95% c.i. 27 - 36%). This other study, while not primarily about PCR sensitivity reports about 60 %

    The paper has very few details, e.g. on the point in time when samples were taken.

    This study measures the sensitivity of method, lab handling and sampling procedure together.

    Experts in radion interviews I heard over the last months were talking about sensitivity in the order of magnitude of 70 - 80 % due to difficult sampling (requiring trained staff, preferrably taking 2 swabs [but swabs were scarce at some point], the correct procedure for nasopharyngeal sampling being painful, the correct point in time wrt. the course of the disease)

    In any case, the overall sensitivity seems to be dominated by sampling error rather than lab handling or the sensitivity of the RT-PCR.

So my initial guesstimates were too optimistic for overall sensitivity, but specificity looks better than the initial worst-case scenario: we have an LR⁺ of around 50, but LR⁻ is only about 1/4.

Important Note: all this is about the real time PCR tests for virus RNA. Antibody tests have their own and actually quite different characteristics.

  • 2
    "25 positive out of a total of 796 tests, so about 3 % prevalence in the tested population." Are they testing randos or people who seem to be displaying symptoms? If the latter, then this is going to produce a prevalence estimate that is off by an order of magnitude. (since it will be the prevalence in the population of symptomatic ppl, rather than the population of "all Canadians") – Him Mar 15 '20 at 16:07
  • Ah, hence your comment about only testing symptomatic people! – Him Mar 15 '20 at 16:11
  • 1
    @Scott: to clarify they don't test random people and they should not as the tests are not meant for screening (i.e. searching for rare cases) - to be useful for screening, a test needs much higher specificity and sensitivity (which the tests may have, but we don't know that as proving that would need far more extensive testing). Using the tests on the general population would probably mean an avalanche of false positives hiding the true positives. – cbeleites unhappy with SX Mar 15 '20 at 23:27
  • 1
    I agree. However, I think that I see why I was confused by the two cohorts: You say "If there are further point of suspicion, say, the patient does cough, we'd say they belong to a high risk group with even higher prevalence of Covid-19, so he the chance of a false positive would be lower for them.", but the numbers that you calculate are already for those with symptoms. So their chance of false positive is not lower, it is what you have presented. – Him Mar 16 '20 at 00:06
  • 1
    @Scott: over here, they test also people without symptoms who have been in contact with confirmed cases. So the population of "people who are tested" has subgroups. Among them, I expect the prevalences to go "without symptoms" < "overall tested" population" < "with symptoms". This is changing right now towards testing only people with symptoms (because everyone else has already been told to avoid contact) and then but the prevalence then may be more e.g. like the 15 % among the tested in Italy right now. – cbeleites unhappy with SX Mar 16 '20 at 15:22
  • 4
    Yes, I am merely pointing out that it is quite difficult to determine which statements in your answer apply to which cohorts. You seem to switch between talking about these various populations without any sort of segway, and it is somewhat confusing which one you are referring to at any time. To be clear, we are on the same page at this point, but I worry that future readers may find themselves scratching their heads over some bits. – Him Mar 16 '20 at 15:52
  • 1
    @Scott: I tried to clarify - please have a look whether you think I succeeded. – cbeleites unhappy with SX Mar 16 '20 at 20:00
  • 1
    I just updated again to correct LR+ and LR- for the Roche test since I mixed the positive (50) and negative (100) validation test numbers - sorry. (Doesn't lead to practical changes in conclusions, though) – cbeleites unhappy with SX Mar 19 '20 at 22:04
  • So... it's much more likely that she didn't actually have the virus? That would also explain why Prince Charles or Boris Johnson recovered so quickly... they might never have had the virus in the first place. – JonathanReez Apr 03 '20 at 08:13
  • 1
    @JonathanReez: I'm not saying it is that unlikely that she had it, I'm only saying we cannot be very sure right now. The difference is that right now, we are not certain how well the test performs (as opposed to knowing with certainty that the test is not that sensitive). Whether someone may never have had the virus: if we really want to know, we could increase the overall certainty by doing more independent tests (e.g. did they have symptoms? or use another test by another manufacturer that targets a different piece of the virus RNA) – cbeleites unhappy with SX Apr 03 '20 at 19:16
  • This is an absolutely brilliant answer! This is why, even if there were testing capacities for the whole population, tests don't make sense unless we know there are some risk factors [...] This goes for any screening tool whatsoever.. Unless they are really really good, population-wide screening does more damage than good. This changes drastically when you only test risk (sub-)groups! – Narusan Jun 01 '20 at 10:15
  • 1
    @Narusan: thank you! Wrt. general requirements for screening and the linked news article, 2 % incidence/prevalence in a screening population would be considered a high-risk group for many diseases - that would correspond roughly e.g. to women age 50+ that do carry the BRCA1 breast cancer gene (think Angelina Jolie in 5+ years without mastectomy). – cbeleites unhappy with SX Jun 03 '20 at 15:20
  • Phenomenal answer, thank you. – Andrew Jul 25 '20 at 08:13
4

Is there any figure about the accuracy of COVID-19 tests?

A thorough review of prior research found: Using "rRT-PCR; First-line screening tool: E gene assay; Confirmatory testing: RdRp gene assay." one can obtain "95% detection probability, 100% specificity".

Reference:

The Journal of Clinical Medicine review: "Potential Rapid Diagnostics, Vaccine and Therapeutics for 2019 Novel Coronavirus (2019-nCoV): A Systematic Review", J. Clin. Med. 2020, 9(3), 623 explains:

"A systematic search was carried out in three major electronic databases (PubMed, Embase and Cochrane Library) to identify published studies examining the diagnosis, therapeutic drugs and vaccines for Severe Acute Respiratory Syndrome (SARS), Middle East Respiratory Syndrome (MERS) and the 2019 novel coronavirus (2019-nCoV), in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.

...

An initial search identified a total of 1,065 articles from PubMed, Embase and Cochrane Library. There were 236, 236 and 593 articles related to diagnostics, therapeutics and vaccines, respectively. After reviewing for inclusion and exclusion and the removal of duplications, a total of 27 studies were used for the full review.

...

The first validated diagnostic test was designed in Germany. Corman et al. had initially designed a candidate diagnostic RT-PCR assay based on the SARS or SARS-related coronavirus as it was suggested that circulating virus was SARS-like. Upon the release of the sequence, assays were selected based on the match against 2019-nCoV upon inspection of the sequence alignment. Two assays were used for the RNA dependent RNA polymerase (RdRP) gene and E gene where E gene assay acts as the first-line screening tool and RdRp gene assay as the confirmatory testing. All assays were highly sensitive and specific in that they did not cross-react with other coronavirus and also human clinical samples that contained respiratory viruses."

Sources:

  • Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR, Euro Surveill. 2020 Jan 23; 25(3): 2000045.

    "All assays were highly sensitive, with best results obtained for the E gene and RdRp gene assays (5.2 and 3.8 copies per reaction at 95% detection probability, respectively). These two assays were chosen for further evaluation. One of the laboratories participating in the external evaluation used other basic RT-PCR reagents (TaqMan Fast Virus 1-Step Master Mix) and repeated the sensitivity study, with equivalent results (E gene: 3.2 RNA copies/reaction (95% CI: 2.2–6.8); RdRP: 3.7 RNA copies/reaction (95% CI: 2.8–8.0).

    ...

    In total, this testing yielded no false positive outcomes. In four individual test reactions, weak initial reactivity was seen but they were negative upon retesting with the same assay. These signals were not associated with any particular virus, and for each virus with which initial positive reactivity occurred, there were other samples that contained the same virus at a higher concentration but did not test positive. Given the results from the extensive technical qualification described above, it was concluded that this initial reactivity was not due to chemical instability of real-time PCR probes but most probably to handling issues caused by the rapid introduction of new diagnostic tests and controls during this evaluation study.".

  • COVID-19 testing:

    "Testing for the respiratory illness named coronavirus disease 2019 (COVID-19) and the associated SARS-CoV-2 virus is possible with two main methods: molecular recognition and serology testing.

    Molecular methods leverage polymerase chain reaction (PCR) along with nucleic acid tests, and other advanced analytical techniques, to detect the genetic material of the virus using real-time reverse transcription polymerase chain reaction for diagnostic purposes.

    Serology testing, leverages ELISA antibody test kits to detect the presence of antibodies produced by the host immune system against the virus.

    ...

    Serology antibody testing is being used both for surveillance and investigational purposes including, in China, confirmation of recovery only, while the molecular test methodologies are used to diagnosis active infections.".

    "There are several serology techniques that can be used depending on the antibodies being studied. These include: ELISA, agglutination, precipitation, complement-fixation, and fluorescent antibodies and more recently chemiluminescence.".

Rob
  • 221
  • 1
  • 3
  • 9
2

First off, it depends on which test - blood draw for antibodies (past infection), or saliva/nasal swab for current infection.

Here is an easy-to-understand answer from Harvard Heath:

"If you get the nasal/throat swab or saliva test, you will get a false negative test result:

  • 100% of the time on the day you are exposed to the virus. (There are so few viral particles in your nose or saliva so soon after infection that the test cannot detect them.)
  • About 40% of the time if you are tested four days after exposure to the virus.
  • About 20% of the time if you develop symptoms and are tested three days after those symptoms started. This possibility of a false negative test result is why anyone who has symptoms that could be due to COVID-19, or has been exposed to someone known to be infected, must isolate even if they test negative for coronavirus."
Dan Dascalescu
  • 513
  • 2
  • 11