The Case Fatality Ratio of the Novel Coronavirus (2019-nCoV)

Estimating the case fatality ratio (CFR) of the 2019 novel coronavirus is crucial to inform policy makers and help them make the best decisions.

As of 7 February 2020 in mainland China there are 31 161 confirmed cases, 637 deaths, 1 540 recoveries. So some claim the case fatality ratio is 637/31161 = 2%. But this is as naïve as claiming the recovery ratio is 1540/31161 = 5% thus that the fatality is 95%.

28 984 cases (93%) are not resolved. They could either die or recover. So in reality the CFR could be between 2% and 95%.

Naïve CFR
Resolved CFR
Lack of awareness
Simulating naïve vs. resolved CFR
CFR of 2019-nCoV
Updates & Validation

Naïve CFR

After the SARS epidemic, a paper was published in the American Journal of Epidemiology by Ghani et al., Methods for Estimating the Case Fatality Ratio for a Novel, Emerging Infectious Disease, where they demonstrated that this common method to estimate the CFR was severely flawed. The authors call it a “naïve estimate:”

naïve CFR = deaths / cases

They noted this method is “clearly easier to describe to policy makers and the public” however it exhibits “considerable bias.” In the case of SARS in Hong Kong in 2003, between 2 April and 21 May it “falsely suggested a rise in the case fatality ratio” by 5x, from 2% to 11%. See the “naïve CFR” curve labelled “simple estimate 1” in their figure 3a:

Ghani et al. figure 3a

In reality the true observed CFR has been about 13% during this period of time. The naïve CFR was severely inaccurate due to “simply an artifact,” a lag: “the final outcome for patients [death or recovery] lagged behind their identification by approximately 3 weeks.”

We observed this false rise in the naïve CFR with other outbreaks. For example Ebola (source: Case fatality rate for Ebola virus disease in west Africa):

Resolved CFR

Ghani et al. recommend two other methods to better estimate the CFR. One eliminates the lag between identification and death or recovery by taking into account only cases whose outcome is known, or resolved. Ghani et al. refer to it as the “simple estimate 2” (e₂) but for clarity I suggest calling it the resolved CFR:

resolved CFR = deaths / (deaths + recoveries)

Another method is based on the Kaplan-Meier survival procedure, which I will not describe here.

The authors conclude that both methods, the resolved CFR and Kaplan-Meier, “adequately estimated the case fatality ratio during the SARS epidemic.” As their figure 3a shows the resolved CFR tracks the true observed CFR much more closely than the naïve CFR:

Ghani et al. figure 3a

Of course these methods have caveats, but it is very clear from their data that the naïve CFR is the least accurate estimate of all methods, severely underestimates the true CFR, and only converges toward it near the end of the outbreak.

15 years after Ghani et al.’s paper, it was cited by at least 59 others. For example Estimating Absolute and Relative Case Fatality Ratios from Infectious Disease Surveillance Data supports it and concludes that “the naïve estimator is virtually always biased, often severely so.”

Since the start of the 2019-nCoV outbreak, another expert has specifically come forward to emphasize the resolved CFR is the right method: see this article in Issues in Science and Technology by medical doctor Maimuna Majumder, PhD.

Epidemiological methods overwhelmingly support the resolved CFR as a better estimate than the naïve CFR, period.

[Edit: 6 months later, in August 2020, the WHO published a scientific brief that explains the issues with the naïve CFR method, and suggests the resolved CFR as a simple solution.]

Lack of awareness

Unfortunately, 15 years after Ghani et al.’s paper, the public, journalists, and even many in the medical field have learned nothing from epidemiologists. They continue to use the naïve CFR. They appear unaware it often underestimates the true CFR and is expected to rise over time (“false rise”.)

For example the scientific director of the World Health Organization’s SARS investigation (of all people!) appeared unaware. A 2003 New York Times article wrote: “the death rate has also steadily risen, leaving health officials worried. Lacking a precise explanation for the rise, health officials have generated a number of theories. In outbreaks of other new infections, the death rate has usually fallen with time. ‘‘It’s worrying, and we hope it is not an indication of a continuing trend,’’ said Dr. Klaus Stöhr, scientific director of the W.H.O.’s SARS investigation.”

The WHO was subsequently criticized in A comparison study of realtime fatality rates: severe acute respiratory syndrome in Hong Kong, Singapore, Taiwan, Toronto and Beijing, China (PDF version,) where the authors write: “While the outbreak was on going and there were patients still in hospitals over the course of the epidemic, the WHO estimate assumed implicitly that all remaining SARS in-patients would eventually recover. It therefore led to an underestimation of the true case fatality rate. For example, in the midst of the SARS outbreak at April 15th, 2003, the fatality rate in Hong Kong was 4.5% according to the WHO estimate, but it hit a record high of 17.0% at the end of the epidemic.”

Have the WHO learned anything since 2003? No! Another WHO representative recently still used the naïve CFR method which calculates the 2% figure: “WHO representative to the Philippines Dr. Rabindra Abeyasinghe noted that the 2019-nCoV’s death rate fell to about 2 percent.”

Simulating naïve vs. resolved CFR

To demonstrate how inaccurate the naïve CFR is compared to the resolved CFR, I wrote a Python script that simulates an outbreak infecting a population of 100k individuals over 200 days following a logistic growth curve which increases gradually at first, more rapidly in the middle growth period, and slowly at the end, eventually leveling off. The disease has a 50% probability of causing death 21 days after infection.

The results are obvious: the naïve CFR underestimates the true CFR by 5x (it starts at 9%) and only converges toward the true CFR (50%) near the end of the epidemic. By comparison the resolved CFR is a perfect estimate at any point:

In a second simulation, I changed the parameter time_to_heal to 28 days in order to simulate recovery taking longer than death: deaths still happen 21 days after the infection, but recoveries happen 28 days after. The only difference this creates is that at the beginning and middle of the epidemic the resolved CFR slightly overestimates the true CFR by a factor of about 1.26x (63%/50%):

Three things are very clear:

The naïve CFR is always a severe underestimate in the beginning and middle phase of an outbreak.
The naïve CFR is bound to increase over time.
The resolved CFR is always much more accurate than the naïve CFR. At worse the resolved CFR is off by 1.26x while the naïve CFR is off by 5x.

Want to play with my simulation? Here is the source code:

#!/usr/bin/python3

import math

population = 100e3
days = 200
death_prob = 0.50
time_to_death = 21
time_to_heal = 21
hist = []
deaths = recovs = naive_cfr = resolved_cfr = 0
print('day,cases,deaths,recoveries,naive_cfr,resolved_cfr')
for d in range(0, days):
    if d == 0:
        cases = 0
    else:
        cases = round(population / (1 + math.e**(-0.08*(d - days/2))))
    hist.insert(0, cases)
    if len(hist) >= time_to_death + 2:
        deaths += round((hist[time_to_death] - hist[time_to_death + 1]) * \
                death_prob)
    if len(hist) >= time_to_heal + 2:
        recovs += round((hist[time_to_heal] - hist[time_to_heal + 1]) * \
                (1 - death_prob))
    if d > max(time_to_death, time_to_heal):
        if cases:
            naive_cfr = 100 * deaths / cases
        if deaths + recovs:
            resolved_cfr = 100 * deaths / (deaths + recovs)
    print('{day},{cases},{deaths},{recovs},{naive_cfr},{resolved_cfr}'.\
            format(day=d, cases=cases, deaths=deaths, recovs=recovs,
            naive_cfr=naive_cfr, resolved_cfr=resolved_cfr))

CFR of 2019-nCoV

We know the naïve CFR (2%) is inaccurate and will increase, as per previous outbreaks (SARS, Ebola) and as per simulations.

We know the resolved CFR is much more accurate, and as of 7 February 2020 it stands at deaths/(deaths+recoveries) = 637/(637+1540) = 29%, which is rather alarming.

But we should interpret any CFR estimate with caution:

We are very early in the outbreak, with very few resolved cases (a few thousands) so the resolved CFR is fluctuating a lot from day to day
China’s official statistics may be underestimating the number of deaths (eg. patients who are suspected of but not yet confirmed having 2019-nCoV and who die are not counted toward deaths caused by the virus)
There may be many undetected/underreported mild cases that heal on their own and thus are not counted in the number of recoveries (these cases could be detected when/if serosurveys are performed)

Nonetheless we can take a stab at guessing a best case scenario for a low fatality ratio. Let’s assume there are 50x more cases than reported, all mild, so 50x more recoveries. And let’s assume there are only 2x more deaths than reported.

With these parameters the CFR would be 637*2/(637*2+1540*50) = 1.6%, which is still concerning because if there are 50x more cases than reported, then the virus is spreading far faster than we think. So a pandemic would be unavoidable, and 1.6% would be 16x deadlier than the seasonal flu (0.1%.) The flu kills half a million worldwide every year, so 2019-nCoV would kill a few millions.

You can play with the parameters, but either way 2019-nCoV is not looking good at all.

My parameters of 50x more cases than reported and 2x more deaths than reported are equivalent to assuming the same number of deaths but 25x more cases than reported, as of 7 February 2020. This assumption is roughly consistent with other estimates:

Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study estimates 75 815 cases as of 25 January 2020, which is 38x more than the 1 975 cases officially reported by China on that date.
The Imperial College London MRC Centre for Global Infectious Disease Analysis estimates in Report 4: Severity of 2019-novel coronavirus as of 3 February 2020 that there are 19x or 26x more infections than reported, according to 2 scenarios.
The Rate of Underascertainment of Novel Coronavirus (2019‐nCoV) Infection: Estimation Using Japanese Passengers Data on Evacuation Flights estimates 11x more infections than reported (“the ascertainment rate of infection was estimated at 9.2%.”)

Updates & Validation

As of 16 February 2020 the resolved CFR based on China’s official statistics stands at 1863/(1863+10844) = 15%. And when accounting for underreported mild cases, I still believe my guess of 1.6%, made on 7 February 2020, is reasonable.

On 10 February 2020, the Imperial College London MRC Centre for Global Infectious Disease Analysis published Report 4: Severity of 2019-novel coronavirus that estimates the CFR based on China’s statistics at 18%, or 0.8-0.9% when accounting for underreported mild cases. Both figures approximately match my estimates (respectively 15% and 1.6%.)

On 14 February 2020, Real-Time Estimation of the Risk of Death from Novel Coronavirus (COVID-19) Infection: Inference Using Exported Cases estimated the IFR at 0.5-0.8%. The Infection Fatality Ratio is the same as a CFR estimate that corrects for underreported cases. It approximately matches my estimate (1.6%.)

On 19 February 2020, the Institute for Disease Modeling estimated the IFR at 0.94%. It approximately matches my estimate (1.6%.)

The 3 reports above are referenced by the WHO in Coronavirus disease situation report - 30 (references 10, 11 and 12) and in situation report 31.

As of 21 February 2020, a researcher from the Institute of Social and Preventive Medicine, University of Bern estimated the CFR using a methodology to correct for possible underreporting of mild cases, and found 1.6%. It exactly matches my estimate for this scenario (1.6%.)

As of 30 March 2020, a peer-reviewed paper in The Lancet, Estimates of the severity of coronavirus disease 2019: a model-based analysis, estimated the IFR at 0.66%. It approximately matches my estimate (1.6%) within a two-fold factor.