The number of confirmed COVID-19 cases in the US surpassed 1.1 million on May 2. However, the number of actual cases is probably significantly larger than that: on that same day, we learned that 12.3% of New York state tested positive for coronavirus antibodies. That's 2.4 million people - in one state. The truth is, no one really knows how many have been infected.
How is such a disparity possible? What are its implications?
The answer to these questions starts with understanding the role of testing policies.
Let's consider a simple example. The graph opposite represents the population of a fictional city. It has about 1,000 inhabitants, 10 of whom are initially infected by a mysterious virus.
Some disease carriers are asymptomatic, but not very contagious. Others exhibit potentially deadly symptoms, and are highly contagious as a result. Most of them will recover and become immune.
As time goes by, the disease spreads, until there are no active cases left.
Reorganizing these dots provides a clearer picture of what the city experienced: the majority of the population contracted the disease.
We can then visualize the spread of the disease over time: the virus's propagation slows as the number of uninfected decreases.
Since this is a simulation, we have full visibility into the dynamics at every point in time. In particular, we know how many people were infected. Let's focus on this number: this new curve represents the total number of people who have been infected as a function of time.
This includes both current infections—symptomatic and asymptomatic—as well as recoveries and deaths.
Let's now consider the first testing scenario.
In this scenario, we have access to a limited number of tests per day. This mirrors the situation in many countries today, including the US.
We assume that most symptomatic carriers seek testing, as long as they have not already been confirmed positive. A smaller fraction of asymptomatic carriers also try to get tested.
However, access to tests is limited: the city has a limited capacity of 5 tests per day. It may not sound like much, but it's actually not that bad considering our population size.
Out of the total actual cases by the end of the simulation, only are detected.
Now, let's consider the second testing scenario.
We are making the same assumptions about who wants to be tested, but in this scenario, there are no limits on the number of tests: every individual who wants a test gets tested.
Furthermore, a small, random sample of the population is also tested for antibodies every day: anyone previously infected will be counted.
This testing policy is significantly more effective: out of actual cases, are detected.
You might think that there weren’t such gaps in testing for COVID-19. But there were. And it is an ongoing issue.
These numbers are critically important, because our ability to adapt public policy and limit the spread of the virus is directly dependent upon the quality of our data and models.
Epidemiologists and health officials are painfully aware of these issues, and try to adapt to the situation as best they can despite these challenges.
However, it is difficult to adapt when we do not even know how deadly the virus is.
To illustrate the difference that testing policies can have on crucial data points, let's focus on the fatality rate.
In our simulation, the actual fatality rate was .
However, based on the first policy, the estimated rate would be , while the second policy yields .
All testing policies underestimate the actual number of cases, but how underestimated they are varies greatly depending on the specific testing parameters. And that is without going into additional factors that also play out in reality, such as false positives and false negatives in test results.
All of this results in a situation where we do not know precisely what is happening, and we cannot rely on external data to inform our decisions and public health policy.
These decisions are impacting our lives, and understanding their limitations is crucial.
This article revolves around a simulated disease. We use a stochastic variant of the SEIR model, which is a classical compartmental model in epidemiology. You can experiment with our model below.
Initial population |
|
Total population | |
Number of initial cases | |
Testing policy |
|
Maximum number of viral tests per day | |
Proportion of symptomatic carriers seeking a test per day | |
Proportion of asymptomatic carriers seeking a test per day | |
Number of random antibody tests per day | |
Note: individuals who have already tested positive will never be tested again. Casualties are also automatically identified. |
|
Disease parameters |
|
Daily probability of an asymptomatic carrier becoming symptomatic | |
Daily probability of an asymptomatic carrier recovering | |
Daily probability of a symptomatic carrier recovering | |
Daily probability of a symptomatic individual dying | |
Average number of daily infections per asymptomatic carrier | |
Average number of daily infections per symptomatic carrier | |
Note: Daily infections per carrier are measured assuming an infinitely susceptible population. The real number of infections decreases as the proportion of the population having been infected increases. |