Visualizing testing

Testing policies have a huge influence on reported case numbers.

Comparisons of the spread of the disease between different countries are often flawed.

Here’s why.



The number of confirmed COVID-19 cases in the US surpassed 1.1 million on May 2. However, the number of actual cases is probably significantly larger than that: on that same day, we learned that 12.3% of New York state tested positive for coronavirus antibodies. That's 2.4 million people - in one state. The truth is, no one really knows how many have been infected.

How is such a disparity possible? What are its implications?

The answer to these questions starts with understanding the role of testing policies.

Simulating a disease

Let's consider a simple example. The graph opposite represents the population of a fictional city. It has about 1,000 inhabitants, 10 of whom are initially infected by a mysterious virus.

Some disease carriers are asymptomatic, but not very contagious. Others exhibit potentially deadly symptoms, and are highly contagious as a result. Most of them will recover and become immune.

As time goes by, the disease spreads, until there are no active cases left.

Reorganizing these dots provides a clearer picture of what the city experienced: the majority of the population contracted the disease.

We can then visualize the spread of the disease over time: the virus's propagation slows as the number of uninfected decreases.

What does testing do?

Since this is a simulation, we have full visibility into the dynamics at every point in time. In particular, we know how many people were infected. Let's focus on this number: this new curve represents the total number of people who have been infected as a function of time.

This includes both current infections—symptomatic and asymptomatic—as well as recoveries and deaths.

Let's now consider the first testing scenario.

In this scenario, we have access to a limited number of tests per day. This mirrors the situation in many countries today, including the US.

We assume that most symptomatic carriers seek testing, as long as they have not already been confirmed positive. A smaller fraction of asymptomatic carriers also try to get tested.

However, access to tests is limited: the city has a limited capacity of 5 tests per day. It may not sound like much, but it's actually not that bad considering our population size.

Out of the total actual cases by the end of the simulation, only are detected.

Now, let's consider the second testing scenario.

We are making the same assumptions about who wants to be tested, but in this scenario, there are no limits on the number of tests: every individual who wants a test gets tested.

Furthermore, a small, random sample of the population is also tested for antibodies every day: anyone previously infected will be counted.

This testing policy is significantly more effective: out of actual cases, are detected.

You might think that there weren’t such gaps in testing for COVID-19. But there were. And it is an ongoing issue.

These numbers are critically important, because our ability to adapt public policy and limit the spread of the virus is directly dependent upon the quality of our data and models.

Epidemiologists and health officials are painfully aware of these issues, and try to adapt to the situation as best they can despite these challenges.

However, it is difficult to adapt when we do not even know how deadly the virus is.

To illustrate the difference that testing policies can have on crucial data points, let's focus on the fatality rate.

In our simulation, the actual fatality rate was .

However, based on the first policy, the estimated rate would be , while the second policy yields .

All testing policies underestimate the actual number of cases, but how underestimated they are varies greatly depending on the specific testing parameters. And that is without going into additional factors that also play out in reality, such as false positives and false negatives in test results.

All of this results in a situation where we do not know precisely what is happening, and we cannot rely on external data to inform our decisions and public health policy.

These decisions are impacting our lives, and understanding their limitations is crucial.

Ombeline Lagé


Github


LinkedIn

Haris Sahovic


Github


LinkedIn

Sandbox model

This article revolves around a simulated disease. We use a stochastic variant of the SEIR model, which is a classical compartmental model in epidemiology. You can experiment with our model below.

Initial population

Total population
Number of initial cases

Testing policy

Maximum number of viral tests per day
Proportion of symptomatic carriers seeking a test per day
Proportion of asymptomatic carriers seeking a test per day
Number of random antibody tests per day

Note: individuals who have already tested positive will never be tested again. Casualties are also automatically identified.

Disease parameters

Daily probability of an asymptomatic carrier becoming symptomatic
Daily probability of an asymptomatic carrier recovering
Daily probability of a symptomatic carrier recovering
Daily probability of a symptomatic individual dying
Average number of daily infections per asymptomatic carrier
Average number of daily infections per symptomatic carrier

Note: Daily infections per carrier are measured assuming an infinitely susceptible population. The real number of infections decreases as the proportion of the population having been infected increases.

Appendix

The final write-up associated with this project can be found here.