Covid Lectures Part 1: What is epidemiology?

by Suranya Aiyar
13 July 2020, New Delhi

"Panic" Pixabay. Edited by author.

Contents:

1.1 How Little We Know of Science

The lockdown, and all the other responses of the world to Covid-19, have been wrong. They have been unscientific and unethical. They have been self-defeating. They are standing in the way of our finding a real solution to the pandemic.

We have been failed by the experts. The epidemiologists, the World Health Organisation (WHO) and the public health experts have all been operating on weak science, and a partial and biased view of the crisis. Our political representatives, in every country, have not been able to intelligently assess the advice of the experts, or to place it in its proper context. By acting on the basis of a science which they did not understand, they have failed the people and the science.

A society that is beholden to science without understanding it 

Above all, this is a crisis brought about by a society that is, on the one hand, excessively dependent on science, and that, on the other hand, has, within the lay public, political leaders and commentators, a very weak grasp of science, at least of the kind that is in vogue among public health experts today.

If we, as the lay public, want to get out of this crisis, then we have to start thinking for ourselves. We have to think as if our life depends on it, because it does. This means that we have to dig much deeper than we usually do with a scientific subject, to understand for ourselves what the science is saying.

1.2 The Basics of Mathematical Modelling

Everything began with the epidemiologists. But how many of us know anything about epidemiology even today? What do we know about the “mathematical modelling” that has ruled our lives since March? Let us try to find out.


Mathematical modelling essentially uses different types of equations that were developed in the field of statistical mathematics. The equations enable you to calculate the total number or size of a phenomenon by taking into account the variables that affect it. They are supposed to help you to answer questions like how many cars on an assembly line will be defective, or how big is a black hole. If the phenomenon you are looking at varies in quantity depending on what you start with, say, the number of defective nuts (for cars), or the size of a collapsing star (for black holes); or it varies with a time-related factor, such as the age of the assembly-line equipment; or whatever other variables that might be relevant, you can put them into your equation, and get a result that takes them all into account. These calculations are used in various fields like physics, economics, business, sociology, poll surveys and, our area of interest, epidemiology. The results of these calculations can be plotted on a graph to give you the curves with which we have all now become so familiar.

The actual arithmetic of this exercise is carried out by feeding the numbers for different variables into a computer that has been programmed to run calculations using the chosen equation. But a computer can only run the variables according to the equation, it cannot tell you what those variables should be, and herein lies the rub.

1.3 The Lack of Science in Epidemiology

Deciding which variables are relevant to the phenomenon you are studying is not a mathematical exercise, but a theoretical one. Ideally, you should have a solid theoretical understanding of the phenomenon under study on the basis of which you can identify, in a rigorous, stable and complete way, the set of variables that apply. An estimation using mathematical modelling is not merely about putting a number on different elements in your equation. It is, in essence, a theory of what elements to include in the equation, and how they relate to each other.

Epidemiologists are not big on theory

But epidemiologists are not big on theory. They don’t spend much time thinking about whether they have taken into account all the factors that drive a disease outbreak, or their relative importance. There is no great understanding, in principle, of any disease or any population. They prefer to run with working assumptions, which they keep changing as things unfold in the real world with whatever disease they are modelling.  


To avoid getting caught up in questions of the biology of the disease, epidemiologists start with a simple theory: the spread of disease in a population at any point in time is a factor of the number susceptible to it (S), the number infected by it (I) and the number recovered from it (R); this is the "SIR model".

Sounds like we have made some progress, but, really, we are where we started, because we don’t know what are the numbers of susceptible, recovered or infected. Each of these variables needs its own theory for what are the further things, i.e., the further variables, that determine each one of them: who is susceptible, who is exposed, and so on. Since Covid-19 is a contact disease, epidemiologists took as their key working assumption, the degree of contact with others as determining who could be infected. But this in turn required further working assumptions, such as: what kind of contact results in infection and what was the basis for assuming how much contact a person has. Since you cannot do a direct count of the number of contacts each person in a population has, you look for something by which you can estimate average contact rates, such as travel statistics or cell phone data, which act as an indicator or “proxy” for contact. Then there are other assumptions that need to be made, such as how to take into account the period of infectiousness; how to factor in the time to onset of symptoms; can people be asymptomatic, but infectious; and so on. 

In this way, you can see how at each step in deciding what variables go into our equation, we are relying on multiple levels of estimates within estimates, and assumptions within assumptions. Each element of the epidemiological model is in itself a cascading series of estimates and rough working assumptions. Any one of your cascade of estimates and assumptions that turns out to be wrong, could throw the whole result off.

1.4 The Circularity of "Fitting"

What makes all of this even more unreliable, is that epidemiologists do not even spend much time identifying the underlying estimates or assumptions when assembling their variables for their equation. So very often, it is not even a question of the assumptions that have consciously been made, but of the things that have been unconsciously assumed in the model. In other words, assumptions will be built into the epidemiologists’ models, that they are not even aware of. Sometimes epidemiologists try to correct for this by applying yet more models to their base models to “adjust” for the over- or under- estimation of its different variables. But each adjustment comes with its own assumptions and uncertainties, adding to the already cascading series of assumptions and uncertainties in the base model.

The lack of science behind the maths

What’s wrong here is not the maths, but the science, or rather the   lack thereof. It is said of modelling   that your prediction is only as good   as your data: “garbage in, garbage   out”. But this really obscures the   uncertainty, incompleteness and  messiness that is embedded in   epidemiological thinking. Your model is really only as good as the theory on which it is based, and epidemiologists have very poor theories, if at all, behind their models. Often it is a case of garbage all the way down.

There is no understanding of 
how the virus works in principle

After putting together their back-of-the-envelope variables, epidemiologists then start the exercise of “fitting” their models to the data. As the information and data for the disease comes in, the quantities assigned to different variables in their model are changed so as to produce the outcome that is observed. On this basis, epidemiologists will work backwards to tell you, for instance, the “Reproduction Number” or “R”, which is the number of people who can be infected by one ill person; this is a key estimate that epidemiologists use to predict the rate of transmission of a disease. After back-calculating to infer the R, they then use this R value to work forwards to predict the number of cases. Where this keeps going wrong, is that because there is no understanding in principle, of how the virus behaves, or why some people fall ill and others don’t, your prediction based from the inference of present behaviour is only as good as your assumption about if and how the R is going to change over time.

1.5 There is No Such Thing as Good Data

To calculate the R for Covid, epidemiologists are using the daily data put out by countries of their cases. But what is popularly called the “daily” data, is not really “daily” in any coherent or stable sense. They are merely the data that have been reported for the day. This may include cases from previous days, and leave out as yet unreported cases for the day.

A Covid graph

This makes a big difference if you are trying to calculate the rate of case growth using daily reported data. A key element that gripped the popular attention with the Covid-19 pandemic was the so-called “doubling-rate” which was said to be “exponential”. At the start of an infection, you are still guessing where the cases might exist. Diseases don’t have a very wide range of variation in their symptoms – fever, cold, cough and diarrhoea about sums up the range of symptoms.

So initially you spot a few cases here and there, which gives you a flattish line, if you’re plotting daily cases over time. As you start getting better at identifying cases, and people begin to realise that they might not have an ordinary ‘flu, but this new disease, more cases begin to be detected, and so you will inevitably see a dramatic and exponential rise in cases. This may or may not mean that cases are in fact rising exponentially. In hindsight, you may find that there were many more undetected cases at the time that you thought the graph was flat. If this is the case, then you would be wrong to have predicted soaring exponential growth into the future. What your model tells you is an “exponential” growth in cases based on daily reported case data, could well be an exponential growth in cases being reported because of increased testing; increased awareness of the disease, leading to more people reporting to hospitals for diagnosis; or faster tests.

There is also a time-lag between infection and diagnosis or laboratory-confirmation of the infection, so the cases that are reported actually represent infections that occurred many days back, depending on the incubation period. This means that when cases are reported to be peaking, the actual infections, or case onset, is of several days or weeks before. You may still have many cases, and face a serious challenge caring for all those who fall ill, but your prediction of exponential growth based on cases detected in the first weeks of the outbreak will be wrong.

If, like epidemiologists, you have no real theory, just some roughly thought out, tentative and incomplete ideas about the disease in question, then fitting can end up reinforcing your wrong assumptions about it. We have already seen that if your reported cases are not accurate, then the R derived from them will not be correct. This starting mistake gets compounded if, at that point, instead of looking for another more dependable basis on which to assess the R, you do “fitting” by increasing or decreasing the R, depending on the cases that you see over the next few days. If you keep doing this, you will never get to a point where you can have a reliable R. It is also dangerously circular reasoning.  Essentially you are saying that the number of cases depends on the R, which in turn depends on the number of cases!

We need to dig deeper....

This type of analysis may be helpful as an adjunct to a more rigorous theoretical understanding of a disease,
 but it should not be used to frame and lead thinking about it. The number of cases that we see for a disease are merely its outward manifestation. To truly understand a disease, we need to dig much deeper into the biology of the pathogen itself, as well as the way in which the human body responds to it. If we have a correct understanding of these things, then we can, perhaps, accurately model the disease. But without this knowledge, modelling is a terrible way to evaluate anything. Even if it turns out to be right, it is so only by chance.

Even at more sophisticated levels of analysis than the one used by epidemiologists, for instance, in theoretical mathematics and physics, there is legitimate debate over the validity and reliability of modelling. There are questions, not just about the data and choice of variables, but about the equations themselves, and whether it is always correct to draw your analysis based on outcomes for which these equations give you higher probabilities. This is not the place to go into these issues in any detail, but we need to be aware of them to give the context for how public health experts might have gone wrong in conceptualising pandemic disease. 

To add to all these structural contradictions and weaknesses in the epidemiologist’s approach are all the problems with data. By now, anyone following the Covid-19 numbers must be familiar with the problems with data. There are inconsistencies in the way a disease is tested or clinically diagnosed, and in the way a death is attributed to a given disease. There are uncertainties in whether testing results are accurate. Added to this are uncertainties over whether results are accurately reported. Then there are time lags in reporting.  

Not all the problems with the data are of a nature that better reporting can iron out. For example, the debate over dying “with” or “by” Covid-19 is not a reporting issue, but a scientific one, and will probably stay alive among scientists for decades to come.

So to add to all the gaps in what a model can tell us, is the inevitability of the data, on which it is dependent, not being very good. Epidemiologists try to adjust for this with more models, but again, this is an estimation. Adjustment does not clean up the data in any absolute sense.

Suranya Aiyar is trained in mathematics at St. Stephen’s College, India and law at Oxford University, UK and New York University, USA. She lives in New Delhi, India, with her husband and two children.

This was presented live on Facebook on July 13th, 2020. Watch the video here. 



Listen to the podcast here.

Follow the lectures everyday at 7pm India time (2.30pm London/9.30am New York) on Facebook Live @ Suranya Aiyar. 

Read the full paper here.



Comments