Research Article

Predicting COVID-19 Cases and Deaths Utilizing Hygiene Hypothesis Surrogate Factors: A Global Analysis

by Boban A1, Boban S2, George B2, DiTomasso RA3, Roberts MB3, Cadet VE2*

1School of Osteopathic Medicine Philadelphia College of Osteopathic Medicine Georgia 625 Old Peachtree Rd, NW Suwanee, Georgia 30024, United States.

2Department of Biomedical Sciences School of Osteopathic Medicine Philadelphia College of Osteopathic Medicine Georgia 625 Old Peachtree Rd, NW Suwanee, Georgia 30024, United States.

3Department of Clinical Psychology School of Professional and Applied Psychology Philadelphia College of Osteopathic Medicine 4170 City Ave, Philadelphia, Pennsylvania 19131.

*Corresponding author: Cadet VE, Department of Biomedical Sciences School of Osteopathic Medicine Philadelphia College of Osteopathic Medicine Georgia 625 Old Peachtree Rd, NW Suwanee, Georgia 30024, United States.

Received Date: 17 April, 2024

Accepted Date: 07 May, 2024

Published Date: 10 May, 2024

Citation: Boban A, Boban S, George B, DiTomasso RA, Roberts MB, et al. (2024) Predicting COVID-19 Cases and Deaths Utilizing Hygiene Hypothesis Surrogate Factors: A Global Analysis. Rep GlobHealth Res 7: 197.


At the beginning of the pandemic, it was difficult to determine which factor or set of factors could be analyzed to determine where coronavirus disease 2019 (COVID-19) cases and deaths could be expected to surge globally. This study, utilizing surrogate factors representing the hygiene hypothesis, sought to examine if correlations between the various factors existed.

Data publicly available from 190 countries were collected. These data included COVID-19 total case numbers and deaths through December 28, 2020; water, sanitation, and hygiene (WaSH) metrics; data on mortality due to various types of air pollution; and additional factors such as control of solid waste, emission growth rate of methane and carbon dioxide, and daily adjusted life years lost to unsafe drinking water and sanitation. These elements were analyzed using multiple regression analyses to determine the combination of factors most predictive of COVID-19 total cases and deaths via IBM SPSS 27.0. Separate regressions were conducted for the two criterion variables.

The analyses revealed positive correlations between two predictor variables: a nation’s mortality due to air pollution (MDAP) and their level of control of solid waste (CSW), with COVID-19 total number of cases. This combination of predictors accounted for approximately 28% of the variance in the total number of cases. A predictive equation for the number of COVID-19 cases, within a 90% confidence interval, was created using both the MDAP and CSW: Estimated COVID-19 total cases = 10.534(MDAP) + 498321.18(CSW) – 57370.23 +/- (716905.12). Regarding the number of COVID-19 deaths, 9.6% of the variance was accounted for by MDAP. Our findings support prior studies indicating air pollution as a potential catalyst for COVID-19 spread, and to a lesser extent, mortality.

One essential mitigating strategy for dealing with respiratory viruses is via abatement of air pollution. This correlates with decreased time for the virus to circulate in denser particles of polluted air along with decreased aggravation of the respiratory system. Thus, MDAP is an effective predictor of COVID-19 cases, and to a lower degree, deaths. The positive correlation with CSW and number of cases indicates a likelihood that lockdowns throughout the world created chaos in solid waste disposal systems, most notably in nations with prior effective CSW mechanisms. Conclusions demonstrate the benefit of implementing procedures focusing on minimizing air pollution and strengthening systems to CSW. Additionally, the predictive equation can be used to anticipate where areas of increased case numbers due to novel respiratory viruses could be found, thus attenuating a descent into another global pandemic.

Keywords: hygiene hypothesis; air pollution; solid waste; COVID-19; Environmental Performance Index; WaSH mortality;


Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) was initially identified in Wuhan, China in December, 2019. It has since wreaked havoc on a worldwide scale causing a disproportionate amount of deaths in wealthier nations [1].  Previous experience with related coronavirus illnesses, notably Severe Acute Respiratory Syndrome (SARS) [2] of 2002-2004 and the ongoing Middle East Respiratory Syndrome (MERS) [3], has allowed scientists to rapidly identify and begin to understand the highly contagious and infectious capabilities of the newly discovered 2019 SARS-CoV-2 [4,5]. As countries around the world attempted to put stringent measures in place to limit the morbidity and mortality caused by COVID-19, scientists across the globe raced to find therapeutic solutions for those infected, along with vaccines to prevent the spread. By the end of 2020, vaccine manufacturers Pfizer-BioNTech and Moderna secured FDA Emergency Use Authorization (EUA) for the first two COVID-19 vaccines available in the US [6]. Shortly thereafter, several other pharmaceutical companies released positive clinical trial data for additional COVID-19 vaccines, with Johnson & Johnson’s Janssen COVID-19 vaccine also receiving EUA in the US6. However, COVID-19 cases and deaths continued to increase due to lack of social distancing, particularly during holidays, as well as to what some called ‘pandemic fatigue’ [7-9]. Meanwhile, more highly transmissible variants with increased virulence have emerged at a fast pace, first identified at various locations around the world, such as in the United Kingdom, South Africa, and Brazil, for example10. Studies indicate a reduced vaccine efficacy against these newly emergent variants [10-12], thus underscoring how essential it is to identify factors that are potentially implicated in reduced COVID-19 mortality.

In a recently published commentary, the question was raised concerning whether the hygiene hypothesis applies to COVID-19 susceptibility. Sehrawat, et al argued the very real possibility that frequent exposure to pathogens and infectious agents prepares the immune system to be able to battle newer infections such as those caused by SARS-CoV-21. The hygiene hypothesis is centered around the theory that exposure to pathogens beginning in early childhood with repeated frequency allows the immune system to become more robust in combating newly acquired infections. It was initially discovered by Strachan during a study on 17,414 British children born in 1958.  He discovered that there was an inverse correlation between the prevalence of hay fever and the number of older siblings [13]. This was further expanded upon in numerous studies such as the case-controlled study on early social mixing and childhood Type I Diabetes Mellitus (Type 1 DM) [14] as well as the cross-sectional study on the age of starting nursery school and the occurrence of childhood allergies [15]. 

The latter study concluded that early infections were protective against development of allergies later on in life in concordance with the hygiene hypothesis [15]. So what exactly is the hygiene hypothesis?  The hygiene hypothesis postulates that  CD4+ T Helper 1 (Th1) and CD4+ T Helper 2 (Th2) cells must be in balance for proper functioning of the immune system [9]. In developed nations, it has been seen that a decrease in pathogen exposure leads to a weaker immune system, predominated by a Th1 immune reaction that aberrantly attacks self-antigens and other allergens. This is hypothesized to be tied to the increased prevalence of autoimmune diseases, allergies, and asthma in the US [16]. Improved hygiene, use of antibiotics and vaccinations are several factors implicated in the decrease in stimulation of Th1 cells which in turn causes an increase in activation of the Th2 response. This response includes release of cytokines interleukin (IL)-4, IL-5 and IL-13 which are associated with an increase in IgE and eosinophilic responses in atopy, along with IL-10, which has an anti-inflammatory response [17]. The immunological concept of Th1 and Th2 is further displayed in Figure 1.


Figure 1: CD4+  Th1 and Th2 cells and their roles in immune responses. Unique characteristics of Th1 and Th2 cells are shown, including *aberrant responses when the delicate balance is disrupted.

Although the hygiene factor cannot be measured with a single numeric representation, it can be analyzed using WaSH mortality rates along with select factors from the Environmental Performance Index (EPI). The EPI is derived from a combination of 32 environmental and hygiene variables, as defined by Yale University and Columbia University in collaboration with the World Economic Forum (Supplementary Table 1) [18]. Taken together, the EPI rankings provide an indication of how nations address common environmental challenges.  When exploring the individual variables, it becomes evident that many of the factors directly pertaining to the hygienic conditions of  each nation would likely have a direct impact on the health and immune status of inhabitants. By extension, they may potentially play a role in an individual developing COVID-19 and/or succumbing to the disease.  As the raw data for these variables were made available on the EPI website and additional factors representative of hygiene could be assessed from various public databases, in this study, we sought to expand on that assumption with analysis of available global data to determine if the hygiene hypothesis was correlated to the differences in COVID-19 cases and deaths seen in the first phase of the pandemic globally. During this phase, characterized as prior to the widespread emergence of variants of concern, it was difficult to accurately predict where surges would occur worldwide, therefore, we sought to additionally utilize these surrogate factors to determine a predictive equation, thus allowing for a multi-disciplinary approach to assess the role of the hygiene hypothesis in COVID-19 cases and deaths.


Data representing 190 nations were collected from various publicly-accessible sources. Specifically, the data included the total number of COVID-19 cases and deaths through December 28, 2020 along with the total  of COVID-19 tests conducted (Johns Hopkins Coronavirus Resource Center (CRC))19.  The average stringency index was calculated by finding the mean of each individual country’s daily stringency index [19]. COVID-19 case fatality rate per nation was calculated using the total number of cases and deaths. Metrics pertaining to WaSH mortality were retrieved from WHO [20]. Data on mortality due to unsafe water source, unsafe sanitation, and lack of access to handwashing facilities, as well as proportion of population using limited drinking water services, limited and basic sanitation services and practicing open defecation were retrieved (UNICEF) [21].  Additional hygiene surrogate factors including mortality due to various types of air pollution such as general air pollution and solid fuels (OurWorldInData) [22], along with  household and ambient air pollution (Data.worldbank) [23] were also collected. Finally, EPI itself, along with data representing 11 variables of the EPI were obtained: emission growth rate of carbon dioxide and methane; control of solid waste; air pollution from exposure to household solid fuels, fine particulate matter (PM) 2.5 and ozone; proportions of population connected to wastewater treatment and wastewater collected that is is treated; daily adjusted living years due to unsafe sanitation, unsafe drinking water and exposure to lead [18].  All raw data were organized in MS Excel version 2106 and can be found in Supplementary Table 2. 

Data were analyzed via IBM SPSS version 27.0. Specifically, Spearman correlation analysis was used to determine any correlations present between the 29 factors. The correlation analysis was later converted into a heatmap in RStudio version 4.1 for easy interpretation of the factors that showed a significant correlation within the Spearman analysis.  SPSS was further used to conduct multi-regression analyses to determine the combination of factors most predictive of COVID-19 total cases and deaths. The outliers, defined as those above 3 standard deviations from the mean, were removed from the analysis. Beta coefficients from the multiple regression analysis were used to derive a predictive equation for COVID-19 total cases.


As a preliminary analysis to determine the number of potential predictors to include in the multiple regression, a Spearman correlation was computed. After obtaining the Spearman correlation, the following rules were used to classify the correlation strengths: very high positive correlation (0.9-1.0), high positive correlation (0.7- 0.9), moderate positive correlation (0.5 to 0.7), low positive correlation (0.3 to 0.5), and negligible correlation (0 - 0.3).  Negative correlations were also given the same strength classification.  All statistical analyses were considered significant if alpha was less than or equal to 0.05. The heat map translates the Spearman correlation into an easily visualized data set depicting the magnitude and direction of correlations (Figure 2).