Abstract Volume: 1 Issue: 1 ISSN:

 Statistical and Mathematical Analysis on COVID-19

Bin Zhao*, Jinming Cao1

1School of Science, Hubei University of Technology, Wuhan, Hubei, China.

2School of Information and Mathematics, Yangtze University, Jingzhou, Hubei, China.

*Corresponding author: Dr. Bin Zhao, School of Science, Hubei University of Technology, Wuhan, Hubei, China.

Received Date: June 20, 2020
Published Date: July 07, 2020


An infectious disease caused by a novel coronavirus called COVID-19 has raged across the world since December 2019. The novel coronavirus first appeared in Wuhan, China, and quickly spread to Asia and now many countries around the world are affected by the epidemic. The deaths of many patients, including medical staff, caused social panic, media attention, and high attention from governments and world organizations. Today, with the joint efforts of the government, the doctors, and all walks of life, the epidemic in Hubei Province has been brought under control, preventing its spread from affecting the lives of the people. Because of its rapid spread and serious consequences, this sudden novel coronary pneumonia epidemic has become an important social hot spot event. Through the analysis of the novel coronary pneumonia epidemic situation, we can also have a better understanding of sudden infectious diseases in the future, so that we can take more effective response measures, establish a truly predictable and provide reliable and sufficient information for prevention and control model.

Keywords: COVID-19; Pneumonia, Virus, Coronary; differential equation; infectious disease model

Statistical and Mathematical Analysis on Covid-19


We establish different models according to the different developments of the epidemic situation, different time points, and different response measures taken by the government. To be specific, during the period of 2020.1.23-2020.2.7, the traditional SIR model is adopted; during the period of 2020.2.8-2020.3.30, according to the scientific research results, it was considered that the novel coronary pneumonia has a latent period, so in the later phase of epidemic development, the government has effectively isolated patients, thus we adopt the SEIQR model accordingly. During the period of 2020.3.31-2020.5.16, because more asymptomatic infected people were found, we use the SEIQLR model to fit. Finally, through an SEIR simulator, considering the susceptible number, the latent number, the infected number, the cured number, death number, and other factors, we simulate the change of various numbers of people from the beginning to the next 180 days of novel coronary pneumonia.


The results based on the analysis of differential equations and kinetic models show that through the prediction of the model established in the first phase, the epidemic situation of novel coronary pneumonia in Hubei Province was controlled at the end of March, which is in line with the actual situation. The rest of Hubei province, except for Wuhan, lifted control of the departure channel from 0:00 am on March 25, and Wuhan was also unblocked on April 8. Through the establishment of the second-phase model, it is found that the epidemic situation will reach its peak in mid-February. For example, the quarantine admission of the hospital declined after mid-February, which is inseparable from the measures to build square cabin hospitals in early February so that more and more patients can be admitted. The model established in the third phase shows that the epidemic had been completely controlled by the end of May, which is also in line with reality. Because in mid-May, the Wuhan government conducted a nucleic acid test on all the citizens to screen for asymptomatic infected persons to fundamentally control the spread of novel coronary pneumonia.

Hubei Province, as the center of the initial outbreak of novel coronary pneumonia, people were forced to be isolated at home during the Spring Festival, the most important Chinese holiday, and the whole society was in a state of suspension of work and study. The Chinese government had taken many measures in response to the epidemic, such as shutting down the city, vigorously building square cabin hospitals, and prohibiting people from gathering. At the beginning of May this year, the epidemic in Hubei Province was finally effectively controlled. For ordinary citizens, we should not cause unnecessary panic about the unknown novel coronavirus. Instead, we should fully understand and be familiar with this virus. In addition to the relevant medical knowledge, we should also understand the spread of infectious diseases through appropriate mathematical models. By mathematical models, we can understand the degree of harm of infectious diseases, when to control it, how to stop it, and use scientific views to reveal the original face of the novel coronavirus to the public without causing social panic.


The coronavirus disease 2019 (COVID-19) was first reported in December 2019 in Wuhan, China. It quickly spread to other districts in the country, and a month later, to other countries across the world, impacting over 200 countries and territories1. On March 11, 2020, Tedros, the Director-General of the World Health Organization, announced that, based on an assessment, the World Health Organization believes that the current novel coronary pneumonia could be described as a global pandemic2. COVID-19 is a highly contagious respiratory infection caused by a coronavirus that is transmitted primarily through close airborne droplets and contact with a patient's respiratory secretions and close contact, and may also be transmitted through droplet contaminants from a patient (e.g., through hands, clothing, food, water, or the environment). The incubation period of most patients is within 7

days. Common clinical symptoms of COVID-19 patients include: fever, respiratory symptoms, fatigue, normal or decreased peripheral blood lymphocyte count, and multiple bilateral flaky glass cups around the two lungs in computed tomography (CT) results in turbid3. Although the exact source of COVID-19 is still unknown, patients with COVID-19 are by far the most certain source of infection.

As of June 26, 2020, it has been reported that the cumulative number of confirmed cases of COVID-19 in the world has reached 9,690,148, and the cumulative deaths have reached 488,9714. At this time, there are 68,135 cumulative confirmed cases of COVID-19 and 4,512 cumulative deaths in Hubei Province, China5. The outbreak of COVID-19 has had a great impact on people's lives and the development of the national economy.

From March to the present, novel coronary pneumonia has been basically controlled in China. The normal life of the people and the economy affected by the epidemic are recovering.

However, in many areas except China, the epidemic situation is still very serious, and the number of infected people is still high. We analyze Hubei Province, the initial epidemic center of the new outbreak of pneumonia, and combine with the actual situation in Hubei Province, using different models to provide the world with valuable experience and effective measures in the fight against the epidemic.

In view of the fact that the Chinese government had adopted different policies overtime during the fight against the novel coronavirus. When we are modeling, we use different models at different time periods to more effectively conform to the development trend of the epidemic and to respond to changes brought about by policies. Then we use software that could simulate the spread of novel coronavirus, in order to find a result in a theoretical circumstance.



The data in this paper on Hubei Province are from authoritative data published by the Hubei Provincial Health Planning Commission on its official platform from January 23, 2020, to May 16, 20206. Data include cumulative diagnosed cases, cumulative deaths, cumulative cures, suspected cases, and asymptomatic infections, etc., and get Hubei Province's 2019 total population from official sources7.

The data we collect is very large, so we must process and analyze the large amount of data collected. The approach is to use Excel to determine the data categories, and then we use MATLAB to further optimize the parameters so that we can effectively use these data to gain the results. The specific operation is: according to the known data, filter the data and through basic operations in Excel to get the data we actually need. Then bring these data into MATLAB to calculate, get the optimized parameter value through fmincon function.

The model

Based on the characteristics of novel coronary pneumonia transmission, we use differential equations to establish dynamic infectious disease models and analyze the whole process in three time periods, depending on the time of transmission and the studies published by scientists on novel coronary pneumonia epidemics at different times.

Taking January 23, 2020, to February 7, 2020, as the first phase, the SIR model8, 9is established. Because it was in the early phase of the outbreak of novel coronary pneumonia, research in all aspects was not enough and did not realize that novel coronary pneumonia had an incubation period and asymptomatic infection. Therefore, the data selected are the daily number of confirmed diagnoses, the cumulative number of deaths, and the cumulative number of cures.

Taking February 8, 2020, to March 30, 2020, as the second phase, the SEIQR model10, 11 is established. According to the data, the suspected case was released for the first time on February 8, and with the control of the state, most of the diagnosed patients were able to receive effective isolation measures and treatment. Therefore, we take into account the patients in the incubation period and the quarantined patients, that is, we select the daily number of confirmed diagnoses, cumulative deaths, cumulative cures, centralized isolation, and suspected numbers.

Taking March 31, 2020, to May 16, 2020, as the third phase, the SEIQLR model12, 13 is established. According to the data on March 31, the official released information for asymptomatic people for the first time. Therefore, we also consider asymptomatic infections, that is, we select the daily number of confirmed diagnoses, cumulative deaths, cumulative cures, centralized isolation, suspected number, and a daily number of asymptomatic infections.

SEIQLR-based method for estimation

Based on the known data, we set the 2019 population of Hubei Province as N. Then we divide the population of Hubei Province into six categories. Among them, people who are not infected with the novel coronavirus are classified as S(t), the daily number of suspects is classified as E(t), and the daily number of diagnoses that exist daily is classified as I(t), those who are quarantined after diagnoses are classified as Q(t), asymptomatic infected people are classified as the latent, that is L(t), and cumulatively cured and died patients are classified as R(t).

Therefore, we make the following assumptions.

  1. The population is evenly distributed.
  2. The cured people will be permanently immune to the virus and will not be re-infected.
  3. The quarantined and the diagnosed have the same infectious power.
  4. The latent patients, the diagnosed, and the suspected have different infectious power.

However, not all data for the above six categories are directly available, and some require a merging operation of known data. Specifically, for the susceptible (S), we need to subtract all the people infected with the virus from the total population N. For the infectious (I), we need to subtract the number of people quarantined (Q) and the number of people who are exposed to the virus (E) from the number of people diagnosed. And for the removal (R), we need to add up the number of people cured and the number of people who have died because of the COVID-19.

For the SIR model, assuming that the total number of people in N, the proportions of healthy people, patients, and removals in the total number of people are classified as S(t)I(t), and R(t), respectively. Then it is obvious that S(t) + R(t) + I(t) = N holds. The natural birth rate and mortality of the population are not considered during the epidemic.

It is assumed that the number of effective contacts per patient per day is β, which is called the contagion rate, and when a healthy person is effectively contacted by the patient, he will be immediately infected and become ill. Assuming that the number of healthy people effectively exposed per patient per day is βS(t), the number of healthy people exposed per day for all patients I(t) is βS(t)I(t), these healthy individuals are immediately infected. Monotonic reduction in S(t) based on the assumption that the contagion rate is β. Among patients, the rate of diagnosed case transfer per day is ν, where ν = 1. Patients are transferred to inpatient care with a removal rate of α, where α includes cure rates and mortality, i.e., the number of daily removals is ανI(t) (13), (14).

The model used here is the SEIHD model, which is the equivalent of the SEIR model. Because (H) and (D) here represent the number of people cured and the number of people who died from the disease, respectively, adding these two together gives (R). To study infectious diseases for the long-term effects on society, we set the number of simulation days to 180, which is about six months.


The result of SIR-based method in Phase 1 In MATLAB, optimization of the parameters by the fmincon function17 yields α = 0.08, β = 0.5 for the first phase. By fitting the curves, we can see from Figure 5 that in the first phase the curves fit perfectly to the observed values.


As can be seen from the above, we use two different software to analyze the data, namely MATLAB and SEIR simulator. In comparison, MATLAB is more powerful, it can improve differential equations according to our needs, but it is relatively complicated in parameter setting and image drawing; SEIR simulator is more convenient: only need to set a few parameters to generate an image, but there are certain limitations in the optimization of the equation. And we combine the above two cases to achieve a more accurate purpose.

In addition, we have established three different models based on different phases, namely the SIR model, the SEIQR model, and the SEIQLR model, which are gradually improved to better fit the actual situation of the epidemic.

It can be seen from the results that the degree of the fitting curve is different using different models. Although we considered more factors in Phase 2 and Phase 3, the curve fitting effect is not ideal. This may be because what happened, in reality, is accidental, and these phenomena cannot be explained by traditional mathematical models.

On the other hand, the factors taken into account do not accurately reflect reality. However, in general, the three models we establish can effectively reflect the trends in reality.

 In summary, the traditional mathematical model cannot effectively explain reality to a certain extent, but this is not to deny the value of the traditional mathematical model. Although the SEIQLR model we establish does not work well in curve fitting, it takes into account more factors than the SIR model, and there are more influential factors in reality.

Therefore, for such an event that contains many factors, we should consider using the improved traditional model, such as the SEIQLR model, or use more advanced methods, such as time series analysis, neural networks, etc.


Novel coronavirus pneumonia is influenced by many factors, but we use a time-phased approach and establish different models for different periods. In the case of COVID-19, an unprecedented malignant epidemic, inexperience in the early phase of the epidemic made it difficult to make sound judgments.

Therefore, we initially establish the SIR model based on officially published data and previous information on infectious disease models. Over time, latent patients with novel coronavirus were also counted in the data, and government control was further increased with vigorous efforts to isolate and treat patients, so we establish the SEIQR model.

When various experiences became more available, studies found that novel coronary pneumonia had asymptomatic infections, thus we establish the SEIQLR model. This approach to modeling provides a better simulation of the actual situation. Finally, we obtain the transmission of novel coronary pneumonia from the initial phase to 180 days afterward by setting the relevant parameters in a closed environment through the SEIR simulator, which will also give us some reference value in the process of combating novel coronary pneumonia18.

Our model of infectious disease which is established by differential equation has a wide range of operating prospect, except for infectious disease itself (e.g. COVID-19 and SARS) of the prediction, prevention, and control, there are a lot of social behaviors and incidents in our life follow the rule similar to the model of the spread of infectious disease.

The infectious disease model can be widely used in the diffusion of innovation, the network public opinion spread, the spread of financial risk, and other areas of the social science research19, 20. The diffusion process of management accounting matters, which is shown in Table 6 and Figure 12 below, clearly uses the SIR model for analysis21.


When we establish the models, we do not consider the impact of natural birth and mortality on the whole. Because there is a lack of data on the mobile population and on infections among the mobile population, we ignore the impact of population movement between provinces and districts on the epidemic in the pre-blockade period in Hubei Province.


The model we have established is only for Hubei province, but it is actually worth discussing at the national level, and the spread of the novel coronavirus to rural and pastoral areas. In addition, for modeling, how to group the total population and characterize random phenomena, and how to study the stratification of population subgroups that affect the predictive control mechanisms of infectious diseases based on epidemiological characteristics such as age, behavior, geographic distribution, and mobility. The models established are influenced by many factors such as differences between patient infectiousness, individual susceptibility, differences in morbidity between local districts, differences in intensity of prevention and control in different regions, and errors in statistical data22.

We can also see from the resulting images that as the complexity of the model increases, the fit does not improve correspondingly, and even the fit is worse than the simple model. This is not only because of the discrepancy between reality and theory, but more importantly because the factors taken into account in differential equations do not necessarily reflect reality effectively23.

This also tells us that theoretical mathematical models alone are not enough if we want to better reflect reality, because there are many unknown factors in reality that mathematical models cannot accurately represent.


Conflict of interest

We have no conflict of interests to disclose and the manuscript has been read and approved by all named authors.


This work was supported by the Philosophical and Social Sciences Research Project of Hubei Education Department (19Y049), and the Staring Research Foundation for the Ph.D. of Hubei University of Technology (BSQD2019054), Hubei Province, China.


1. Ahmed Syed Faraz,Quadeer Ahmed A,McKay Matthew R. “Preliminary Identification of Potential Vaccine Targets for the COVID-19 Coronavirus (SARS-CoV-2) Based on SARS-CoV Immunological Studies”. Pubmed.

2. 2019 novel coronavirus, E%8B%E5%86%A0%E7%8A%B6%E7%97%85%E6%AF%92/24267858?fr=aladdi n (2020).

3. Qin Zhiqiang, Ma Gang, Zhong Xiaogang. “Diagnosis and antiviral treatment of novel coronavirus pneumonia”. Chinese Journal of Clinical New Medicine, 2020, (in Chinese).

4. Global outbreak of novel coronary pneumonia (COVID-19) briefing on 26 June, (2020).

5. Outbreak of pneumonia with novel coronavirus infection in Hubei province, June 26, 2020, (2020).

6. National Bureau of Statistics, (2020).

7. Luo Ronggui, Jiang Tao. “A study of technology diffusion models based on the SIR infectious disease model”. Journal of Management Engineering, 2006 (in Chinese).

8. Zhu Gui, Li Weide, Zhu Lingfeng. “Comparison of different control strategies based on the SIR infectious disease model”. Journal of North China University: Nature Science, 2011, (in Chinese).

9. Li Jianquan, Wang Feng, Ma Zhien. “A global analysis of a class of infectious disease models with quarantine”. Journal of Engineering Mathematics, 2005,(in Chinese).

10. Zhang Shuangde, Hao Hai, Zhang Xihong. “A class of models of infectious disease dynamics containing latency”. Journal of Mathematical Medicine, 2002, (in Chinese).

11. Chen Huilin, Dong Huiru, Zheng Yinan, et al. “SLICAR model of transmission considering both latency and onset of infection in cryptogenic populations”. China Health Statistics, 2015, (in Chinese).

12. A. A. King?E. L. Lonides?M. Pascual?et al. “Inapparent infections and cholera dynamics”. Nature, 2008.

13. Kang Qiyuan. “Mathematical models. Higher Education” Press, 2019. (in Chinese).

14. Frank R. Giordano, “Mathematical modeling. Mechanical Industry” Press, 2014.

15. Yao Yong. “Uniqueness of the overall existence of solutions to groups of equations of infectious disease dynamics”. Annals of Mathematics: Chinese Edition,1991 (in Chinese).

“An outbreak development simulation software based on the K-SEIRD mathematical model”, (2020).

17. Pan Wei. “Simulation modeling and MATLAB practical tutorial. Tsinghua University” Press, 2019. (in Chinese).

18. Ye Jianli, Luo Juhua, Jin Shuigao, et al. “Mathematical modeling of infectious diseases and SARS prediction”. Health Research, 2005, (in Chinese).

19. Research on Management Accounting Practice Diffusion Mechanism Based on Infe ctious Disease Model, u0tx330vx0pt4u0e8043639929&site=xueshu_se (2020).

20. Guozhong Zhao,Yan Zhong. “Management accounting practice.Tsinghua University” press,2007,(in Chinese).

21. Wang Z, et al. “Mathematical analysis on the Epidemic of Coronavirus Disease 2019”, J Pharmacol Pharmaceut Pharmacovig 2020.

22. Zhou Yicang, Tang Yun. “Mathematical models for SARS transmission prediction”. Journal of Engineering Mathematics, 2003, (in Chinese).

23. Cruz-Rodriguez L, et al. “How to Evaluate Viral Transmission in Enclosed Areas”, Journal of Bioscience & Biomedical Engineering, 2020.


Volume 1 Issue 1 July 2020
©All rights reserved by Bin Zhao.