The Polio Risk Map Project allows you to upload case data and run them through our workflow in order to generate a risk map.
The workflow will go through the following steps:
Below you will find example files for each of the supported countries. Also the hierarchy table is provided for your convenience and includes all the names expected for the country locations.
The system is expecting a specific file format for the file that you wish to upload. The requirements are:
admin0, admin1, admin2
Case_Date
For example the following could be an example of correct Nigeria file format:
PolIS Case ID, Case_Date, admin0, admin1, admin2 NGA10-353, 9/10/2010, NIGERIA, BORNO, MAIDUGURI NGA10-4312, 27/09/2010 NIGERIA, KANO, DAMBATTA NGA11-1372, 29/11/2011 NIGERIA, JIGAWA, BABURA NGA11-1387, 29/10/2011 NIGERIA, JIGAWA, BIRNIN KUDU NGA11-1564, 28/07/2011 NIGERIA, KANO, DAWAKIN KUDU NGA11-1641, 8/6/2011, NIGERIA, KEBBI, BIRNIN KEBBI NGA11-1787, 29/11/2011 NIGERIA, KATSINA, MANI NGA11-1796, 2/10/2011, NIGERIA, KATSINA, MASHI NGA11-1733, 27/08/2011 NIGERIA, KANO, NASSARAWA NGA12-6291, 27/03/2012 NIGERIA, KATSINA, BATSARI NGA11-3897, 25/08/2011 NIGERIA, JIGAWA, RINGIM
The AUC is a common evaluation metric for binary classification problems.
Consider a plot of the true positive rate vs the false positive rate as the threshold value for classifying an item as “True” or “False” is increased from 0 to 1.
If the classifier is very good, the true positive rate will increase quickly and the area under this curve will be close to 1.
If the classifier is no better than random guessing, the true positive rate will increase linearly with the false positive rate and the area under this curve will be around 0.5.
The probability of at least one case in a district during a 6-month period is modeled as a function of an overall level of risk as well as a set of independent and spatially structured random effects, also known as the convolution model.1 In the first stage of this hierarchical model, we assume the presence or absence of cases in district i and period t (Xit) is distributed Xit~Bern(qit) where qit is the underlying rate of interest. We consider the logit linear model:
logit(qit)=μ+βiXi,t-1+β2Zi, t-1+θi+ϕi
where μ is the overall risk level, βi is the coefficient for at least one case in district i in the previous period t-1, β2 is the coefficient for the indicator Zi, t-1 = I[0<∑i~jXj,t-1] where j~i denotes the districts that have a shared boundary with district i, the binary variable describing if any districts neighboring district i had at least one case in t-1, θi is a spatially structured effect of district i, and ϕi is the independent effect of space.
At the second stage of the hierarchical model, we assign priors to the random effects. The independent effects are assigned the prior ϕi |σϕ2~N(0,σϕ2) for i=1,…,I. The spatially structured effect is assigned the intrinsic conditional autoregressive prior (ICAR)2 where θi|θ-i,σθ2∼N(∑j~iθj/mi,σθ2/mi), θ-i is the vector of θs excluding θi, and mi is the number of districts that share a boundary with district i. The summation of the Additional details about the spatially structured priors can be found in Guassian Markov Random Fields3.
This model was fit in R4 using the Integrated Nested Laplace Approximation (INLA)5,6 as implemented in the INLA package.7
The probability of at least one case in a district during a 6-month period is modeled as a function of an overall level of risk as well as the presence of cases in the previous period. The presence of a case in district i and period t(Xit) is distributed Xit~Bern(qit) where qit is the underlying rate of interest. We consider the logit linear model
logit(qit)=μ+β1Xi,t-1+β2Zi,t-1
where μ is the overall risk level, β1 is the coefficient for at least one case in district i in the previous period t-1, and β2 is the coefficient for Zi,t-1= ∑i~jXj,t-1 the total number of districts neighboring district i with at least one case in t-1.
This model was fit in R1 using the Integrated Nested Laplace Approximation (INLA)2,3 as implemented in the INLA package.4
The probability of at least one case in a district during the upcoming 6-month period is modeled using a random forest classifier1. Seven covariates are available to the ensemble: the total case count in the previous time period in the district and in its neighbors, the total and average historical case counts in the district and in its neighbors, and a dummy variable for whether the time period is the first or second half of the year as a proxy for seasonality. The model was fit in R2, using the randomForest package3.
This software is distributed as is, completely without warranty or service support. Institute for Disease Modeling and its employees are not liable for the condition or performance of the software.
This software is leveraging the following technologies and libraries: