You are free to choose the group’s composition;
groups must have a minimum of three members and a maximum of four members;
the group’s composition needs to be sent to the professors via mail;
each group’s member must present and has 10 minutes during the whole group’s presentation;
although the list of points/questions to be answered is the same (see below), each group will work with different data and variables. Group-specific data and variables will be communicated via mail (check your mail!) once you communicate us your group composition;
the groups are not asked to produce a thorough report for the final project; however, they are asked to prepare a PDF/HTML presentation (using Power Point, Beamer, or any other software) containing any useful analysis/plot/formula and, possibly, to send the pdf presentation to the professors via email (it is sufficient to send it the day before the exam takes place);
remember to enroll yourself to the Appello you want to deal with (for the winter session, 27 January or 23 February).
Download the data for the Covid-19 spreading outbreak from the official website of Protezione Civile, by using the following command:
read.csv("https://raw.githubusercontent.com/pcm-dpc/
COVID-19/master/dati-regioni/dpc-covid19-ita-regioni.csv")
The dataset contains the following variables:
data
Date of notificationstato
Country of referencecodice_regione
Code of the Region (ISTAT 2019)denominazione_regione
Name of the Regionlat
Latitudelong
Longitudericoverati_con_sintomi
Hospitalised patients with symptomsterapia_intensiva
Intensive Careingressi_terapia_intensiva
Daily admissions to intensive caretotale_ospedalizzati
Total hospitalised patientsisolamento_domiciliare
Home confinementtotale_positivi
Total amount of current positive cases (Hospitalised patients + Home confinement)variazione_totale_positivi
New amount of current positive cases (totale_positivi current day - totale_positivi previous day)nuovi_positivi
New amount of current positive cases (totale_casi current day - totale_casi previous day)dimessi_guariti
Recovereddeceduti
Death (cumulated values)totale_casi
Total amount of positive casestamponi
Tests performedcasi_testati
Total number of people testedEach grop will be assigned a specific region/couple of regions to deal with and a response variable \(Y\) among the following ones: terapia_intensiva
, totale_ospedalizzati
, nuovi_positivi
, the positivity ratio.
Consider your dataset starting from 1st October 2020 to 1st February 2021 (basically, the second wave) and prepare your presentation by considering the following points:
Perform some explanatory analysis for your data, especially by use of graphical tools.
Describe the quality of the data and discuss whether and how a plausible statistical model could be posed.
Build a model for your response variable \(Y\). To this aim you can adopt any among the regression techniques covered during the course. Comment the estimates from the best model obtained.
By building your model, evaluate the inclusion of some covariates and their effect on the response variable. Some possible covariates could regard: the regional colors (yellow, orange, red), the partial lockdown regime, some region-specific laws and rules, etc.
Check the model fit by using the proper tools, such as residuals plots.
Provide 10-15 days-forward predictions and check their accuracy (say, for the period February 2nd to February 15, 2021).
Compare alternative models in terms of predictive information criteria and comment.
And remember: all the models are wrong, but some are useful. Feel free to use any model (possibly one from those covered in the course).