Esercitazione 10

Esercizio 1

*Fill a histogram randomly (n=~10,000) with a Landau distribution with a most probable value at 20 and a “width” of 5 
(use the ROOT website to find out about the Landau function)
*Fill the same histogram randomly (n=~5,000) with a Gaussian distribution centered at 5 with a “width” of 3 
*Write a compiled script with a fit function that describes the total histogram nicely (it might be a good idea to fit both peaks individually first and use the fit parameters for a combined fit)
*Add titles to x- and y-axis
*Include a legend of the histogram with number of entries, mean, and RMS values
*Add text to the canvas with the fitted function parameters
*Draw everything on a square-size canvas (histogram in blue, fit in red)
*Save as png, eps, and root file

Esercizio 2

The exercise is divided into 3 part:
Processing data from a TTree, filling a histogram, and writing the results to an output file
Reading a file that contains multiple histograms and interpreting the results, writing the final plots to a pdf file
Reading a file that contains a histogram and fitting the histogram in different ways, writing the results to a pdf file
*Exercise 1:
Input files: Data.root, MC.root
Thanks to ATLAS and CERN Open Data for the data and simulation and Steven Schramm (Universite’ de Geneve) for filter the data and prepare the exercise
Output ROOT file: selection.root
Solution code (ROOT): selection.cpp
*Exercise 2:
Input file: selection.root
Output plot file: analysis.pdf
Solution code (ROOT): analysis.cpp
*Exercise 3:
Input file: selection.root
Output plot file: fit.pdf
Solution code (ROOT): fit.cpp

Part 1 

Create a macro the read the “.root” files (you case use the MakeClass function of the TTree object)
Inside the macro create the histogram that we want to fill:
We want to look at the di-lepton invariant mass to reconstruct the Z boson invariant mass (M=90 MeV/c2)
Loop over the TTree entries. Calculate the pairs of lepton four-vectors for each event and the di-lepton invariant mass:
Each time GetEntry is called, all of the variables we defined are overwritten with new values for the new event. We can now check that there are two leptons in the event and make their four-vectors, then use that to calculate the di-lepton invariant mass. We have to check that there are actually two leptons, as we can’t calculate a di-lepton mass if there is only one lepton in the event. 
If there are not two leptons, then we should skip this event (continue to the next event).
After we have confirmed that there are two leptons, we can build their four-vectors and store them as TLorentzVector objects, which contain lots of useful functions.

TLorentzVector lepton0 , lepton1 ;
lepton0.SetPtEtaPhiE ( pT [0] , eta [0] , phi [0] , nrg [0]);
lepton1.SetPtEtaPhiE ( pT [1] , eta [1] , phi [1] , nrg [1]);

We now have four-vector representations of the two leptons. However, we want the di-lepton system, which is the four-vector sum of the two leptons. Then, we want to store the mass of that system.

// Previous code in the loop is here
TLorentzVector dilepton = lepton0 + lepton1 ;
double dileptonMass = dilepton . M ();
Fill the histogram
Write the histogram to the output file and then close the output file, which is what triggers it to be written to disk rather than living in memory:
outHistFile -> Close();
Check the output file using the TBrowser
NOTE - this is the end of the main part of this exercise, while the below is adding more complex functionality. If you are short on time, you can download the file named “selection.root” and use that for the next two exercises.
Extend the program to handle both data and MC
Normally in physics, we want to compare our data with our simulation, representing the expectation
For example, is the peak we saw in the last part representative of the Z, or is it too small or too large?
To do this, we need our program to be able to run over both data and MC
When you run over MC, some special scaling factors are needed:
the MC TTree has some more extra information that can be used: 

tree -> SetBranchAddress ( "mcWeight "           ,& mcWeight );
tree -> SetBranchAddress ( "scaleFactor_PILEUP"  ,& sf_PILEUP );
tree -> SetBranchAddress ( "scaleFactor_MUON"    ,& sf_MUON );
tree -> SetBranchAddress ( "scaleFactor_TRIGGER" ,& sf_TRIGGER );

We then have to calculate the total weight to apply, which is just the product of these four values.

weight = mcWeight * sf_PILEUP * sf_MUON * sf_TRIGGER ;

Then, when we are filling the histogram, apply this weight:

mll->Fill( dileptonMass , weight );

Now your MC has been corrected to more correctly model the expected data distribution.


Part 2

Open the input histogram file(s) for reading
Retrieve the two histograms from the file

TH1D * dataHisto = ( TH1D *) histFile -> Get( "data" );
TH1D * mcHisto = ( TH1D *) histFile -> Get( "MC" );

Plot the two histograms in two pads of a canvas
Normalize the MC histogram: If you look at the plots, you will see that the y-axis has a very different range. The y-axis of the MC plot seems to be roughly 7 orders of magnitude larger than the data y-axis!
There are several reasons for this, but for now, the main point is that this is an artificial difference. As such, We want to normalize the MC histogram to data to remove this offset.
We don’t want to normalize it in a way that makes the distributions agree bin-by-bin, as they we could never search for new physics or measure the standard model properties. Instead, we want to just get the overall scale correct. 
One way to do this is to scale by the total number of events in data vs the number of events in MC. This is done using the integral of the histograms (which is the total number of events, as the y-axis is just the number of events in a given bin).

mcHisto -> Scale ( dataHisto -> Integral()/ mcHisto -> Integral());

Alternatively, you can decide to normalize the two histograms is a specific mass region
After the normalization, draw the data and MC on the same plot in a new canvas

Make a split-panel plot:
First, let’s prepare by creating a histogram which is the ratio of the data divided by the MC. Do this by creating a copy of the data histogram (to avoid modifying the original one) and dividing that histogram by the MC.

TH1D * ratio = ( TH1D *) dataHisto -> Clone ();
ratio ->Divide ( mcHisto );
ratio -> SetLineColor ( kRed );

On a canvas create a TPad (a sub-canvas) with a horizontal extent of the full range (0, y, 1, y) and a vertical extend of the upper 70% of the range (x, 0.3, x, 1). That is, this sub-canvas should be the upper 70% of the full canvas.
Set the sub-canvas to have a logarithmic y-axis, draw the sub-canvas on the canvas, set the sub-canvas to be the owner of whatever is drawn next, and then draw the MC and data histograms (not their ratios) on this upper sub-canvas.

TPad pad1 ( " pad1 " ," pad1 " ,0 ,0.3 ,1 ,1);
pad1.SetLogy( true );
pad1.Draw();
pad1.cd();
mcHisto -> Draw( " h " );
dataHisto -> Draw( " pe , same " );

Now, go back to the full canvas and make a second sub-canvas covering the bottom 30% of the plot, and then draw the ratio on that sub-canvas. Note that this sub-canvas is a linear scale, not logarithmic, as this is a ratio.

canvas.cd ();
TPad pad2( "pad2" ,"pad2" ,0 ,0.05 ,1 ,0.3);
pad2.Draw();
pad2.cd();
ratio->Draw( "pe" );
canvas.SaveAs( "plot.pdf");

Change the pad spacing to better use canvas space
First, remove the border spacing on the bottom of the top pad 

pad1.SetBottomMargin (0);

Now, remove the border spacing on the top of the bottom pad.
Also add a bit of extra space on the bottom of the top pad so that we have some space for later.

pad2.SetTopMargin(0);
pad2.SetBottomMargin(0.25);

Once this is done, the upper and lower plots touch, which makes sense as you only need to have one x-axis label for the two plots (they are the same axis).
Adjust the size and contents of axis labels and markers: First, get rid of the x-axis labels on the top plot, this removing the random ×10 3 on the far right of the plot

mcHisto -> SetTitle( " " );
mcHisto -> GetXaxis() -> SetLabelSize(0);
mcHisto -> GetXaxis() -> SetTitleSize(0);

Then increase the size of the y-axis label, which is necessary because the pad is only 70% of the full vertical range, so we need to increase the text size to get back to a reasonable value. 

mcHisto -> GetYaxis () -> SetTitleSize (0.05);

Now do the similar things to the bottom panel, also increasing the text sizes everywhere as this is now only 30% of the full vertical range. Note that the title offset is changed at one point to bring the label closer to the axis (so it doesn’t go off the canvas).

ratio -> SetTitle( "" );
ratio -> GetXaxis() -> SetLabelSize(0.12);
ratio -> GetXaxis() -> SetTitleSize(0.12);
ratio -> GetYaxis() -> SetLabelSize(0.1);
ratio -> GetYaxis() -> SetTitleSize(0.15);
ratio -> GetYaxis() -> SetTitle( " Data / MC " );
ratio -> GetYaxis() -> SetTitleOffset(0.3);

To guide the eye, it is useful to add a flat line at 1 on ratio plots. This is done using the TLine object, where you specify the (x start , y start , x end , and y end ) coordinates of the line.
Note that this should be created after you draw the ratio histogram. That is, we are adding a line on top of the ratio plot.	

TLine line (50. e3 ,1 ,200. e3 ,1);
line.SetLineColor ( kBlack );
line.SetLineWidth (2);
line.Draw ( " same " );

Now when you look at the data, you can see that there is generally agreement for di-lepton masses above roughly 70 GeV (70 × 103 MeV) within the statistical precision. However, there is a significant difference at lower mass values. This is because the MC sample we have used is actually only for one process, Z → ll, and is thus ignoring the Standard Model γ → ll process. That is why data is above MC, namely we are neglecting a real process present in data when calculating our MC expectation.
Add a legend to the plot:
Right now, we know that the data is black and the MC is red. However, if we show this plot to someone else, they won’t know this. It is thus very important to add legends to all of your plots.
Legends should be added after drawing all of the relevant histograms, adding a legend on top of the plot.
We also set the width of the line around the box to 0 to remove it.
Note that the coordinates that you use to specify the legend location are pad-global coordinates. That means that 0 is the bottom/left of the pad, 1 is the top/right of the pad, and any point in between is a fractional value. These numbers are relevant to the pad it is being drawn on, not the full canvas.
Add other labels to the plot:
It is often useful to add additional labels to plots. This can be done using TLatex, which allows you to write text at arbitrary locations on the plot.
Note that we use the SetNDC() function to specify that we want to use global coordinates, not specific points of a given plot. 
In this case, you only need to specify the starting x and y coordinates, not also the ending coordinates.
Labels should be added after everything else, before writing the final plot to the output pdf file.


Part 3

Create the basic structure of a ROOT macro, open the file, read the data histogram, and close the file
Create the TCanvas and prepare the output file
Plot the basic histogram as our starting point
Fit a Gaussian to the peak: fitting Gaussians in ROOT is remarkably easy, and this is a very common task.
The Gaussian function is pre-defined in ROOT. Create a TF1 (T for ROOT, F for function, 1 for 1-dimensional) as below. 
TF1 gaussFit ( " gaussfit " ," gaus " ,81. e3 ,101. e3 );
Fit a Breit-Wigner to the peak
If you have managed to complete everything so far, use what you learned from exercise 2 to make a split-panel plot of the data and its fit in the top panel, and the ratio of the data to its fit in the bottom panel.

//Breit-Wigner function
Double_t mybw(Double_t* x, Double_t* par)
{
  Double_t arg1 = 14.0/22.0; // 2 over pi
  Double_t arg2 = par[1]*par[1]*par[2]*par[2]; //Gamma=par[1]  M=par[2]
  Double_t arg3 = ((x[0]*x[0]) - (par[2]*par[2]))*((x[0]*x[0]) - (par[2]*par[2]));
  Double_t arg4 = x[0]*x[0]*x[0]*x[0]*((par[1]*par[1])/(par[2]*par[2]));
  return par[0]*arg1*arg2/(arg3 + arg4);
}


Esercizio 3

*Fill a histogram randomly (n=~10,000) with a Landau distribution with a most probable value at 20 and a “width” of 5 
(use the ROOT website to find out about the Landau function)
*Fill the same histogram randomly (n=~5,000) with a Gaussian distribution centered at 5 with a “width” of 3 
*Write a compiled script with a fit function that describes the total histogram nicely (it might be a good idea to fit both peaks individually first and use the fit parameters for a combined fit)
*Add titles to x- and y-axis
*Include a legend of the histogram with number of entries, mean, and RMS values
*Add text to the canvas with the fitted function parameters
*Draw everything on a square-size canvas (histogram in blue, fit in red)
*Save as png, eps, and root file