- Count data can be considered as coming from a random variable that follows a Poisson distribution, which has a unique parameter, this parameter being the mean and the variance of the series
\[Y_1, \ldots, Y_n \stackrel{i.i.d}{\sim} Poisson(\lambda), \quad \lambda > 0\] \[P(Y_i=y) = \frac{e^{-\lambda}\lambda^y}{y!}, \quad y\in \{0,1,2, \ldots\}\] \[\mathbf{E}(Y)=\lambda \quad \mathbf{V}(Y)=\lambda\]
The ML estimator of this parameter is the sample mean \(\hat \lambda_MV = \bar Y\)
Count data often take small values
While comparing real count data to Poisson distribution, one often faces two problems:
- overdispersion, which means that the variance is much greater than the mean
- excess of zero, which means that the mean probability of a zero value computed using the Poisson distribution is often much less than the observed share of zero in the sample