5th/7th November 2025 - Vincenzo Gioia
Advertising Data Set
First question?
First Answer
Advertising Data Set
Goal
Reading the Data
The working directory
Explore the Structure of the data set
[1] 200 5
[1] "X" "TV" "Radio" "Newspaper" "Sales"
'data.frame': 200 obs. of 5 variables:
$ X : int 1 2 3 4 5 6 7 8 9 10 ...
$ TV : num 230.1 44.5 17.2 151.5 180.8 ...
$ Radio : num 37.8 39.3 45.9 41.3 10.8 48.9 32.8 19.6 2.1 2.6 ...
$ Newspaper: num 69.2 45.1 69.3 58.5 58.4 75 23.5 11.6 1 21.2 ...
$ Sales : num 22.1 10.4 9.3 18.5 12.9 7.2 11.8 13.2 4.8 10.6 ...
Univariate Exploratory Analysis
TV Radio Newspaper Sales
Min. : 0.70 Min. : 0.000 Min. : 0.30 Min. : 1.60
1st Qu.: 74.38 1st Qu.: 9.975 1st Qu.: 12.75 1st Qu.:10.38
Median :149.75 Median :22.900 Median : 25.75 Median :12.90
Mean :147.04 Mean :23.264 Mean : 30.55 Mean :14.02
3rd Qu.:218.82 3rd Qu.:36.525 3rd Qu.: 45.10 3rd Qu.:17.40
Max. :296.40 Max. :49.600 Max. :114.00 Max. :27.00
Univariate Exploratory Analysis
Problems and questions
1. Is there a relationship between advertising budget and sales?
1. Is there a relationship between advertising budget and sales?
\[y = 7.033 + 0.047 x_2\]
1. Is there a relationship between advertising budget and sales?
1. Is there a relationship between advertising budget and sales?
1. Is there a relationship between advertising budget and sales?
Confidence interval: \[IC^{1-\alpha}_{\beta_r}=(\hat \beta_r - t_{n-p; 1- \alpha/2} \sqrt{S^2((X^\top X)^{-1}_{rr})}, \hat \beta_r + t_{n-p; 1- \alpha/2} \sqrt{S^2((X^\top X)^{-1}_{rr})})\]
For instance for \(\beta_1\) (\(1-\alpha=0.95\)) a realization is \[IC^{0.95}_{\beta_1}=(7.033 - t_{198;0.975} \times 0.458, 7.033 + t_{198;0.975} \times 0.458) \approx (6.13, 7.94)\]
In absence of any TV advertising the sales, will, on average, fall somewhere between 6.13 and 7.94 units
For each \(1000\$\) increase in television advertising, there will be an average increase in sales betweem 42 and 53 units
1. Is there a relationship between advertising budget and sales?
1. Is there a relationship between advertising budget and sales?
1. Is there a relationship between advertising budget and sales?
Assessing the accuracy of the model via \(R^2\) coefficient (proportion of variability explained by the model) \[R^2 = 1 - \frac{\sum_{i=1}^{n}(y_i - \hat y_i)^2}{\sum_{i=1}^{n}(y_i - \bar y)^2}\]
It does not depend on the scale of \(y\): a value close to 1 means that a large proportion of the variability is explained by the regression, while a value close to 0 means that the regression line does not explain much of the variability of \(y\) (this might occur because the linear model is wrong and/or the error variance is high)
Here, \(R^2=0.61\) means the less than 2/3 of the variability in sales is explained by regressing sales on TV
However, also in this case it is still challenging to determine what is a good \(R^2\) value and it depends on the application (in typical applications in biology, psychology, marketing … the linear model is at best an extremely rough approximation to the data, and residual errors due to other unmeasured factors are often very large)
1. Is there a relationship between advertising budget and sales?
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.3116381 0.56290050 16.542245 3.561071e-39
Radio 0.2024958 0.02041131 9.920765 4.354966e-19
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.3514071 0.62142019 19.876096 4.713507e-49
Newspaper 0.0546931 0.01657572 3.299591 1.148196e-03
1. Is there a relationship between advertising budget and sales?
1. Is there a relationship between advertising budget and sales?
1. Is there a relationship between advertising budget and sales?
1. Is there a relationship between advertising budget and sales?
1. Is there a relationship between advertising budget and sales?
This because in the simple linear regression model, the slope term represents the average increase in sales associated with an additional \(1000\$\) in Newspaper, ignoring TV and Radio, while in the multiple linear regression the slope term represents the average increase in sales associated with an additional \(1000\$\) in Newspaper, holding fixed Radio and TV
This is due to the correlation between Radio and Newspaper (0.35): markets with high Newspaper advertising tend to have high Radio advertising
Indeed, in markets where we spend more on radio our sales will tend to be higher, and as the correlation shows, we also tend to spend more on newspaper advertising in those same markets
In other words, Newspaper is a surrogate for Radio Advertising: newspaper gets credit for the association between radio and sales
1. Is there a relationship between advertising budget and sales?
1. Is there a relationship between advertising budget and sales?
1. Is there a relationship between advertising budget and sales?
2. How strong is the relationship between advertising budget and sales?
2. How strong is the relationship between advertising budget and sales?
2. How strong is the relationship between advertising budget and sales?
3. Which media are associated with sales?
4. How large is the association between each medium and sales?
4. How large is the association between each medium and sales?
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.03259355 0.457842940 15.36028 1.40630e-35
TV 0.04753664 0.002690607 17.66763 1.46739e-42
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.3116381 0.56290050 16.542245 3.561071e-39
Radio 0.2024958 0.02041131 9.920765 4.354966e-19
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.3514071 0.62142019 19.876096 4.713507e-49
Newspaper 0.0546931 0.01657572 3.299591 1.148196e-03
5. How well can we predict future sales?
5. How well can we predict future sales?
5. How well can we predict future sales?
6. Is the relationship linear?
6. Is the relationship linear?
Call:
lm(formula = Sales ~ TV + Radio + I(TV^2), data = Advertising)
Residuals:
Min 1Q Median 3Q Max
-7.3860 -0.8822 -0.0498 0.9613 3.5725
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.288e+00 3.588e-01 3.588 0.000421 ***
TV 7.844e-02 4.985e-03 15.736 < 2e-16 ***
Radio 1.930e-01 7.293e-03 26.465 < 2e-16 ***
I(TV^2) -1.136e-04 1.677e-05 -6.775 1.42e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.517 on 196 degrees of freedom
Multiple R-squared: 0.9167, Adjusted R-squared: 0.9154
F-statistic: 719 on 3 and 196 DF, p-value: < 2.2e-16
6. Is the relationship linear?
7. Is there any synergy among the advertising media?
7. Is there any synergy among the advertising media?
7. Is there any synergy among the advertising media?
fitLMint <- lm(Sales ~ TV*Radio, data = Advertising)
# Equivalently
fitLMint <- lm(Sales ~ TV + Radio + TV:Radio, data = Advertising)
round(summary(fitLMint)$coefficients,4) Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.7502 0.2479 27.2328 0.0000
TV 0.0191 0.0015 12.6990 0.0000
Radio 0.0289 0.0089 3.2408 0.0014
TV:Radio 0.0011 0.0001 20.7266 0.0000
[1] 0.9677905
7. Is there any synergy among the advertising media?
Transforming the outcome
fitLMtrans <- lm(I(log(Sales)) ~ TV + I(TV^2) + Radio + TV:Radio, data = Advertising)
summary(fitLMtrans)
Call:
lm(formula = I(log(Sales)) ~ TV + I(TV^2) + Radio + TV:Radio,
data = Advertising)
Residuals:
Min 1Q Median 3Q Max
-1.42334 -0.03854 -0.00468 0.06355 0.19313
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.651e+00 4.111e-02 40.157 < 2e-16 ***
TV 7.703e-03 4.763e-04 16.174 < 2e-16 ***
I(TV^2) -1.798e-05 1.471e-06 -12.223 < 2e-16 ***
Radio 5.960e-03 1.259e-03 4.735 4.21e-06 ***
TV:Radio 4.656e-05 7.395e-06 6.297 1.97e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1331 on 195 degrees of freedom
Multiple R-squared: 0.8989, Adjusted R-squared: 0.8968
F-statistic: 433.5 on 4 and 195 DF, p-value: < 2.2e-16
Transforming the outcome
Transforming the outcome
What we will explore next week?