# Exercises for 2.3: Simple inferential statistics

## Preliminaries

• load the driving.Rdata file.
• type head( driving ) to look at the first few observations
• load the following packages: lsr
load( "~/Work/Research/Rbook/workshop_dsto/datasets/driving.Rdata")
head( driving )
##    id gender age distractor peak.hour errors_time1 errors_time2 rt_time1
## 1 s.1   male  19      radio       yes            7            7      346
## 2 s.2 female  42    toddler        no           15           16      424
## 3 s.3   male  27       none        no           10            7      415
## 4 s.4 female  22      radio       yes            5            1      266
## 5 s.5 female  33       none        no            4            9      302
## 6 s.6 female  35    toddler       yes           15           12      423
##   rt_time2
## 1      636
## 2      787
## 3      580
## 4      459
## 5      513
## 6      767
library(lsr)

## Exercise 2.3.1: Confidence intervals

• use ciMean() to calculate the 95% confidence interval for the mean age
• use ciMean() to calculate the 99% confidence interval for the mean age
• use ciMean() to calculate 95% confidence intervals for all variables
• use aggregate and ciMean to calculate 95% confidence intervals for the mean age separately for each distractor type

## Solution 2.3.1:

There’s currently a tiny bug in ciMean that stops it from reading variable names properly in some cases. You can see it in this output, which differs slightly from the output shown in the slides: it’s part of why the “aggreated” version doesn’t display labels properly. I’ll fix this in an update of lsr:

ciMean( driving$age ) ## 2.5% 97.5% ## [1,] 25.75776 33.06577 ciMean( driving$age, conf=.99 )
##          0.5%    99.5%
## [1,] 24.37732 34.44621
ciMean( driving )
##                    2.5%     97.5%
## id*                  NA        NA
## gender*              NA        NA
## age           25.757758  33.06577
## distractor*          NA        NA
## peak.hour*           NA        NA
## errors_time1   7.661346  12.92689
## errors_time2   4.752800  10.77661
## rt_time1     338.891704 411.57888
## rt_time2     498.663583 640.51289
aggregate( age ~ distractor, driving, ciMean )
##   distractor    age.1    age.2
## 1       none 24.42609 32.77391
## 3    toddler 32.69136 39.64197

## Exercise 2.3.2: Independent samples t-test

• Run a t-test comparing the number of errors (at time 1) for the peak hour group versus the non peak hour group.
• Calculate Cohen’s d.

## Solution 2.3.2:

t.test( errors_time1 ~ peak.hour, driving )
##
##  Welch Two Sample t-test
##
## data:  errors_time1 by peak.hour
## t = -0.96656, df = 13.254, p-value = 0.3511
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -7.897431  3.008542
## sample estimates:
## mean in group yes  mean in group no
##           9.00000          11.44444
cohensD( errors_time1 ~ peak.hour, driving )
## [1] 0.4768209

## Exercise 2.3.3: Paired samples t-test

• Run a paired samples t-test to see if the number of errors made at time 2 differs from the number of erros people made at tie 1

## Solution 2.3.3:

t.test( driving$errors_time1, driving$errors_time2, paired=TRUE )
##
##  Paired t-test
##
## data:  driving$errors_time1 and driving$errors_time2
## t = 2.4075, df = 16, p-value = 0.02849
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.302193 4.756631
## sample estimates:
## mean of the differences
##                2.529412

## Exercise 2.3.4: Chi-square goodness of fit tests

• Run a chi-square goodness of fit to see if the number of females vs males differs significantly from chance

(counts <- table( driving$gender )) ## ## female male ## 12 5 chisq.test( counts, p=c(.5,.5) ) ## ## Chi-squared test for given probabilities ## ## data: counts ## X-squared = 2.8824, df = 1, p-value = 0.08956 ## Exercise 2.3.5: Chi-square tests of association • Run a chi-square test of association to see if there is a significant association between gender and distractor type • Calculate Cramer’s V for the association ## Solution 2.3.5: (counts2 <- table( driving$gender, driving$distractor )) ## ## none radio toddler ## female 3 4 5 ## male 2 2 1 chisq.test( counts2 ) ## Warning in chisq.test(counts2): Chi-squared approximation may be incorrect ## ## Pearson's Chi-squared test ## ## data: counts2 ## X-squared = 0.78389, df = 2, p-value = 0.6757 cramersV( counts2 ) ## Warning in chisq.test(...): Chi-squared approximation may be incorrect ## [1] 0.214735 ## Bonus 2.3.5: Here’s the Fisher exact test, just in case: fisher.test( counts2 ) ## ## Fisher's Exact Test for Count Data ## ## data: counts2 ## p-value = 0.8182 ## alternative hypothesis: two.sided ## Exercise 2.3.6: Testing the significance of a single correlation • use cor.test to see if the Pearson correlation between errors and RT at time 1 is significant • again using cor.test, see if the Spearman correlation between errors and RT at time 1 is significant ## Solution 2.3.6: cor.test( driving$errors_time1, driving$rt_time1 ) ## ## Pearson's product-moment correlation ## ## data: driving$errors_time1 and driving$rt_time1 ## t = 3.8782, df = 15, p-value = 0.001486 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## 0.3438925 0.8866728 ## sample estimates: ## cor ## 0.7075811 cor.test( driving$errors_time1, driving$rt_time1, method="spearman" ) ## Warning in cor.test.default(driving$errors_time1, driving$rt_time1, method ## = "spearman"): Cannot compute exact p-value with ties ## ## Spearman's rank correlation rho ## ## data: driving$errors_time1 and driving\$rt_time1
## S = 217.03, p-value = 0.0007948
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho
## 0.7340295

## Exercise 2.3.7: Testing all pairwise correlations

• use correlate to test the significance of all pairwise correlations among (numeric) variables in driving

## Solution 2.3.7:

correlate( driving, test=TRUE )
##
## CORRELATIONS
## ============
## - correlation type:  pearson
## - correlations shown only when both variables are numeric
##
##              id    gender      age    distractor    peak.hour
## id            .         .        .             .            .
## gender        .         .        .             .            .
## age           .         .        .             .            .
## distractor    .         .        .             .            .
## peak.hour     .         .        .             .            .
## errors_time1  .         .    0.407             .            .
## errors_time2  .         .    0.282             .            .
## rt_time1      .         .    0.521             .            .
## rt_time2      .         .    0.428             .            .
##              errors_time1    errors_time2    rt_time1    rt_time2
## id                      .               .           .           .
## gender                  .               .           .           .
## age                 0.407           0.282       0.521       0.428
## distractor              .               .           .           .
## peak.hour               .               .           .           .
## errors_time1            .           0.696*      0.708*      0.605.
## errors_time2        0.696*              .       0.404       0.466
## rt_time1            0.708*          0.404           .        0.789**
## rt_time2            0.605.          0.466        0.789**        .
##
## ---
## Signif. codes: . = p < .1, * = p<.05, ** = p<.01, *** = p<.001
##
##
## p-VALUES
## ========
## - total number of tests run:  10
## - correction for multiple testing:  holm
##
##              id gender   age distractor peak.hour errors_time1
## id            .      .     .          .         .            .
## gender        .      .     .          .         .            .
## age           .      .     .          .         .        0.347
## distractor    .      .     .          .         .            .
## peak.hour     .      .     .          .         .            .
## errors_time1  .      . 0.347          .         .            .
## errors_time2  .      . 0.347          .         .        0.015
## rt_time1      .      . 0.191          .         .        0.013
## rt_time2      .      . 0.347          .         .        0.070
##              errors_time2 rt_time1 rt_time2
## id                      .        .        .
## gender                  .        .        .
## age                 0.347    0.191    0.347
## distractor              .        .        .
## peak.hour               .        .        .
## errors_time1        0.015    0.013    0.070
## errors_time2            .    0.347    0.296
## rt_time1            0.347        .    0.002
## rt_time2            0.296    0.002        .
##
##
## SAMPLE SIZES
## ============
##
##              id gender age distractor peak.hour errors_time1 errors_time2
## id           17     17  17         17        17           17           17
## gender       17     17  17         17        17           17           17
## age          17     17  17         17        17           17           17
## distractor   17     17  17         17        17           17           17
## peak.hour    17     17  17         17        17           17           17
## errors_time1 17     17  17         17        17           17           17
## errors_time2 17     17  17         17        17           17           17
## rt_time1     17     17  17         17        17           17           17
## rt_time2     17     17  17         17        17           17           17
##              rt_time1 rt_time2
## id                 17       17
## gender             17       17
## age                17       17
## distractor         17       17
## peak.hour          17       17
## errors_time1       17       17
## errors_time2       17       17
## rt_time1           17       17
## rt_time2           17       17