Exercises for 2.3: Simple inferential statistics

Preliminaries

  • load the driving.Rdata file.
  • type head( driving ) to look at the first few observations
  • load the following packages: lsr
load( "~/Work/Research/Rbook/workshop_dsto/datasets/driving.Rdata")
head( driving )
##    id gender age distractor peak.hour errors_time1 errors_time2 rt_time1
## 1 s.1   male  19      radio       yes            7            7      346
## 2 s.2 female  42    toddler        no           15           16      424
## 3 s.3   male  27       none        no           10            7      415
## 4 s.4 female  22      radio       yes            5            1      266
## 5 s.5 female  33       none        no            4            9      302
## 6 s.6 female  35    toddler       yes           15           12      423
##   rt_time2
## 1      636
## 2      787
## 3      580
## 4      459
## 5      513
## 6      767
library(lsr)

Exercise 2.3.1: Confidence intervals

  • use ciMean() to calculate the 95% confidence interval for the mean age
  • use ciMean() to calculate the 99% confidence interval for the mean age
  • use ciMean() to calculate 95% confidence intervals for all variables
  • use aggregate and ciMean to calculate 95% confidence intervals for the mean age separately for each distractor type

Solution 2.3.1:

There’s currently a tiny bug in ciMean that stops it from reading variable names properly in some cases. You can see it in this output, which differs slightly from the output shown in the slides: it’s part of why the “aggreated” version doesn’t display labels properly. I’ll fix this in an update of lsr:

ciMean( driving$age )
##          2.5%    97.5%
## [1,] 25.75776 33.06577
ciMean( driving$age, conf=.99 )
##          0.5%    99.5%
## [1,] 24.37732 34.44621
ciMean( driving )
##                    2.5%     97.5%
## id*                  NA        NA
## gender*              NA        NA
## age           25.757758  33.06577
## distractor*          NA        NA
## peak.hour*           NA        NA
## errors_time1   7.661346  12.92689
## errors_time2   4.752800  10.77661
## rt_time1     338.891704 411.57888
## rt_time2     498.663583 640.51289
aggregate( age ~ distractor, driving, ciMean )
##   distractor    age.1    age.2
## 1       none 24.42609 32.77391
## 2      radio 16.54302 30.12365
## 3    toddler 32.69136 39.64197

Exercise 2.3.2: Independent samples t-test

  • Run a t-test comparing the number of errors (at time 1) for the peak hour group versus the non peak hour group.
  • Calculate Cohen’s d.

Solution 2.3.2:

t.test( errors_time1 ~ peak.hour, driving )
## 
##  Welch Two Sample t-test
## 
## data:  errors_time1 by peak.hour
## t = -0.96656, df = 13.254, p-value = 0.3511
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -7.897431  3.008542
## sample estimates:
## mean in group yes  mean in group no 
##           9.00000          11.44444
cohensD( errors_time1 ~ peak.hour, driving )
## [1] 0.4768209

Exercise 2.3.3: Paired samples t-test

  • Run a paired samples t-test to see if the number of errors made at time 2 differs from the number of erros people made at tie 1

Solution 2.3.3:

t.test( driving$errors_time1, driving$errors_time2, paired=TRUE )
## 
##  Paired t-test
## 
## data:  driving$errors_time1 and driving$errors_time2
## t = 2.4075, df = 16, p-value = 0.02849
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.302193 4.756631
## sample estimates:
## mean of the differences 
##                2.529412

Exercise 2.3.4: Chi-square goodness of fit tests

  • Run a chi-square goodness of fit to see if the number of females vs males differs significantly from chance

Solution 2.3.4:

(counts <- table( driving$gender ))
## 
## female   male 
##     12      5
chisq.test( counts, p=c(.5,.5) )
## 
##  Chi-squared test for given probabilities
## 
## data:  counts
## X-squared = 2.8824, df = 1, p-value = 0.08956

Exercise 2.3.5: Chi-square tests of association

  • Run a chi-square test of association to see if there is a significant association between gender and distractor type
  • Calculate Cramer’s V for the association

Solution 2.3.5:

(counts2 <- table( driving$gender, driving$distractor ))
##         
##          none radio toddler
##   female    3     4       5
##   male      2     2       1
chisq.test( counts2 )
## Warning in chisq.test(counts2): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  counts2
## X-squared = 0.78389, df = 2, p-value = 0.6757
cramersV( counts2 )
## Warning in chisq.test(...): Chi-squared approximation may be incorrect
## [1] 0.214735

Bonus 2.3.5:

Here’s the Fisher exact test, just in case:

fisher.test( counts2 )
## 
##  Fisher's Exact Test for Count Data
## 
## data:  counts2
## p-value = 0.8182
## alternative hypothesis: two.sided

Exercise 2.3.6: Testing the significance of a single correlation

  • use cor.test to see if the Pearson correlation between errors and RT at time 1 is significant
  • again using cor.test, see if the Spearman correlation between errors and RT at time 1 is significant

Solution 2.3.6:

cor.test( driving$errors_time1, driving$rt_time1 )
## 
##  Pearson's product-moment correlation
## 
## data:  driving$errors_time1 and driving$rt_time1
## t = 3.8782, df = 15, p-value = 0.001486
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3438925 0.8866728
## sample estimates:
##       cor 
## 0.7075811
cor.test( driving$errors_time1, driving$rt_time1, method="spearman" )
## Warning in cor.test.default(driving$errors_time1, driving$rt_time1, method
## = "spearman"): Cannot compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  driving$errors_time1 and driving$rt_time1
## S = 217.03, p-value = 0.0007948
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.7340295

Exercise 2.3.7: Testing all pairwise correlations

  • use correlate to test the significance of all pairwise correlations among (numeric) variables in driving

Solution 2.3.7:

correlate( driving, test=TRUE )
## 
## CORRELATIONS
## ============
## - correlation type:  pearson 
## - correlations shown only when both variables are numeric
## 
##              id    gender      age    distractor    peak.hour   
## id            .         .        .             .            .   
## gender        .         .        .             .            .   
## age           .         .        .             .            .   
## distractor    .         .        .             .            .   
## peak.hour     .         .        .             .            .   
## errors_time1  .         .    0.407             .            .   
## errors_time2  .         .    0.282             .            .   
## rt_time1      .         .    0.521             .            .   
## rt_time2      .         .    0.428             .            .   
##              errors_time1    errors_time2    rt_time1    rt_time2   
## id                      .               .           .           .   
## gender                  .               .           .           .   
## age                 0.407           0.282       0.521       0.428   
## distractor              .               .           .           .   
## peak.hour               .               .           .           .   
## errors_time1            .           0.696*      0.708*      0.605.  
## errors_time2        0.696*              .       0.404       0.466   
## rt_time1            0.708*          0.404           .        0.789**
## rt_time2            0.605.          0.466        0.789**        .   
## 
## ---
## Signif. codes: . = p < .1, * = p<.05, ** = p<.01, *** = p<.001
## 
## 
## p-VALUES
## ========
## - total number of tests run:  10 
## - correction for multiple testing:  holm 
## 
##              id gender   age distractor peak.hour errors_time1
## id            .      .     .          .         .            .
## gender        .      .     .          .         .            .
## age           .      .     .          .         .        0.347
## distractor    .      .     .          .         .            .
## peak.hour     .      .     .          .         .            .
## errors_time1  .      . 0.347          .         .            .
## errors_time2  .      . 0.347          .         .        0.015
## rt_time1      .      . 0.191          .         .        0.013
## rt_time2      .      . 0.347          .         .        0.070
##              errors_time2 rt_time1 rt_time2
## id                      .        .        .
## gender                  .        .        .
## age                 0.347    0.191    0.347
## distractor              .        .        .
## peak.hour               .        .        .
## errors_time1        0.015    0.013    0.070
## errors_time2            .    0.347    0.296
## rt_time1            0.347        .    0.002
## rt_time2            0.296    0.002        .
## 
## 
## SAMPLE SIZES
## ============
## 
##              id gender age distractor peak.hour errors_time1 errors_time2
## id           17     17  17         17        17           17           17
## gender       17     17  17         17        17           17           17
## age          17     17  17         17        17           17           17
## distractor   17     17  17         17        17           17           17
## peak.hour    17     17  17         17        17           17           17
## errors_time1 17     17  17         17        17           17           17
## errors_time2 17     17  17         17        17           17           17
## rt_time1     17     17  17         17        17           17           17
## rt_time2     17     17  17         17        17           17           17
##              rt_time1 rt_time2
## id                 17       17
## gender             17       17
## age                17       17
## distractor         17       17
## peak.hour          17       17
## errors_time1       17       17
## errors_time2       17       17
## rt_time1           17       17
## rt_time2           17       17