Exercises for 2.5: Data manipulation

Preliminaries

  • load the driving.Rdata file.
  • type head( driving ) to look at the first few observations
  • load the following packages: lsr
load( "~/Work/Research/Rbook/workshop_dsto/datasets/driving.Rdata")
head( driving )
##    id gender age distractor peak.hour errors_time1 errors_time2 rt_time1
## 1 s.1   male  19      radio       yes            7            7      346
## 2 s.2 female  42    toddler        no           15           16      424
## 3 s.3   male  27       none        no           10            7      415
## 4 s.4 female  22      radio       yes            5            1      266
## 5 s.5 female  33       none        no            4            9      302
## 6 s.6 female  35    toddler       yes           15           12      423
##   rt_time2
## 1      636
## 2      787
## 3      580
## 4      459
## 5      513
## 6      767
library(lsr)

Exercise 2.5.1: Reshaping from wide to long

  • use wideToLong() to make a long-form version of the driving data frame. Save the results to driving.2
  • if you didn’t already give the within-subjects factor a meaningful name, repeat the command but this time use the within argument to specifiy a good name for it

Solution 2.5.1

driving.2 <- wideToLong( driving )
head(driving.2)
##    id gender age distractor peak.hour within errors  rt
## 1 s.1   male  19      radio       yes  time1      7 346
## 2 s.2 female  42    toddler        no  time1     15 424
## 3 s.3   male  27       none        no  time1     10 415
## 4 s.4 female  22      radio       yes  time1      5 266
## 5 s.5 female  33       none        no  time1      4 302
## 6 s.6 female  35    toddler       yes  time1     15 423
(driving.2 <- wideToLong( driving, within="time" ))
##      id gender age distractor peak.hour  time errors  rt
## 1   s.1   male  19      radio       yes time1      7 346
## 2   s.2 female  42    toddler        no time1     15 424
## 3   s.3   male  27       none        no time1     10 415
## 4   s.4 female  22      radio       yes time1      5 266
## 5   s.5 female  33       none        no time1      4 302
## 6   s.6 female  35    toddler       yes time1     15 423
## 7   s.7 female  24       none        no time1      7 374
## 8   s.8   male  29       none       yes time1      4 241
## 9   s.9 female  22      radio        no time1     12 370
## 10 s.10 female  32    toddler        no time1     14 463
## 11 s.11 female  36    toddler       yes time1     18 453
## 12 s.12   male  37    toddler        no time1     19 463
## 13 s.13 female  31      radio        no time1      9 346
## 14 s.14   male  15      radio        no time1     13 308
## 15 s.15 female  31      radio       yes time1      3 322
## 16 s.16 female  30       none       yes time1      6 459
## 17 s.17 female  35    toddler       yes time1     14 404
## 18  s.1   male  19      radio       yes time2      7 636
## 19  s.2 female  42    toddler        no time2     16 787
## 20  s.3   male  27       none        no time2      7 580
## 21  s.4 female  22      radio       yes time2      1 459
## 22  s.5 female  33       none        no time2      9 513
## 23  s.6 female  35    toddler       yes time2     12 767
## 24  s.7 female  24       none        no time2     10 651
## 25  s.8   male  29       none       yes time2      0 281
## 26  s.9 female  22      radio        no time2     13 558
## 27 s.10 female  32    toddler        no time2      4 700
## 28 s.11 female  36    toddler       yes time2      6 718
## 29 s.12   male  37    toddler        no time2     19 634
## 30 s.13 female  31      radio        no time2      4 506
## 31 s.14   male  15      radio        no time2     10 414
## 32 s.15 female  31      radio       yes time2      0 401
## 33 s.16 female  30       none       yes time2      0 596
## 34 s.17 female  35    toddler       yes time2     14 482

Exercise 2.5.2: Reshaping from long to wide

  • use the longToWide() function to make a wide-form version of the driving.2 data frame that you created in 2.5.1. Save the results to driving.3

Solution 2.5.2

driving.3 <- longToWide( 
  data = driving.2,
  formula = rt + errors ~ time
)
driving.3
##      id gender age distractor peak.hour rt_time1 errors_time1 rt_time2
## 1   s.1   male  19      radio       yes      346            7      636
## 2   s.2 female  42    toddler        no      424           15      787
## 3   s.3   male  27       none        no      415           10      580
## 4   s.4 female  22      radio       yes      266            5      459
## 5   s.5 female  33       none        no      302            4      513
## 6   s.6 female  35    toddler       yes      423           15      767
## 7   s.7 female  24       none        no      374            7      651
## 8   s.8   male  29       none       yes      241            4      281
## 9   s.9 female  22      radio        no      370           12      558
## 10 s.10 female  32    toddler        no      463           14      700
## 11 s.11 female  36    toddler       yes      453           18      718
## 12 s.12   male  37    toddler        no      463           19      634
## 13 s.13 female  31      radio        no      346            9      506
## 14 s.14   male  15      radio        no      308           13      414
## 15 s.15 female  31      radio       yes      322            3      401
## 16 s.16 female  30       none       yes      459            6      596
## 17 s.17 female  35    toddler       yes      404           14      482
##    errors_time2
## 1             7
## 2            16
## 3             7
## 4             1
## 5             9
## 6            12
## 7            10
## 8             0
## 9            13
## 10            4
## 11            6
## 12           19
## 13            4
## 14           10
## 15            0
## 16            0
## 17           14

Exercise 2.5.3: Cutting a continuous variable into categories

  • use cut() to cut driving$age into 3 bins of approximately equal size (i.e. similar age ranges). Save the result to age.group.1.
  • use table() to look at how many people fall in the different age groups, and look at the category names to see how wide each of the age groups are.
  • use quantileCut() to cut driving$age into 3 bins of approximately equal frequency (i.e., similar number of people in each group). Save the result to age.group.2.
  • use table() to look at how many people fall in the different age groups, and look at the category names to see how wide each of the age groups are.

Solution 2.5.3:

(age.group.1 <- cut( driving$age, 3 ))
##  [1] (15,24] (33,42] (24,33] (15,24] (24,33] (33,42] (15,24] (24,33]
##  [9] (15,24] (24,33] (33,42] (33,42] (24,33] (15,24] (24,33] (24,33]
## [17] (33,42]
## Levels: (15,24] (24,33] (33,42]
table( age.group.1)
## age.group.1
## (15,24] (24,33] (33,42] 
##       5       7       5
(age.group.2 <- quantileCut( driving$age, 3 ))
##  [1] (15,27.7]   (32.7,42]   (15,27.7]   (15,27.7]   (32.7,42]  
##  [6] (32.7,42]   (15,27.7]   (27.7,32.7] (15,27.7]   (27.7,32.7]
## [11] (32.7,42]   (32.7,42]   (27.7,32.7] (15,27.7]   (27.7,32.7]
## [16] (27.7,32.7] (32.7,42]  
## Levels: (15,27.7] (27.7,32.7] (32.7,42]
table( age.group.2)
## age.group.2
##   (15,27.7] (27.7,32.7]   (32.7,42] 
##           6           5           6

Exercise 2.5.4: Permuting factor levels

  • print out driving$distractor and take note of the ordering of factor levels
  • use bars() to plot means and confidence intervals for RT at time 1 for each distractor
  • use permuteLevels() to reorder the factor levels for distractor.
  • print out driving$distractor to check that you have successfully reordered the groups, and now use bars() to redraw the plot.

Solution 2.5.4:

driving$distractor
##  [1] radio   toddler none    radio   none    toddler none    none   
##  [9] radio   toddler toddler toddler radio   radio   radio   none   
## [17] toddler
## Levels: none radio toddler
bars( rt_time1 ~ distractor, driving)