# Exercises for 2.5: Data manipulation

## Preliminaries

• load the `driving.Rdata` file.
• type `head( driving )` to look at the first few observations
• load the following packages: `lsr`
``````load( "~/Work/Research/Rbook/workshop_dsto/datasets/driving.Rdata")
``````##    id gender age distractor peak.hour errors_time1 errors_time2 rt_time1
## 1 s.1   male  19      radio       yes            7            7      346
## 2 s.2 female  42    toddler        no           15           16      424
## 3 s.3   male  27       none        no           10            7      415
## 4 s.4 female  22      radio       yes            5            1      266
## 5 s.5 female  33       none        no            4            9      302
## 6 s.6 female  35    toddler       yes           15           12      423
##   rt_time2
## 1      636
## 2      787
## 3      580
## 4      459
## 5      513
## 6      767``````
``library(lsr)``

## Exercise 2.5.1: Reshaping from wide to long

• use `wideToLong()` to make a long-form version of the `driving` data frame. Save the results to `driving.2`
• if you didn’t already give the within-subjects factor a meaningful name, repeat the command but this time use the `within` argument to specifiy a good name for it

## Solution 2.5.1

``````driving.2 <- wideToLong( driving )
``````##    id gender age distractor peak.hour within errors  rt
## 1 s.1   male  19      radio       yes  time1      7 346
## 2 s.2 female  42    toddler        no  time1     15 424
## 3 s.3   male  27       none        no  time1     10 415
## 4 s.4 female  22      radio       yes  time1      5 266
## 5 s.5 female  33       none        no  time1      4 302
## 6 s.6 female  35    toddler       yes  time1     15 423``````
``(driving.2 <- wideToLong( driving, within="time" ))``
``````##      id gender age distractor peak.hour  time errors  rt
## 1   s.1   male  19      radio       yes time1      7 346
## 2   s.2 female  42    toddler        no time1     15 424
## 3   s.3   male  27       none        no time1     10 415
## 4   s.4 female  22      radio       yes time1      5 266
## 5   s.5 female  33       none        no time1      4 302
## 6   s.6 female  35    toddler       yes time1     15 423
## 7   s.7 female  24       none        no time1      7 374
## 8   s.8   male  29       none       yes time1      4 241
## 9   s.9 female  22      radio        no time1     12 370
## 10 s.10 female  32    toddler        no time1     14 463
## 11 s.11 female  36    toddler       yes time1     18 453
## 12 s.12   male  37    toddler        no time1     19 463
## 13 s.13 female  31      radio        no time1      9 346
## 14 s.14   male  15      radio        no time1     13 308
## 15 s.15 female  31      radio       yes time1      3 322
## 16 s.16 female  30       none       yes time1      6 459
## 17 s.17 female  35    toddler       yes time1     14 404
## 18  s.1   male  19      radio       yes time2      7 636
## 19  s.2 female  42    toddler        no time2     16 787
## 20  s.3   male  27       none        no time2      7 580
## 21  s.4 female  22      radio       yes time2      1 459
## 22  s.5 female  33       none        no time2      9 513
## 23  s.6 female  35    toddler       yes time2     12 767
## 24  s.7 female  24       none        no time2     10 651
## 25  s.8   male  29       none       yes time2      0 281
## 26  s.9 female  22      radio        no time2     13 558
## 27 s.10 female  32    toddler        no time2      4 700
## 28 s.11 female  36    toddler       yes time2      6 718
## 29 s.12   male  37    toddler        no time2     19 634
## 30 s.13 female  31      radio        no time2      4 506
## 31 s.14   male  15      radio        no time2     10 414
## 32 s.15 female  31      radio       yes time2      0 401
## 33 s.16 female  30       none       yes time2      0 596
## 34 s.17 female  35    toddler       yes time2     14 482``````

## Exercise 2.5.2: Reshaping from long to wide

• use the `longToWide()` function to make a wide-form version of the `driving.2` data frame that you created in 2.5.1. Save the results to `driving.3`

## Solution 2.5.2

``````driving.3 <- longToWide(
data = driving.2,
formula = rt + errors ~ time
)
driving.3``````
``````##      id gender age distractor peak.hour rt_time1 errors_time1 rt_time2
## 1   s.1   male  19      radio       yes      346            7      636
## 2   s.2 female  42    toddler        no      424           15      787
## 3   s.3   male  27       none        no      415           10      580
## 4   s.4 female  22      radio       yes      266            5      459
## 5   s.5 female  33       none        no      302            4      513
## 6   s.6 female  35    toddler       yes      423           15      767
## 7   s.7 female  24       none        no      374            7      651
## 8   s.8   male  29       none       yes      241            4      281
## 9   s.9 female  22      radio        no      370           12      558
## 10 s.10 female  32    toddler        no      463           14      700
## 11 s.11 female  36    toddler       yes      453           18      718
## 12 s.12   male  37    toddler        no      463           19      634
## 13 s.13 female  31      radio        no      346            9      506
## 14 s.14   male  15      radio        no      308           13      414
## 15 s.15 female  31      radio       yes      322            3      401
## 16 s.16 female  30       none       yes      459            6      596
## 17 s.17 female  35    toddler       yes      404           14      482
##    errors_time2
## 1             7
## 2            16
## 3             7
## 4             1
## 5             9
## 6            12
## 7            10
## 8             0
## 9            13
## 10            4
## 11            6
## 12           19
## 13            4
## 14           10
## 15            0
## 16            0
## 17           14``````

## Exercise 2.5.3: Cutting a continuous variable into categories

• use `cut()` to cut `driving\$age` into 3 bins of approximately equal size (i.e. similar age ranges). Save the result to `age.group.1`.
• use `table()` to look at how many people fall in the different age groups, and look at the category names to see how wide each of the age groups are.
• use `quantileCut()` to cut `driving\$age` into 3 bins of approximately equal frequency (i.e., similar number of people in each group). Save the result to `age.group.2`.
• use `table()` to look at how many people fall in the different age groups, and look at the category names to see how wide each of the age groups are.

## Solution 2.5.3:

``(age.group.1 <- cut( driving\$age, 3 ))``
``````##  [1] (15,24] (33,42] (24,33] (15,24] (24,33] (33,42] (15,24] (24,33]
##  [9] (15,24] (24,33] (33,42] (33,42] (24,33] (15,24] (24,33] (24,33]
## [17] (33,42]
## Levels: (15,24] (24,33] (33,42]``````
``table( age.group.1)``
``````## age.group.1
## (15,24] (24,33] (33,42]
##       5       7       5``````
``(age.group.2 <- quantileCut( driving\$age, 3 ))``
``````##  [1] (15,27.7]   (32.7,42]   (15,27.7]   (15,27.7]   (32.7,42]
##  [6] (32.7,42]   (15,27.7]   (27.7,32.7] (15,27.7]   (27.7,32.7]
## [11] (32.7,42]   (32.7,42]   (27.7,32.7] (15,27.7]   (27.7,32.7]
## [16] (27.7,32.7] (32.7,42]
## Levels: (15,27.7] (27.7,32.7] (32.7,42]``````
``table( age.group.2)``
``````## age.group.2
##   (15,27.7] (27.7,32.7]   (32.7,42]
##           6           5           6``````

## Exercise 2.5.4: Permuting factor levels

• print out `driving\$distractor` and take note of the ordering of factor levels
• use `bars()` to plot means and confidence intervals for RT at time 1 for each distractor
• use `permuteLevels()` to reorder the factor levels for `distractor`.
• print out `driving\$distractor` to check that you have successfully reordered the groups, and now use `bars()` to redraw the plot.

## Solution 2.5.4:

``driving\$distractor``
``````##  [1] radio   toddler none    radio   none    toddler none    none
``bars( rt_time1 ~ distractor, driving)``