getwd()
, setwd()
Challenge 1.1 (see also Rcommands.R file for challenges and answers)
TASK: Use setwd() function to navigate to `SoftwareCarpentry_Spr16` folder.
Sunday_AM
(dir.create()
), go to Sunday_AM
, save file “Rcommands.R”, make data
folderEsc
to return to >
when stuck with +
cost <- 34
; cost2 <- cost*2
; cost <- cost +2
:Challenge 1.2
TASK: What will be the value of each variable after each statement in the following code?
mass <- 47.5
age <- 122
mass <- mass * 2.3
age <- age - 20
ls()
- list all objects in your environmentrm(objectName)
- remove object objectName
rm(list=ls())
- remove all objects, clear environment?plot
args(setwd)
help(mean)
install.package("knitr")
- get packagelibrary(knitr)
- load package into RLet’s take a look at the example of a dataset, gapminderData.csv
Sunday_AM/data
folder, then download file.
curl -O https://annawilliford.github.io/2016-01-30-UTA/data/gapminderData.csv
This is a familiar, table-like dataset suitable for different statistical analyses. Much of the data you are working with can probably be represented in this format. Here we have a record of population size, life expectancy and other kind of information for different countries.
Any ideas about how this kind of dataset can be represented in R? In R this whole dataset is a single object (or data structure) built out of smaller pieces. Think of a castle built of legos. We want to understand how to built a castle and how to take it apart. Our example of a castle is this data set ( in R it is known as a dataframe) and it is relatively simple, but this is as far as we will go today. Let’s start from smallest lego pieces and build our dataset from them.
Let’s assign value of 45 to a variable age
. We just created the smallest lego piece (smallest object) in R:
age <- 45
length(age)
## [1] 1
str(age)
## num 45
Variables can hold values of various types. Most common data types:
For example: What data type is stored in score
variable?
score<-79
is.integer(score)
## [1] FALSE
typeof(score)
## [1] "double"
typeof(is.integer(score))
## [1] "logical"
The last expression is an example of nested function. Nested functions are very common in R, but are very difficult to understand it first. You can always split nested function into a series of single function calls. Remember that the variable inside the most inner paranthesis is an argument(input)for the function that will be evaluated first.
Challenge 2.1:Learn how to read the output of nested help functions
TASK: Break the following expression into multiple single function calls. You will need to assign the output of each function to a variable that will serve as an input(argument) for the next function. What is the value of each variable? What does each function do? Assign: `score<-79`
is.logical(is.numeric(typeof(is.integer(score))))
Challenge 2.1: Answer
score <- 79 step1 <- is.integer(score) print(step1)
## [1] FALSE
step2 <- typeof(step1) print(step2)
## [1] "logical"
step3 <- is.numeric(step2) print(step3)
## [1] FALSE
step4 <- is.logical(step3) print(step4)
## [1] TRUE
## Or as a single step: print(is.logical(is.numeric(typeof(is.integer(score)))))
## [1] TRUE
Sometimes you will need to convert between data types. There are functions that do that: as.integer()
, as.character()
, and so on. The conversion between data types is not always possible - why? Let’s see what happens here:
score <- 79
typeof(score)
## [1] "double"
score <- as.integer(score)
typeof(score)
## [1] "integer"
#but can we convert character to integer?
name <- "Sasha"
typeof(name)
## [1] "character"
name <- as.integer(name)
## Warning: NAs introduced by coercion
# the data type will be changed, but no value will be assigned
typeof(name)
## [1] "integer"
print(name) # NA = missing value
## [1] NA
The small objects can be combined to build larger objects. Look at the gapminder dataset. Our smallest objects can be used to represent a single element in the dataset, like individual year, or individual country, but what would be the simplest object that you can make with multiple elements?
c()
###let's make a vector
v<-c(1:3, 45)
v
## [1] 1 2 3 45
##examine object
typeof(v) # tells you the data type of vector elements
## [1] "double"
length(v) # what does this do?
## [1] 4
str(v) # tells you the structure of the object VERY USEFUL
## num [1:4] 1 2 3 45
##view
head(v, n=2) #look at the first 2 elements
## [1] 1 2
#what would `tail()` do?
tail(v, n=3) #look at the first 3 elements
## [1] 2 3 45
##manipulate
v <- c(v,56) #add element to vector
#vectorizarion: no loop is required to perform operation on each vector element
v1 <- 2*v # multiply each vector element by 2
v1
## [1] 2 4 6 90 112
# let's try to add vectors
v2<-c(1:5)
v3 <- v1+v2
v3
## [1] 3 6 9 94 117
# you can name vectors; find out what `names()` function does
# change data type
v3 <- as.character(v3) #also known as coersion
str(v3)
## chr [1:5] "3" "6" "9" "94" "117"
matrix()
functionm <- matrix(c(1:18), 3,6)
m
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1 4 7 10 13 16
## [2,] 2 5 8 11 14 17
## [3,] 3 6 9 12 15 18
# try functions that we used for vectors - do they work on matrices?
# new to 2D structures
dim(m) # tells you number of rows and columns in your matrix
## [1] 3 6
factor()
f <- factor(c("M","F","F","F")) #4 observations, the first one for male, other 3 for female
str(f) # what are these numbers in the output?
## Factor w/ 2 levels "F","M": 2 1 1 1
typeof(f) # factors are of integer data type! Levels are numbered in alphabetical order
## [1] "integer"
#sometimes importent to reorder levels
f <- factor(f, levels=c("M","F"))
str(f)
## Factor w/ 2 levels "M","F": 1 2 2 2
list()
functionl<-list("Afghanistan", 1952, 8769855)
print(l)
## [[1]]
## [1] "Afghanistan"
##
## [[2]]
## [1] 1952
##
## [[3]]
## [1] 8769855
typeof(l)
## [1] "list"
str(l)
## List of 3
## $ : chr "Afghanistan"
## $ : num 1952
## $ : num 8769855
length(l)
## [1] 3
CHALLENGE 2.2
TASK: Try to create a list named `myOrder` that contains the following
data structures as list elements:
-- Element 1 is a character vector of length 4 that lists the menu items
you ordered from the restaurant: chicken, soup, salad, tea.
-- Element 2 is a factor that describes menu items as "liquid" or "solid".
-- Element 3 is a vector that records the cost of each menu item:
4.99, 2.99, 3.29, 1.89.
*Hint: Define your elements first, then create a list with them.
CHALLENGE 2.2: Answer
Now apply the following functions to the list you created. Try to predict the output before you run the command.menuItems<-c("chicken", "soup", "salad", "tea") menuType<-factor(c("solid", "liquid", "solid", "liquid")) menuCost<-c(4.99, 2.99, 3.29, 1.89) myOrder<-list(menuItems, menuType, menuCost)
length(myOrder)
str(myOrder)
print(myOrder)
Let’s go back to gapminder dataset. Could you make an informative guess about how this data structure can be represented in R?
Yes! It is a list of vectors of equal length. Let’s look at our myOrder
list to see if we can make data frame out of it. Is the list we just made suitable for a data frame? Yes, the elements of the list are vectors of equal size (but they do not have to to be list elements).
Previously we used list() to combine our elements:
myOrder<-list(menuItems, menuType, menuCost)
Now let’s combine with data.frame()
function. How? Give it a different name, myOrder_df
.
myOrder_df<-data.frame(menuItems, menuType, menuCost)
#now view it!
myOrder_df
## menuItems menuType menuCost
## 1 chicken solid 4.99
## 2 soup liquid 2.99
## 3 salad solid 3.29
## 4 tea liquid 1.89
#and check with `str()` - anything different compared to `str(myOrder)`
#output? What is happening with data types?
str(myOrder_df)
## 'data.frame': 4 obs. of 3 variables:
## $ menuItems: Factor w/ 4 levels "chicken","salad",..: 1 3 2 4
## $ menuType : Factor w/ 2 levels "liquid","solid": 2 1 2 1
## $ menuCost : num 4.99 2.99 3.29 1.89
Let’s talk about how to take your dataset apart. In general, you can access every element of your data set. You must be able to do that to manipulate and analyze your data. There are three general ways to subset the data:
### 1. By position index
## 1a. Use `[]` operator
v<-c(1:10)
v
## [1] 1 2 3 4 5 6 7 8 9 10
## see what happens here
v[2]
## [1] 2
v[c(3:6)]
## [1] 3 4 5 6
v[-c(3:5)]
## [1] 1 2 6 7 8 9 10
## 1b. Use `which` function - extracts the position indices of the
## elements with a specified values:
v<-c(1,3,5,5,7,5)
v1<-v[which(v==5)] #get vector elements equal to 5
v1 #Can you explain the output? Try `which(v==5)`
## [1] 5 5 5
## the above works for lists too, notice that [] returns list, use [[]]to get vector
## try subsetting myOrder list we created above
##for 2D structures like matrices and dataframes provide 2 indices [row, column]
myOrder_df[1:3, ] #gets first 3 rows
## menuItems menuType menuCost
## 1 chicken solid 4.99
## 2 soup liquid 2.99
## 3 salad solid 3.29
### 2. By name:
## Use `$` operator to extract columns as vectors
myOrder_df$menuType
## [1] solid liquid solid liquid
## Levels: liquid solid
### 3. By logical vector index: selects elements corresponding to TRUE values
### of logical vector:
v
## [1] 1 3 5 5 7 5
v1<-v[v==5]
v1
## [1] 5 5 5
# how does the above work? Try only `v==4`
v==5 # returns logical vector
## [1] FALSE FALSE TRUE TRUE FALSE TRUE
##Use `myOrder_df` dataframe:select rows that satisfy various conditions
##Diplay logical vector to understand the ouput
df1<-myOrder_df[myOrder_df$menuType=="solid", ]
df1
## menuItems menuType menuCost
## 1 chicken solid 4.99
## 3 salad solid 3.29
df2<-myOrder_df[myOrder_df$menuCost>3, ]
df2
## menuItems menuType menuCost
## 1 chicken solid 4.99
## 3 salad solid 3.29
##Can you explain the output generated here?
df3<-myOrder_df[myOrder_df$menuType=="solid"]
df3
## menuItems menuCost
## 1 chicken 4.99
## 2 soup 2.99
## 3 salad 3.29
## 4 tea 1.89
Let’s return to our gapminder dataset that you have downloaded to Sunday_AM/data/gapminderData.csv
Before we examine our data, let’s read(load) this dataset into R. There are multiple ways to read data into R. For table-like formats, here are 2 popular methods:
Method 1: Use read.csv()
function
myData<-read.csv("data/gapminderData.csv")
myData
head()
function:
head(myData)
Method 2. Use read.table()
function
Challenge 3.1 Learn how to read data into R
TASK: Load our gapminder dataset into R using read.table() function
*Hints: 1. Use help functions to read about read.table():
`?read.table()` or `args(read.table)`
2. It might be helpful to compare `args(read.csv)` and `args(read.table)`.
3. Look at `dim(myData)` output after each try. Is it different? Why?
Challenge 3.1: Answer
myData<-read.table("data/gapminderData.csv", header=TRUE, sep = ",")
Now we know how to read dataframes into R. Let’s use this dataset to go over what we talked about this morning. Some of the details were not covered in class, but it is good to know what else you can do with your dataset. Explore!
Challenge 3.2 Play with gapminder dataset
TASK: Answer the following questions about `myData` object
1. Overall object structure? What function will you use?
2. Can you tell what is the data type of elements in each cloumn?
3. Can you extract 3rd and 5th column of the dataset?
4. Can you extract the list of countries in this dataset?
5. Can you get a part of this dataset that includes information about Sweden?
6. Can you extract all countries for which life expectancy is below 70?
7. Can you make a new column that contains population in units of millions of people?
Challenge 3.2: Answer
1. str() 2. typeof() # typeof() will give you "list" - lets you know that dataframe #is really a list of vectors; examine the output of str() for details about #column data types 3. myData[ , c(3,5)] # you can use head(myData[ , c(3,5)]) to view top 6 rows #of the output 4. names(table(myData$country)) # this is a nested function -> break it up #to see what each function does; use help(function) to get help 5. myData[myData$country=="Sweden", ] #rows are selected based on logical #vector TRUE values - can you view this vector? 6. myData[myData$lifeExp<70, ] #similar to Q5. 7. myData$PopM<-myData$pop/10^6 #simple way to add a column to a dataframe. #You can verify that you added a column with: `head(myData)`