Workshop website:
Check software installation (follow steps below AFTER you complete installation - see workshop website for how to install)
Bash: open gitbash terminal, type `bash --version`. You should get an output indicating the version of bash shell
Text Editor: open terminal, type `npp new.txt` if using notepad++; type `edit new.txt` if using Text Wrangler
Git: open gitbash terminal, type `git --version`. You should get an output indicating the version of git
R: open gitbash terminal, type `R --version`. You should get an output indicating the version of R
RStudio: open application, should see 3 or 4 windows.
Shell Cheatsheet
History of Shell commands:
Challenge 1: Navigating
Note where you are in your directory hierarchy. Now aimlessly (randomly) move away from this location at least 3 times (i.e., 3 'cd' commands). Determine where you are and navigate back to your location using 1 (!) command. Half the room is using the relative paths and half is using the absolute paths.
Shell Data Files:
Commands you should definitely master!
- whoami # prints username
- pwd # print working directory path
- echo # print
- ls # list the contents of a directory
- cd # change directory
- mkdir # make directory
- touch # create empty file
- cat # view file/concatenate files
- less # controlled view of file
- mv # move/rename file
- cp # copy file
- rm # delete file
Commands I didn't get to in depth, but that you should also try to master!
- wc # word count
- head/tail # display start/end of file
- cut # extract fields (columns) from file
- sort # sort file
- uniq # select uniq lines only
- grep # select rows based on content
- ssh # connect to a remote computer
- scp # transfer files to another computer
- awk # software for manipulating piped data more powerfully in shell
- for loops
- environmental variables
Extra Challenge we didn't get to:
Challenge: Think of a couple questions you could answer using the data we provided you (see above). Answer one using the 'ByMeasure' data and the other using the 'ByCountry' data. You will write a script that loops through these files and extracts the answer from each, writing it to one output file. Repeat and master these tools!
Linux nano: $ git config --global core.editor "nano –w" Gedit: $ git config --global core.editor "gedit -s“
Mac Text Wrangler $ git config --global core.editor "edit -w“
Windows Notepad++ (Win) $ git config --global core.editor "'c:/program files (x86)/Notepad++/notepad++.exe' -multiInst -notabbar -nosession -noPlugin"
git config color.ui "auto"
git config --global color.ui "auto"
git config --list
Link of the md file
Create another file called abc.txt and push it to Github.
## All the commands from the first session
mkdir swc
Morning: R basics and scripts
R data:
curl -o https://annawilliford.github.io/2016-01-30-UTA/data/gapminderData.csv
to download directly from R:
system("curl -o https://annawilliford.github.io/2016-01-30-UTA/data/gapminderData.csv")
or install and use the R curl package: https://cran.r-project.org/web/packages/curl/vignettes/intro.html
Clearning the R console with a command, rather than Ctrl-L, is actually not really intuitive. Here is the answer for how to do it.
'Data Type' is a fairly ambiguous term, so it is good to understand what it can mean. It is also important to understand these basic concepts in general.
- It can refer to the different types of fundamental data types (numeric, string, etc.). These are most basic way a single piece of data is stored and these terms are used across essentially all programming languages. See http://www.r-tutor.com/r-introduction/basic-data-types
- It can also refer to the basic ways that data of any type (numeric, string, etc.) are stored together for use. This is distinct and includes things like vectors, lists, matrices, etc. These are generally used in programming languages as well and apply outside of R. See https://www.tutorialspoint.com/r/r_data_types.htm
0- vs. 1-indexing
- 0-indexing (includes Python), the first element of a vector, list, etc. is indexed as 0, and last element in n-1
- 1-indexing (includes R), the first element of a vector, list, etc. is indexed as 1, and last element is n
- Here is a more graphical way of looking at this. In this application, it is referring to nucleotide strings but the principles are the same. https://www.biostars.org/p/84686/
menuItems<-c("chicken", "soup", "salad", "tea")
menuType<-factor(c("solid", "liquid", "solid", "liquid"))
menuCost<-c(4.99, 2.99, 3.29, 1.89)
myOrder<-list(menuItems, menuType, menuCost)
myOrder_DF<-data.frame(menuItems, menuType, menuCost)
You can subset dataframes or matrices by specifying the rows or columns that you want in brackets:
myOrder[c(1,3), ] # This gives rows 1 and 3
myOrder[, c(1,3)] # This gives columns 1 and 3
Inside the brackets it is always [rows, columns]
# My First R Script# Location of filefilename <- "gapminderData.csv"
# read in data filegapminder <- read.csv(filename)# View dataView(gapminder)
# Select the rows of the country AlbaniaalbaniaData <- gapminder[gapminder$country=="Albania", ]
#GDP per cap of AlbaniaalbaniaGDP <- albaniaData$gdpPercap
Lesson 1: https://www.dropbox.com/s/3ymvdg0fvbacxje/Sunday_R1.R?dl=0
First R Script: https://www.dropbox.com/s/2su6gcq4suovqkp/First_Script.R?dl=0
Afternoon: Plotting and data analysis!!!!
CRAN task views: https://cran.r-project.org/web/views/
1) start code with a header
2) run import statements directly after the header
3) setwd() breaks stuff
4) use # to section of code
5) if you have functions put them up at the top don't bury them in your code
6) be consistent
7) if you have a script >100 lines your doing it wrong
8) have a directory structure that all projects have
9) peer review
10) use git and version control for your code
foo <- faithful[ ,2]
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
dat <- read.csv(url("http://coleoguy.github.io/SWC/scores.csv"))
color brewer
Heaths website