class: left, middle, inverse, title-slide # An Introduction to R ## ED 217 B Week 2 ### Daniel Katz ### 1/12/2021 --- class: middle class: middle ### General Strategy + My goal is to get us up and running - doing is the best learning + Provide you with some quality resources for continuing to learn via links like [_**this**_](https://stackoverflow.com/tags/r/info) + I won't emphasize code-style best practices except in a few cases (but these will only be for cases that make life easier) + I think it's important to focus on ideas and functions + For more on style/best practices, [_**click here**_](http://adv-r.had.co.nz/Style.html) + Learning how to trouble shoot is important --- class: middle ## General Outline for Today .pull-left[ 1. What is the relationship between probability, odds, and log odd (logits)? 1. Intro: How should I think about and use R and Rstudio? 1. R Basics 1. Reading in data, subsetting, intro to dplyr (and piping) ] --- ## Next few weeks 1. Next week - more on plotting, more on working with data frames. 1. Using `Rmarkdown` 1. Getting introduced to TAM -- .pull-right[There's a **TON** to R. I don't expect expertise after this session, but I hope you know where to start! <img src="poppins_bag.gif" style="display: block; margin: auto;" /> ] --- class: middle # What is R? + R is a software and scripting language + Like any other language it has syntax (for now think, "rules" of language) + R comes with a variety of built-in functions + R is desiged for statistics, but you can do lots with it + build websites + write documents in Latex + Create slides for presentations, etc. --- class: middle # You can learn R! + It is quickly becoming the statistical language of choice in academic and non-academic circles -- + `R Packages` are making tasks more accessible to non-programmers like me -- + R allows for sharing code: we won't cover Github, but please [_**check it out**_](https://happygitwithr.com/rstudio-git-github.html) -- + Scripting forces us to slow down and think -- + We can do everything in one place (data cleaning, plotting, modeling, etc) --- # It makes really (fairly?) cool visualizations <img src="r_introslides_files/figure-html/unnamed-chunk-3-1.png" width="504" style="display: block; margin: auto;" /> --- <img src="r_introslides_files/figure-html/unnamed-chunk-6-1.png" width="648" style="display: block; margin: auto;" /> --- class: middle # Probability, odds, logits + In this class, we'll be operating a lot with probability, odds, and log odds -- + As mentioned in course slides, the link function in logistic regression (and the Rasch model is a special case) is the logit link -- - We still want to work in the linear model framework - we do this with the log of the odds `$$\log(\frac{p_i}{1-p_i}) = \beta_0 + \sum_{n=1}^N \beta_n x_{in}$$` --- class: middle # Solving for `\(p_i\)` Writing, `\(\beta_0 + \sum_{n=1}^N \beta_n x_{in}\)` as `\(\theta\)` so `$$\log(\frac{p_i}{1-p_i}) = \theta$$` .pull-left[ 1. Exponentiate both sides. 2. Multiply both sides by `\({1-p_i}\)` 3. Distribut the left side 4. Add `\(e^\theta*p_i\)` 5. Factor out `\(p_i\)` on the left 6. Divide both sides by `\(1+e^\theta\)` ] .pull-right[ 1. `\(\frac{p_i}{1-p_i} =e^\theta\)` 2. `\(p_i = e^\theta(1-p_i)\)` 3. `\(p_i = e^\theta-e^\theta*p_i\)` 4. `\(p_i + e^\theta*p_i = e^\theta\)` 5. `\(p_i(1+e^\theta)= e^\theta\)` 6. `\(p_i = \frac{e^\theta}{1+e^\theta}\)` ] --- class: middle Probability + Perhaps the most basic defintion (frequency/long run frequency) + proportion of times an event occurs given a number of trials + Often not that simple to interpret + Cannot be outside the range, [0,1] -- + `p` is the probability of an event occurring + `1-p` is the probability of an even not occurring -- + What does it mean to say a coin has a 50% chance of landing heads? Or a student has a 30% chance of answering an item correctly? -- + Usually invoking the long run --- <img src="r_introslides_files/figure-html/unnamed-chunk-7-1.png" width="648" style="display: block; margin: auto;" /> --- class: middle # Odds + Odds are basically the ratio of an event happening to an event not happening -- + Formally `odds = p/(1-p)` + If `p = .6` + Odds are `1.5 (.6/.4)` - the event (a candidate winning an election) is 1.5 times more likely to happen than not -- + What are the odds if the probability is .5? --- class: middle # Log of the odds -logits + Take the natural log (ln) of the odds + `\(log_{e}(odds)\)` or `\(ln(odds)\)` + Most commonly (and in R): `\(log(odds)\)` + Odds range from `\((0, \infty)\)` not great for linear modeling + log odds range from `\((-\infty, \infty)\)` --- class: middle # Formulas for going the other way: logits to probability 1. Exponentiate - `$$e^{logit} = e^{log_e(odds)} = odds$$` 2. Get the probability `$$\frac{odds}{1+odds}$$` --- ## Open Up RStudio --- background-image: url(introconsole.png) background-size: contain background-position: center --- background-image: url(scriptconsole.png) background-size: contain background-position: center --- class: middle ## Introducing "Scripts" + Scripts are (essentially) RStudio's text editor. **Work in Scripts**. They're saved text that you _can_ edit + Code in the console doesn't get saved and you _cannot_ edit -- + The primary aim of working in scripts is reproducability, transparency, being able to run code from start-to-finish without intervening -- + Rely on being able to re-run your script as if you have no memory of what you had done before when you return to the script the next day (or whatever) --- # Running the code in a script <img src="run script.jpg" width="1820" style="display: block; margin: auto;" /> --- ## Creating a new script in RStudio .pull-left[ 1. File > New File > R Script + Or, ctrl+shft+N + Or, below file, hit the icon that looks like: <img src="createscript.jpg" width="51" /> 2. To run a line of code in a script + put your cursor on that line, + run it, via ctrl/cmd + enter. + Or the `run` button in the top right of the script ] -- .pull-right[Make sure it works: 3. Type `1+1` in a script and run it. + What happens? Where does it output? 4. There are multiple ways to run more than one line of code at a time + Highlight the relevant code, run + Or select the various options under "Run"] -- .center[Write two lines of code (such as `1+1` and `2+2`)] -- --- class: middle ## Commenting *Commenting is essential.* + R will recognize any line after `#` as a comment For instance: ```r # I ran this to see what would happen factorial(4) ``` ``` ## [1] 24 ``` + Add more comments than you think you need --- # Vocabulary | R Word | Translation |----------|------------------ | Script | Document containing code to be run | Console | Output Window/Printer (but you can also run code here) | Environment| The place where things you create in R, live (metaphorically) | Working Directory| The folder where your current R session lives, where output files go --- class: middle ## Working Directory (We'll return to this in more detail later) + The notion of the <mark>working directory</mark> isn't unique to R + Directories are just folders on your computer with a certain `path` + For instance, your `desktop` is a directory + A folder within `Documents` is a directory + A `working directory` is just where you tell `R` that you'll be working + when you load files, it knows to look to a certain folder + When you export files, `R` exports automatically to the working directory --- class: middle ## File paths + A path to a Word doc on my desktop looks like: `C:/Users/katzd/Desktop/myfile.docx` + A path to a file in a folder called, `thesis` in the `Documents directory` looks like `C:/Users/katzd/Documents/thesis/myfile.docx` + But these are long! + Instead, we'll tell `R` where to read files from, where to export, etc, via `projects` --- class: middle ## File Paths: Getting the Wd... + To see the currect working directory, simply run the code `getwd()` + To set the working directory `setwd("filepath")` (the filepath has to be in quotes) + It's often useful to set the working directory at the top of your script **via menu options**: Session ➡️ Set Working Directory ➡️ Choose Directory (copy and paste outputted code to the top of your script) + Some people don't like this practice: [**Link**](https://whattheyforgot.org/safe-paths.html) + Run: `getwd()` --- .middle[### If you've made any comments or would like to save your current script, please go ahead an do-so now (file ➡️ save ➡️ where you want to save it ➡️ name)] --- ## Creating RStudio Projects ### Projects are an RStudio staple .pull-left[1. Once open, click File (top left) ➡️ New Project 2. New Directory ➡️ New Project ➡️ Create Project as a Subdirectory of: click `browse`. 3. Choose where you'd like to put this session's R work (for instance, on your desktop in a folder called, `ED217B Lab 1`) - or even one level down in a new folder within `ED217B Lab 1`) 4. `Directory Name` ➡️ Name this R project - `IntroR` (no spaces; It'll make your life easier) ➡️ Create Project.] .pull-right[ <img src="newproject.jpg" width="800px" style="display: block; margin: auto;" /> ] --- # Setting up a nice working directory -- 1. There are various philosophies about setting up your working directory. Within your working directory... 1. Set up a folder called "scripts" or similar -- 1. Set up a folder called "data" -- 1. Set up a folder called "output" or similar Finally...time to get our feet wet... --- # Getting set up. 1. Download the R script `Intro_to_R1.R` 1. Place `Intro_to_R1.R` in your `scripts` subdirectory. --- ## Variable Assignment 1. One of the most fundamental aspects of R (really any language) is assigning `variables` or `objects`, + giving something a name so you can + store it + work with it in R. + R uses `<-` (you can also use for Windows: "alt" and "-" or on a Mac: "option" and "-") 2. Now, we can store objects in R and work with the objects! 3. Note, none of this requires an external package, just through `base R`. For a nice base R cheat sheet, [_**Click**_](https://paulvanderlaken.files.wordpress.com/2017/08/base-r-cheetsheat.pdf) -- 4. When we start assigning variables, note what changes in your R session (where do you see the variables go?) --- class: middle ### For instance, run the code below: ```r x <- 2 ``` ```r #What's the value of y? y <- x+2 ``` 1. What is in your Global Environment, now? 2. What was outputted in the console? 3. Now run `x <- 3`. What is the value of x? What is the value of `x+2` 4. Just type `x` and run --- class: middle ## The Environment + In the right corner, under `Global Environment`, you should now see where R is "storing" `x` and `y` and their values. + The environment is also where we'll see what datasets, in R speak, `dataframes` that we have loaded or created, or other objects + If you don't assign something to a variable, it won't be stored --- class: middle --- # R and Packages + Think of `packages` as very specialized bundles of software. + They're collections of functions developed by the "community" -- + These packages have documentation if they're on `CRAN: The Comprehensive R Archive Network` (and even sometimes if they're not) -- + You don't need all the packages! -- + Packages add functions to base R -- + For instance, the `psych` package is built with tools commonly needed in the social sciences --- class: middle ## Packages, cont'd + When you `install.packages()`, R will search for a package on `CRAN` and download it to your computer (have to be connected to the internet) + To see where it's downloaded, run `.libpaths()` + Each time you start a new R session, you have to load a package via `library()` but you don't have to install each time --- ## Getting help + First and foremost, you can see installed packages in the bottom-right pane: <img src="packages.jpg" width="868" /> + Click on a package to see the documentation --- ## Other Help + You can also type `?packagename` such as `?dplyr` + To get help with a base R function, you can type ?function + To get help with a package function, you can type: `?packagename::function` such as `?dplyr::select` where select is a particular function in R. + Under the <mark>Help</mark> tab in R studio (next to `Tools` at the top), exists a few resources as well, including "cheat sheets" + Also, Google (we'll talk about this more)! --- # Run the examples in documentation, usually at the bottom of the section to see how something works! <img src="doc_examples.jpg" width="1569" style="display: block; margin: auto;" /> --- # Operators in R |Operation|Operator in R |Example| |---------|------|-------|-------| |Addition |`+` |`4+3` |Subtraction|`-`| `4-3` |Multiplication|`*`| `4*3` |Exponentiation | `**`| `4**3` |Division | `/`| `12/4` |Matrix multiplication|`%*%`| `a %*% b`| **Challenge: Aside from the matrix multiplication, make sure all of these work in `R`** (use a script!) --- ## Output + R `returns a value` in the `console` + It'll look like this: <img src="output_slides.jpg" width="1809" /> ***Challenge***: Order of operations: Run `(4+3)*(4-3)` and then `4+3*4-3` in the console (for experimentation). What happens? ### This is one of those necessary evils, so you know how R works - To select individual elements in a vector, use `[2]` and the numeric position that you would like to extract. - For instance, run: `v6 <- v5[c(2,3)]` - This is also how one may pull out individual columns, rows, or specific rows and columns in a dataframe in R using <mark>base R</mark> - We can also get descriptives or compute across a vector, or specific vector elements - Working with datasets or matrices, we can subset rows, columns or both --- class: middle #### Rows: `df[1, ]` (gets just the first row) #### Columnn `df[,1]` or `df[1]` (gets just the first column) #### Both `df[1,1]` #### Can Select multiple `df[1:5, 1:6]` or `df[c(1,3, 8), c(1, 9, 2)]` ---