2 - R startup
R, not unlike Linux, has the ability to customize and automate many tasks/functions. Understanding R’s startup is essential in becoming a power R user. Here, we only need to understand the basics to gain an understanding of how we can configure R for ease of computing and reproducibility on our clusters.
An exhausting guide is here: https://rstats.wtf/r-startup.html/. Although, I prefer the summary here: https://rviews.rstudio.com/2017/04/19/r-for-enterprise-understanding-r-s-startup/
include_url("https://rviews.rstudio.com/post/2017-04-04-rs-quirky-and-powerful-startup_files/R_STARTUP.jpeg")
Basically, what this is telling us is there are many places where we can customize how our R/Rstudio session look/run/feel. For us, we are most concerned with two files:
- ~/.Renviron
- ~/.Rprofile
.Renviron
The first of these files is your .Renviron file. The .Renviron file contains environment variables to be set in R sessions. This file is sourced when you start R. By sourced, I mean in the bash sense (although it could be an R command Sys.setenv(key=value), I haven’t seen it in the source code – yet…):
>cat ~/.Renviron
R_LIBS_USER=/home/rsettlag/R/OOD/Ubuntu-20.04-4.0.3
Basically, what this does is take all the key value pairs in this file and activate them in your current environment. What should we put in here? Things that simplify our life but don’t enter into computations. For instance, where do our user libraries live, any keys for things like GPG, memory limits, vector size limits, basically, environment variables. If there is a .Renviron file in your project directory, it will prefer the project level file.
Avoid sharing your .Renviron file as this is the place to put token/secret keys.
.Rprofile
The second file is .Rprofile. This file should ideally contain settings for options, ie it contains R code to be run in each session. To get a glimpse into what things others put here, check out https://github.com/search?q=filename%3A.Rprofile+interactive&type=Code.
Here is an fun look at some of the customizing you can do: https://github.com/csgillespie/rprofile
Here is Steven Turners .Rprofile. I always love to read through his blogs…an actual R user and data scientist: https://gist.github.com/stephenturner/5700920 A couple of examples from Steven’s setup, NOTE the use of environments AND the note about code portability. Like .Renviron files, if there is a .Rprofile within your project directory, it will be prefered over one at ~/.Rprofile:
## Don't show those silly significanct stars
options(show.signif.stars=FALSE)
## Do you want to automatically convert strings to factor variables in a data.frame?
## WARNING!!! This makes your code less portable/reproducible.
options(stringsAsFactors=FALSE)
## Don't ask me for my CRAN mirror every time
options("repos" = c(CRAN = "http://cran.rstudio.com/"))
## Create a new invisible environment for all the functions to go in so it doesn't clutter your workspace.
.env <- new.env()
## Returns a logical vector TRUE for elements of X not in Y
.env$"%nin%" <- function(x, y) !(x %in% y)
## Single character shortcuts for summary() and head().
.env$s <- base::summary
.env$h <- utils::head
## ht==headtail, i.e., show the first and last 10 items of an object
.env$ht <- function(d) rbind(head(d,10),tail(d,10))
Avoidance manuver
Rscript --vanilla -e ".libPaths()"