6 - HPC notes

November 05, 2020

ARC

Advanced Research Computing (ARC) at Virginia Tech maintains the on-premise high-performance computing (HPC) clusters. For more information, see <arc.vt.edu>.

Basic Linux Commands

There are plenty of cheat sheets online. Here is one I like: https://cheatography.com/davechild/cheat-sheets/linux-command-line/

A few to get you started, besides the scheduler and software ones listed below or in other notes:

ls -lah #list files in current dir, give long listing and all
mkdir -p ~/class_work/ # make a directory and any necessary parent dirs
rm file.txt # delete file.txt
cd ~/class_work/ # change to the dir ~/class_work

Scheduler

High performance computing clusters are characterized by login nodes fronting a pool of compute resources. Access to the compute resources is controlled via a resource scheduler.

High performance computing clusters are characterized by login nodes fronting a pool of compute resources. Access to the compute resources is controlled via a resource scheduler. In our case, this is Slurm. Simple Linux Utility for Resource Management (SLURM) by SchedMD https://www.schedmd.com/. Access can be either interactive or via submitting scripts via batch jobs. Here are a few basic commands you will find useful:

get list of running and queued jobs

squeue

get list of YOUR running and queued jobs

– replace with your VT pid

>squeue -u <pid>

check your storage and account quotas

>quota # works on Cascades/Dragonstooth/NewRiver, coming soon to TinkerCliffs

start an interactive job using class account in normal_q for 1 hour on:

– 1 node with 2 tasks, each with 3 cores

>interact --partition=normal_q -N 1 -n 2 -c 3 --time=01:00:00 --account=stat5526-fall2020

start a script in batch mode

>sbatch --export=FILE=matrix_mult.R cool_script.sh

where the script has everything it needs to run

>cat generic_script.slrm

#!/bin/bash

## sbatch --export=FILE=matrix_mult.R generic_script.slrm

###########################################################################
####### job customization
#SBATCH --job-name=matrixmult
#SBATCH -N 1
#SBATCH -n 24
#SBATCH -t 01:00:00
#SBATCH -p normal_q
#SBATCH -A stat5526-fall2020
#SBATCH --mail-user=<pid>@vt.edu  ## <----- change me
#SBATCH --mail-type=FAIL
####### end of job customization
###########################################################################
module reset
module load containers/singularity
singularity exec /projects/arcsingularity/ood-rstudio-basic_4.0.3.sif Rscript $FILE

Software

Software on ARC clusters is either a) user installed or b) system managed. For many packages, user installation is difficult or impossible due to use of system directories users do not have priveledges to write to. For system managed software, ARC uses a module system. This allows for multiple versions of software packages to be available to users. To access system managed software, you must load the appropriate module. These modules may have dependencies that also need loading. A few commands/modules that could be useful:

# find, load and make avaible R on TinkerCliffs
>module spider R
>module load R/4.0.2-foss-2020a

# load Singularity on TinkerCliffs
>module load containers/singularity

Containers

Containers, eg Docker, Singularity, etc., are IMO a reproducible researchers dream. Basically, you create an image (like burning a DVD) which is then a (mostly) portable copy of everything you put in the image. Operating system, software libraries, environment variables, any scripts you pushed in, perhaps even data. You can then instantiate the images as a running container. Inside the container, you should have a stable computing environment insulated from host changes. For R, I have done extensive testing on matrix operations and have found NO penalty to using a container vs using a module. For me, this makes installing and managing software super easy. For users, this means they can build and manage thier own software stack from thier local computer as well. Here, we will use containers I manage. To use R, the command looks a little (ok a lot) more complicated, but there are advantages:

# get fresh environment, load singularity, start an R container 
#    and get sessionInfo() from R in the container
>module reset
>module load containers/singularity
>singularity exec --bind $TMPFS:/tmp \
    /projects/arcsingularity/ood-rstudio-basic_4.0.0-tc-amd.sif \
    Rscript -e "sessionInfo()"

R

One of the elements of R that makes it powerful is the extensibility of the platform, ie community developed and shared packages. To make use of packages not prepackaged in the R installation (module or container), users need to specify where R should install a package. As stated above, not all paths that R would like to install to are writable by the user. In fact, NONE of the paths within the container are writable. To deal with this, we need to specify a location with write privilidges, ~/R for instance. The correct way to do this is by specifying this in an Renviron file. For the images I make, I change the name of this file to avoid conflicts to .Renvion.OOD. In that file, you can do something like:

>cat ~/.Renviron.OOD
R_LIBS_USER=/home/rsettlag/R/OOD/Ubuntu-20.04-4.0.3

Now, you can do the following:

>singularity exec --bind $TMPFS:/tmp /projects/arcsingularity/ood-rstudio-basic_4.0.3.sif Rscript -e ".libPaths()"

Which, for me gives the following output:

[1] “/usr/local/lib/R/site-library”
[2] “/usr/local/lib/R/library”
[3] “/home/rsettlag/R/OOD/Ubuntu-20.04-4.0.3”

NOTE: this order is backwards. I am currently working with the R developers to figure out why. I am told it should not be happening and in fact, if you run Rstudio in this same image, it is ordered the correct way. If you need to install software, before you do the install, you can do: “.libPaths(.libPaths()[c(3,1,2)])” at the R prompt or specify it in the install.packages call as “install.packages(PACKAGE,lib=.libPaths()[3])”. Alternatively, use the Rstudio associated with this image https://ood.arc.vt.edu to install and configure the environment before running your scripts.