R
Jump to navigation
Jump to search
PDF
Contents
RStudio Server Admin
File:R-datatypes-and-syntax.pdf
R Environment info
- R.home() - path to R installation
- R.Version() - returns a list
- version - returns a simple list which prints nicely interactively
- .Platform - Platform Specific Variables
- sessionInfo() - Collect Information About the Current R Session
- Sys.*
- Sys.info() - Extract System and User Information
- Sys.glob( "/path1/*", "/path2/*.tiff" )
- list.files()
- list.dirs()
- file.*()
Package Management
- find.package( "BiocManager" ) - path to package installation directory
- See also: system.file( package="BiocManager" )
- .Library - global string of path to default library for R packages
- .Library.site - locations of site libraries. Can be set via the environment variable $R_LIBS_SITE (as a non-empty colon-separated list of library trees).
- .libPaths() - R library tree getter and setter function
- install.packages()
- available.packages() - list every package available for download installation from the given mirror
- remove.packages()
Bioconductor Package management
- BiocManager vignette
- install.packages("BiocManager")
- BiocManager::install() - install core packages
- BiocManager::version()
- BiocManager::valid()
- BiocManager::available()
General
library()
- show all installed packages- message() as superior to print and cat
Jupyter Notebook Tips
- Don't use
dev.new()
or your plots won't show up
Links
Functions
- distance = dissimilarity
ggplot
- piechart:
library( tidyverse) library(ggplot2) library(readxl) var <- read_excel("GENES.xlsx") stats <- var %>% select( `Gene Biotype` ) %>% drop_na %>% group_by(`Gene Biotype`) %>% summarize( count=n() ) names(stats) <- c( "name", "count") ggplot( stats, aes( x="", y=count, fill=factor(name) ) ) + geom_col( width= 1) + scale_fill_manual( values=c('red', 'blue', 'green', 'darkorchid', 'sienna4', 'thistle', 'gray0', 'khaki4', 'seagreen', 'midnightblue', 'azure4', 'cornflowerblue', 'olivedrab', 'lightgreen', 'purple4', 'turquoise') ) + coord_polar( theta="y", start=0 )
Data Types
parallel package
- R-core package, replaces 3rd party multicore and snow
references
- package manual
- HPC and Parallel Computing in R
- Sept 2017 parallel foreach guide
- http://pablobarbera.com/POIR613/code/06-parallel-computing.html ditto]
course-grained parallelization
- large chunks of computations in parallel
- chunks of computation are unrelated, do not need to communicate in any way
- great for bootstrapping
load-balancing paradigm
- start up M worker processes
- allow worker processes access to data
- split task into N tasks (chunks)
- send the first M number of tasks to the M workers
- Implementation detail - via serialization, it not unlimited
- when a worker finishes a task send it another one
Worker paradigms
- "Cluster" - SAME MACHINE, start new processes ("snow" style)
- "fork" - SAME MACHINE, POSIX forks, copy on write, theoretically cheaper, not avail on windows
- MPI - MULTIPLE MAHCINES
CPUs/Cores
- physical CPU has 2 or more cores that run independently
- ergo concept of "logical CPU"
- considerations
- we know how many ther are, but how many available TO YOU?
- hopefully your chunk of code that you want to run in parallel does not, itself, run multiple cores
apply functions
- mclapply() - sets up ephemeral group of cores for this computation
- makeCluster() + parLapply() + stopCluster() - set up a pool of workers, then call stop cluster when done
- parRapply/parCapply for per row/col apply for matrix
"Cluster", i.e., multiprocessing API
- makeCluster( n_cores, type="PSOCK|FORK") - calls down to one of two subfunctions
- makePSOCKcluster() - uses Rscript to launch further copies of R on this or other hosts
- makeForkCluster() - not available on windows - workers automatically inherit environment of parent session
- stdout and stderr are discarded by default, unless logged by outfile option
library doParallel
- registerDoParallel(myCluster)
- stopCluster(myCluster)