introduction.RmdThe main function in the seeker package is, well, seeker(). Currently seeker() is targeted at processing RNA-seq data. The main input is a list of parameters specifying which steps of RNA-seq data processing to perform and how to perform them. Depending on the parameters, seeker() will call other functions in the package.
A convenient way to construct the list of parameters is to make a yaml file and read it into R using yaml::read_yaml(). A template yaml file is reproduced below and available at system.file('extdata', 'params_template.yml', package = 'seeker').
study: '' # [string]
metadata:
run: TRUE # [logical]
bioproject: '' # [string]
# include # [named list or NULL]
# colname # [string]
# values # [vector]
# exclude # [named list or NULL]
# colname # [string]
# values # [vector]
fetch:
run: TRUE # [logical]
# overwrite # [logical or NULL]
# ascpCmd # [string or NULL]
# ascpArgs # [character vector or NULL]
# ascpPrefix # [string or NULL]
trimgalore:
run: TRUE # [logical]
# cmd # [string or NULL]
# args # [character vector or NULL]
fastqc:
run: TRUE # [logical]
# cmd # [string or NULL]
# args # [character vector or NULL]
salmon:
run: TRUE # [logical]
indexDir: '' # [string]
# cmd # [string or NULL]
# args # [character vector or NULL]
multiqc:
run: TRUE # [logical]
# cmd # [string or NULL]
# args # [character vector or NULL]
tximport:
run: TRUE # [logical]
tx2gene:
# [named list or NULL]
dataset: 'mmusculus_gene_ensembl' # [string]
version: 104 # [number; latest version is 104 as of Oct 2021]
countsFromAbundance: '' # [string]
# ignoreTxVersion # [logical or NULL]A convenient way to run seeker() is then using a script such as the one reproduced below and available at system.file('extdata', 'run_seeker.R', package = 'seeker').
doParallel::registerDoParallel()
cArgs = commandArgs(TRUE)
yamlPath = cArgs[1L]
parentDir = cArgs[2L]
params = yaml::read_yaml(yamlPath)
seeker::seeker(params, parentDir)If you copy the script to your current working directory, you can run it using something like
Rscript run_seeker.R <path/to/study>.yml <path/to/parent/directory>A fancier option, which saves stdout and stderr to a log file, would be something like
study="<study>" && \
parentDir="<path/to/parent/directory>" && \
mkdir -p "${parentDir}/${study}" && \
Rscript run_seeker.R "<path/to>/${study}.yml" "${parentDir}" &> \
"${parentDir}/${study}/progress.log"This option assumes that the name of the yaml file (minus the file extension) is identical to the study variable within the yaml file, which we highly recommend.