Process RNA-seq data end to end

This function selectively performs various steps to process RNA-seq data.

seeker(params, parentDir = ".")

Arguments

params

params	Named list of parameters with components: `study`: String used to name the output directory within `parentDir`. `metadata`: Named list with components: `run`: Logical indicating whether to fetch metadata from ENA. See `fetchMetadata()`. If `TRUE`, saves a file `parentDir`/`study`/data/metadata.csv. If `FALSE`, expects that file to already exist. Following components are only checked if `run` is `TRUE`. `bioproject`: String indicating the study's bioproject accession. `include`: Optional named list for specifying which rows of metadata to include for further processing, with components: `colname`: String indicating column in metadata `values`: Vector indicating values within `colname` `exclude`: Optional named list for specifying which rows of metadata to exclude from further processing (superseding `include`), with components: `colname`: String indicating column in metadata `values`: Vector indicating values within `colname` `fetch`: Named list with components: `run`: Logical indicating whether to fetch fastq(.gz) files using ascp. See `fetch()`. If `TRUE`, expects metadata to have a column 'fastq_aspera' containing remote paths, and saves files to `parentDir`/`study`/fetch_output. If `FALSE`, expects metadata to have a column 'fastq_aspera' containing names (or complete paths, local or remote) of fastq files. Whether `TRUE` or `FALSE`, updates metadata with column 'fastq_fetched' containing paths to files that should be in `parentDir`/`study`/fetch_output. Following components are only checked if `run` is `TRUE`. `overwrite`: Logical indicating whether to overwrite files that already exist. `NULL` indicates to use the default in `fetch()`. `ascpCmd`: String indicating path to ascp. `NULL` indicates to use the default in `fetch()`. `ascpArgs`: Character vector of arguments to pass to ascp. `NULL` indicates to use the default in `fetch()`. `ascpPrefix`: String indicating prefix for fetching files. `NULL` indicates to use the default in `fetch()`. `trimgalore`: Named list with components: `run`: Logical indicating whether to perform quality/adapter trimming of reads. See `trimgalore()`. If `TRUE`, expects metadata to have a column 'fastq_fetched' containing paths to fastq files in `parentDir`/`study`/fetch_output, saves trimmed files to `parentDir`/`study`/trimgalore_output, and updates metadata with column 'fastq_trimmed'. If `FALSE`, expects and does nothing. Following components are only checked if `run` is `TRUE`. `cmd`: Name or path of the command-line interface. `NULL` indicates to use the default in `trimgalore()`. `args`: Additional arguments to pass to the command-line interface. `NULL` indicates to use the default in `trimgalore()`. `fastqc`: Named list with components: `run`: Logical indicating whether to perform QC on reads. See `fastqc()`. If `TRUE` and `trimgalore$run` is `TRUE`, expects metadata to have a column 'fastq_trimmed' containing paths to fastq files in `parentDir`/`study`/trimgalore_output. If `TRUE` and `trimgalore$run` is `FALSE`, expects metadata to have a column 'fastq_fetched' containing paths to fastq files in `parentDir`/`study`/fetch_output. If `TRUE`, saves results to `parentDir`/`study`/fastqc_output. If `FALSE`, expects and does nothing. Following components are only checked if `run` is `TRUE`. `cmd`: Name or path of the command-line interface. `NULL` indicates to use the default in `fastqc()`. `args`: Additional arguments to pass to the command-line interface. `NULL` indicates to use the default in `fastqc()`. `salmon`: Named list with components: `run`: Logical indicating whether to quantify transcript abundances. See `salmon()`. If `TRUE` and `trimgalore$run` is `TRUE`, expects metadata to have a column 'fastq_trimmed' containing paths to fastq files in `parentDir`/`study`/trimgalore_output. If `TRUE` and `trimgalore$run` is `FALSE`, expects metadata to have a column 'fastq_fetched' containing paths to fastq files in `parentDir`/`study`/fetch_output. If `TRUE`, also expects metadata to have a column 'sample_accession' containing sample ids, and saves results to `parentDir`/`study`/salmon_output and `parentDir`/`study`/data/salmon_meta_info.csv. If `FALSE`, expects and does nothing. Following components are only checked if `run` is `TRUE`. `indexDir`: Directory that contains salmon index. `cmd`: Name or path of the command-line interface. `NULL` indicates to use the default in `salmon()`. `args`: Additional arguments to pass to the command-line interface. `NULL` indicates to use the default in `salmon()`. `multiqc`: Named list with components: `run`: Logical indicating whether to aggregrate results of various processing steps. See `multiqc()`. If `TRUE`, saves results to `parentDir`/`study`/multiqc_output. If `FALSE`, expects and does nothing. Following components are only checked if `run` is `TRUE`. `cmd`: Name or path of the command-line interface. `NULL` indicates to use the default in `multiqc()`. `args`: Additional arguments to pass to the command-line interface. `NULL` indicates to use the default in `multiqc()`. `tximport`: Named list with components: `run`: Logical indicating whether to summarize transcript- or gene-level estimates for downstream analysis. See `tximport()`. If `TRUE`, expects a directory `parentDir`/`study`/salmon_output containing directories of quantification results from salmon, and saves results to `parentDir`/`study`/data/tximport_output.qs. If `FALSE`, expects and does nothing. Following components are only checked if `run` is `TRUE`. `tx2gene`: Optional named list with components: `dataset`: String indicating ensembl gene dataset. See `getTx2gene()`. `version`: Number indicating ensembl version. See `getTx2gene()`. If specified, saves a file `parentDir`/`study`/data/tx2gene.csv. `countsFromAbundance`: String indicating whether or how to estimate counts using estimated abundances. See `tximport::tximport()`. `ignoreTxVersion`: Logical indicating whether to the version suffix on transcript ids. If `NULL`, indicates to use `TRUE`. See `tximport::tximport()`. `params` can be derived from a yaml file, see `vignette('introduction', package = 'seeker')`. The yaml representation of `params` will be saved to `parentDir`/`params$study`/data/params.yml.
parentDir	Directory in which to store the output, which will be a directory named according to `params$study`.

Named list of parameters with components:

study: String used to name the output directory within parentDir.
metadata: Named list with components:
- run: Logical indicating whether to fetch metadata from ENA. See fetchMetadata(). If TRUE, saves a file parentDir/study/data/metadata.csv. If FALSE, expects that file to already exist. Following components are only checked if run is TRUE.
- bioproject: String indicating the study's bioproject accession.
- include: Optional named list for specifying which rows of metadata to include for further processing, with components:
  - colname: String indicating column in metadata
  - values: Vector indicating values within colname
- exclude: Optional named list for specifying which rows of metadata to exclude from further processing (superseding include), with components:
  - colname: String indicating column in metadata
  - values: Vector indicating values within colname
fetch: Named list with components:
- run: Logical indicating whether to fetch fastq(.gz) files using ascp. See fetch(). If TRUE, expects metadata to have a column 'fastq_aspera' containing remote paths, and saves files to parentDir/study/fetch_output. If FALSE, expects metadata to have a column 'fastq_aspera' containing names (or complete paths, local or remote) of fastq files. Whether TRUE or FALSE, updates metadata with column 'fastq_fetched' containing paths to files that should be in parentDir/study/fetch_output. Following components are only checked if run is TRUE.
- overwrite: Logical indicating whether to overwrite files that already exist. NULL indicates to use the default in fetch().
- ascpCmd: String indicating path to ascp. NULL indicates to use the default in fetch().
- ascpArgs: Character vector of arguments to pass to ascp. NULL indicates to use the default in fetch().
- ascpPrefix: String indicating prefix for fetching files. NULL indicates to use the default in fetch().
trimgalore: Named list with components:
- run: Logical indicating whether to perform quality/adapter trimming of reads. See trimgalore(). If TRUE, expects metadata to have a column 'fastq_fetched' containing paths to fastq files in parentDir/study/fetch_output, saves trimmed files to parentDir/study/trimgalore_output, and updates metadata with column 'fastq_trimmed'. If FALSE, expects and does nothing. Following components are only checked if run is TRUE.
- cmd: Name or path of the command-line interface. NULL indicates to use the default in trimgalore().
- args: Additional arguments to pass to the command-line interface. NULL indicates to use the default in trimgalore().
fastqc: Named list with components:
- run: Logical indicating whether to perform QC on reads. See fastqc(). If TRUE and trimgalore$run is TRUE, expects metadata to have a column 'fastq_trimmed' containing paths to fastq files in parentDir/study/trimgalore_output. If TRUE and trimgalore$run is FALSE, expects metadata to have a column 'fastq_fetched' containing paths to fastq files in parentDir/study/fetch_output. If TRUE, saves results to parentDir/study/fastqc_output. If FALSE, expects and does nothing. Following components are only checked if run is TRUE.
- cmd: Name or path of the command-line interface. NULL indicates to use the default in fastqc().
- args: Additional arguments to pass to the command-line interface. NULL indicates to use the default in fastqc().
salmon: Named list with components:
- run: Logical indicating whether to quantify transcript abundances. See salmon(). If TRUE and trimgalore$run is TRUE, expects metadata to have a column 'fastq_trimmed' containing paths to fastq files in parentDir/study/trimgalore_output. If TRUE and trimgalore$run is FALSE, expects metadata to have a column 'fastq_fetched' containing paths to fastq files in parentDir/study/fetch_output. If TRUE, also expects metadata to have a column 'sample_accession' containing sample ids, and saves results to parentDir/study/salmon_output and parentDir/study/data/salmon_meta_info.csv. If FALSE, expects and does nothing. Following components are only checked if run is TRUE.
- indexDir: Directory that contains salmon index.
- cmd: Name or path of the command-line interface. NULL indicates to use the default in salmon().
- args: Additional arguments to pass to the command-line interface. NULL indicates to use the default in salmon().
multiqc: Named list with components:
- run: Logical indicating whether to aggregrate results of various processing steps. See multiqc(). If TRUE, saves results to parentDir/study/multiqc_output. If FALSE, expects and does nothing. Following components are only checked if run is TRUE.
- cmd: Name or path of the command-line interface. NULL indicates to use the default in multiqc().
- args: Additional arguments to pass to the command-line interface. NULL indicates to use the default in multiqc().
tximport: Named list with components:
- run: Logical indicating whether to summarize transcript- or gene-level estimates for downstream analysis. See tximport(). If TRUE, expects a directory parentDir/study/salmon_output containing directories of quantification results from salmon, and saves results to parentDir/study/data/tximport_output.qs. If FALSE, expects and does nothing. Following components are only checked if run is TRUE.
- tx2gene: Optional named list with components:
  - dataset: String indicating ensembl gene dataset. See getTx2gene().
  - version: Number indicating ensembl version. See getTx2gene().
  If specified, saves a file parentDir/study/data/tx2gene.csv.
- countsFromAbundance: String indicating whether or how to estimate counts using estimated abundances. See tximport::tximport().
- ignoreTxVersion: Logical indicating whether to the version suffix on transcript ids. If NULL, indicates to use TRUE. See tximport::tximport().

params can be derived from a yaml file, see vignette('introduction', package = 'seeker'). The yaml representation of params will be saved to parentDir/params$study/data/params.yml.

parentDir

Directory in which to store the output, which will be a directory named according to params$study.

Value

NULL, invisibly.

Process RNA-seq data end to end

Arguments

Value

See also