This function selectively performs various steps to process RNA-seq data.

seeker(params, parentDir = ".")

Arguments

params

Named list of parameters with components:

  • study: String used to name the output directory within parentDir.

  • metadata: Named list with components:

    • run: Logical indicating whether to fetch metadata from ENA. See fetchMetadata(). If TRUE, saves a file parentDir/study/data/metadata.csv. If FALSE, expects that file to already exist. Following components are only checked if run is TRUE.

    • bioproject: String indicating the study's bioproject accession.

    • include: Optional named list for specifying which rows of metadata to include for further processing, with components:

      • colname: String indicating column in metadata

      • values: Vector indicating values within colname

    • exclude: Optional named list for specifying which rows of metadata to exclude from further processing (superseding include), with components:

      • colname: String indicating column in metadata

      • values: Vector indicating values within colname

  • fetch: Named list with components:

    • run: Logical indicating whether to fetch fastq(.gz) files using ascp. See fetch(). If TRUE, expects metadata to have a column 'fastq_aspera' containing remote paths, and saves files to parentDir/study/fetch_output. If FALSE, expects metadata to have a column 'fastq_aspera' containing names (or complete paths, local or remote) of fastq files. Whether TRUE or FALSE, updates metadata with column 'fastq_fetched' containing paths to files that should be in parentDir/study/fetch_output. Following components are only checked if run is TRUE.

    • overwrite: Logical indicating whether to overwrite files that already exist. NULL indicates to use the default in fetch().

    • ascpCmd: String indicating path to ascp. NULL indicates to use the default in fetch().

    • ascpArgs: Character vector of arguments to pass to ascp. NULL indicates to use the default in fetch().

    • ascpPrefix: String indicating prefix for fetching files. NULL indicates to use the default in fetch().

  • trimgalore: Named list with components:

    • run: Logical indicating whether to perform quality/adapter trimming of reads. See trimgalore(). If TRUE, expects metadata to have a column 'fastq_fetched' containing paths to fastq files in parentDir/study/fetch_output, saves trimmed files to parentDir/study/trimgalore_output, and updates metadata with column 'fastq_trimmed'. If FALSE, expects and does nothing. Following components are only checked if run is TRUE.

    • cmd: Name or path of the command-line interface. NULL indicates to use the default in trimgalore().

    • args: Additional arguments to pass to the command-line interface. NULL indicates to use the default in trimgalore().

  • fastqc: Named list with components:

    • run: Logical indicating whether to perform QC on reads. See fastqc(). If TRUE and trimgalore$run is TRUE, expects metadata to have a column 'fastq_trimmed' containing paths to fastq files in parentDir/study/trimgalore_output. If TRUE and trimgalore$run is FALSE, expects metadata to have a column 'fastq_fetched' containing paths to fastq files in parentDir/study/fetch_output. If TRUE, saves results to parentDir/study/fastqc_output. If FALSE, expects and does nothing. Following components are only checked if run is TRUE.

    • cmd: Name or path of the command-line interface. NULL indicates to use the default in fastqc().

    • args: Additional arguments to pass to the command-line interface. NULL indicates to use the default in fastqc().

  • salmon: Named list with components:

    • run: Logical indicating whether to quantify transcript abundances. See salmon(). If TRUE and trimgalore$run is TRUE, expects metadata to have a column 'fastq_trimmed' containing paths to fastq files in parentDir/study/trimgalore_output. If TRUE and trimgalore$run is FALSE, expects metadata to have a column 'fastq_fetched' containing paths to fastq files in parentDir/study/fetch_output. If TRUE, also expects metadata to have a column 'sample_accession' containing sample ids, and saves results to parentDir/study/salmon_output and parentDir/study/data/salmon_meta_info.csv. If FALSE, expects and does nothing. Following components are only checked if run is TRUE.

    • indexDir: Directory that contains salmon index.

    • cmd: Name or path of the command-line interface. NULL indicates to use the default in salmon().

    • args: Additional arguments to pass to the command-line interface. NULL indicates to use the default in salmon().

  • multiqc: Named list with components:

    • run: Logical indicating whether to aggregrate results of various processing steps. See multiqc(). If TRUE, saves results to parentDir/study/multiqc_output. If FALSE, expects and does nothing. Following components are only checked if run is TRUE.

    • cmd: Name or path of the command-line interface. NULL indicates to use the default in multiqc().

    • args: Additional arguments to pass to the command-line interface. NULL indicates to use the default in multiqc().

  • tximport: Named list with components:

    • run: Logical indicating whether to summarize transcript- or gene-level estimates for downstream analysis. See tximport(). If TRUE, expects a directory parentDir/study/salmon_output containing directories of quantification results from salmon, and saves results to parentDir/study/data/tximport_output.qs. If FALSE, expects and does nothing. Following components are only checked if run is TRUE.

    • tx2gene: Optional named list with components:

      • dataset: String indicating ensembl gene dataset. See getTx2gene().

      • version: Number indicating ensembl version. See getTx2gene().

      If specified, saves a file parentDir/study/data/tx2gene.csv.

    • countsFromAbundance: String indicating whether or how to estimate counts using estimated abundances. See tximport::tximport().

    • ignoreTxVersion: Logical indicating whether to the version suffix on transcript ids. If NULL, indicates to use TRUE. See tximport::tximport().

params can be derived from a yaml file, see vignette('introduction', package = 'seeker'). The yaml representation of params will be saved to parentDir/params$study/data/params.yml.

parentDir

Directory in which to store the output, which will be a directory named according to params$study.

Value

NULL, invisibly.

See also