Transcriptomic Expression Quantification Workflow Walkthrough
Last updated
Last updated
Copyright 2024 DNAnexus
Transcriptomic Expression Quantification (TEQ) is an end-to-end RNA-seq workflow which accepts unmapped transcript reads (FASTQ format) as input and generates a gene or transcript quantification matrix as output. A user may choose from five pipeline configurations when launching the workflow, each providing different combinations of tools for read mapping and quantification. These five pipelines give the user the freedom to perform genome- or transcriptome-based alignment and gene- or transcript-level quantification. The five configurations consist of the following tool combinations:
STAR
STAR + Salmon
STAR + RSEM
Kallisto
Salmon
The following demonstrates an example run of the workflow using the “STAR + RSEM” pipeline and through either the DNAnexus platform site (Graphical User Interface, GUI) or SDK (Command Line Interface, CLI).
From the GUI, the configurations described in this tutorial can be found in the Tools Library (https://platform.dnanexus.com/panx/tools) by searching “TEQ.” To search for “TEQ” by name, click the button on the left upper side of the screen labeled “Any name.”
All five configurations have a mandatory analysis input field called "alignment reference". The field "quantification reference", is an additional requirement for both the “STAR + Salmon” and “STAR + RSEM” configurations, however note that these references are referred to as reference index in the nomenclature of alignment tools such as STAR or Kallisto.
The following DNAnexus Platform Apps may be used to generate the corresponding references (see table below). If the pipeline of choice requires an "alignment reference" as well as a "quantification reference", make sure to use the corresponding releases for any genome, transcriptome, and genome annotation (GTF) file combinations when generating references. For example, if the genome reference is from “GENCODE release 40”, then the transcriptome file should also be sourced from “GENCODE release 40”. Once "alignment reference" and quantification reference" inputs have been prepared, these files may be reused for future workflow runs.
Reference input requirements for each pipeline and the corresponding DNAnexus app to be used for generating the reference are listed below.
STAR + Salmon
Note: All transcripts are required in the Transcript FASTA file. See documentation in the App, Salmon Quantification.
STAR + RSEM
To run the STAR + RSEM pipeline, the star_generate_genome_index app is used to create the input for "alignment reference". If running STAR or STAR + Salmon pipelines, the same procedure is followed to generate the Alignment Reference. The Quantification Reference for the STAR + RSEM pipeline is prepared using the rsem_prepare_genome app.
To generate a compatible "alignment reference" for TEQ, prepare the genome index using a genome file and a genome annotation file in GTF format. Download the relevant files from Ensembl (https://www.ensembl.org/Homo_sapiens/Info/Index) for this guide. Other sources are fine, as long the genome and annotation files are compatible. Use the URL Fetcher App (https://platform.dnanexus.com/app/url_fetcher) to download the genome and the corresponding GTF from Ensembl. You can use the GUI to run this app (see screenshot below), or simply upload the files to the project on the platform from a local machine. Genome and GTF files may be downloaded from here and here, respectively (Ensembl release 106).
Now the STAR Generate Genome Index app is ready to run. Simply, map the previously downloaded files to "reference genome" and "transcript annotations" inputs of the app (see below for the UI screenshot).
Next, prepare the Quantification Reference for the pipeline’s quantification engine, which is RSEM in this case. This follows the same procedure with the same inputs as outlined before for STAR index generation, the only difference being that the RSEM Prepare Genome app (https://platform.dnanexus.com/app/rsem_prepare_genome) is instead used with the same genome and GTF files downloaded earlier from Ensembl.
Now the necessary inputs for the STAR + RSEM pipeline have been generated and the analysis can begin. From the GUI, you can find the TEQ workflow in the Tools Library. The TEQ workflow is region-specific, so select the workflow matching the account region.
Next, provide the necessary inputs for running the STAR + RSEM pipeline, including "alignment reference", "quantification reference", and "transcript annotation" files previously downloaded and prepared. Using a Transcript Annotation (GTF file) other than the one used to prepare alignment reference" and "quantification reference" is not recommended and may result in errors.
In this example, a paired-end analysis is being conducted, so both “reads” and “reads 2” are necessary inputs. All FASTQ files in both “reads” and “reads 2” should always come from a single sample. In the case of multi-FASTQ analysis, ensure that the FASTQs are from the same sample and not multiple different samples. For multi-sample analysis, please use Batch Job functionality.
Select "STAR + RSEM" as input in the "Common" section. Other settings may be left “as is” (default). To generate a more comprehensive QC report, enable the “perform mapping QC” option at the bottom of the input section.
Below are the commands to run this analysis from CLI and using dx-toolkit.
Fetch reference data using the app URL Fetcher:
Next, generate an index file using the app STAR Generate Genome Index:
Prepare the genome for RSEM using the app RSEM Prepare Genome:
Finally, run the global workflow Transcriptomic Expression Quantification:
TEQ is available in the following regions, however please take note of the different naming conventions across regions, especially when calling the workflow from CLI.
transcriptomic_expression_quantification
AWS US East
transcriptomic_expression_quantification_ap
AWS Asia Pacific - Sydney
transcriptomic_expression_quantification_eu
AWS Europe - Frankfurt
transcriptomic_expression_quantification_eu_west_2_g
AWS Europe - London
transcriptomic_expression_quantification_azure_eu
Azure Europe
transcriptomic_expression_quantification_azure_us
Azure US