Analyzing Germline Variants

Analyze germline genomic variants, including filtering, visualization, and detailed variant annotation in the Cohort Browser.

An Apollo license is required to use Cohort Browser on the DNAnexus Platform. Org approval may also be required. Contact DNAnexus Sales for more information.

Explore and analyze datasets with germline data by opening them in the Cohort Browser and switching to the Germline Variants tab. You can create cohorts based on germline variants, visualize variant patterns, and examine detailed variant information.

Filtering by Germline Variants

You can define your cohort to include only samples with specific germline variants.

To apply a germline filter to your cohort:

  1. For the cohort you want to edit, click Add Filter.

  2. In Add Filter to Cohort > Assays > Genomic Sequencing, select a genomic filter.

  3. In Edit Filter: Variant (Germline), specify your filtering criteria:

    • For datasets with multiple germline variant assays, select the specific assay to filter by.

    • On the Genes / Effects tab, select variants of specific types and variant consequences within the specified genes and/or genomic ranges. You can specify up to 5 genes or genomic ranges in a comma-separated list.

    • On the Variant IDs tab, specify a list of variant IDs, with a maximum of 100 variants.

    • To enter multiple genes, genomic ranges, or variants, separate them with commas or place each on a new line.

  4. Click Apply Filter.

Adding a germline filter

Exploring Variant Patterns in Your Cohort

The Germline Variants tab includes a lollipop plot displaying allele frequencies for variants in a specified genomic region. This visualization helps you identify patterns in germline variants across your cohort and understand the distribution of allelic frequencies.

Genomic Variant Browser and Details

If your dataset contains multiple germline variant assays, such as WES and WGS assays, you can choose the assay to visualize at the top of the dashboard. The Cohort Browser displays data from only one assay at a time. When you switch between assays, your charts and their display settings are preserved.

Examining Variant Annotations

The allele table, located below the lollipop plot, shows the same variants in a tabular format with comprehensive annotation information. It allows you to examine specific variant characteristics and compare allele frequencies within your selected cohort, the entire dataset, and from annotation databases, including gnomAD.

The annotation information includes:

  • Type: whether the variant is an SNP, deletion, insertion, or mixed.

  • Consequences: The impact of variant according to SnpEff. For variants with multiple gene annotations, this column displays the most severe consequence per gene.

  • Population Allele Frequency: Allele frequency calculated across entire dataset from which the cohort is created.

  • Cohort Allele Frequency: Allele frequency calculated across current cohort selection.

  • GnomAD Allele Frequency: Allele frequency of the specified allele from the public dataset gnomAD.

If canonical transcript information is available, the following three columns with additional annotation information appear in the Table:

  • Consequences (Canonical Transcript): Canonical effects per each associated gene, according to SnpEff.

  • HGVS DNA (Canonical Transcript): HGVS (DNA) standard terminology per each associated gene with this variant

  • HGVS Protein (Canonical Transcript): HGVS (Protein) standard terminology per each associated gene with this variant

Exporting Variant Metadata

You can export the selected variants in the table as a list of variant IDs or a CSV file.

  • To copy a comma-separated list of variant IDs to your clipboard, select the set of IDs you want to copy, and click Copy.

  • To export variants as a CSV file, select the set of IDs you need, and click Download (.csv file).

Accessing Detailed Variant Information

In Allele table > Location column, you can click on the specific location to open the locus details. The locus details provides in-depth annotations and population genetics data for the selected genomic position.

When genomic information is ingested and made available in the Cohort Browser, variants are annotated using NCBI dbSNP and gnomAD. The specific versions of each are provided during the ingestion process and create a set of tables optimized for cohort creation through the Cohort Browser.

Viewing specific locus details

The locus details page displays three main sections of pre-calculated information from dataset ingestion: Location Summary, Genotype Distribution, and Allele Annotations. These sections provide a comprehensive view starting with a locus summary, including genotype frequencies, followed by detailed annotations for each allele.

  • Location Info provides a quick overview of the genomic locus in your dataset, including the chromosome and starting position, the frequency of both the reference allele and no-calls, and the total number of alleles available.

  • Genotypes shows a detailed breakdown of genotypes in the dataset at the specific location. Since allele order is not preserved, genotypes like C/A and A/C are counted in the same category, which is why only half of the comparison table is populated. These genotype frequencies represent the entire dataset at this location, not just your selected cohort.

  • Alleles displays detailed information for each allele, collected from dbSNP and gnomAD during data ingestion. When available, rsID or AffyID appear with direct links to the corresponding NCBI dbSNP page. The section provides allele type, affected samples (dataset), and gnomAD frequency for quick reference, with additional details sorted by transcript ID. For canonical transcripts, a blue indicator appears next to the transcript ID, identifying the primary transcript annotations.

Integrating with Advanced Analysis Tools

For more sophisticated genomic analysis beyond the Cohort Browser's visualization capabilities, you can connect your variant data with other DNAnexus tools. Export variant lists for detailed analysis in JupyterLab, leverage Spark clusters for large-scale genomic computations, or connect to SQL Runner for complex queries across your dataset.

Last updated

Was this helpful?