Analyzing Somatic Variants

Analyze somatic variants, including cancer-specific filtering, visualization, and variant landscape exploration in the Cohort Browser.

An Apollo license is required to use Cohort Browser on the DNAnexus Platform. Org approval may also be required. Contact DNAnexus Sales for more information.

Explore and analyze datasets with somatic variant assays by opening them in the Cohort Browser and switching to the Somatic Variants tab. You can create cohorts based on somatic variants, visualize variant patterns, and examine detailed variant information.

You can analyze somatic variants across four main categories: Single Nucleotide Variants (SNVs) & Indels for small genomic changes, Copy Number Variants (CNVs) for alterations in gene copy numbers, Fusions for structural rearrangements involving gene coding sequences, and Structural Variants (SVs) for larger genomic rearrangements.

Somatic assay datasets are created using the Somatic Variant Assay Loader.

Variant Classification

The somatic data model classifies all genomic variants into four main classes, defined by their size, structure, and representation in VCF files. Each variant type has specific criteria that must be met for classification.

Variant Type
Classification Criteria
Examples

SNV & Indel Single base substitutions and small insertions/deletions with precise allele sequences

All must match: • Variant size ≤ 50bp • ALT field contains precise allele (NOT symbolic like <DEL>, <INS>, <DUP>, <CNV>)

A→G, ATCG→A, A→ATCG

Copy Number Variant (CNV) Changes in gene copy number

All must match:ALT field contains symbolic allele (<CNV>, <DEL>, <DUP>) • Explicit copy number value present in FORMAT field key CN

<CNV>, <DEL>, <DUP>

Fusion Structural rearrangements involving gene coding sequences

All must match:ALT field contains breakend notation with square brackets ([ or ]) • At least one breakpoint overlaps with annotated gene or transcript

[chr2:123456[, ]chr5:789012]

Structural Variant (SV) Large or complex structural changes

Either must match: • Variant length > 50bp • ALT field contains symbolic allele (<DEL>, <INV>, <CNV>, <BND>)

<DEL>, <INV>, large insertions

Somatic Variants in Cohort Browser

  • CNVs and Fusions are also classified as Structural Variants in the Cohort Browser because they use symbolic allele representations (<CNV>, <DEL>, <DUP>, <BND>). This dual classification ensures they are correctly distinguished from SNVs regardless of their physical length.

  • For optimal performance and annotation scalability, the Cohort Browser processes SVs and CNVs between 50bp and 10Mbp differently than larger variants:

    • SVs and CNVs ≤ 10Mbp: Fully annotated with gene symbols and consequences, appear in all visualizations including the Variant Frequency Matrix

    • SVs and CNVs > 10Mbp: Ingested and visible in the Variants & Events table but lack gene-level annotations. These larger variants do not appear in the Variant Frequency Matrix and cannot be filtered using gene symbols or consequence terms. Use genomic coordinates or variant IDs to filter for these variants (see Working with Large Structural Variants below).

    • Fusions are not affected by this size limit as they are considered two single-position events.

Filtering by Somatic Variants

You can define your cohort to include only samples with specific somatic variants.

To apply a somatic filter to your cohort:

  1. For the cohort you want to edit, click Add Filter.

  2. In Add Filter to Cohort > Assays > Variant (Somatic), select a genomic filter.

  3. In Edit Filter: Variant (Somatic), specify the criteria:

    • For datasets with multiple somatic variant assays, select the specific assay to filter by.

    • Choose whether to include patients with at least one detected variant matching the specified criteria (WITH Variant), or include only patients who have no detected variants matching the criteria (WITHOUT Variant). By default, the filter includes those with matching variants. This choice applies to all specified filtering criteria.

    • On the Genes / Effects tab, select variants of specific types and variant consequences within specified genes and genomic ranges. You can specify up to 5 genes or genomic ranges in a comma-separated list.

    • On the HGVS tab, specify a particular HGVS DNA or HGVS protein notation, preceded by a gene symbol. Example: KRAS p.Arg1459Ter.

    • On the Variant IDs tab, specify variant IDs using the standard format chr_pos_ref_alt (for example, 17_7674257_A_G). You can enter up to 10 variant IDs in a comma-separated list.

    • Enter multiple genes, ranges, or variants, by separating them with commas or placing each on a new line.

  4. Click Apply Filter.

Adding a somatic variant filter

You can specify up to 10 somatic variant filters for each cohort.

Working with Large Structural Variants (>10Mbp)

Structural variants larger than 10 megabases lack gene-level annotations, which limits how you can filter and visualize them. Use these alternative filtering approaches:

  • Filter by genomic coordinates: In the Genes / Effects filter, enter genomic coordinates in the format chr:start-end, for example, 17:7661779-7687538 for the TP53 gene region. Set the variant type scope to SV or CNV and leave consequence types blank. Find gene coordinates by typing the gene symbol in the search icon next to the Variants & Events table.

  • Filter by variant IDs: In the Variant IDs filter, enter up to 10 variant IDs in the format chr_pos_ref_alt, for example, 17_7674257_A_<DEL>. To get variant IDs, navigate to the gene region in the Variants & Events table, select variants of interest, and download the CSV file - the Location column contains the variant IDs.

Large structural variants are visible in the Variants & Events table with full details, but they do not appear in the Variant Frequency Matrix due to missing gene-level annotations.

Comparing Variant Patterns Across Your Cohort

The Variant Frequency Matrix provides a visual overview of how often somatic variants appear throughout your cohort. The matrix helps you identify variant patterns across tumor samples and discover which variants frequently occur together. You can also measure the mutation burden in different genes and compare how mutation profiles differ between two cohorts. This makes it easier to spot trends and relationships in your data that might not be apparent when examining individual variants.

Variant Frequency Matrix filtered to SNVs & Indels

In the Variant Frequency Matrix, the rows represents genes and columns represent samples, both are sorted by variant frequency.

  • Sorted gene list: Genes are ranked from most to least frequently affected by variants. A sample is considered "affected" by a gene if it is a tumor sample with at least one detected variant of high or moderate impact in that gene's canonical transcript. Matched normal samples are not included in this calculation.

  • Sorted sample list: Samples are ordered by the total number of genes that contain variants. This ranking is independent of how frequently each individual gene is affected.

The Variant Frequency Matrix displays up to the top 50 genes with the most variants and up to 500 samples for any given cohort. The samples shown are the 500 with the highest number of genes containing variants. If your cohort has fewer than 500 samples, the matrix shows all samples.

Filtering by Genes and Consequences

By default, the Variant Frequency Matrix includes all genes and samples. To narrow your view, you can filter the matrix to specific classes of somatic variants, such as SNVs & Indels, Structural Variants, CNVs, or Fusions.

Show only a specific class of somatic variants

Using the legend in bottom right, you can focus on specific variants, events, or consequences. This allows you to better explore particular areas of interest, such as high-impact mutations or specific consequences relevant to your research.

Show only a specific consequence for a class of somatic variants

When comparing cohorts, the matrix can display the top 200 samples (columns) from both the primary and secondary cohorts. The top genes are selected and sorted by their variant frequency within the primary cohort.

Viewing Gene and Sample Details

The Variant Frequency Matrix is highly interactive, allowing you to quickly access more details and apply filters.

When you hover over a cell, the matrix shows a unique identifier for the sample, along with a breakdown of the variants detected in that gene, organized by their consequence type. You can copy the sample ID to your clipboard to apply it to a cohort filter.

Hovering over a cell to view specific sample details

When you hover over a gene ID on the left axis, the matrix shows more information about that gene. This includes a unique identifier for the gene, along with a quick breakdown of available external annotations, with direct links to the CIViC and OncoKB databases (when available).

To create a filter, hover over the gene and click + Add to Filter, or copy the gene ID to your clipboard for use in a custom filter.

Hovering over a gene ID to view its details

Color Coding and Consequences

The Variant Frequency Matrix uses color coding to represent the consequences of detected variants, providing a quick visual assessment of variant types. Only high and moderate impact consequences, as defined by Ensembl VEP version 109, are included in this visualization.

Samples with two or more detected variants are color-coded as "Multi Hit", indicating a complex variant profile.

Exploring Gene-Level Mutation Patterns

The Lollipop Plot is a visualization tool that shows the somatic variants of a cohort on a single gene's canonical protein. With Lollipop Plot, you can identify mutation hotspots within a specific gene, understand the functional impact of variants in the context of protein domains, compare mutation patterns across different patient cohorts, and explore recurrent mutations in cancer driver genes.

Use the Go to Gene field to quickly navigate to a gene of interest, such as TP53.

Lollipop Plot for the TP53 gene

When you hover over a lollipop, you can see details about the amino acid change, such as the HGVS notation and the frequency of that change in the current cohort. The plot also shows the location of each mutation along the protein sequence, with color coding to indicate the consequence type.

The Lollipop Plot displays SNV & Indel data from the same genomic region as the Variants & Events table. When you change the genomic region in the table, the Lollipop Plot updates to reflect the change and the other way around.

Reading the Lollipop Plot

  • Each lollipop on the plot represents amino acid changes at a specific location.

  • The horizontal position (X axis) indicates the location of the change, while the height (Y axis) represents the frequency of that change within the current cohort.

  • Lollipops are color-coded by consequence based on the canonical transcript.

  • If a lollipop represents multiple consequence types, it is coded as "Multi Hit".

  • You can identify mutation hotspots for a given gene and see protein changes in HGVS short form notation, such as T322A, and HGVS.p notation, such as p.Thr322Ala.

Examining Detailed Variant Information

The Variants & Events table displays details on the same genomic region as the Lollipop Plot. You can filter the table to focus on specific variant types, such as SNV & Indels, SV (Structural Variants), CNV, or Fusion.

Unlike the Variant Frequency Matrix, the Variants & Events table displays all structural variants including those larger than 10Mbp. Use this table to examine large SVs that may not appear in other visualizations.

Information displayed in the Variants & Events table includes:

  • Location of variant, with a link to its locus details

  • Reference allele of variant

  • Alternate allele of variant

  • Type of variant, such as SNV, Indel, or Structural Variant

  • Variant consequences, with entries color-coded by level of severity

  • HGVS cDNA

  • HGVS Protein

  • COSMIC ID

  • RSID, with a link to the dbSNP entry for the variant

Variants & Events for the TP53 gene

Exporting Variant Information

You can export the selected variants in the Variants & Events table as a list of variant IDs or a CSV file.

  • To copy a comma-separated list of variant IDs to your clipboard, select the set of IDs you want to copy, and click Copy.

  • To export variants as a CSV file, select the set of IDs you need, and click Download (.csv file).

Accessing External Annotations and Resources

In Variants & Events > Location column, you can click on the specific location to open the locus details.

The locus details show specific SNV & Indel variants as well as up to 200 structural variants overlapping with the specific location. For canonical transcripts, a blue indicator appears next to the transcript ID, identifying the primary transcript annotations.

Showing locus details for a specific somatic variant

The locus details include enhanced annotations to external resources:

  • Gene-level links - Direct links to gene information in external databases

  • Variant-level links - Links to variant-specific annotation resources

These links allow you to quickly navigate to external annotation resources for further information about genes or variants of interest.

Last updated

Was this helpful?