Analyzing Somatic Variants
Analyze somatic variants, including cancer-specific filtering, visualization, and variant landscape exploration in the Cohort Browser.
Explore and analyze datasets with somatic variant assays by opening them in the Cohort Browser and switching to the Somatic Variants tab. You can create cohorts based on somatic variants, visualize variant patterns, and examine detailed variant information.
You can analyze somatic variants across four main categories: Single Nucleotide Variants (SNVs) & Indels for small genomic changes, Copy Number Variants (CNVs) for alterations in gene copy numbers, Fusions for structural rearrangements involving gene coding sequences, and Structural Variants (SVs) for larger genomic rearrangements.
Variant Classification
The somatic data model classifies all genomic variants into four main classes, defined by their size, structure, and representation in VCF files. Each variant type has specific criteria that must be met for classification.
SNV & Indel Single base substitutions and small insertions/deletions with precise allele sequences
All must match:
• Variant size ≤ 50bp
• ALT field contains precise allele (NOT symbolic like <DEL>, <INS>, <DUP>, <CNV>)
A→G, ATCG→A, A→ATCG
Copy Number Variant (CNV) Changes in gene copy number
All must match:
• ALT field contains symbolic allele (<CNV>, <DEL>, <DUP>)
• Explicit copy number value present in FORMAT field key CN
<CNV>, <DEL>, <DUP>
Fusion Structural rearrangements involving gene coding sequences
All must match:
• ALT field contains breakend notation with square brackets ([ or ])
• At least one breakpoint overlaps with annotated gene or transcript
[chr2:123456[, ]chr5:789012]
Structural Variant (SV) Large or complex structural changes
Either must match:
• Variant length > 50bp
• ALT field contains symbolic allele (<DEL>, <INV>, <CNV>, <BND>)
<DEL>, <INV>, large insertions
Filtering by Somatic Variants
You can define your cohort to include only samples with specific somatic variants.
To apply a somatic filter to your cohort:
For the cohort you want to edit, click Add Filter.
In Add Filter to Cohort > Assays > Variant (Somatic), select a genomic filter.
In Edit Filter: Variant (Somatic), specify the criteria:
For datasets with multiple somatic variant assays, select the specific assay to filter by.
Choose whether to include patients with at least one detected variant matching the specified criteria (WITH Variant), or include only patients who have no detected variants matching the criteria (WITHOUT Variant). By default, the filter includes those with matching variants. This choice applies to all specified filtering criteria.
On the Genes / Effects tab, select variants of specific types and variant consequences within specified genes and genomic ranges. You can specify up to 5 genes or genomic ranges in a comma-separated list.
On the HGVS tab, specify a particular HGVS DNA or HGVS protein notation, preceded by a gene symbol. Example:
KRAS p.Arg1459Ter.On the Variant IDs tab, specify variant IDs using the standard format chr_pos_ref_alt (for example,
17_7674257_A_G). You can enter up to 10 variant IDs in a comma-separated list.Enter multiple genes, ranges, or variants, by separating them with commas or placing each on a new line.
Click Apply Filter.

Working with Large Structural Variants (>10Mbp)
Structural variants larger than 10 megabases lack gene-level annotations, which limits how you can filter and visualize them. Use these alternative filtering approaches:
Filter by genomic coordinates: In the Genes / Effects filter, enter genomic coordinates in the format
chr:start-end, for example,17:7661779-7687538for the TP53 gene region. Set the variant type scope to SV or CNV and leave consequence types blank. Find gene coordinates by typing the gene symbol in the search icon next to the Variants & Events table.Filter by variant IDs: In the Variant IDs filter, enter up to 10 variant IDs in the format
chr_pos_ref_alt, for example,17_7674257_A_<DEL>. To get variant IDs, navigate to the gene region in the Variants & Events table, select variants of interest, and download the CSV file - the Location column contains the variant IDs.
For comprehensive structural variant analysis, combine multiple filtering approaches. Use gene symbol filters to capture annotated structural variants ≤ 10Mbp, then add coordinate-based filters to include larger structural variants in the same genomic regions.
Large structural variants are visible in the Variants & Events table with full details, but they do not appear in the Variant Frequency Matrix due to missing gene-level annotations.
Comparing Variant Patterns Across Your Cohort
The Variant Frequency Matrix provides a visual overview of how often somatic variants appear throughout your cohort. The matrix helps you identify variant patterns across tumor samples and discover which variants frequently occur together. You can also measure the mutation burden in different genes and compare how mutation profiles differ between two cohorts. This makes it easier to spot trends and relationships in your data that might not be apparent when examining individual variants.

The Variant Frequency Matrix is interactive. You can filter by genes and consequences, and view details of specific genes and samples, and zoom in on specific genes or regions.
In the Variant Frequency Matrix, the rows represents genes and columns represent samples, both are sorted by variant frequency.
Sorted gene list: Genes are ranked from most to least frequently affected by variants. A sample is considered "affected" by a gene if it is a tumor sample with at least one detected variant of high or moderate impact in that gene's canonical transcript. Matched normal samples are not included in this calculation.
Sorted sample list: Samples are ordered by the total number of genes that contain variants. This ranking is independent of how frequently each individual gene is affected.
Filtering by Genes and Consequences
By default, the Variant Frequency Matrix includes all genes and samples. To narrow your view, you can filter the matrix to specific classes of somatic variants, such as SNVs & Indels, Structural Variants, CNVs, or Fusions.

Using the legend in bottom right, you can focus on specific variants, events, or consequences. This allows you to better explore particular areas of interest, such as high-impact mutations or specific consequences relevant to your research.

Viewing Gene and Sample Details
The Variant Frequency Matrix is highly interactive, allowing you to quickly access more details and apply filters.
When you hover over a cell, the matrix shows a unique identifier for the sample, along with a breakdown of the variants detected in that gene, organized by their consequence type. You can copy the sample ID to your clipboard to apply it to a cohort filter.

When you hover over a gene ID on the left axis, the matrix shows more information about that gene. This includes a unique identifier for the gene, along with a quick breakdown of available external annotations, with direct links to the CIViC and OncoKB databases (when available).
To create a filter, hover over the gene and click + Add to Filter, or copy the gene ID to your clipboard for use in a custom filter.

Color Coding and Consequences
The Variant Frequency Matrix uses color coding to represent the consequences of detected variants, providing a quick visual assessment of variant types. Only high and moderate impact consequences, as defined by Ensembl VEP version 109, are included in this visualization.
Samples with two or more detected variants are color-coded as "Multi Hit", indicating a complex variant profile.
Exploring Gene-Level Mutation Patterns
The Lollipop Plot is a visualization tool that shows the somatic variants of a cohort on a single gene's canonical protein. With Lollipop Plot, you can identify mutation hotspots within a specific gene, understand the functional impact of variants in the context of protein domains, compare mutation patterns across different patient cohorts, and explore recurrent mutations in cancer driver genes.
Use the Go to Gene field to quickly navigate to a gene of interest, such as TP53.

When you hover over a lollipop, you can see details about the amino acid change, such as the HGVS notation and the frequency of that change in the current cohort. The plot also shows the location of each mutation along the protein sequence, with color coding to indicate the consequence type.
Reading the Lollipop Plot
Each lollipop on the plot represents amino acid changes at a specific location.
The horizontal position (X axis) indicates the location of the change, while the height (Y axis) represents the frequency of that change within the current cohort.
Lollipops are color-coded by consequence based on the canonical transcript.
If a lollipop represents multiple consequence types, it is coded as "Multi Hit".
You can identify mutation hotspots for a given gene and see protein changes in HGVS short form notation, such as
T322A, and HGVS.p notation, such asp.Thr322Ala.
Examining Detailed Variant Information
The Variants & Events table displays details on the same genomic region as the Lollipop Plot. You can filter the table to focus on specific variant types, such as SNV & Indels, SV (Structural Variants), CNV, or Fusion.
Information displayed in the Variants & Events table includes:
Location of variant, with a link to its locus details
Reference allele of variant
Alternate allele of variant
Type of variant, such as SNV, Indel, or Structural Variant
Variant consequences, with entries color-coded by level of severity
HGVS cDNA
HGVS Protein
COSMIC ID
RSID, with a link to the dbSNP entry for the variant

Exporting Variant Information
You can export the selected variants in the Variants & Events table as a list of variant IDs or a CSV file.
To copy a comma-separated list of variant IDs to your clipboard, select the set of IDs you want to copy, and click Copy.
To export variants as a CSV file, select the set of IDs you need, and click Download (.csv file).
Accessing External Annotations and Resources
In Variants & Events > Location column, you can click on the specific location to open the locus details.
The locus details show specific SNV & Indel variants as well as up to 200 structural variants overlapping with the specific location. For canonical transcripts, a blue indicator appears next to the transcript ID, identifying the primary transcript annotations.

The locus details include enhanced annotations to external resources:
Gene-level links - Direct links to gene information in external databases
Variant-level links - Links to variant-specific annotation resources
These links allow you to quickly navigate to external annotation resources for further information about genes or variants of interest.
Last updated
Was this helpful?