> For the complete documentation index, see [llms.txt](https://documentation.dnanexus.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://documentation.dnanexus.com/user/cohort-browser/analyzing-somatic-variants.md).

# Analyzing Somatic Variants

{% hint style="info" %}
An Apollo license is required to use Cohort Browser on the DNAnexus Platform. Org approval may also be required. [Contact DNAnexus Sales](mailto:sales@dnanexus.com) for more information.
{% endhint %}

Explore and analyze datasets with somatic variant assays by opening them in the Cohort Browser and switching to the Somatic Variants tab. You can create cohorts based on somatic variants, visualize variant patterns, and examine detailed variant information.

You can analyze somatic variants across four main categories: Single Nucleotide Variants (SNVs) & Indels for small genomic changes, Copy Number Variants (CNVs) for alterations in gene copy numbers, Fusions for structural rearrangements involving gene coding sequences, and Structural Variants (SVs) for larger genomic rearrangements.

{% hint style="info" %}
Somatic assay datasets are created using the [Somatic Variant Assay Loader](/developer/ingesting-data/somatic-variant-assay-loader.md).
{% endhint %}

## Variant Classification

The somatic data model classifies all genomic variants into four main classes, defined by their size, structure, and representation in VCF files. Each variant type has specific criteria that must be met for classification.

| Variant Type                                                                                                                  | Classification Criteria                                                                                                                                                                                                                               | Examples                           |
| ----------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------- |
| <p><strong>SNV & Indel</strong><br>Single base substitutions and small insertions/deletions with precise allele sequences</p> | <p><strong>All must match:</strong><br>• Variant size ≤ 50 bp<br>• <code>ALT</code> field contains precise allele (NOT symbolic like <code>\<DEL></code>, <code>\<INS></code>, <code>\<DUP></code>, <code>\<CNV></code>)</p>                          | A→G, ATCG→A, A→ATCG                |
| <p><strong>Copy Number Variant (CNV)</strong><br>Changes in gene copy number</p>                                              | <p><strong>All must match:</strong><br>• <code>ALT</code> field contains symbolic allele (<code>\<CNV></code>, <code>\<DEL></code>, <code>\<DUP></code>)<br>• Explicit copy number value present in <code>FORMAT</code> field key <code>CN</code></p> | `<CNV>`, `<DEL>`, `<DUP>`          |
| <p><strong>Fusion</strong><br>Structural rearrangements involving gene coding sequences</p>                                   | <p><strong>All must match:</strong><br>• <code>ALT</code> field contains breakend notation with square brackets (<code>\[</code> or <code>]</code>)<br>• At least one breakpoint overlaps with annotated gene or transcript</p>                       | `[chr2:123456[`, `]chr5:789012]`   |
| <p><strong>Structural Variant (SV)</strong><br>Large or complex structural changes</p>                                        | <p><strong>Either must match:</strong><br>• Variant length > 50 bp<br>• <code>ALT</code> field contains symbolic allele (<code>\<DEL></code>, <code>\<INV></code>, <code>\<CNV></code>, <code>\<BND></code>)</p>                                      | `<DEL>`, `<INV>`, large insertions |

{% hint style="info" %}
**Somatic Variants in Cohort Browser**

* CNVs and Fusions are also classified as Structural Variants in the Cohort Browser because they use symbolic allele representations (`<CNV>`, `<DEL>`, `<DUP>`, `<BND>`). This dual classification ensures they are correctly distinguished from SNVs regardless of their physical length.
* For optimal performance and annotation scalability, the Cohort Browser processes SVs and CNVs between 50 bp and 10 Mbp differently than larger variants:
  * **SVs and CNVs ≤ 10 Mbp**: Fully annotated with gene symbols and consequences, appear in all visualizations including the Variant Frequency Matrix
  * **SVs and CNVs > 10 Mbp**: Ingested and visible in the Variants & Events table but lack gene-level annotations. These larger variants do not appear in the Variant Frequency Matrix and cannot be filtered using gene symbols or consequence terms. Use genomic coordinates or variant IDs to filter for these variants (see [Working with Large Structural Variants](#working-with-large-structural-variants-10-mbp) below).
  * Fusions are not affected by this size limit as they are considered two single-position events.
    {% endhint %}

## Filtering by Somatic Variants

You can [define your cohort](/user/cohort-browser/defining-cohorts.md#defining-cohort-criteria) to include only samples with specific somatic variants.

To apply a somatic filter to your cohort:

1. For the cohort you want to edit, click **Add Filter**.
2. In **Add Filter to Cohort** > **Assays** > **Variant (Somatic)**, select a genomic filter.
3. In **Edit Filter: Variant (Somatic)**, specify the criteria:
   * For datasets with multiple somatic variant assays, select the specific assay to filter by.
   * Choose whether to include patients with at least one detected variant matching the specified criteria (**WITH Variant**), or include only patients who have no detected variants matching the criteria (**WITHOUT Variant**). By default, the filter includes those with matching variants. This choice applies to all specified filtering criteria.
   * On the **Genes / Effects** tab, select variants of specific types and [variant consequences](https://feb2023.archive.ensembl.org/info/genome/variation/prediction/predicted_data.html) within specified genes and genomic ranges. You can specify up to 5 genes or genomic ranges in a comma-separated list.
   * On the **HGVS** tab, specify a particular [HGVS](https://hgvs-nomenclature.org/stable/) DNA or HGVS protein notation, preceded by a gene symbol. Example: `KRAS p.Arg1459Ter`.
   * On the **Variant IDs** tab, specify variant IDs using the standard format `chr_pos_ref_alt` (for example, `17_7674257_A_G`). You can enter up to 10 variant IDs in a comma-separated list.
   * Enter multiple genes, ranges, or variants, by separating them with commas or placing each on a new line.
4. Click **Apply Filter**.

You can specify up to 10 somatic variant filters for each cohort.

![Adding a somatic variant filter](/files/PQGgxVCwUpf8hpyccSEE)

{% hint style="info" %}
After you apply or edit filters, the participant count updates immediately. However, visualization tiles do not automatically refresh. Click **Refresh Visualizations** at the top of the dashboard to update all tiles. Click **Refresh** on individual tiles to update specific charts.
{% endhint %}

### Working with Large Structural Variants (>10 Mbp)

Structural variants larger than 10 megabases lack gene-level annotations, which limits how you can filter and visualize them. Use these alternative filtering approaches:

* **Filter by genomic coordinates**: In the **Genes / Effects** filter, enter genomic coordinates in the format `chr:start-end`, for example, `17:7661779-7687538` for the *TP53* gene region. Set the variant type scope to **SV** or **CNV** and leave consequence types blank. Find gene coordinates by typing the gene symbol in the search icon next to the Variants & Events table.
* **Filter by variant IDs**: In the **Variant IDs** filter, enter up to 10 variant IDs in the format `chr_pos_ref_alt`, for example, `17_7674257_A_<DEL>`. To get variant IDs, navigate to the gene region in the Variants & Events table, select variants of interest, and download the CSV file - the Location column contains the variant IDs.

{% hint style="success" %}
For comprehensive structural variant analysis, combine multiple filtering approaches. Use gene symbol filters to capture annotated structural variants ≤ 10 Mbp, then add coordinate-based filters to include larger structural variants in the same genomic regions.
{% endhint %}

Large structural variants are visible in the [Variants & Events](#examining-detailed-variant-information) table with full details, but they do not appear in the [Variant Frequency Matrix](#comparing-variant-patterns-across-your-cohort) due to missing gene-level annotations.

## Comparing Variant Patterns Across Your Cohort

The **Variant Frequency Matrix** provides a visual overview of how often somatic variants appear throughout your cohort. Use it to identify variant patterns across tumor samples and discover which variants frequently occur together. You can also measure the mutation burden in different genes and compare how mutation profiles differ between two cohorts. This makes trends and relationships in your data easier to spot than when examining individual variants.

![Variant Frequency Matrix filtered to SNVs & Indels](/files/rFIZ7VDz54lQGlE2sFbN)

{% hint style="success" %}
The Variant Frequency Matrix is interactive. You can [filter by genes and consequences](#filtering-by-genes-and-consequences), and [view details of specific genes and samples](#viewing-gene-and-sample-details), and zoom in on specific genes or regions.
{% endhint %}

In the Variant Frequency Matrix, the rows represent genes sorted by variant frequency, and columns represent samples sorted by the number of genes that contain variants.

* **Sorted gene list**: Genes are ranked from most to least frequently affected by variants. A sample is considered "affected" by a gene if it is a tumor sample with at least one detected variant of high or moderate impact in that gene's canonical transcript. Matched normal samples are not included in this calculation.
* **Sorted sample list**: Samples are ordered by the total number of genes that contain variants. This ranking is independent of how frequently each individual gene is affected.

{% hint style="info" %}
The Variant Frequency Matrix displays up to the top 50 genes with the most variants and up to 500 samples for any given cohort. The samples shown are the 500 with the highest number of genes containing variants. If your cohort has fewer than 500 samples, the matrix shows all samples.
{% endhint %}

### Filtering by Genes and Consequences

By default, the Variant Frequency Matrix includes all genes and samples. To narrow your view, you can filter the matrix to specific classes of somatic variants, such as SNVs & Indels, Structural Variants, CNVs, or Fusions.

![Show only a specific class of somatic variants](/files/WemJbPJFWGx9AGDvZAdS)

Use the legend in the bottom right to focus on specific variants, events, or consequences. This gives you a clearer view of areas of interest, such as high-impact mutations or consequences relevant to your research.

![Show only a specific consequence for a class of somatic variants](/files/iHNAMmnKNQN8cMw5RuJS)

{% hint style="info" %}
When [comparing cohorts](/user/cohort-browser/defining-cohorts.md#comparing-cohorts), the matrix can display the top 200 samples (columns) from both the primary and secondary cohorts. The top genes are selected and sorted by their variant frequency within the primary cohort.
{% endhint %}

### Viewing Gene and Sample Details

The Variant Frequency Matrix is highly interactive, allowing you to access more details and apply filters.

When you hover over a cell, the matrix shows a unique identifier for the sample, along with a breakdown of the variants detected in that gene, organized by their consequence type. You can copy the sample ID to your clipboard to apply it to a cohort filter.

![Hovering over a cell to view specific sample details](/files/z5UlLnbsT9lsq4ceAmCa)

When you hover over a gene ID on the y-axis, the matrix shows more information about that gene. This includes a unique identifier for the gene, along with a quick breakdown of available external annotations, with direct links to the [CIViC](https://civicdb.org) and [OncoKB](https://www.oncokb.org) databases (when available).

To create a filter, hover over the gene and click **+ Add to Filter**, or copy the gene ID to your clipboard for use in a custom filter.

![Hovering over a gene ID to view its details](/files/slfUQcg2y3XcYtdva360)

### Color Coding and Consequences

The Variant Frequency Matrix uses color coding to represent the consequences of detected variants, providing a quick visual assessment of variant types. Only high and moderate impact consequences, as defined by [Ensembl VEP version 109](https://feb2023.archive.ensembl.org/info/docs/tools/vep/script/index.html), are included in this visualization.

Samples with two or more detected variants are color-coded as "Multi Hit", indicating a complex variant profile.

## Exploring Gene-Level Mutation Patterns

The **Lollipop Plot** is a visualization tool that shows the somatic variants of a cohort on a single gene's canonical protein. With the Lollipop Plot, you can identify mutation hotspots within a specific gene, understand the functional impact of variants in the context of protein domains, compare mutation patterns across different patient cohorts, and explore recurrent mutations in cancer driver genes.

Use the **Go to Gene** field to navigate to a gene of interest, such as *TP53*.

![Lollipop Plot for the TP53 gene](/files/0QdFDBPfNFa9miaPJFMh)

When you hover over a lollipop, you can see details about the amino acid change, such as the HGVS notation and the frequency of that change in the current cohort. The plot also shows the location of each mutation along the protein sequence, with color coding to indicate the consequence type.

{% hint style="info" %}
The Lollipop Plot displays SNV & Indel data from the same genomic region as the [Variants & Events table](#examining-detailed-variant-information). When you change the genomic region in the table, the Lollipop Plot updates to reflect the change and the other way around.
{% endhint %}

### Reading the Lollipop Plot

* Each lollipop on the plot represents amino acid changes at a specific location.
* The horizontal position (X axis) indicates the location of the change, while the height (Y axis) represents the frequency of that change within the current cohort.
* Lollipops are color-coded by consequence based on the canonical transcript.
* If a lollipop represents multiple consequence types, it is coded as "Multi Hit".
* You can identify mutation hotspots for a given gene and see protein changes in [HGVS](https://hgvs-nomenclature.org/stable/) short form notation, such as T322A, and HGVS.p notation, such as p.Thr322Ala.

## Examining Detailed Variant Information

The **Variants & Events** table displays details on the same genomic region as the [Lollipop Plot](#exploring-gene-level-mutation-patterns). You can filter the table to focus on specific variant types, such as SNV & Indels, SV (Structural Variants), CNV, or Fusion.

{% hint style="info" %}
Unlike the Variant Frequency Matrix, the Variants & Events table displays **all structural variants** including those larger than 10 Mbp. Use this table to examine large SVs that may not appear in other visualizations.
{% endhint %}

Information displayed in the Variants & Events table includes:

* Location of variant, with a link to its [locus details](#accessing-external-annotations-and-resources)
* Reference allele of variant
* Alternate allele of variant
* Type of variant, such as SNV, Indel, or Structural Variant
* Variant consequences, with entries color-coded by level of severity
* HGVS cDNA
* HGVS Protein
* COSMIC ID
* RSID, with a link to the [dbSNP](https://www.ncbi.nlm.nih.gov/snp/) entry for the variant

![Variants & Events for the TP53 gene](/files/s99ylpjzQFxOExzlvYYR)

### Exporting Variant Information

You can export the selected variants in the Variants & Events table as a list of variant IDs or a CSV file.

* To copy a comma-separated list of variant IDs to your clipboard, select the set of IDs you want to copy, and click **Copy**.
* To export variants as a CSV file, select the set of IDs you need, and click **Download (.csv file)**.

## Accessing External Annotations and Resources

In the **Variants & Events** > **Location** column, you can click on the specific location to open the locus details.

The locus details show specific SNV & Indel variants as well as up to 200 structural variants overlapping with the specific location. For canonical transcripts, a blue indicator appears next to the transcript ID, identifying the primary transcript annotations.

![Showing locus details for a specific somatic variant](/files/i2NmDug5Txp5x6BnqIQd)

The locus details include enhanced annotations to external resources:

* **Gene-level links** - Direct links to gene information in external databases
* **Variant-level links** - Links to variant-specific annotation resources

These links help you navigate to external annotation resources for further information about genes or variants of interest.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://documentation.dnanexus.com/user/cohort-browser/analyzing-somatic-variants.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.