Omics Data Assistant
Explore and analyze datasets using natural language queries with Omics Data Assistant, a GenAI-powered interface integrated into Cohort Browser.
A license is required to use Omics Data Assistant on the DNAnexus Platform. Contact DNAnexus Sales for more information.
Omics Data Assistant (the assistant) is a GenAI-powered conversational interface that helps you explore and analyze complex biomedical and clinical datasets using natural language. The assistant is integrated directly into Cohort Browser. This means you can combine conversational queries with powerful visualization tools for comprehensive data analysis.
Whether you're new to a dataset or an experienced bioinformatician, the assistant saves you time by understanding your questions in plain English. New users can quickly discover what data their datasets contain without browsing through fields and schemas. Experienced users can define cohorts in seconds by describing criteria in a few sentences, eliminating the need to manually configure multiple filters.
Omics Data Assistant uses generative AI to accelerate your analysis. While powerful, AI models can occasionally produce inaccurate or incomplete results. Always verify generated cohorts and insights against your underlying data. The assistant alone should not be used for clinical diagnosis or treatment decisions.
How It Works
Omics Data Assistant uses the latest Anthropic Claude model for natural language understanding and response generation. This model supports large context windows, allowing the assistant to handle complex queries and maintain context throughout conversations.
The assistant accesses only your Apollo dataset and does not connect to the internet or external data sources. Omics Data Assistant is deployed regionally to meet data residency requirements, keeping your data in your region throughout all operations. Conversations are stored securely and remain private to you.
Getting Started
Prerequisites
Your organization has active Omics Data Assistant and Cohort Browser licenses.
You have access to an Apollo dataset and its associated databases in a project.
Opening Omics Data Assistant
Omics Data Assistant works with datasets in Cohort Browser.
In the DNAnexus Platform, open a dataset in Cohort Browser.
Click ✨ Omics Data Assistant in the lower right corner.

By default, the assistant opens in a panel on the right side. You can enlarge the assistant panel by clicking Enter Full Screen.
In the assistant's input field, you can explore the data by asking questions.
Below the input, you can use two controls to get started quickly:
Dataset Overview: Opens an AI-generated overview of the opened dataset. Use this to learn what the dataset contains without writing a prompt. You can still ask follow-up questions for specific details.
Help: Opens a guide about Omics Data Assistant, its capabilities, and example prompts to try.

First-Time Dataset Indexing
The first time Omics Data Assistant is used with a dataset, the dataset must be indexed. This one-time process enables natural language queries by creating vector representations of the dataset structure. Only one person needs to start this indexing process. Subsequent users can query the dataset immediately once indexing is complete.
After indexing starts, it runs in the background. Most datasets complete indexing within 15 minutes. Large datasets like UK Biobank may take over an hour. During indexing, you cannot ask questions until the process completes. You can monitor indexing progress through the assistant interface.
Index data is stored securely in the same AWS region as your data to maintain data residency requirements.
Using Omics Data Assistant
Asking Questions
Omics Data Assistant responds to your questions in plain English and translates them into structured database queries.
Example prompts:
"Find all patients diagnosed with IBD within 6 months of a diabetes diagnosis"
"Get patients with lower hemoglobin values than the laboratory's recommended value"
"Create cohort of all patients with exon loss variants in KIAA1109"

Understanding Responses
Each response includes the assistant's thinking process, which shows how it interpreted your question and the SQL queries it generated. You can verify the assistant understood your question correctly, review the SQL queries for accuracy, and learn how natural language translates to database queries.
If your question is unclear, the assistant asks clarifying questions to ensure accurate results.
For each response, you can:
Copy responses: Click Copy at the end of any response to copy the markdown-formatted text to your clipboard for use in documents or notes.
Provide feedback: Click the thumbs up or thumbs down buttons on responses to help improve the assistant's accuracy. When giving negative feedback, you can describe the problem in your own words. Your feedback helps the DNAnexus team enhance the assistant for all users.
Creating Cohorts
The assistant excels at creating cohorts and generating demographic summaries. For complex visualizations, create your cohorts through the assistant, then use Cohort Browser's native visualization tools for detailed analysis.
To create a cohort directly from Omics Data Assistant, phrase your question to specify patient criteria. For example, "Create a cohort of patients with amplifications in HER2".
When the assistant returns the cohort results, click + Add to Dashboard to filter the dataset by the new cohort in Cohort Browser. You can add multiple cohorts from the assistant to your dashboard and switch between cohorts.

From Cohort Browser, you can save cohorts to your project as CohortBrowser records.
Managing Conversations
Your conversation history is stored separately for each dataset. Your conversations remain private to you. Other users cannot access your conversation history.
To manage your conversations, click the three dots next to a conversation name to either rename or delete the conversation.
To view and search through your past conversations, click See All at the bottom of the panel.

Last updated
Was this helpful?