> For the complete documentation index, see [llms.txt](https://documentation.dnanexus.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://documentation.dnanexus.com/developer/dataset-management/clinical-dataset-merger.md).

# Clinical Dataset Merger

{% hint style="info" %}
An Apollo license is required to use Apollo Datasets on the DNAnexus Platform. Org approval may also be required. [Contact DNAnexus Sales](mailto:sales@dnanexus.com) for more information.
{% endhint %}

The [Clinical Dataset Merger](https://platform.dnanexus.com/app/clinical_dataset_merger) app provides a way to link Entities from one Apollo Dataset to Entities in another Dataset to create a new merged Dataset. The Entities from the source Dataset are linked to the target Dataset based on the inputs provided. The new Dataset can be used on its own, integrated with existing Datasets and downstream tools such as the Cohort Browser, JupyterLab, analysis apps, and custom developer-led initiatives, or both.

To launch the Clinical Dataset Merger app via the CLI, enter the command:

`dx run clinical_dataset_merger`

## Overview

![](/files/aGTjb6rDcO0y5lVAP5UQ)

### Inputs

The Clinical Dataset Merger app requires the following as general input:

* **Source Dataset** - The Dataset that contains the Entities that are added to the Target Dataset.
* **Target Dataset** - The core Dataset that is extended. The main Entity on this Dataset becomes the new main Entity. This Dataset may have none, one, or more Assays.
* **Output Dataset Name** - The name of the new Dataset record to create as output.
* **Linking Information** - Provide linking information when the linkage is not between the Global Key of the source Dataset and the global key of the target Dataset.
  * **Source Entity Name** – The name of the entity in the source dataset to use for joining to the target dataset. If left blank, the main entity of the source dataset is used.
  * **Source Field Name** – The field in the source entity to use for the join. If left blank, the global key of the main entity in the source dataset is used. For one-to-one (1:1) or many-to-one (N:1) relationships, this field must be unique.
  * **Target Entity Name** – The name of the entity in the target dataset to use for joining. If left blank, the main entity of the target dataset is used.
  * **Target Field Name** – The field in the target entity to use for the join. If left blank, the global key of the main entity in the target dataset is used. For one-to-many (1:N) relationships, this field must be unique.
  * **Join Relationship** – Specifies the relationship between the source and target entities. Only one-to-one (1:1), many-to-one (N:1), and one-to-many (1:N) relationships are supported.

See the [Clinical Dataset Merger app documentation](https://platform.dnanexus.com/app/clinical_dataset_merger) on the Platform for further granular configurations.

### Process

1. All validations are performed to ensure that Entity names are unique and the linkage information is valid.
2. A new Dataset is created and, in the Entities, the target Dataset Entities are added along with the Entities of the source Dataset based on the linkage information.
3. Any other components in the target Dataset are added and the new Dataset is returned to the user.

### Outputs

* **Dataset** **record** - Dataset containing the merged Dataset with the target Dataset and the Entities of the source Dataset.

### Best Practices

1. Ensure that the Entity names are unique between the source Dataset and the target Dataset.
2. This app is a great way to split up sensitive and non-sensitive data into two different databases. Ingest each through its own ingestion process and then use this app to create a unified experience for users with permission to both sets of data.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://documentation.dnanexus.com/developer/dataset-management/clinical-dataset-merger.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.