Example Usage of Dataset Extender

Common usage patterns for dataset extender

Adding derived phenotypes to an existing entity

  1. Identify your dataset to extend. If you are using the command line, ensure that you retrieve the record id

  2. To add data to an existing entity, ensure the following conditions are met

    1. The data is related to the entity in a one-to-one relationship

    2. The data has the unique keys for the entity you are extending, preferably in the first column

    3. Your column names do not overlap with any of the column names in the entity you are extending (excluding the column key, those can overlap).

  3. Save the data as a file in your project. It is recommended you save as comma delimited, but tab delimited is also supported with an extra input configuration.

  4. Run the Dataset Extender application with the following inputs

    1. Source Data - This should be set to your data file

    2. Target Dataset - This is the dataset you want to extend

    3. Target Entity Name - Only specify this if you are extending an entity that is not the main entity

    4. Source Data Delimiter - Select "/t" if you are using a tsv. The default is "," comma.

    5. When running through dx-toolkit, you can use a pattern as follows:

      dx run dataset-extender -isource_data=<file path> -itarget_dataset=<record id>

    6. For additional configuration guidance refer to the Dataset Extender page

  5. This process will generate:

    1. A new dataset with the original data plus your new data

    2. A new database if the original database cannot be written to

Adding a new entity related to your dataset

  1. Identify your dataset to extend. If you are using the command line, ensure that you retrieve the record id

  2. To add data to as a new entity, ensure the following conditions are met

    1. The data is related to the entity in a one-to-one or many-to-one relationship

    2. The data has the a column with values that correspond to the keys for the entity you are extending, preferably this is in in the first column

  3. Save the data as a file in your project. It is recommended you save as comma delimited, but tab delimited is also supported with an extra input configuration.

  4. Run the Dataset Extender application with the following inputs

    1. Source Data - This should be set to your data file

    2. Target Dataset - This is the dataset you want to extend

    3. Build New Entity - This needs to be changed to true

    4. New Entity Name - The name of the new entity you are creating. This cannot overlap with any other entity title in the Target Dataset

    5. Target Entity Name - Only specify this if you are extending an entity that is not the main entity

    6. Source Data Delimiter - Select "/t" if you are using a tsv. The default is "," comma.

    7. When running through dx-toolkit, you can use a pattern as follows:

      dx run dataset-extender -isource_data=<file path> -itarget_dataset=<record id> -ibuild_new_entity=true -inew_entity_name=<entity name> -itarget_entity_name=<entity title the data relates to>

    8. For additional configuration guidance refer to the Dataset Extender page

  5. This process will generate:

    1. A new dataset with the original data plus your new data

    2. A new database if the original database cannot be written to