DNAnexus Documentation
APIDownloadsIndex of dx CommandsLegal
  • Overview
  • Getting Started
    • DNAnexus Essentials
    • Key Concepts
      • Projects
      • Organizations
      • Apps and Workflows
    • User Interface Quickstart
    • Command Line Quickstart
    • Developer Quickstart
    • Developer Tutorials
      • Bash
        • Bash Helpers
        • Distributed by Chr (sh)
        • Distributed by Region (sh)
        • SAMtools count
        • TensorBoard Example Web App
        • Git Dependency
        • Mkfifo and dx cat
        • Parallel by Region (sh)
        • Parallel xargs by Chr
        • Precompiled Binary
        • R Shiny Example Web App
      • Python
        • Dash Example Web App
        • Distributed by Region (py)
        • Parallel by Chr (py)
        • Parallel by Region (py)
        • Pysam
      • Web App(let) Tutorials
        • Dash Example Web App
        • TensorBoard Example Web App
      • Concurrent Computing Tutorials
        • Distributed
          • Distributed by Region (sh)
          • Distributed by Chr (sh)
          • Distributed by Region (py)
        • Parallel
          • Parallel by Chr (py)
          • Parallel by Region (py)
          • Parallel by Region (sh)
          • Parallel xargs by Chr
  • User
    • Login and Logout
    • Projects
      • Project Navigation
      • Path Resolution
    • Running Apps and Workflows
      • Running Apps and Applets
      • Running Workflows
      • Running Nextflow Pipelines
      • Running Batch Jobs
      • Monitoring Executions
      • Job Notifications
      • Job Lifecycle
      • Executions and Time Limits
      • Executions and Cost and Spending Limits
      • Smart Reuse (Job Reuse)
      • Apps and Workflows Glossary
      • Tools List
    • Cohort Browser
      • Chart Types
        • Row Chart
        • Histogram
        • Box Plot
        • List View
        • Grouped Box Plot
        • Stacked Row Chart
        • Scatter Plot
        • Kaplan-Meier Survival Curve
      • Locus Details Page
    • Using DXJupyterLab
      • DXJupyterLab Quickstart
      • Running DXJupyterLab
        • FreeSurfer in DXJupyterLab
      • Spark Cluster-Enabled DXJupyterLab
        • Exploring and Querying Datasets
      • Stata in DXJupyterLab
      • Running Older Versions of DXJupyterLab
      • DXJupyterLab Reference
    • Using Spark
      • Apollo Apps
      • Connect to Thrift
      • Example Applications
        • CSV Loader
        • SQL Runner
        • VCF Loader
      • VCF Preprocessing
    • Environment Variables
    • Objects
      • Describing Data Objects
      • Searching Data Objects
      • Visualizing Data
      • Filtering Objects and Jobs
      • Archiving Files
      • Relational Database Clusters
      • Symlinks
      • Uploading and Downloading Files
        • Small File Sets
          • dx upload
          • dx download
        • Batch
          • Upload Agent
          • Download Agent
    • Platform IDs
    • Organization Member Guide
    • Index of dx commands
  • Developer
    • Developing Portable Pipelines
      • dxCompiler
    • Cloud Workstation
    • Apps
      • Introduction to Building Apps
      • App Build Process
      • Advanced Applet Tutorial
      • Bash Apps
      • Python Apps
      • Spark Apps
        • Table Exporter
        • DX Spark Submit Utility
      • HTTPS Apps
        • Isolated Browsing for HTTPS Apps
      • Transitioning from Applets to Apps
      • Third Party and Community Apps
        • Community App Guidelines
        • Third Party App Style Guide
        • Third Party App Publishing Checklist
      • App Metadata
      • App Permissions
      • App Execution Environment
        • Connecting to Jobs
      • Dependency Management
        • Asset Build Process
        • Docker Images
        • Python package installation in Ubuntu 24.04 AEE
      • Job Identity Tokens for Access to Clouds and Third-Party Services
      • Enabling Web Application Users to Log In with DNAnexus Credentials
      • Types of Errors
    • Workflows
      • Importing Workflows
      • Introduction to Building Workflows
      • Building and Running Workflows
      • Workflow Build Process
      • Versioning and Publishing Global Workflows
      • Workflow Metadata
    • Ingesting Data
      • Molecular Expression Assay Loader
        • Common Errors
        • Example Usage
        • Example Input
      • Data Model Loader
        • Data Ingestion Key Steps
        • Ingestion Data Types
        • Data Files Used by the Data Model Loader
        • Troubleshooting
      • Dataset Extender
        • Using Dataset Extender
    • Dataset Management
      • Rebase Cohorts and Dashboards
      • Assay Dataset Merger
      • Clinical Dataset Merger
    • Apollo Datasets
      • Dataset Versions
      • Cohorts
    • Creating Custom Viewers
    • Client Libraries
      • Support for Python 3
    • Walkthroughs
      • Creating a Mixed Phenotypic Assay Dataset
      • Guide for Ingesting a Simple Four Table Dataset
    • DNAnexus API
      • Entity IDs
      • Protocols
      • Authentication
      • Regions
      • Nonces
      • Users
      • Organizations
      • OIDC Clients
      • Data Containers
        • Folders and Deletion
        • Cloning
        • Project API Methods
        • Project Permissions and Sharing
      • Data Object Lifecycle
        • Types
        • Object Details
        • Visibility
      • Data Object Metadata
        • Name
        • Properties
        • Tags
      • Data Object Classes
        • Records
        • Files
        • Databases
        • Drives
        • DBClusters
      • Running Analyses
        • I/O and Run Specifications
        • Instance Types
        • Job Input and Output
        • Applets and Entry Points
        • Apps
        • Workflows and Analyses
        • Global Workflows
        • Containers for Execution
      • Search
      • System Methods
      • Directory of API Methods
      • DNAnexus Service Limits
  • Administrator
    • Billing
    • Org Management
    • Single Sign-On
    • Audit Trail
    • Integrating with External Services
    • Portal Setup
    • GxP
      • Controlled Tool Access (allowed executables)
  • Science Corner
    • Scientific Guides
      • Somatic Small Variant and CNV Discovery Workflow Walkthrough
      • SAIGE GWAS Walkthrough
      • LocusZoom DNAnexus App
      • Human Reference Genomes
    • Using Hail to Analyze Genomic Data
    • Open-Source Tools by DNAnexus Scientists
    • Using IGV Locally with DNAnexus
  • Downloads
  • FAQs
    • EOL Documentation
      • Python 3 Support and Python 2 End of Life (EOL)
    • Automating Analysis Workflow
    • Backups of Customer Data
    • Developing Apps and Applets
    • Importing Data
    • Platform Uptime
    • Legal and Compliance
    • Sharing and Collaboration
    • Product Version Numbering
  • Release Notes
  • Technical Support
  • Legal
Powered by GitBook

Copyright 2025 DNAnexus

On this page
  • Searching Objects with Glob Patterns
  • Searching Objects in Your Current Folder
  • Searching Across Objects in the Current Project
  • Escaping Special Characters
  • Searching Objects with Other Criteria
  • Searching Objects Created Within a Certain Period of Time
  • Searching Objects by Their Metadata
  • Searching Objects in Another Project
  • Searching Objects Across Projects with VIEW and Above Permissions
  • Scoping Within Projects

Was this helpful?

Export as PDF
  1. User
  2. Objects

Searching Data Objects

Last updated 2 years ago

Was this helpful?

You can use the command to list the objects in your current project. You can learn which project and folder you are currently in by using the command . Using glob patterns, you can broaden your search for objects by specifying filenames with wildcard characters such as * and ?. An asterisk (*) is used to represent zero or more characters in a string, and a question mark (?) represents exactly one character.

Searching Objects with Glob Patterns

Searching Objects in Your Current Folder

By listing objects in your current directory with the wildcard characters * and ?, you can search for objects with a filename using a glob pattern. Here we take folder "C. Elegans - Ce10/" in the public project (platform login required to access this link) and walk through these examples:

Printing the Current Working Directory

$ dx select "Reference Genome Files"
$ dx cd "C. Elegans - Ce10/"
$ dx pwd # Print current working directory
Reference Genome Files:/C. Elegans - Ce10

Listing Folders and/or Objects in a Folder

$ dx ls
ce10.bt2-index.tar.gz
ce10.bwa-index.tar.gz
ce10.cw2-index.tar.gz
ce10.fasta.fai
ce10.fasta.gz
ce10.hisat2-index.tar.gz
ce10.star-index.tar.gz
ce10.tmap-index.tar.gz

Listing Objects Named Using a Pattern

$ dx ls ‘*.fa*’ # List objects with filenames of the pattern "*.fa*"
ce10.fasta.fai
ce10.fasta.gz
$ dx ls ce10.???-index.tar.gz # List objects with filenames of the pattern "ce10.???-index.tar.gz"
ce10.cw2-index.tar.gz
ce10.bt2-index.tar.gz
ce10.bwa-index.tar.gz

Searching Across Objects in the Current Project

$ dx find data --name "*.fa*.gz"
closed  2014-10-09 09:50:51 776.72 MB /M. musculus - mm10/mm10.fasta.gz (file-BQbYQPj0Z05ZzPpb1xf000Xy)
closed  2014-10-09 09:50:30 767.47 MB /M. musculus - mm9/mm9.fasta.gz (file-BQbYK6801fFJ9Fj30kf003PB)
closed  2014-10-09 09:49:27 49.04 MB /D. melanogaster - Dm3/dm3.fasta.gz (file-BQbYVf80yf3J9Fj30kf00PPk)
closed  2014-10-09 09:48:55 29.21 MB /C. Elegans - Ce10/ce10.fasta.gz (file-BQbY9Bj015pB7JJVX0vQ7vj5)
closed  2014-10-08 13:52:26 818.96 MB /H. Sapiens - GRCh37 - hs37d5 (1000 Genomes Phase II)/hs37d5.fa.gz (file-B6ZY7VG2J35Vfvpkj8y0KZ01)
closed  2014-10-08 13:51:31 876.79 MB /H. Sapiens - hg19 (UCSC)/ucsc_hg19.fa.gz (file-B6qq93v2J35fB53gZ5G0007K)
closed  2014-10-08 13:50:53 827.95 MB /H. Sapiens - hg19 (Ion Torrent)/ion_hg19.fa.gz (file-B6ZYPQv2J35xX095VZyQBq2j)
closed  2014-10-08 13:50:17 818.88 MB /H. Sapiens - GRCh38/GRCh38.no_alt_analysis_set.fa.gz (file-BFBv6J80634gkvZ6z100VGpp)
closed  2014-10-08 13:49:53 810.45 MB /H. Sapiens - GRCh37 - b37 (1000 Genomes Phase I)/human_g1k_v37.fa.gz (file-B6ZXxfG2J35Vfvpkj8y0KXF5)

Escaping Special Characters

As described above, if your file contains special characters in their filename, the special characters should be escaped when searching. Additionally, as a colon (:) is used to denote project names and a slash (/) is used to separate folder names on the platform, they are also special characters, so we will also need to escape these two characters when they appear in a data object's name. To escape any special characters, you will use a preceding backslash \.

Please note that while dx-toolkit itself requires a single \ to escape a colon or a slash, the syntax conventions in some shells may require you to escape the \ character itself by an extra backslash or by enclosing the argument in single quotes.

Searching Objects with Other Criteria

dx find data also allows you to search data using metadata fields, such as when the data was created, the data's tags, or the project the data exists in.

Searching Objects Created Within a Certain Period of Time

You can utilize the flags --created-after and --created-before to search for data objects created within a period of time.

$ dx find data --created-after 2017-02-22 --created-before 2017-02-25
closed  2017-02-27 19:14:51 3.90 GB  /H. Sapiens - hg19 (UCSC)/ucsc_hg19.hisat2-index.tar.gz (file-F2pJvF80Vzx54f69K4J8K5xy)
closed  2017-02-27 19:14:21 3.55 GB  /M. musculus - mm10/mm10.hisat2-index.tar.gz (file-F2pJqk00Vq161bzq44Vjvpf5)
closed  2017-02-27 19:13:57 3.51 GB  /M. musculus - mm9/mm9.hisat2-index.tar.gz (file-F2pJpKj0G0JxZxBZ4KJq0Q6B)
closed  2017-02-27 19:13:41 3.85 GB  /H. Sapiens - hg19 (Ion Torrent)/ion_hg19.hisat2-index.tar.gz (file-F2pJkp00BjBk99xz4Jk74V0y)
closed  2017-02-27 19:13:28 3.85 GB  /H. Sapiens - GRCh37 - b37 (1000 Genomes Phase I)/human_g1k_v37.hisat2-index.tar.gz (file-F2pJpy007bGBzj7X446PzxJJ)
closed  2017-02-27 19:13:02 3.90 GB  /H. Sapiens - GRCh37 - hs37d5 (1000 Genomes Phase II)/hs37d5.hisat2-index.tar.gz (file-F2pJpb000vFpzj7X446PzxF0)
closed  2017-02-27 19:12:31 3.91 GB  /H. Sapiens - GRCh38/GRCh38.no_alt_analysis_set.hisat2-index.tar.gz (file-F2pK5y00F8Bp9BYk4KX7Qb4P)
closed  2017-02-27 19:12:18 224.54 MB /D. melanogaster - Dm3/dm3.hisat2-index.tar.gz (file-F2pJP7j0QkbQ3ZqG269589pj)
closed  2017-02-27 19:11:56 139.76 MB /C. Elegans - Ce10/ce10.hisat2-index.tar.gz (file-F2pJK300KKz8bx1126Ky5b3P)

Searching Objects by Their Metadata

$ dx find data --tag sampleABC --tag batch123
closed  2017-01-01 09:00:00 6.08 GB  /Input/SRR504516_1.fastq.gz (file-xxxx)
closed  2017-01-01 09:00:00 5.82 GB  /Input/SRR504516_2.fastq.gz (file-wwww)

To search by object properties, use the option --property. This option can be repeated if the search requires multiple properties.

$ dx find data --property sequencing_providor=CRO_XYZ
closed  2017-01-01 09:00:00 8.06 GB  /Input/SRR504555_1.fastq.gz (file-qqqq)
closed  2017-01-01 09:00:00 8.52 GB  /Input/SRR504555_2.fastq.gz (file-rrrr)

Searching Objects in Another Project

$ dx find data --name "*.fastq.gz"
 --path project-BQfgzV80bZ46kf6pBGy00J38:/Input
  closed  2014-10-03 12:04:16 6.08 GB  /Input/SRR504516_1.fastq.gz (file-B40jg7v8KfPy38kjz1vQ001y)
  closed  2014-10-03 12:04:16 5.82 GB  /Input/SRR504516_2.fastq.gz (file-B40jgYG8KfPy38kjz1vQ0020)

Searching Objects Across Projects with VIEW and Above Permissions

If you would like to search for data objects live in all projects in which you have VIEW and above permissions, you can use the --all-projects flag. Public projects are not shown in this search.

$ dx find data --name "SRR*_1.fastq.gz" --all-projects
closed  2017-01-01 09:00:00 6.08 GB  /Exome Analysis Demo/Input/SRR504516_1.fastq.gz (project-xxxx:file-xxxx)
closed  2017-07-01 10:00:00 343.58 MB /input/SRR064287_1.fastq.gz (project-yyyy:file-yyyy)
closed  2017-01-01 09:00:00 6.08 GB  /data/exome_analysis_demo/SRR504516_1.fastq.gz (project-zzzz:file-xxxx)

Scoping Within Projects

To describe data for small amounts of files (typically below 100), scope findDataObjects to only a project level.

The below is an example of code used to scope a project:

dx api system findDataObjects '{"scope": {"project": "project-xxxx"}, "describe":{"fields":{"state":true}}}'

If you wish to search the entire project with a filename pattern, you can utilize the command dx find data --name with the wildcard characters. Unless --path or --all-projects is specified, dx find data searches data under the current project. Below, we use the command dx find data in the public project (platform login required to access this link) using the --name option to specify the filename of objects that we're searching for.

You can search for objects based on their metadata. An object's metadata can be set by performing the command or to respectively tag or setup key-value pairs to describe your data object. You can also set metadata while uploading data to the platform. To search by object tags, use the option --tag. This option can be repeated if the search requires multiple tags.

You can search for an object living in a different project than your current working project by specifying a project and folder path with the flag --path. Below, we specify the project ID (project-BQfgzV80bZ46kf6pBGy00J38) of the public project (platform login required to access this link) as an example.

See the for more information about usage.

"Reference Genome Files"
"Exome Analysis Demo"
API method system/findDataObjects
"Reference Genome Files"
dx ls
dx pwd
dx tag
dx set_properties