Searching Data Objects
You can use the dx ls
command to list the objects in your current project. You can learn which project and folder you are currently in by using the command dx pwd
. Using glob patterns, you can broaden your search for objects by specifying filenames with wildcard characters such as *
and ?
. An asterisk (*
) is used to represent zero or more characters in a string, and a question mark (?
) represents exactly one character.
Searching Objects with Glob Patterns
Searching Objects in Your Current Folder
By listing objects in your current directory with the wildcard characters *
and ?
, you can search for objects with a filename using a glob pattern. Here we take folder "C. Elegans - Ce10/" in the public project "Reference Genome Files" (platform login required to access this link) and walk through these examples:
Printing the Current Working Directory
$ dx select "Reference Genome Files"
$ dx cd "C. Elegans - Ce10/"
$ dx pwd # Print current working directory
Reference Genome Files:/C. Elegans - Ce10
Listing Folders and/or Objects in a Folder
$ dx ls
ce10.bt2-index.tar.gz
ce10.bwa-index.tar.gz
ce10.cw2-index.tar.gz
ce10.fasta.fai
ce10.fasta.gz
ce10.hisat2-index.tar.gz
ce10.star-index.tar.gz
ce10.tmap-index.tar.gz
Listing Objects Named Using a Pattern
$ dx ls '*.fa*' # List objects with filenames of the pattern "*.fa*"
ce10.fasta.fai
ce10.fasta.gz
$ dx ls ce10.???-index.tar.gz # List objects with filenames of the pattern "ce10.???-index.tar.gz"
ce10.cw2-index.tar.gz
ce10.bt2-index.tar.gz
ce10.bwa-index.tar.gz
Searching Across Objects in the Current Project
If you wish to search the entire project with a filename pattern, you can use the command dx find data --name
with the wildcard characters. Unless --path
or --all-projects
is specified, dx find data
searches data under the current project. Below, we use the command dx find data
in the public project "Reference Genome Files" (platform login required to access this link) using the --name
option to specify the filename of objects that we're searching for.
$ dx find data --name "*.fa*.gz"
closed 2014-10-09 09:50:51 776.72 MB /M. musculus - mm10/mm10.fasta.gz (file-BQbYQPj0Z05ZzPpb1xf000Xy)
closed 2014-10-09 09:50:30 767.47 MB /M. musculus - mm9/mm9.fasta.gz (file-BQbYK6801fFJ9Fj30kf003PB)
closed 2014-10-09 09:49:27 49.04 MB /D. melanogaster - Dm3/dm3.fasta.gz (file-BQbYVf80yf3J9Fj30kf00PPk)
closed 2014-10-09 09:48:55 29.21 MB /C. Elegans - Ce10/ce10.fasta.gz (file-BQbY9Bj015pB7JJVX0vQ7vj5)
closed 2014-10-08 13:52:26 818.96 MB /H. Sapiens - GRCh37 - hs37d5 (1000 Genomes Phase II)/hs37d5.fa.gz (file-B6ZY7VG2J35Vfvpkj8y0KZ01)
closed 2014-10-08 13:51:31 876.79 MB /H. Sapiens - hg19 (UCSC)/ucsc_hg19.fa.gz (file-B6qq93v2J35fB53gZ5G0007K)
closed 2014-10-08 13:50:53 827.95 MB /H. Sapiens - hg19 (Ion Torrent)/ion_hg19.fa.gz (file-B6ZYPQv2J35xX095VZyQBq2j)
closed 2014-10-08 13:50:17 818.88 MB /H. Sapiens - GRCh38/GRCh38.no_alt_analysis_set.fa.gz (file-BFBv6J80634gkvZ6z100VGpp)
closed 2014-10-08 13:49:53 810.45 MB /H. Sapiens - GRCh37 - b37 (1000 Genomes Phase I)/human_g1k_v37.fa.gz (file-B6ZXxfG2J35Vfvpkj8y0KXF5)
Escaping Special Characters
When filenames contain special characters, escape these characters with a backslash (\
) during searches. Characters requiring escaping include wildcards (*
and ?
) as well as colons (:
) and slashes (/
), which have special meaning in DNAnexus paths.
Shell behavior affects escaping rules. In many shells, you'll need to either double-escape (\\
) or use single quotes to prevent the shell from interpreting the backslash.
The following examples show proper escaping techniques:
# Searching for a file with colons in the name
dx find data --name "sample\:123.txt"
# Or alternatively with single quotes
dx find data --name 'sample\:123.txt'
# Searching for a file with a literal asterisk
dx find data --name "experiment\*.fastq"
Searching Objects with Other Criteria
dx find data
also allows you to search data using metadata fields, such as when the data was created, the data tags, or the project the data exists in.
Searching Objects Created Within a Certain Period of Time
You can use the flags --created-after
and --created-before
to search for data objects created within a specific time period.
$ dx find data --created-after 2017-02-22 --created-before 2017-02-25
closed 2017-02-27 19:14:51 3.90 GB /H. Sapiens - hg19 (UCSC)/ucsc_hg19.hisat2-index.tar.gz (file-F2pJvF80Vzx54f69K4J8K5xy)
closed 2017-02-27 19:14:21 3.55 GB /M. musculus - mm10/mm10.hisat2-index.tar.gz (file-F2pJqk00Vq161bzq44Vjvpf5)
closed 2017-02-27 19:13:57 3.51 GB /M. musculus - mm9/mm9.hisat2-index.tar.gz (file-F2pJpKj0G0JxZxBZ4KJq0Q6B)
closed 2017-02-27 19:13:41 3.85 GB /H. Sapiens - hg19 (Ion Torrent)/ion_hg19.hisat2-index.tar.gz (file-F2pJkp00BjBk99xz4Jk74V0y)
closed 2017-02-27 19:13:28 3.85 GB /H. Sapiens - GRCh37 - b37 (1000 Genomes Phase I)/human_g1k_v37.hisat2-index.tar.gz (file-F2pJpy007bGBzj7X446PzxJJ)
closed 2017-02-27 19:13:02 3.90 GB /H. Sapiens - GRCh37 - hs37d5 (1000 Genomes Phase II)/hs37d5.hisat2-index.tar.gz (file-F2pJpb000vFpzj7X446PzxF0)
closed 2017-02-27 19:12:31 3.91 GB /H. Sapiens - GRCh38/GRCh38.no_alt_analysis_set.hisat2-index.tar.gz (file-F2pK5y00F8Bp9BYk4KX7Qb4P)
closed 2017-02-27 19:12:18 224.54 MB /D. melanogaster - Dm3/dm3.hisat2-index.tar.gz (file-F2pJP7j0QkbQ3ZqG269589pj)
closed 2017-02-27 19:11:56 139.76 MB /C. Elegans - Ce10/ce10.hisat2-index.tar.gz (file-F2pJK300KKz8bx1126Ky5b3P)
Searching Objects by Their Metadata
You can search for objects based on their metadata. An object's metadata can be set by performing the command dx tag
or dx set_properties
to respectively tag or setup key-value pairs to describe your data object. You can also set metadata while uploading data to the platform. To search by object tags, use the option --tag
. This option can be repeated if the search requires multiple tags.
$ dx find data --tag sampleABC --tag batch123
closed 2017-01-01 09:00:00 6.08 GB /Input/SRR504516_1.fastq.gz (file-xxxx)
closed 2017-01-01 09:00:00 5.82 GB /Input/SRR504516_2.fastq.gz (file-wwww)
To search by object properties, use the option --property
. This option can be repeated if the search requires multiple properties.
$ dx find data --property sequencing_providor=CRO_XYZ
closed 2017-01-01 09:00:00 8.06 GB /Input/SRR504555_1.fastq.gz (file-qqqq)
closed 2017-01-01 09:00:00 8.52 GB /Input/SRR504555_2.fastq.gz (file-rrrr)
Searching Objects in Another Project
You can search for an object living in a different project than your current working project by specifying a project and folder path with the flag --path
. Below, we specify the project ID (project-BQfgzV80bZ46kf6pBGy00J38) of the public project "Exome Analysis Demo" (platform login required to access this link) as an example.
$ dx find data --name "*.fastq.gz"
--path project-BQfgzV80bZ46kf6pBGy00J38:/Input
closed 2014-10-03 12:04:16 6.08 GB /Input/SRR504516_1.fastq.gz (file-B40jg7v8KfPy38kjz1vQ001y)
closed 2014-10-03 12:04:16 5.82 GB /Input/SRR504516_2.fastq.gz (file-B40jgYG8KfPy38kjz1vQ0020)
Searching Objects Across Projects with VIEW and Above Permissions
If you would like to search for data objects live in all projects in which you have VIEW and above permissions, you can use the --all-projects
flag. Public projects are not shown in this search.
$ dx find data --name "SRR*_1.fastq.gz" --all-projects
closed 2017-01-01 09:00:00 6.08 GB /Exome Analysis Demo/Input/SRR504516_1.fastq.gz (project-xxxx:file-xxxx)
closed 2017-07-01 10:00:00 343.58 MB /input/SRR064287_1.fastq.gz (project-yyyy:file-yyyy)
closed 2017-01-01 09:00:00 6.08 GB /data/exome_analysis_demo/SRR504516_1.fastq.gz (project-zzzz:file-xxxx)
Scoping Within Projects
To describe data for small amounts of files (typically below 100), scope findDataObjects
to only a project level.
The below is an example of code used to scope a project:
dx api system findDataObjects '{"scope": {"project": "project-xxxx"}, "describe":{"fields":{"state":true}}}'
See the API method system/findDataObjects
for more information about usage.
Last updated
Was this helpful?