Connect to Thrift

The DNAnexus Thrift server is a service that allows JDBC and ODBC clients to run Spark SQL queries. The Thrift server connects to a high availability Apache Spark cluster integrated with the platform. It leverages the same security, permissions, and sharing features built into DNAnexus.

Connecting to Thrift Server

In order to connect to the Thrift server, we need two pieces of information:

  1. The JDBC url

    jdbc:hive2://query.us-east-1.apollo.dnanexus.com:10000/;ssl=true
  2. Your username

    We support the following format of the username:

    • TOKEN__PROJECTID : TOKEN is DNAnexus user generated token and PROJECTID is a DNAnexus project ID used as the project context (when you create databases). Note the double underscore between the token and the project ID.

Generate token for DNAnexus account

See the Authentication tokens page.

Getting the Project ID

  1. Navigate to https://platform.dnanexus.com and login using your username and password.

  2. Go to Projects -> your project -> Settings -> Project ID and click on Copy to Clipboard.

Project ID

Using Beeline

Beeline is a JDBC client bundled with Apache Spark that can be used to run interactive queries on the command line.

Installing Apache Spark

You can download Apache Spark here.

$ tar -zxvf spark-2.4.0-bin-hadoop2.7.tar

You need to have Java installed in your system PATH, or the JAVA_HOME environment variable pointing to a Java installation.

Running Beeline

The beeline client is located under $SPARK_HOME/bin/.

$ cd spark-2.4.0-bin-hadoop2.7/bin
$ ./beeline

Connect to beeline using the JDBC URL:

beeline> !connect jdbc:hive2://query.us-east-1.apollo.dnanexus.com:10000/;ssl=true
Enter username: <TOKEN__PROJECTID>
Enter password: <empty - press RETURN>

Once successfully connected, you should see the message:

Connected to: Spark SQL (version 2.4.0)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ

You are now connected to the Thrift server using your credentials and will be able to see all databases to which you have access.

0: jdbc:hive2://query.us-east-1.apollo.dnanex> show databases;
+---------------------------------------------------------+--+
| databaseName |
+---------------------------------------------------------+--+
| database_fj7q18009xxzzzx0gjfk6vfz__genomics_180718_01 |
| database_fj8gygj0v10vj50j0gyfqk1x__af_result_180719_01 |
| database_fj96qx00v10vj50j0gyfv00z__af_result2 |
| database_fjf3y28066y5jxj2b0gz4g85__metabric_data |
| database_fjj1jkj0v10p8pvx78vkkpz3__pchr1_test |
| database_fjpz6fj0v10fjy3fjy282ybz__af_result1 |
+---------------------------------------------------------+--+

You can use the unique database name including downcased database ID database_fjf3y28066y5jxj2b0gz4g85__metabric_data or just the proper name metabric_data. If there is more than one database sharing the name metabric_data, then you will need to use the unique name.

0: jdbc:hive2://query.us-east-1.apollo.dnanex> use metabric_data;

Now you can run SQL queries.

0: jdbc:hive2://query.us-east-1.apollo.dnanex> select * from cna limit 10;
+--------------+-----------------+------------+--------+--+
| hugo_symbol | entrez_gene_id | sample_id | value |
+--------------+-----------------+------------+--------+--+
| MIR3675 | NULL | MB-6179 | -1 |
| MIR3675 | NULL | MB-6181 | 0 |
| MIR3675 | NULL | MB-6182 | 0 |
| MIR3675 | NULL | MB-6183 | 0 |
| MIR3675 | NULL | MB-6184 | 0 |
| MIR3675 | NULL | MB-6185 | -1 |
| MIR3675 | NULL | MB-6187 | 0 |
| MIR3675 | NULL | MB-6188 | 0 |
| MIR3675 | NULL | MB-6189 | 0 |
| MIR3675 | NULL | MB-6190 | 0 |
+--------------+-----------------+------------+--------+--+