Comment on page
Connect to Thrift
Learn about the DNAnexus Thrift server, a service that allows JDBC and ODBC clients to run Spark SQL queries.
A license is required to access Spark functionality on the DNAnexus Platform. Contact DNAnexus Sales for more information.
The DNAnexus Thrift server connects to a high availability Apache Spark cluster integrated with the platform. It leverages the same security, permissions, and sharing features built into DNAnexus.
In order to connect to the Thrift server, we need the following:
- 1.The JDBC url:AWS US (East): jdbc:hive2://query.us-east-1.apollo.dnanexus.com:10000/;ssl=trueAWS London (General): jdbc:hive2://query.eu-west-2-g.apollo.dnanexus.com:10000/;ssl=trueAWS London (UKB): jdbc:hive2://query.eu-west-2.apollo.dnanexus.com:10000/;ssl=trueAzure US (West): jdbc:hive2://query.westus.apollo.dnanexus.com:10001/;ssl=true;transportMode=http;httpPath=cliserviceAWS Frankfurt (General): jdbc:hive2://query.eu-central-1.apollo.dnanexus.com:10000/;ssl=trueNote: Azure UK South (OFH) region does not support access to Thrift.
- 2.We support the following format of the username:
- TOKEN__PROJECTID : TOKEN is DNAnexus user generated token and PROJECTID is a DNAnexus project ID used as the project context (when you create databases). Note the double underscore between the token and the project ID.
- Additionally, both the Thrift server that the user wants to connect to and the project must be from the same region.
- 1.
- 2.Go to Projects -> your project -> Settings -> Project ID and click on Copy to Clipboard.

Beeline is a JDBC client bundled with Apache Spark that can be used to run interactive queries on the command line.
$ tar -zxvf spark-3.2.3-bin-hadoop3.2.tgz
You need to have Java installed in your system PATH, or the
JAVA_HOME
environment variable pointing to a Java installation.If you already have beeline installed and all of the credentials, you can quickly connect with the following command:
<beeline> -u <thrift path> -n <token>__<project-id>
In the following AWS example, note that some characters are escaped (
;
with \
)$SPARK_HOME/bin/beeline -u jdbc:hive2://query.us-east-1.apollo.dnanexus.com:10000/\;ssl=true -n yourToken__project-xxxx
Note that the command for connecting to Thrift is different for Azure, as seen below:
$SPARK_HOME/bin/beeline -u jdbc:hive2://query.westus.apollo.dnanexus.com:10001/\;ssl=true\;transportMode=http\;httpPath=cliservice -n yourToken__project-xxxx
The beeline client is located under
$SPARK_HOME/bin/
.$ cd spark-3.2.3-bin-hadoop3.2/bin
$ ./beeline
Connect to beeline using the JDBC URL:
beeline> !connect jdbc:hive2://query.us-east-1.apollo.dnanexus.com:10000/;ssl=true
Enter username: <TOKEN__PROJECTID>
Enter password: <empty - press RETURN>
Once successfully connected, you should see the message:
Connected to: Spark SQL (version 3.2.3)
Driver: Hive JDBC (version 2.3.9)
Transaction isolation: TRANSACTION_REPEATABLE_READ
You are now connected to the Thrift server using your credentials and will be able to see all databases to which you have access to within your current region.
0: jdbc:hive2://query.us-east-1.apollo.dnanex> show databases;
+---------------------------------------------------------+--+
| databaseName |
+---------------------------------------------------------+--+
| database_fj7q18009xxzzzx0gjfk6vfz__genomics_180718_01 |
| database_fj8gygj0v10vj50j0gyfqk1x__af_result_180719_01 |
| database_fj96qx00v10vj50j0gyfv00z__af_result2 |
| database_fjf3y28066y5jxj2b0gz4g85__metabric_data |
| database_fjj1jkj0v10p8pvx78vkkpz3__pchr1_test |
| database_fjpz6fj0v10fjy3fjy282ybz__af_result1 |
+---------------------------------------------------------+--+
You can query using the unique database name including the downcased database ID, for example
database_fjf3y28066y5jxj2b0gz4g85__metabric_data.
If the database is within the same username and project you used to connect to the Thrift server, you can query using only the database name, for example metabric_data
. If the database is located outside the project, you will need to use the unique database name.0: jdbc:hive2://query.us-east-1.apollo.dnanex> use metabric_data;
You may also find databases stored in other projects if you specify the project context in the
LIKE
option of SHOW DATABASES
using the format '<project-id>:<database pattern>'
like so:0: jdbc:hive2://query.us-east-1.apollo.dnanex> SHOW DATABASES LIKE 'project-xxx:af*';
+---------------------------------------------------------+--+
| databaseName |
+---------------------------------------------------------+--+
| database_fj8gygj0v10vj50j0gyfqk1x__af_result_180719_01 |
| database_fj96qx00v10vj50j0gyfv00z__af_result2 |
| database_fjpz6fj0v10fjy3fjy282ybz__af_result1 |
+---------------------------------------------------------+--+
Now you can run SQL queries.
0: jdbc:hive2://query.us-east-1.apollo.dnanex> select * from cna limit 10;
+--------------+-----------------+------------+--------+--+
| hugo_symbol | entrez_gene_id | sample_id | value |
+--------------+-----------------+------------+--------+--+
| MIR3675 | NULL | MB-6179 | -1 |
| MIR3675 | NULL | MB-6181 | 0 |
| MIR3675 | NULL | MB-6182 | 0 |
| MIR3675 | NULL | MB-6183 | 0 |
| MIR3675 | NULL | MB-6184 | 0 |
| MIR3675 | NULL | MB-6185 | -1 |
| MIR3675 | NULL | MB-6187 | 0 |
| MIR3675 | NULL | MB-6188 | 0 |
| MIR3675 | NULL | MB-6189 | 0 |
| MIR3675 | NULL | MB-6190 | 0 |
+--------------+-----------------+------------+--------+--+
Last modified 4mo ago