Connect to Thrift
Learn about the DNAnexus Thrift server, a service that allows JDBC and ODBC clients to run Spark SQL queries.
The DNAnexus Thrift server connects to a high availability Apache Spark cluster integrated with the platform. It leverages the same security, permissions, and sharing features built into DNAnexus.
A license is required to use Spark on the DNAnexus Platform. Contact DNAnexus Sales for more information.
In order to connect to the Thrift server, we need the following:
- 1.The JDBC url:AWS US (East): jdbc:hive2://query.us-east-1.apollo.dnanexus.com:10000/;ssl=trueAWS London (General): jdbc:hive2://query.eu-west-2-g.apollo.dnanexus.com:10000/;ssl=trueAWS London (UKB): jdbc:hive2://query.eu-west-2.apollo.dnanexus.com:10000/;ssl=trueAzure US (West): jdbc:hive2://query.westus.apollo.dnanexus.com:10001/;ssl=true;transportMode=http;httpPath=cliserviceAWS Frankfurt (General): jdbc:hive2://query.eu-central-1.apollo.dnanexus.com:10000/;ssl=trueNote: Azure UK South (OFH) region does not support access to Thrift.
- 2.We support the following format of the username:
- TOKEN__PROJECTID : TOKEN is DNAnexus user generated token and PROJECTID is a DNAnexus project ID used as the project context (when you create databases). Note the double underscore between the token and the project ID.
- Additionally, both the Thrift server that the user wants to connect to and the project must be from the same region.
- 1.
- 2.Go to Projects -> your project -> Settings -> Project ID and click on Copy to Clipboard.

Beeline is a JDBC client bundled with Apache Spark that can be used to run interactive queries on the command line.
$ tar -zxvf spark-3.2.3-bin-hadoop3.2.tgz
You need to have Java installed in your system PATH, or the
JAVA_HOME
environment variable pointing to a Java installation.If you already have beeline installed and all of the credentials, you can quickly connect with the following command:
<beeline> -u <thrift path> -n <token>__<project-id>
In the following AWS example, note that some characters are escaped (
;
with \
)$SPARK_HOME/bin/beeline -u jdbc:hive2://query.us-east-1.apollo.dnanexus.com:10000/\;ssl=true -n yourToken__project-xxxx
Note that the command for connecting to Thrift is different for Azure, as seen below:
$SPARK_HOME/bin/beeline -u jdbc:hive2://query.westus.apollo.dnanexus.com:10001/\;ssl=true\;transportMode=http\;httpPath=cliservice -n yourToken__project-xxxx
The beeline client is located under
$SPARK_HOME/bin/
.$ cd spark-3.2.3-bin-hadoop3.2/bin
$ ./beeline
Connect to beeline using the JDBC URL:
beeline> !connect jdbc:hive2://query.us-east-1.apollo.dnanexus.com:10000/;ssl=true
Enter username: <TOKEN__PROJECTID>
Enter password: <empty - press RETURN>
Once successfully connected, you should see the message:
Connected to: Spark SQL (version 3.2.3)
Driver: Hive JDBC (version 2.3.9)
Transaction isolation: TRANSACTION_REPEATABLE_READ
You are now connected to the Thrift server using your credentials and will be able to see all databases to which you have access to within your current region.
0: jdbc:hive2://query.us-east-1.apollo.dnanex> show databases;
+---------------------------------------------------------+--+
| databaseName |
+---------------------------------------------------------+--+
| database_fj7q18009xxzzzx0gjfk6vfz__genomics_180718_01 |
| database_fj8gygj0v10vj50j0gyfqk1x__af_result_180719_01 |
| database_fj96qx00v10vj50j0gyfv00z__af_result2 |
| database_fjf3y28066y5jxj2b0gz4g85__metabric_data |
| database_fjj1jkj0v10p8pvx78vkkpz3__pchr1_test |
| database_fjpz6fj0v10fjy3fjy282ybz__af_result1 |
+---------------------------------------------------------+--+
You can query using the unique database name including the downcased database ID, for example
database_fjf3y28066y5jxj2b0gz4g85__metabric_data.
If the database is within the same username and project you used to connect to the Thrift server, you can query using only the database name, for example metabric_data
. If the database is located outside the project, you will need to use the unique database name.0: jdbc:hive2://query.us-east-1.apollo.dnanex> use metabric_data;
You may also find databases stored in other projects if you specify the project context in the
LIKE
option of SHOW DATABASES
using the format '<project-id>:<database pattern>'
like so:0: jdbc:hive2://query.us-east-1.apollo.dnanex> SHOW DATABASES LIKE 'project-xxx:af*';
+---------------------------------------------------------+--+
| databaseName |
+---------------------------------------------------------+--+
| database_fj8gygj0v10vj50j0gyfqk1x__af_result_180719_01 |
| database_fj96qx00v10vj50j0gyfv00z__af_result2 |
| database_fjpz6fj0v10fjy3fjy282ybz__af_result1 |
+---------------------------------------------------------+--+
Now you can run SQL queries.
0: jdbc:hive2://query.us-east-1.apollo.dnanex> select * from cna limit 10;
+--------------+-----------------+------------+--------+--+
| hugo_symbol | entrez_gene_id | sample_id | value |
+--------------+-----------------+------------+--------+--+
| MIR3675 | NULL | MB-6179 | -1 |
| MIR3675 | NULL | MB-6181 | 0 |
| MIR3675 | NULL | MB-6182 | 0 |
| MIR3675 | NULL | MB-6183 | 0 |
| MIR3675 | NULL | MB-6184 | 0 |
| MIR3675 | NULL | MB-6185 | -1 |
| MIR3675 | NULL | MB-6187 | 0 |
| MIR3675 | NULL | MB-6188 | 0 |
| MIR3675 | NULL | MB-6189 | 0 |
| MIR3675 | NULL | MB-6190 | 0 |
+--------------+-----------------+------------+--------+--+
Last modified 1mo ago