A license is required to access Spark functionality on the DNAnexus Platform. Contact DNAnexus Sales for more information.
The Spark application is an extension of the current app(let) framework. Currently, app(let)s have a specification for their VM (instance type, OS, packages). This has been extended to allow for an additional optional cluster specification with type=dxspark.
Calling /app(let)-xxx/run for Spark apps creates a Spark cluster (+ master VM).
The master VM (where the app shell code runs) acts as the driver node for Spark.
Code in the master VM leverages the Spark infrastructure.
Job mechanisms (monitoring, termination, etc.) are the same for Spark apps as for any other regular app(let)s on the Platform.
Spark apps use the same platform "dx" communication between the master VM and DNAnexus API servers.
There's a new log collection mechanism to collect logs from all nodes.
You can use the Spark UI to monitor running job using ssh tunneling.
Spark apps can be launched over a distributed Spark cluster.