sqlfile
: [Required] A SQL file which contains an ordered list of SQL queries.substitutions
: A JSON file which contains the variable substitutions.user_config
: User configuration JSON file, in case you want to set or override certain Spark configurations.export
: (boolean) default false
. Will export output files with results for the queries in the sqlfile
export_options
: A JSON file which contains the export configurations.collect_logs
: (boolean) default false
. Collects cluster logs from all nodes.executor_memory
: (string) Amount of memory to use per executor process, in MiB unless otherwise specified. (e.g. 2g, 8g). This is passed as --executor-memory
to Spark submit.executor_cores
: (integer) Number of cores to use per executor process. This is passed as --executor-cores
to Spark submit. driver_memory
: (string) Amount of memory to use for the driver process (e.g. 2g, 8g). This is passed as --driver-memory
to Spark submit.log_level
: (string) default INFO
. Logging level for both driver and executors. [ALL, TRACE, DEBUG, INFO]
output_files
: Output files include report SQL file and query export files.sqlfile
and runs them in sequential order.;
.--
is ignored (comments). Any comment within a command should be inside /*...*/
The following are examples of valid comments:substitutions
.srcdb
in sqlfile within ${...}
will be substituted with sskrdemo1
. For example, select * from ${srcdb}.${patient_table};
. The script adds the set
command before executing any of the SQL commands in sqlfile
. So select * from ${srcdb}.${patient_table};
would translate to:export_options
defines an export configuration.num_files
: default 1. This is specified to define the maximum output files you want to generate. This generally depends on how many executors you are running in the cluster as well as how many partitions of this file exist in the system. Each output file corresponds to a part file in parquet.fileprefix
: The filename prefix for every SQL output file. By default the output files will be prefixed with query_id which is the order in which the queries are listed in sqlfile (starting with 1). For example 1-out.csv
. If we specify prefix, it will generate output files like <prefix>-1-out.csv
.header
: Default is true. If true, it add header to each exported file.<JobId>
-export.tar : Contains all the query results.<JobId>
-outfile.sql : SQL debug file.fileprefix
used. We have one folder for each query, and each folder has a .sql file containing the query executed and a .csv folder containing the result csv..sql
file..sql
file and even use this report file as an input for a subsequent run -- picking up where it left off.