2.6. Applets, Apps, Workflows, and Jobs

An executable (applet or app) defines application logic that is to be run in the DNAnexus Platform’s Execution Environment. In order to facilitate parallel processing, an executable may define multiple functions (or entry points) that can invoke each other; a running job can use new_dxjob() to invoke any function defined in the same executable, creating a new job that runs that function on a different machine (possibly even launching multiple such jobs in parallel).

To create an executable from scratch, we encourage you to use the command-line tools dx-app-wizard and dx build rather than using the API or bindings directly. The following handlers for applets, apps, and jobs are most useful for running preexisting executables and monitoring their resulting jobs.

Workflows created from the website UI can also be run using the DXWorkflow workflow handler.

2.6.1. DXApplet Handler

Applets are data objects that store application logic, including specifications for executing it, and (optionally) input and output signatures. They can be run by calling the DXApplet.run() method.

class dxpy.bindings.dxapplet.DXExecutable(*args, **kwargs)[source]

Methods in DXExecutable are used by DXApp, DXApplet, and DXWorkflow

run(executable_input, project=None, folder=None, name=None, tags=None, properties=None, details=None, instance_type=None, stage_instance_types=None, stage_folders=None, rerun_stages=None, depends_on=None, allow_ssh=None, debug=None, delay_workspace_destruction=None, priority=None, extra_args=None, **kwargs)[source]
Parameters:
  • executable_input (dict) – Hash of the executable’s input arguments
  • project (string) – Project ID of the project context
  • folder (string) – Folder in which executable’s outputs will be placed in project
  • name (string) – Name for the new job (default is “<name of the executable>”)
  • tags (list of strings) – Tags to associate with the job
  • properties (dict with string values) – Properties to associate with the job
  • details (dict or list) – Details to set for the job
  • instance_type (string or dict) – Instance type on which the jobs will be run, or a dict mapping function names to instance type requests
  • depends_on (list) – List of data objects or jobs to wait that need to enter the “closed” or “done” states, respectively, before the new job will be run; each element in the list can either be a dxpy handler or a string ID
  • allow_ssh (list) – List of hostname or IP masks to allow SSH connections from
  • debug (dict) – Configuration options for job debugging
  • delay_workspace_destruction (boolean) – Whether to keep the job’s temporary workspace around for debugging purposes for 3 days after it succeeds or fails
  • priority (string) – Priority level to request for all jobs created in the execution tree, either “normal” or “high”
  • extra_args (dict) – If provided, a hash of options that will be merged into the underlying JSON given for the API call
Returns:

Object handler of the newly created job

Return type:

DXJob

Creates a new job that executes the function “main” of this executable with the given input executable_input.

class dxpy.bindings.dxapplet.DXApplet(dxid=None, project=None)[source]

Bases: dxpy.bindings.DXDataObject, dxpy.bindings.dxapplet.DXExecutable

Remote applet object handler.

runSpec

The applet’s run specification (a dict indicating, among other things, how the code of the applet is to be interpreted). See the API docs for Run Specification for more information.

dxapi

String containing the version of the DNAnexus API that the applet should run against.

access

The applet’s access requirements hash (a dict indicating any nonstandard permissions, such as requiring access to the internet, that are needed by the applet). See the API docs for Access Requirements for more information.

title

String containing the (human-readable) title of the app

summary

String containing a short, one-line summary of the applet’s purpose

description

String of free-form text (Markdown syntax is supported) containing a description of the applet. The description is presented to users to help them understand the purpose of the app and how to invoke it.

developerNotes

String of free-form text (Markdown syntax is supported) containing information about the internals or implementation details of the applet, suitable for developers or advanced users.

_new(dx_hash, **kwargs)[source]
Parameters:
  • dx_hash (dict) – Standard hash populated in dxpy.bindings.DXDataObject.new() containing attributes common to all data object classes.
  • runSpec (dict) – Run specification
  • dxapi (string) – API version string
  • inputSpec (dict) – Input specification (optional)
  • outputSpec (dict) – Output specification (optional)
  • access (dict) – Access specification (optional)
  • title (string) – Title string (optional)
  • summary (string) – Summary string (optional)
  • description (string) – Description string (optional)

Note

It is highly recommended that the higher-level module dxpy.app_builder or (preferably) its frontend dx build be used instead for applet creation.

Creates an applet with the given parameters. See the API documentation for the /applet/new method for more info. The applet is not run until run() is called.

get(**kwargs)[source]
Returns:Full specification of the remote applet object
Return type:dict

Returns the contents of the applet. The result includes the key-value pairs as specified in the API documentation for the /applet-xxxx/get method.

run(applet_input, *args, **kwargs)[source]

Creates a new job that executes the function “main” of this applet with the given input applet_input.

See dxpy.bindings.dxapplet.DXExecutable.run() for the available args.

2.6.2. DXApp Handler

Apps allow for application logic to be distributed to users in the system, and they allow for analyses to be run in a reproducible and composable way.

Apps extend the functionality of applets to require input/output specifications as well as to allow for versioning, collaborative development, and policies for billing and data access. Similarly to applets, apps can be run by calling their run() method.

Unlike applets, apps are not data objects and do not live in projects. Instead, they share a single global namespace. An app may have multiple different versions (e.g. “1.0.0”, “1.0.1”, etc.) associated with a single name (which is of the form “app-APPNAME”). A particular version of an app may be identified in two ways, either by specifying a combination of its name and a version (or a tag), or by specifying its unique identifier.

Each app has a list of developers, which are the users that are authorized to publish new versions of an app; perform administrative tasks, such as assigning categories, and attaching new tags to versions of the app; and add or remove other developers. When the first version of an app with a given name is created, the creating user initially becomes the sole developer of the app.

class dxpy.bindings.dxapp.DXApp(dxid=None, name=None, alias=None)[source]

Bases: dxpy.bindings.DXObject, dxpy.bindings.dxapplet.DXExecutable

Remote app object handler.

set_id(dxid=None, name=None, alias=None)[source]
Parameters:
  • dxid (string) – App ID
  • name (string) – App name
  • alias (string) – App version or tag
Raises:

DXError if dxid and some other input are both given or if neither dxid nor name are given

Discards the currently stored ID and associates the handler with the requested parameters. Note that if dxid is given, the other fields should not be given, and if name is given, alias has default value “default”.

get_id()[source]
Returns:Object ID of associated app
Return type:string

Returns the object ID of the app that the handler is currently associated with.

new(**kwargs)[source]
Parameters:
  • initializeFrom (string) – ID of an existing app object from which to initialize the app
  • applet (string) – ID of the applet that the app will be created from
  • name (string) – Name of the app (inherits from initializeFrom if possible)
  • title (string) – Title or brand name of the app (optional)
  • summary (string) – A short description of the app (optional)
  • description (string) – An extended description of the app (optional)
  • details (dict or list) – Arbitrary JSON to be associated with the app (optional)
  • version (string) – Version number
  • bill_to (string) – ID of the user or organization who will own the app and be billed for its space usage (optional if an app with this name already exists)
  • access (dict) – Access specification (optional)
  • resources (string or list) – Specifies what is to be put into the app’s resources container. Must be a string containing a project ID, or a list containing object IDs. (optional)

Note

It is highly recommended that the higher-level module dxpy.app_builder or (preferably) its frontend dx build –create-app be used instead for app creation.

Creates an app with the given parameters by using the specified applet or app as a base and overriding its attributes. See the API documentation for the /app/new method for more info.

Exactly one of initializeFrom and applet must be provided.

The app is only available to its developers until publish() is called, and is not run until run() is called.

describe(fields=None, **kwargs)[source]
Parameters:fields (dict) – Hash where the keys are field names that should be returned, and values should be set to True (default is that all fields are returned)
Returns:Description of the remote app object
Return type:dict

Returns a dict with a description of the app. The result includes the key-value pairs as specified in the API documentation for the /app-xxxx/describe method.

update(**kwargs)[source]
Parameters:
  • applet (string) – ID of the applet to replace the app’s contents with
  • details (dict or list) – Metadata to store with the app (optional)
  • access (dict) – Access specification (optional)
  • resources (string or list) – Specifies what is to be put into the app’s resources container. Must be a string containing a project ID, or a list containing object IDs. (optional)

Updates the parameters of an existing app. See the API documentation for the /app/update method for more info.

The current user must be a developer of the app.

add_tags(tags, **kwargs)[source]
Parameters:tags (array) – Tags to add to the app

Adds the specified application name tags (aliases) to this app.

The current user must be a developer of the app.

addTags(tags, **kwargs)[source]

Deprecated since version 0.72.0: Use add_tags() instead.

remove_tags(tags, **kwargs)[source]
Parameters:tags (array) – Tags to remove from the app

Removes the specified application name tags (aliases) from this app, so that it is no longer addressable by those aliases.

The current user must be a developer of the app.

removeTags(tags, **kwargs)[source]

Deprecated since version 0.72.0: Use remove_tags() instead.

install(**kwargs)[source]

Installs the app in the current user’s account.

uninstall(**kwargs)[source]

Uninstalls the app from the current user’s account.

get(**kwargs)[source]
Returns:Full specification of the remote app object
Return type:dict

Returns the contents of the app. The result includes the key-value pairs as specified in the API documentation for the /app-xxxx/get method.

publish(**kwargs)[source]

Publishes the app, so all users can find it on the platform.

The current user must be a developer of the app.

delete(**kwargs)[source]

Removes this app object from the platform.

The current user must be a developer of the app.

run(app_input, *args, **kwargs)[source]

Creates a new job that executes the function “main” of this app with the given input app_input.

See dxpy.bindings.dxapplet.DXExecutable.run() for the available args.

For DXApp.run(), see run().

2.6.3. DXWorkflow Handler

Workflows are data objects which contain metadata for a set of jobs to be run together. They can be run by calling the DXWorkflow.run() method.

dxpy.bindings.dxworkflow.new_dxworkflow(title=None, summary=None, description=None, output_folder=None, init_from=None, **kwargs)[source]
Parameters:
  • title (string) – Workflow title (optional)
  • summary (string) – Workflow summary (optional)
  • description (string) – Workflow description (optional)
  • output_folder (string) – Default output folder of the workflow (optional)
  • init_from (DXWorkflow, DXAnalysis, or string (for analysis IDs only)) – Another analysis workflow object handler or and analysis (string or handler) from which to initialize the metadata (optional)
Return type:

DXWorkflow

Additional optional parameters not listed: all those under dxpy.bindings.DXDataObject.new(), except details.

Creates a new remote workflow object with project set to project and returns the appropriate handler.

Example:

r = dxpy.new_dxworkflow(title=”My Workflow”, description=”This workflow contains...”)

Note that this function is shorthand for:

dxworkflow = DXWorkflow()
dxworkflow.new(**kwargs)
class dxpy.bindings.dxworkflow.DXWorkflow(dxid=None, project=None)[source]

Bases: dxpy.bindings.DXDataObject, dxpy.bindings.dxapplet.DXExecutable

Remote workflow object handler. This class is used for the workflow class data objects which produce an analysis when run.

add_stage(executable, stage_id=None, name=None, folder=None, stage_input=None, instance_type=None, edit_version=None, **kwargs)[source]
Parameters:
  • executable (string, DXApplet, or DXApp) – string or a handler for an app or applet
  • stage_id (string) – id for the stage (optional)
  • name (string) – name for the stage (optional)
  • folder (string) – default output folder for the stage; either a relative or absolute path (optional)
  • stage_input (dict) – input fields to bind as default inputs for the executable (optional)
  • instance_type (string or dict) – Default instance type on which all jobs will be run for this stage, or a dict mapping function names to instance type requests
  • edit_version (int) – if provided, the edit version of the workflow that should be modified; if not provided, the current edit version will be used (optional)
Returns:

ID of the added stage

Return type:

string

Raises:

DXError if executable is not an expected type DXAPIError for errors thrown from the API call

Adds the specified executable as a new stage in the workflow.

get_stage(stage, **kwargs)[source]
Parameters:stage (int or string) – A number for the stage index (for the nth stage, starting from 0), or a string of the stage index, name, or ID
Returns:Hash of stage descriptor in workflow
remove_stage(stage, edit_version=None, **kwargs)[source]
Parameters:
  • stage (int or string) – A number for the stage index (for the nth stage, starting from 0), or a string of the stage index, name, or ID
  • edit_version (int) – if provided, the edit version of the workflow that should be modified; if not provided, the current edit version will be used (optional)
Returns:

Stage ID that was removed

Return type:

string

Removes the specified stage from the workflow

move_stage(stage, new_index, edit_version=None, **kwargs)[source]
Parameters:
  • stage (int or string) – A number for the stage index (for the nth stage, starting from 0), or a string of the stage index, name, or ID
  • new_index (int) – The new position in the order of stages that the specified stage should have (where 0 indicates the first stage)
  • edit_version (int) – if provided, the edit version of the workflow that should be modified; if not provided, the current edit version will be used (optional)

Removes the specified stage from the workflow

update(title=None, unset_title=False, summary=None, description=None, output_folder=None, unset_output_folder=False, stages=None, edit_version=None, **kwargs)[source]
Parameters:
  • title (string) – workflow title to set; cannot be provided with unset_title set to True
  • unset_title (boolean) – whether to unset the title; cannot be provided with string value for title
  • summary (string) – workflow summary to set
  • description (string) – workflow description to set
  • output_folder (string) – new default output folder for the workflow
  • unset_folder (boolean) – whether to unset the default output folder; cannot be True with string value for output_folder
  • stages (dict) – updates to the stages to make; see API documentation for /workflow-xxxx/update for syntax of this field; use update_stage() to update a single stage
  • edit_version (int) – if provided, the edit version of the workflow that should be modified; if not provided, the current edit version will be used (optional)

Make general metadata updates to the workflow

update_stage(stage, executable=None, force=False, name=None, unset_name=False, folder=None, unset_folder=False, stage_input=None, instance_type=None, edit_version=None, **kwargs)[source]
Parameters:
  • stage (int or string) – A number for the stage index (for the nth stage, starting from 0), or a string stage index, name, or ID
  • executable (string, DXApplet, or DXApp) – string or a handler for an app or applet
  • force (boolean) – whether to use executable even if it is incompatible with the previous executable’s spec
  • name (string) – new name for the stage; cannot be provided with unset_name set to True
  • unset_name (boolean) – whether to unset the stage name; cannot be True with string value for name
  • folder (string) – new default output folder for the stage; either a relative or absolute path (optional)
  • unset_folder (boolean) – whether to unset the stage folder; cannot be True with string value for folder
  • stage_input (dict) – input fields to bind as default inputs for the executable (optional)
  • instance_type (string or dict) – Default instance type on which all jobs will be run for this stage, or a dict mapping function names to instance type requests
  • edit_version (int) – if provided, the edit version of the workflow that should be modified; if not provided, the current edit version will be used (optional)

Removes the specified stage from the workflow

run(workflow_input, *args, **kwargs)[source]
Parameters:
  • workflow_input (dict) – Dictionary of the workflow’s input arguments; see below for more details
  • instance_type (string or dict) – Instance type on which all stages’ jobs will be run, or a dict mapping function names to instance types. These may be overridden on a per-stage basis if stage_instance_types is specified.
  • stage_instance_types (dict) – A dict mapping stage IDs, names, or indices to either a string (representing an instance type to be used for all functions in that stage), or a dict mapping function names to instance types.
  • stage_folders (dict) – A dict mapping stage IDs, names, indices, and/or the string “*” to folder values to be used for the stages’ output folders (use “*” as the default for all unnamed stages)
  • rerun_stages (list of strings) – A list of stage IDs, names, indices, and/or the string “*” to indicate which stages should be run even if there are cached executions available
Returns:

Object handler of the newly created analysis

Return type:

DXAnalysis

Run the associated workflow. See dxpy.bindings.dxapplet.DXExecutable.run() for additional args.

When providing input for the workflow, keys should be of one of the following forms:

  • “N.name” where N is the stage number, and name is the name of the input, e.g. “0.reads” if the first stage takes in an input called “reads”
  • “stagename.name” where stagename is the stage name, and name is the name of the input within the stage
  • “stageID.name” where stageID is the stage ID, and name is the name of the input within the stage
  • “name” where name is the name of an input that has been exported for the workflow (this name will appear as a key in the “inputSpec” of this workflow’s description if it has been exported for this purpose)

2.6.4. DXJob Handler

Jobs are DNAnexus entities that capture an instantiation of a running app or applet. They can be created from either dxpy.bindings.dxapplet.DXApplet.run() or dxpy.bindings.dxapp.DXApp.run() if running an applet or app, or via new_dxjob() or DXJob.new() in the case of an existing job creating a subjob.

dxpy.bindings.dxjob.new_dxjob(fn_input, fn_name, name=None, tags=None, properties=None, details=None, instance_type=None, depends_on=None, **kwargs)[source]
Parameters:
  • fn_input (dict) – Function input
  • fn_name (string) – Name of the function to be called
  • name (string) – Name for the new job (default is “<parent job name>:<fn_name>”)
  • tags (list of strings) – Tags to associate with the job
  • properties (dict with string values) – Properties to associate with the job
  • details (dict or list) – Details to set for the job
  • instance_type (string or dict) – Instance type on which the job will be run, or a dict mapping function names to instance type requests
  • depends_on (list) – List of data objects or jobs to wait that need to enter the “closed” or “done” states, respectively, before the new job will be run; each element in the list can either be a dxpy handler or a string ID
Return type:

DXJob

Creates and enqueues a new job that will execute a particular function (from the same app or applet as the one the current job is running). Returns the DXJob handle for the job.

Note that this function is shorthand for:

dxjob = DXJob()
dxjob.new(fn_input, fn_name, **kwargs)

Note

This method is intended for calls made from within already-executing jobs or apps. If it is called from outside of an Execution Environment, an exception will be thrown. To create new jobs from outside the Execution Environment, use dxpy.bindings.dxapplet.DXApplet.run() or dxpy.bindings.dxapp.DXApp.run().

Note

If the environment variable DX_JOB_ID is not set, this method assmes that it is running within the debug harness, executes the job in place, and provides a debug job handler object that does not have a corresponding remote API job object.

class dxpy.bindings.dxjob.DXJob(dxid=None)[source]

Bases: dxpy.bindings.DXObject

Remote job object handler.

new(fn_input, fn_name, name=None, tags=None, properties=None, details=None, instance_type=None, depends_on=None, **kwargs)[source]
Parameters:
  • fn_input (dict) – Function input
  • fn_name (string) – Name of the function to be called
  • name (string) – Name for the new job (default is “<parent job name>:<fn_name>”)
  • tags (list of strings) – Tags to associate with the job
  • properties (dict with string values) – Properties to associate with the job
  • details (dict or list) – Details to set for the job
  • instance_type (string or dict) – Instance type on which the job will be run, or a dict mapping function names to instance type requests
  • depends_on (list) – List of data objects or jobs to wait that need to enter the “closed” or “done” states, respectively, before the new job will be run; each element in the list can either be a dxpy handler or a string ID

Creates and enqueues a new job that will execute a particular function (from the same app or applet as the one the current job is running).

Note

This method is intended for calls made from within already-executing jobs or apps. If it is called from outside of an Execution Environment, an exception will be thrown. To create new jobs from outside the Execution Environment, use dxpy.bindings.dxapplet.DXApplet.run() or dxpy.bindings.dxapp.DXApp.run().

set_id(dxid)[source]
Parameters:dxid (string) – New job ID to be associated with the handler (localjob IDs also accepted for local runs)

Discards the currently stored ID and associates the handler with dxid

describe(fields=None, io=None, **kwargs)[source]
Parameters:
  • fields (dict) – dict where the keys are field names that should be returned, and values should be set to True (by default, all fields are returned)
  • io (bool) – Include input and output fields in description; cannot be provided with fields; default is True if fields is not provided (deprecated)
Returns:

Description of the job

Return type:

dict

Returns a hash with key-value pairs containing information about the job, including its state and (optionally) its inputs and outputs, as described in the API documentation for the /job-xxxx/describe method.

add_tags(tags, **kwargs)[source]
Parameters:tags (list of strings) – Tags to add to the job

Adds each of the specified tags to the job. Takes no action for tags that are already listed for the job.

remove_tags(tags, **kwargs)[source]
Parameters:tags (list of strings) – Tags to remove from the job

Removes each of the specified tags from the job. Takes no action for tags that the job does not currently have.

set_properties(properties, **kwargs)[source]
Parameters:properties (dict) – Property names and values given as key-value pairs of strings

Given key-value pairs in properties for property names and values, the properties are set on the job for the given property names. Any property with a value of None indicates the property will be deleted.

Note

Any existing properties not mentioned in properties are not modified by this method.

wait_on_done(interval=2, timeout=604800, **kwargs)[source]
Parameters:
  • interval (integer) – Number of seconds between queries to the job’s state
  • timeout (integer) – Maximum amount of time to wait, in seconds, until the job is done running
Raises:

DXError if the timeout is reached before the job has finished running, or dxpy.exceptions.DXJobFailureError if the job fails

Waits until the job has finished running.

terminate(**kwargs)[source]

Terminates the associated job.

get_output_ref(field, index=None, metadata=None)[source]
Parameters:
  • field (string) – Output field name of this job
  • index (int) – If the referenced field is an array, optionally specify an index (starting from 0) to indicate a particular member of the array
  • metadata (string) – If the referenced field is of a data object class, a string indicating the metadata that should be read, e.g. “name”, “properties.propkey”, “details.refgenome”

Returns a dict containing a valid job-based object reference to refer to an output of this job. This can be used directly in place of a DNAnexus link when used as a job output value. For example, after creating a subjob, the following app snippet uses a reference to the new job’s output as part of its own output:

mysubjob = dxpy.new_dxjob({}, "my_function")
return { "myfileoutput": mysubjob.get_output_ref("output_field_name"),
         "myotherfileoutput": mysubjob.get_output_ref("output_array",
                                                      index=1),
         "filename": mysubjob.get_output_ref("output_field_name",
                                             metadata="name") }

2.6.5. DXAnalysis Handler

Analyses are DNAnexus entities that capture an instantiation of a running workflow. They can be created from dxpy.bindings.dxworkflow.DXWorkflow.run() or from an existing analysis ID.

class dxpy.bindings.dxanalysis.DXAnalysis(dxid=None)[source]

Bases: dxpy.bindings.DXObject

Remote analysis object handler.

describe(fields=None, **kwargs)[source]
Parameters:fields (dict) – dict where the keys are field names that should be returned, and values should be set to True (by default, all fields are returned)
Returns:Description of the analysis
Return type:dict

Returns a hash with key-value pairs containing information about the analysis

add_tags(tags, **kwargs)[source]
Parameters:tags (list of strings) – Tags to add to the analysis

Adds each of the specified tags to the analysis. Takes no action for tags that are already listed for the analysis.

remove_tags(tags, **kwargs)[source]
Parameters:tags (list of strings) – Tags to remove from the analysis

Removes each of the specified tags from the analysis. Takes no action for tags that the analysis does not currently have.

set_properties(properties, **kwargs)[source]
Parameters:properties (dict) – Property names and values given as key-value pairs of strings

Given key-value pairs in properties for property names and values, the properties are set on the analysis for the given property names. Any property with a value of None indicates the property will be deleted.

Note

Any existing properties not mentioned in properties are not modified by this method.

wait_on_done(interval=2, timeout=604800, **kwargs)[source]
Parameters:
  • interval (integer) – Number of seconds between queries to the analysis’s state
  • timeout (integer) – Maximum amount of time to wait, in seconds, until the analysis is done (or at least partially failed)
Raises:

DXError if the timeout is reached before the analysis has finished running, or DXJobFailureError if some job in the analysis has failed

Waits until the analysis has finished running.

terminate(**kwargs)[source]

Terminates the associated analysis.

get_output_ref(field, index=None, metadata=None)[source]
Parameters:
  • field (string) – Output field name of this analysis
  • index (int) – If the referenced field is an array, optionally specify an index (starting from 0) to indicate a particular member of the array
  • metadata (string) – If the referenced field is of a data object class, a string indicating the metadata that should be read, e.g. “name”, “properties.propkey”, “details.refgenome”

Returns a dict containing a valid reference to an output of this analysis.

Table Of Contents

Previous topic

2.5. Files

Next topic

2.7. Search

This Page