2.7. Search

This module provides methods for finding existing objects and entities in the DNAnexus Platform. The find_data_objects() function provides search functionality over all data objects in the system. The find_jobs() function can be used to find jobs (whether they are running, failed, or done).

dxpy.bindings.search.resolve_data_objects(objects, project=None, folder=None, batchsize=1000)[source]
Parameters:
  • objects (list of dictionaries) – Data object specifications, each with fields “name” (required), “folder”, and “project”

  • project (string) – ID of project context; a data object’s project defaults to this if not specified for that object

  • folder (string) – Folder path within the project; a data object’s folder path defaults to this if not specified for that object

  • batchsize (int) – Number of objects to resolve in each batch call to system_resolve_data_objects; defaults to 1000 and is only used for testing (must be a positive integer not exceeding 1000)

Returns:

List of results parallel to input objects, where each entry is a list containing 0 or more dicts, each corresponding to a resolved object

Return type:

List of lists of dictionaries

Each returned element is a list of dictionaries with keys “project” and “id”. The number of dictionaries for each element may be 0, 1, or more.

dxpy.bindings.search.find_data_objects(classname=None, state=None, visibility=None, name=None, name_mode='exact', properties=None, typename=None, tag=None, tags=None, link=None, project=None, folder=None, recurse=None, modified_after=None, modified_before=None, created_after=None, created_before=None, describe=False, limit=None, level=None, region=None, archival_state=None, return_handler=False, first_page_size=100, **kwargs)[source]
Parameters:
  • classname (string) – Class with which to restrict the search, i.e. one of “record”, “file”, “applet”, “workflow”, “database”

  • state (string) – State of the object (“open”, “closing”, “closed”, “any”)

  • visibility (string) – Visibility of the object (“hidden”, “visible”, “either”)

  • name (string) – Name of the object (also see name_mode)

  • name_mode (string) – Method by which to interpret the name field (“exact”: exact match, “glob”: use “*” and “?” as wildcards, “regexp”: interpret as a regular expression)

  • properties (dict) – Properties (key-value pairs) that each result must have (use value True to require the property key and allow any value)

  • typename (string or dict) – Type constraint that each result must conform to

  • tag (string) – Tag that each result must be tagged with (deprecated in favor of tags)

  • tags (list of strings) – List of tags that each result must have ALL of

  • link (string) – ID of an object that each result must link to

  • project (string) – ID of a project in which each result must appear

  • folder (string) – If project is given, full path to a folder in which each result must belong (default is the root folder)

  • recurse (boolean) – If project is given, whether to look in subfolders of folder as well (default is True)

  • modified_after (int or string) – Timestamp after which each result was last modified (see note below for interpretation)

  • modified_before (int or string) – Timestamp before which each result was last modified (see note below for interpretation)

  • created_after (int or string) – Timestamp after which each result was last created (see note below for interpretation)

  • created_before (int or string) – Timestamp before which each result was last created (see note below for interpretation)

  • describe (bool or dict) – Controls whether to also return the output of calling describe() on each object. Supply False to omit describe output, True to obtain the default describe output, or a dict to be supplied as the describe call input (which may, among other things, be used to customize the set of fields that is returned)

  • level (string) – The minimum permissions level for which results should be returned (one of “VIEW”, “UPLOAD”, “CONTRIBUTE”, or “ADMINISTER”)

  • region (string or list of strings) – Filter on result set by the given region(s).

  • archival_state (string) – Filter by the given archival state (one of “archived”, “live”, “archival”, “unarchiving”, or “any”). Requires classname=”file”, project, and folder arguments to be provided.

  • limit (int) – The maximum number of results to be returned (if not specified, the number of results is unlimited)

  • first_page_size (int) – The number of results that the initial API call will return. Subsequent calls will raise this by multiplying by 2 up to a maximum of 1000.

  • return_handler (boolean) – If True, yields results as dxpy object handlers (otherwise, yields each result as a dict with keys “id” and “project”)

Return type:

generator

Returns a generator that yields all data objects matching the query, up to limit objects. It transparently handles paging through the result set if necessary. For all parameters that are omitted, the search is not restricted by the corresponding field.

Note

All timestamps must be supplied as one of the following:

  • A nonnegative integer, interpreted as milliseconds since the Epoch

  • A negative integer, interpreted as an offset in milliseconds relative to the current time

  • A string containing a negative integer with one of the suffixes “s”, “m”, “d”, “w”, or “y” (for seconds, minutes, days, weeks, or years), interpreted as an offset from the current time.

The following examples both find all items that were created more than 1 week ago:

items1 = list(find_data_objects(created_before="-1w"))
items2 = list(find_data_objects(created_before=-7*24*60*60*1000))

This example iterates through all files with property “project” set to “cancer project” and prints their object IDs:

for result in find_data_objects(classname=”file”, properties={“project”: “cancer project”}):

print “Found a file with object id “ + result[“id”]

dxpy.bindings.search.find_executions(classname=None, launched_by=None, executable=None, project=None, state=None, origin_job=None, parent_job=None, no_parent_job=False, parent_analysis=None, no_parent_analysis=False, root_execution=None, created_after=None, created_before=None, describe=False, name=None, name_mode='exact', tags=None, properties=None, limit=None, first_page_size=100, return_handler=False, include_subjobs=True, include_restarted=None, **kwargs)[source]
Parameters:
  • classname (string) – Class with which to restrict the search, i.e. one of “job”, “analysis”

  • launched_by (string) – User ID of the user who launched the execution’s origin execution

  • executable (string or a DXApp/DXApplet/DXWorkflow instance) – ID of the applet or app that spawned this execution, or a corresponding remote object handler

  • project (string) – ID of the project context for the execution

  • state (string) – State of the execution (e.g. “failed”, “done”)

  • origin_job (string) – ID of the original job that eventually spawned this execution (possibly by way of other executions)

  • parent_job (string) – ID of the parent job (deprecated: use the string ‘none’ to indicate it should have no parent job; use no_parent_job instead)

  • no_parent_job (boolean) – Indicate results should have no parent job; cannot be set to True with parent_job set to a string

  • parent_analysis (string) – ID of the parent analysis (deprecated: use the string ‘none’ to indicate it should have no parent analysis; use no_parent_analysis instead)

  • no_parent_analysis – Indicate results should have no parent analysis; cannot be set to True with parent_analysis set to a string

  • root_execution (string) – ID of the top-level (user-initiated) execution (job or analysis) that eventually spawned this execution (possibly by way of other executions)

  • created_after (int or string) – Timestamp after which each result was last created (see note accompanying find_data_objects() for interpretation)

  • created_before (int or string) – Timestamp before which each result was last created (see note accompanying find_data_objects() for interpretation)

  • describe (bool or dict) – Controls whether to also return the output of calling describe() on each execution. Supply False to omit describe output, True to obtain the default describe output, or a dict to be supplied as the describe call input (which may be used to customize the set of fields that is to be returned; for example, you can supply {“io”: False} to suppress detailed information about the execution’s inputs and outputs)

  • name (string) – Name of the job or analysis to search by (also see name_mode)

  • name_mode (string) – Method by which to interpret the name field (“exact”: exact match, “glob”: use “*” and “?” as wildcards, “regexp”: interpret as a regular expression)

  • tags (list of strings) – List of tags that each result must have ALL of

  • properties (dict) – Properties (key-value pairs) that each result must have (use value True to require the property key and allow any value)

  • limit (int) – The maximum number of results to be returned (if not specified, the number of results is unlimited)

  • first_page_size (int) – The number of results that the initial API call will return. Subsequent calls will raise this by multiplying by 2 up to a maximum of 1000.

  • return_handler (boolean) – If True, yields results as dxpy object handlers (otherwise, yields each result as a dict with keys “id” and “project”)

  • include_subjobs (boolean) – If False, no subjobs will be returned by the API

  • include_restarted (boolean) – If True, API response will include restarted jobs and job trees rooted in restarted jobs

Return type:

generator

Returns a generator that yields all executions (jobs or analyses) that match the query. It transparently handles paging through the result set if necessary. For all parameters that are omitted, the search is not restricted by the corresponding field.

The following example iterates through all finished jobs and analyses in a particular project that were launched in the last two days:

for result in find_executions(state="done", project=proj_id, created_after="-2d"):
    print "Found job or analysis with object id " + result["id"]
dxpy.bindings.search.find_jobs(*args, **kwargs)[source]

This method is identical to find_executions() with the class constraint set to “job”.

dxpy.bindings.search.find_analyses(*args, **kwargs)[source]

This method is identical to find_executions() with the class constraint set to “analysis”.

dxpy.bindings.search.find_projects(name=None, name_mode='exact', properties=None, tags=None, level=None, describe=False, explicit_perms=None, region=None, public=None, created_after=None, created_before=None, billed_to=None, limit=None, return_handler=False, first_page_size=100, containsPHI=None, externalUploadRestricted=None, **kwargs)[source]
Parameters:
  • name (string) – Name of the project (also see name_mode)

  • name_mode (string) – Method by which to interpret the name field (“exact”: exact match, “glob”: use “*” and “?” as wildcards, “regexp”: interpret as a regular expression)

  • properties (dict) – Properties (key-value pairs) that each result must have (use value True to require the property key and allow any value)

  • tags (list of strings) – Tags that each result must have

  • level (string) – One of “VIEW”, “UPLOAD”, “CONTRIBUTE”, or “ADMINSTER”. If specified, only returns projects where the current user has at least the specified permission level. If not specified the default value is “CONTRIBUTE” for the API method /system/findProjects

  • describe (bool or dict) – Controls whether to also return the output of calling describe() on each project. Supply False to omit describe output, True to obtain the default describe output, or a dict to be supplied as the describe call input (which may be used to customize the set of fields that is returned)

  • explicit_perms (boolean or None) – Filter on presence of an explicit permision. If True, matching projects must have an explicit permission (any permission granted directly to the user or an organization to which the user belongs). If False, matching projects must not have any explicit permissions for the user. (default is None, for no filter)

  • region (string) – If specified, only returns projects where the project is in the given region.

  • public (boolean or None) – Filter on the project being public. If True, matching projects must be public. If False, matching projects must not be public. (default is None, for no filter)

  • created_after (int or string) – Timestamp after which each result was created (see note accompanying find_data_objects() for interpretation)

  • created_before (int or string) – Timestamp before which each result was created (see note accompanying find_data_objects() for interpretation)

  • billed_to (string) – Entity ID (user or organization) that pays for the project’s storage costs

  • limit (int) – The maximum number of results to be returned (if not specified, the number of results is unlimited)

  • first_page_size (int) – The number of results that the initial API call will return. Subsequent calls will raise this by multiplying by 2 up to a maximum of 1000.

  • return_handler (boolean) – If True, yields results as dxpy object handlers (otherwise, yields each result as a dict with keys “id” and “project”)

  • containsPHI (boolean) – If set to true, only returns projects that contain PHI. If set to false, only returns projects that do not contain PHI.

  • externalUploadRestricted (boolean) – If set to true, only returns projects with externalUploadRestricted enabled. If set to false, only returns projects that do not have externalUploadRestricted enabled.

Return type:

generator

Returns a generator that yields all projects that match the query. It transparently handles paging through the result set if necessary. For all parameters that are omitted, the search is not restricted by the corresponding field.

You can use the level parameter to find projects that the user has at least a specific level of access to (e.g. “CONTRIBUTE”).

dxpy.bindings.search.find_global_executables(method, name=None, name_mode='exact', category=None, all_versions=None, published=None, billed_to=None, created_by=None, developer=None, created_after=None, created_before=None, modified_after=None, modified_before=None, describe=False, limit=None, return_handler=False, first_page_size=100, **kwargs)[source]
Parameters:
  • method – Name of the API method used to find the global executable (app or a global workflow).

  • name (string) – Name of the app or a global workflow (also see name_mode)

  • name_mode (string) – Method by which to interpret the name field (“exact”: exact match, “glob”: use “*” and “?” as wildcards, “regexp”: interpret as a regular expression)

  • category (string) – If specified, only returns executables that are in the specified category

  • all_versions (boolean) – Whether to return all versions of each app/global workflow or just the version tagged “default”

  • published (boolean) – If specified, only returns results that have the specified publish status (True for published apps/global workflows, False for unpublished ones)

  • billed_to (string) – Entity ID (user or organization) that pays for the storage costs of the app/global workflow

  • created_by (string) – If specified, only returns versions that were created by the specified user (of the form “user-USERNAME”)

  • developer (string) – If specified, only returns apps or global workflows for which the specified user (of the form “user-USERNAME”) is a developer

  • created_after (int or string) – Timestamp after which each result was last created (see note accompanying find_data_objects() for interpretation)

  • created_before (int or string) – Timestamp before which each result was last created (see note accompanying find_data_objects() for interpretation)

  • modified_after (int or string) – Timestamp after which each result was last modified (see note accompanying find_data_objects() for interpretation)

  • modified_before (int or string) – Timestamp before which each result was last modified (see note accompanying find_data_objects() for interpretation)

  • describe (bool or dict) – Controls whether to also return the output of calling describe() on each executable. Supply False to omit describe output, True to obtain the default describe output, or a dict to be supplied as the describe call input (which may be used to customize the set of fields that is returned)

  • limit (int) – The maximum number of results to be returned (if not specified, the number of results is unlimited)

  • first_page_size (int) – The number of results that the initial API call will return. Subsequent calls will raise this by multiplying by 2 up to a maximum of 1000.

  • return_handler (boolean) – If True, yields results as dxpy object handlers (otherwise, yields each result as a dict with keys “id” and “project”)

Return type:

generator

Returns a generator that yields all global executables (either apps or global workflows) that match the query. It transparently handles paging through the result set if necessary. For all parameters that are omitted, the search is not restricted by the corresponding field.

dxpy.bindings.search.find_apps(name=None, name_mode='exact', category=None, all_versions=None, published=None, billed_to=None, created_by=None, developer=None, created_after=None, created_before=None, modified_after=None, modified_before=None, describe=False, limit=None, return_handler=False, first_page_size=100, **kwargs)[source]

This method is identical to find_global_executables() with the API method used: system_find_apps().

dxpy.bindings.search.find_global_workflows(name=None, name_mode='exact', category=None, all_versions=None, published=None, billed_to=None, created_by=None, developer=None, created_after=None, created_before=None, modified_after=None, modified_before=None, describe=False, limit=None, return_handler=False, first_page_size=100, **kwargs)[source]

This method is identical to find_global_executables() with the API method used: system_find_global_workflows().

dxpy.bindings.search.find_one_data_object(zero_ok=False, more_ok=True, **kwargs)[source]
Parameters:
  • zero_ok (bool) – If False (default), DXSearchError is raised if the search has 0 results; if True, returns None if the search has 0 results If not boolean, DXError is raised

  • more_ok (bool) – If False, DXSearchError is raised if the search has 2 or more results

Returns one data object that satisfies the supplied constraints, or None if none exist (provided zero_ok is True). Supports all search constraint arguments supported by find_data_objects().

dxpy.bindings.search.find_one_project(zero_ok=False, more_ok=True, **kwargs)[source]
Parameters:
  • zero_ok (bool) – If False (default), DXSearchError is raised if the search has 0 results; if True, returns None if the search has 0 results If not boolean, DXError is raised

  • more_ok (bool) – If False, DXSearchError is raised if the search has 2 or more results

Returns one project that satisfies the supplied constraints, or None if none exist (provided zero_ok is True). Supports all search constraint arguments supported by find_projects().

dxpy.bindings.search.find_one_app(zero_ok=False, more_ok=True, **kwargs)[source]
Parameters:
  • zero_ok (bool) – If False (default), DXSearchError is raised if the search has 0 results; if True, returns None if the search has 0 results If not boolean, DXError is raised

  • more_ok (bool) – If False, DXSearchError is raised if the search has 2 or more results

Returns one app that satisfies the supplied constraints, or None if none exist (provided zero_ok is True). Supports all search constraint arguments supported by find_apps().

dxpy.bindings.search.org_find_members(org_id=None, level=None, describe=False)[source]
Parameters:
  • org_id (string) – ID of the organization

  • level (string) – The membership level in the org that each member in the result set must have (one of “MEMBER” or “ADMIN”)

  • describe (bool or dict) – Whether or not to return the response of dxpy.api.user_describe for each result. False omits the describe response; True includes it; a dict will be used as the input to dxpy.api.user_describe (to customize the desired set of fields in the describe response).

Returns a generator that yields all org members that match the query formed by intersecting all specified constraints. The search is not restricted by any parameters that were unspecified.

dxpy.bindings.search.org_find_projects(org_id=None, name=None, name_mode='exact', ids=None, properties=None, tags=None, describe=False, public=None, created_after=None, created_before=None, region=None, containsPHI=None)[source]
Parameters:
  • org_id (string) – ID of the organization

  • name (string) – Name that each result must have (also see name_mode param)

  • name_mode (string) – Method by which to interpret the name param (“exact”: exact match, “glob”: use “*” and “?” as wildcards, “regexp”: interpret as a regular expression)

  • ids (array of strings) – List of project IDs. Each result must have a project ID that was specified in this list.

  • properties (dict) – Properties (key-value pairs) that each result must have (use value True to require the property key and allow any value)

  • tags (list of strings) – Tags that each result must have

  • describe (bool or dict) – Whether or not to return the response of dxpy.api.project_describe for each result. False omits the describe response; True includes it; a dict will be used as the input to dxpy.api.project_describe (to customize the desired set of fields in the describe response).

  • public (boolean or None) – True indicates that each result must be public; False indicates that each result must be private; None indicates that both public and private projects will be returned in the result set.

  • created_after (int or string) – Timestamp after which each result was created (see note accompanying find_data_objects() for interpretation)

  • created_before (int or string) – Timestamp before which each result was created (see note accompanying find_data_objects() for interpretation)

  • region (string) – If specified, only returns projects where the project is in the given region.

  • containsPHI (boolean) – If set to true, only returns projects that contain PHI. If set to false, only returns projects that do not contain PHI.

Return type:

generator

Returns a generator that yields all projects that match the query formed by intersecting all specified constraints. The search is not restricted by any parameters that were unspecified.

dxpy.bindings.search.org_find_apps(org_id, name=None, name_mode='exact', category=None, all_versions=None, published=None, created_by=None, developer=None, authorized_user=None, created_after=None, created_before=None, modified_after=None, modified_before=None, describe=False, limit=None, return_handler=False, first_page_size=100, **kwargs)[source]
Parameters:
  • name (string) – Name of the app (also see name_mode)

  • name_mode (string) – Method by which to interpret the name field “exact”: exact match, “glob”: use “*” and “?” as wildcards, “regexp”: interpret as a regular expression

  • category (string) – If specified, only returns apps that are in the specified category

  • all_versions (boolean) – Whether to return all versions of each app or just the version tagged “default”

  • published (boolean) – If specified, only returns results that have the specified publish status True for published apps, False for unpublished apps

  • created_by (string) – If specified, only returns app versions that were created by the specified user (of the form “user-USERNAME”)

  • developer (string) – If specified, only returns apps for which the specified user (of the form “user-USERNAME”) is a developer

  • authorized_user (string) – If specified, only returns apps for which the specified user (either a user ID, org ID, or the string “PUBLIC”) appears in the app’s list of authorized users

  • created_after (int or string) – Timestamp after which each result was last created (see note accompanying find_data_objects() for interpretation)

  • created_before (int or string) – Timestamp before which each result was last created (see note accompanying find_data_objects() for interpretation)

  • modified_after (int or string) – Timestamp after which each result was last modified (see note accompanying find_data_objects() for interpretation)

  • modified_before (int or string) – Timestamp before which each result was last modified (see note accompanying find_data_objects() for interpretation)

  • describe (bool or dict) – Controls whether to also return the output of calling describe() on each app. Supply False to omit describe output, True to obtain the default describe output, or a dict to be supplied as the describe call input (which may be used to customize the set of fields that is returned)

  • limit (int) – The maximum number of results to be returned (if not specified, the number of results is unlimited)

  • first_page_size (int) – The number of results that the initial API call will return. Subsequent calls will raise this by multiplying by 2 up to a maximum of 1000.

  • return_handler (boolean) – If True, yields results as dxpy object handlers (otherwise, yields each result as a dict with keys “id” and “project”)

Return type:

generator

Returns a generator that yields all apps that match the query. It transparently handles paging through the result set if necessary. For all parameters that are omitted, the search is not restricted by the corresponding field.

dxpy.bindings.search.find_orgs(query, first_page_size=10)[source]
Parameters:
  • query (dict) – The input to the /system/findOrgs API method.

  • first_page_size (int) – The number of results that the initial /system/findOrgs API call will return; default 10, max 1000. Subsequent calls will raise the number of returned results exponentially up to a max of 1000.

Return type:

generator

Returns a generator that yields all orgs matching the specified query. Will transparently handle pagination as necessary.