Files can be used to store an immutable opaque sequence of bytes.
You can obtain a handle to a new or existing file object with new_dxfile() or open_dxfile(), respectively. Both return a remote file handler, which is a Python file-like object. There are also helper functions (download_dxfile(), upload_local_file(), and upload_string()) for directly downloading and uploading existing files or strings in a single operation.
Files are tristate objects:
Many methods that return a DXFile object take a mode parameter. In general the available modes are as follows (some methods create a new file and consequently do not support immediate reading from it with the “r” mode):
Note
The automatic flush and close operations implied by the “w” or “a” modes only happen if the DXFile object is used in a Python context-managed scope (see the following examples).
Here is an example of writing to a file object via a context-managed file handle:
# Open a file for writing
with open_dxfile('file-xxxx', mode='w') as fd:
for line in input_file:
fd.write(line)
The use of the context-managed file is optional for read-only objects; that is, you may use the object without a “with” block (and omit the mode parameter), for example:
# Open a file for reading
fd = open_dxfile('file-xxxx')
for line in fd:
print line
Warning
If you write any data to a file and you choose to use a non context-managed file handle, you must call flush() or close() when you are done, for example:
# Open a file for writing; we will flush it explicitly ourselves
fd = open_dxfile('file-xxxx')
for line in input_file:
fd.write(line)
fd.flush()
If you do not do so, and there is still unflushed data when the DXFile object is garbage collected, the DXFile will attempt to flush it then, in the destructor. However, any errors in the resulting API calls (or, in general, any exception in a destructor) are not propagated back to your program! That is, your writes can silently fail if you rely on the destructor to flush your data.
DXFile will print a warning if it detects unflushed data as the destructor is running (but again, it will attempt to flush it anyway).
Note
Writing to a file with the “w” mode calls close() but does not wait for the file to finish closing. If the file you are writing is one of the outputs of your app or applet, you can use job-based object references, which will make downstream jobs wait for closing to finish before they can begin. However, if you intend to subsequently read from the file in the same process, you will need to call wait_on_close() to ensure the file is ready to be read.
The following helper functions are useful shortcuts for interacting with File objects.
Parameters: |
|
---|
Downloads the remote file referenced by dxid and saves it to filename.
Example:
download_dxfile("file-xxxx", "localfilename.fastq")
Parameters: |
|
---|
Downloads the contents of the remote folder of the project into the local directory specified by destdir.
Example:
download_folder("project-xxxx", "/home/jsmith/input", folder="/input")
Parameters: |
|
---|
Returns a list of subfolders for the remote path (included to the result) of the project.
Example:
list_subfolders("project-xxxx", folder="/input")
Parameters: | mode (string) – One of “w” or “a” for write and append modes, respectively |
---|---|
Return type: | DXFile |
Additional optional parameters not listed: all those under dxpy.bindings.DXDataObject.new().
Creates a new remote file object that is ready to be written to; returns a DXFile object that is a writable file-like object.
Example:
with new_dxfile(media_type="application/json") as fd:
fd.write("foo\n")
Note that this is shorthand for:
dxFile = DXFile()
dxFile.new(**kwargs)
Parameters: | dxid (string) – file ID |
---|---|
Return type: | DXFile |
Given the object ID of an uploaded file, returns a remote file handler that is a Python file-like object.
Example:
with open_dxfile("file-xxxx") as fd:
for line in fd:
...
Note that this is shorthand for:
DXFile(dxid)
Parameters: |
|
---|---|
Returns: | Remote file handler |
Return type: |
Additional optional parameters not listed: all those under dxpy.bindings.DXDataObject.new().
Exactly one of filename or file is required.
Uploads filename or reads from file into a new file object (with media type media_type if given) and returns the associated remote file handler. The “name” property of the newly created remote file is set to the basename of filename or to file.name (if it exists).
Examples:
# Upload from a path
dxpy.upload_local_file("/home/ubuntu/reads.fastq.gz")
# Upload from a file-like object
with open("reads.fastq") as fh:
dxpy.upload_local_file(file=fh)
Parameters: |
|
---|---|
Returns: | Remote file handler |
Return type: |
Additional optional parameters not listed: all those under dxpy.bindings.DXDataObject.new().
Uploads the data in the string to_upload into a new file object (with media type media_type if given) and returns the associated remote file handler.
This remote file handler is a Python file-like object.
Bases: dxpy.bindings.DXDataObject
Remote file object handler.
Parameters: |
|
---|
Note
The attribute values below are current as of the last time describe() was run. (Access to any of the below attributes causes describe() to be called if it has never been called before.)
String containing the Internet Media Type (also known as MIME type or Content-type) of the file.
Parameters: |
|
---|
Creates a new remote file with media type media_type, if given.
Parameters: |
|
---|
Return the next item from the iterator. If default is given and the iterator is exhausted, it is returned instead of raising StopIteration.
Parameters: |
|
---|
Discards the currently stored ID and associates the handler with dxid. As a side effect, it also flushes the buffer for the previous file object if the buffer is nonempty.
Parameters: | offset (integer) – Position in the file to seek to |
---|
Seeks to offset bytes from the beginning of the file. This is a no-op if the file is open for writing.
The position is computed from adding offset to a reference point; the reference point is selected by the from_what argument. A from_what value of 0 measures from the beginning of the file, 1 uses the current file position, and 2 uses the end of the file as the reference point. from_what can be omitted and defaults to 0, using the beginning of the file as the reference point.
Returns the current position of the file read cursor.
Warning: Because of buffering semantics, this value will not be accurate when using the line iterator form (for line in file).
Parameters: |
|
---|
Writes the data data to the file.
Returns: | Whether the remote file is closed |
---|---|
Return type: | boolean |
Returns True if the remote file is closed and False otherwise. Note that if it is not closed, it can be in either the “open” or “closing” states.
Parameters: | block (boolean) – If True, this function blocks until the remote file has closed. |
---|
Attempts to close the file.
Note
The remote file cannot be closed until all parts have been fully uploaded. An exception will be thrown if this is not the case.
Parameters: | timeout (integer) – Maximum amount of time to wait (in seconds) until the file is closed. |
---|---|
Raises: | dxpy.exceptions.DXFileError if the timeout is reached before the remote file has been closed |
Waits until the remote file is closed.
Parameters: |
|
---|---|
Raises: | dxpy.exceptions.DXFileError if index is given and is not in the correct range, requests.exceptions.HTTPError if upload fails |
Uploads the data in data as part number index for the associated file. If no value for index is given, index defaults to 1. This probably only makes sense if this is the only part to be uploaded.
Parameters: |
|
---|---|
Returns: | download URL and dict containing HTTP headers to be supplied with the request |
Return type: | tuple (str, dict) |
Raises: | ResourceNotFound if a project context was given and the file was not found in that project context. |
Raises: | ResourceNotFound if no project context was given and the file was not found in any projects. |
Obtains a URL that can be used to directly download the associated file.
Parameters: | all_copies (boolean) – Force the transition of files into the archived state. Requesting user must be the ADMIN of the project billTo org. If true, archive all the copies of files in projects with the same billTo org. |
---|---|
Raises: | InvalidState if the file is not in a live state |
Raises: | PermissionDenied if the requesting user does not have CONTRIBUTE access or is not an ADMIN of the project billTo org with allCopies=True. |
Parameters: | dry_run (boolean) – If true, only display the output of the API call without executing the unarchival |
---|---|
Raises: | InvalidState if the file is not in a closed or archived state |
Raises: | PermissionDenied if the requesting user does not have CONTRIBUTE access |