A remote file handler. More...
#include <dxfile.h>
Public Member Functions | |
DXFile (const DXFile &to_copy) | |
DXFile (const char *dxid, const char *proj=NULL) | |
DXFile (const std::string &dxid, const std::string &proj=config::CURRENT_PROJECT()) | |
DXFile (const dx::JSON &dxlink) | |
DXFile & | operator= (const DXFile &to_copy) |
void | setIDs (const std::string &dxid, const std::string &proj=config::CURRENT_PROJECT()) |
void | setIDs (const char *dxid, const char *proj=NULL) |
void | setIDs (const dx::JSON &dxlink) |
void | create (const std::string &media_type="", const dx::JSON &data_obj_fields=dx::JSON(dx::JSON_OBJECT)) |
int64_t | getMaxBufferSize () const |
void | setMaxBufferSize (const int64_t buf_size) |
int | getNumWriteThreads () const |
void | setNumWriteThreads (const int numThreads) |
void | read (char *ptr, int64_t n) |
int64_t | gcount () const |
bool | eof () const |
void | seek (const int64_t pos) |
void | flush () |
void | write (const char *ptr, int64_t n) |
void | write (const std::string &data) |
void | uploadPart (const std::string &data, const int index=-1) |
void | uploadPart (const char *ptr, int64_t n, const int index=-1) |
bool | is_open () const |
bool | is_closed () const |
void | close (const bool block=false) |
void | waitOnClose () const |
void | startLinearQuery (const int64_t start_byte=0, const int64_t num_bytes=-1, const int64_t chunk_size=10 *1024 *1024, const unsigned max_chunks=20, const unsigned thread_count=5) const |
void | stopLinearQuery () const |
bool | getNextChunk (std::string &chunk) const |
DXFile | clone (const std::string &dest_proj_id, const std::string &dest_folder="/") const |
Public Member Functions inherited from dx::DXDataObject | |
DXDataObject (const DXDataObject &to_copy) | |
DXDataObject (const std::string &dxid) | |
DXDataObject (const std::string &dxid, const std::string &proj) | |
std::string | getID () const |
operator std::string () | |
std::string | getProjectID () const |
virtual void | setIDs (const JSON &dxlink) |
JSON | describe (bool incl_properties=false, bool incl_details=false) const |
void | addTypes (const JSON &types) const |
void | removeTypes (const JSON &types) const |
JSON | getDetails () const |
void | setDetails (const JSON &details) const |
void | hide () const |
void | unhide () const |
void | rename (const std::string &name) const |
void | setProperties (const JSON &properties) const |
JSON | getProperties () const |
void | addTags (const JSON &tags) const |
void | removeTags (const JSON &tags) const |
virtual void | close () const |
JSON | listProjects () const |
void | move (const std::string &dest_folder) const |
void | remove () |
Static Public Member Functions | |
static DXFile | openDXFile (const std::string &dxid) |
static DXFile | newDXFile (const std::string &media_type="", const dx::JSON &data_obj_fields=dx::JSON(dx::JSON_OBJECT)) |
static void | downloadDXFile (const std::string &dxid, const std::string &filename, int64_t chunksize=1048576) |
static DXFile | uploadLocalFile (const std::string &filename, const std::string &media_type="", const dx::JSON &data_obj_fields=dx::JSON(dx::JSON_OBJECT), bool waitForClose=false) |
Additional Inherited Members | |
Protected Member Functions inherited from dx::DXDataObject | |
void | waitOnState (const std::string &state="closed", const int timeout=std::numeric_limits< int >::max()) const |
void | clone_ (const std::string &dest_proj_id, const std::string &dest_folder) const |
Protected Attributes inherited from dx::DXDataObject | |
std::string | dxid_ |
std::string | proj_ |
A remote file handler.
A File represents an opaque array of bytes (see the API specification for more info). DXFile supports multithreaded uploading and downloading for high performance.
When a File object is initialized, it is empty, in the "open" state, and writable. In order to support reliable upload of large files, File objects in the DNAnexus Platform may be written in multiple parts (possibly in parallel). After you have written all the data you like to the File, you may close it. The File goes into the "closing" state for finalization. Some time later, the File goes into the "closed" state and can be used for reading.
There are three important rules to remember:
You can write a File object in the following ways, which you may not mix and match:
When you are finished writing data to the File, call close(). If you wish to wait until the File has been closed before proceeding, you can supply the block=true
parameter to close(); call waitOnClose(); or poll the File's status yourself, using describe().
To read files, do one of the following (these may all be used concurrently and operate independently of each other):
|
inline |
Copy constructor.
|
inline |
Creates a DXFile handler for the specified File object.
dxid | File object ID. |
proj | ID of the project in which to access the object (if NULL, then default workspace will be used). |
|
inline |
Creates a DXFile handler for the specified File object.
dxid | File object ID. |
proj | ID of the project in which the File should be accessed. |
|
inline |
Creates a DXFile handler for the specified File object.
dxlink | A JSON representing a DNAnexus link. You may also use the extended form: {"$dnanexus_link": {"project": proj-id, "id": obj-id}}. |
DXFile dx::DXFile::clone | ( | const std::string & | dest_proj_id, |
const std::string & | dest_folder = "/" |
||
) | const |
Clones the associated object into the specified project and folder.
dest_proj_id | ID of the project to which the object should be cloned. |
dest_folder | Folder route in which to put it in the destination project. |
void dx::DXFile::close | ( | const bool | block = false | ) |
Calls flush() and issues a request to close the remote File.
See the /file-xxxx/close API method for more info.
block | Boolean indicating whether the process should block until the remote file is in the "closed" state (true), or to return immediately (false). |
void dx::DXFile::create | ( | const std::string & | media_type = "" , |
const dx::JSON & | data_obj_fields = dx::JSON(dx::JSON_OBJECT) |
||
) |
Creates a new remote file object and sets the object ID. Initially the object may be used for writing only.
media_type | String representing the media type of the file. |
data_obj_fields | JSON hash containing the optional fields with which to create the object ("project", "types", "details", "hidden", "name", "properties", "tags"), as provided to the /file/new API method. |
|
static |
Shorthand for downloading a remote File to a local file.
The File is downloaded using startLinearQuery() and getNextChunk() semantics. Multiple threads with concurrent HTTP requests are used to fetch the data for higher throughput.
dxid | Object handler or id of the file to download. |
filename | Local path for writing the downloaded data. |
chunksize | Size, in bytes, of each chunk when downloading the file. |
bool dx::DXFile::eof | ( | ) | const |
void dx::DXFile::flush | ( | ) |
Ensures that all the data sent via previous write() calls has been flushed from the buffers and uploaded to the remote File. Finishes all pending uploads and terminates all write threads. This function blocks until the above has completed.
This function is idempotent.
int64_t dx::DXFile::gcount | ( | ) | const |
|
inline |
Returns the buffer size (in bytes) that must be reached before data is flushed.
bool dx::DXFile::getNextChunk | ( | std::string & | chunk | ) | const |
Obtains the next chunk of bytes after a call to startLinearQuery(). Returns false if startLinearQuery() was not called, or if all the requested chunks from the last call to startLinearQuery() have been exhausted.
chunk | If this function returns with "true", then this string will be populated with data from next chunk. Otherwise, this string remains untouched. |
|
inline |
Returns maximum number of write threads used by parallelized write() operation.
bool dx::DXFile::is_closed | ( | ) | const |
bool dx::DXFile::is_open | ( | ) | const |
|
static |
Shorthand for creating a DXFile remote File handler for a new empty remote File. The newly initialized File is ready for writing.
media_type | String representing the media type of the file. |
data_obj_fields | JSON hash containing the optional fields with which to create the object ("project", "types", "details", "hidden", "name", "properties", "tags"), as provided to the /file/new API method. |
|
static |
Assignment operator.
void dx::DXFile::read | ( | char * | ptr, |
int64_t | n | ||
) |
Reads the next n
bytes in the remote file object (or all the bytes up to the end of file if there are fewer than n
), and stores the downloaded data at ptr
. After read() is called, eof() will return whether the end of the file was reached, and gcount() will return how many bytes were actually read.
ptr | Location to which data should be written (user must ensure that sufficient amount of memory has been allocated for data to be written starting at "ptr", before calling this function) |
n | The maximum number of bytes to retrieve |
void dx::DXFile::seek | ( | const int64_t | pos | ) |
Changes the position of the reading cursor (for read()) to the specified byte offset.
Calling this function on a File that is not in the "closed" state will throw an object of class DXFileError.
DXFileError |
pos | New byte position of the read cursor |
|
virtual |
Sets the remote File ID associated with this file handler. If the handler had data stored in its internal buffer to be written to the remote file, that data will be flushed.
dxid | new File object ID |
proj | ID of the project in which to access the File (if NULL, then default workspace will be used). |
Reimplemented from dx::DXDataObject.
|
virtual |
Sets the remote File ID associated with this file handler. If the handler had data stored in its internal buffer to be written to the remote file, that data will be flushed.
dxid | new File object ID |
proj | ID of project in which to access the File. |
Reimplemented from dx::DXDataObject.
void dx::DXFile::setIDs | ( | const dx::JSON & | dxlink | ) |
Sets the remote File ID associated with this file handler. If the handler had data stored in its internal buffer to be written to the remote file, that data will be flushed.
dxlink | A JSON representing a DNAnexus link. You may also use the extended form: {"$dnanexus_link": {"project": proj-id, "id": obj-id}}. |
|
inline |
Sets the buffer size (in bytes) that must be reached before data is flushed.
buf_size | New buffer size, in bytes, to use (must be >= 5242880 (5MB)) |
DXFileError() | if buf_size < 5242880 |
|
inline |
Sets the maximum number of threads used by parallelized write() operation.
numThreads | Number of threads |
void dx::DXFile::startLinearQuery | ( | const int64_t | start_byte = 0 , |
const int64_t | num_bytes = -1 , |
||
const int64_t | chunk_size = 10*1024*1024 , |
||
const unsigned | max_chunks = 20 , |
||
const unsigned | thread_count = 5 |
||
) | const |
Starts fetching data in chunks (of the specified byte size) from the remote File in the background. After calling this function, getNextChunk() can be use to access the chunks in order.
start_byte | Starting byte offset (0-indexed) from which data will be fetched. Defaults to reading from the beginning of the file. |
num_bytes | Total number of bytes to be fetched. If not specified, all data to the end of the file is read. |
chunk_size | Number of bytes to be fetched in each chunk. (Each chunk will be this length, except possibly the last one, which may be shorter.) |
max_chunks | Number of fetched chunks to be kept in memory at any time. Note that the number of real chunks in memory could be as high as (max_chunks + thread_count). |
thread_count | Number of threads to be used for fetching data. |
void dx::DXFile::stopLinearQuery | ( | ) | const |
Stops background fetching of all chunks. Terminates all read threads. Any previous call to startLinearQuery() is invalidated.
This function is idempotent.
|
static |
Shorthand for uploading a local file and closing it when done. Sets the name to be equal to the filename if no name is provided in data_obj_fields.
filename | Local path for the file to upload. |
media_type | String representing the media type of the file. |
data_obj_fields | JSON hash containing the optional fields with which to create the object ("project", "types", "details", "hidden", "name", "properties", "tags"), as provided to the /file/new API method. |
waitForClose | If set to true, then function returns only after uploaded file is in the "closed" state. Otherwise, returns directly after initiating the file close (the uploaded file will be in the "closing" or "closed" state). |
void dx::DXFile::uploadPart | ( | const std::string & | data, |
const int | index = -1 |
||
) |
Uploads data as a part. Same functionality as uploadPart(const char*, int64_t, const int).
data | String containing the data to append. |
index | Number with which to label the uploaded part. |
void dx::DXFile::uploadPart | ( | const char * | ptr, |
int64_t | n, | ||
const int | index = -1 |
||
) |
Uploads the n
bytes stored at ptr
as a part to the remote File. Blocks until the request is completed.
If there are multiple requests to write to the same part, the last one to finish "wins".
If index
is not provided, it defaults to 1 (therefore, possibly overwriting data from other uploadPart() calls that do not specify an index
).
ptr | Pointer to the location of data to be written. |
n | The number of bytes to write. |
index | Number with which to label the part of the file to be uploaded. If not specified, part 1 is written. |
void dx::DXFile::waitOnClose | ( | ) | const |
Waits until the remote File object is in the "closed" state.
void dx::DXFile::write | ( | const char * | ptr, |
int64_t | n | ||
) |
Appends the data stored at ptr
to the remote File.
The data is written to an internal buffer that is uploaded to the remote file when full.
For increased throughput, this function uses multiple threads for uploading data in the background. It will block only if the internal buffer is full and all available workers (MAX_WRITE_THREADS threads) are already busy with HTTP requests. Otherwise, it returns immediately.
If any of the threads fails then std::terminate() will be called.
ptr | Location of data to be written |
n | Number of bytes to write |
void dx::DXFile::write | ( | const std::string & | data | ) |
Appends the data in the specified string to the remote File.
Same functionality as write(const char*, int64_t).
data | String to write to the file |