Public Member Functions | Static Public Member Functions | List of all members
dx::DXFile Class Reference

A remote file handler. More...

#include <dxfile.h>

Inheritance diagram for dx::DXFile:
dx::DXDataObject

Public Member Functions

 DXFile (const DXFile &to_copy)
 
 DXFile (const char *dxid, const char *proj=NULL)
 
 DXFile (const std::string &dxid, const std::string &proj=config::CURRENT_PROJECT())
 
 DXFile (const dx::JSON &dxlink)
 
DXFileoperator= (const DXFile &to_copy)
 
void setIDs (const std::string &dxid, const std::string &proj=config::CURRENT_PROJECT())
 
void setIDs (const char *dxid, const char *proj=NULL)
 
void setIDs (const dx::JSON &dxlink)
 
void create (const std::string &media_type="", const dx::JSON &data_obj_fields=dx::JSON(dx::JSON_OBJECT))
 
int64_t getMaxBufferSize () const
 
void setMaxBufferSize (const int64_t buf_size)
 
int getNumWriteThreads () const
 
void setNumWriteThreads (const int numThreads)
 
void read (char *ptr, int64_t n)
 
int64_t gcount () const
 
bool eof () const
 
void seek (const int64_t pos)
 
void flush ()
 
void write (const char *ptr, int64_t n)
 
void write (const std::string &data)
 
void uploadPart (const std::string &data, const int index=-1)
 
void uploadPart (const char *ptr, int64_t n, const int index=-1)
 
bool is_open () const
 
bool is_closed () const
 
void close (const bool block=false)
 
void waitOnClose () const
 
void startLinearQuery (const int64_t start_byte=0, const int64_t num_bytes=-1, const int64_t chunk_size=10 *1024 *1024, const unsigned max_chunks=20, const unsigned thread_count=5) const
 
void stopLinearQuery () const
 
bool getNextChunk (std::string &chunk) const
 
DXFile clone (const std::string &dest_proj_id, const std::string &dest_folder="/") const
 
- Public Member Functions inherited from dx::DXDataObject
 DXDataObject (const DXDataObject &to_copy)
 
 DXDataObject (const std::string &dxid)
 
 DXDataObject (const std::string &dxid, const std::string &proj)
 
std::string getID () const
 
 operator std::string ()
 
std::string getProjectID () const
 
virtual void setIDs (const JSON &dxlink)
 
JSON describe (bool incl_properties=false, bool incl_details=false) const
 
void addTypes (const JSON &types) const
 
void removeTypes (const JSON &types) const
 
JSON getDetails () const
 
void setDetails (const JSON &details) const
 
void hide () const
 
void unhide () const
 
void rename (const std::string &name) const
 
void setProperties (const JSON &properties) const
 
JSON getProperties () const
 
void addTags (const JSON &tags) const
 
void removeTags (const JSON &tags) const
 
virtual void close () const
 
JSON listProjects () const
 
void move (const std::string &dest_folder) const
 
void remove ()
 

Static Public Member Functions

static DXFile openDXFile (const std::string &dxid)
 
static DXFile newDXFile (const std::string &media_type="", const dx::JSON &data_obj_fields=dx::JSON(dx::JSON_OBJECT))
 
static void downloadDXFile (const std::string &dxid, const std::string &filename, int64_t chunksize=1048576)
 
static DXFile uploadLocalFile (const std::string &filename, const std::string &media_type="", const dx::JSON &data_obj_fields=dx::JSON(dx::JSON_OBJECT), bool waitForClose=false)
 

Additional Inherited Members

- Protected Member Functions inherited from dx::DXDataObject
void waitOnState (const std::string &state="closed", const int timeout=std::numeric_limits< int >::max()) const
 
void clone_ (const std::string &dest_proj_id, const std::string &dest_folder) const
 
- Protected Attributes inherited from dx::DXDataObject
std::string dxid_
 
std::string proj_
 

Detailed Description

A remote file handler.

A File represents an opaque array of bytes (see the API specification for more info). DXFile supports multithreaded uploading and downloading for high performance.

When a File object is initialized, it is empty, in the "open" state, and writable. In order to support reliable upload of large files, File objects in the DNAnexus Platform may be written in multiple parts (possibly in parallel). After you have written all the data you like to the File, you may close it. The File goes into the "closing" state for finalization. Some time later, the File goes into the "closed" state and can be used for reading.

There are three important rules to remember:

You can write a File object in the following ways, which you may not mix and match:

When you are finished writing data to the File, call close(). If you wish to wait until the File has been closed before proceeding, you can supply the block=true parameter to close(); call waitOnClose(); or poll the File's status yourself, using describe().

To read files, do one of the following (these may all be used concurrently and operate independently of each other):

Constructor & Destructor Documentation

dx::DXFile::DXFile ( const DXFile to_copy)
inline

Copy constructor.

dx::DXFile::DXFile ( const char *  dxid,
const char *  proj = NULL 
)
inline

Creates a DXFile handler for the specified File object.

Parameters
dxidFile object ID.
projID of the project in which to access the object (if NULL, then default workspace will be used).
dx::DXFile::DXFile ( const std::string &  dxid,
const std::string &  proj = config::CURRENT_PROJECT() 
)
inline

Creates a DXFile handler for the specified File object.

Parameters
dxidFile object ID.
projID of the project in which the File should be accessed.
dx::DXFile::DXFile ( const dx::JSON &  dxlink)
inline

Creates a DXFile handler for the specified File object.

Parameters
dxlinkA JSON representing a DNAnexus link. You may also use the extended form: {"$dnanexus_link": {"project": proj-id, "id": obj-id}}.

Member Function Documentation

DXFile dx::DXFile::clone ( const std::string &  dest_proj_id,
const std::string &  dest_folder = "/" 
) const

Clones the associated object into the specified project and folder.

Parameters
dest_proj_idID of the project to which the object should be cloned.
dest_folderFolder route in which to put it in the destination project.
Returns
New object handler with the associated project set to dest_proj_id.
void dx::DXFile::close ( const bool  block = false)

Calls flush() and issues a request to close the remote File.

See the /file-xxxx/close API method for more info.

Parameters
blockBoolean indicating whether the process should block until the remote file is in the "closed" state (true), or to return immediately (false).
void dx::DXFile::create ( const std::string &  media_type = "",
const dx::JSON &  data_obj_fields = dx::JSON(dx::JSON_OBJECT) 
)

Creates a new remote file object and sets the object ID. Initially the object may be used for writing only.

Parameters
media_typeString representing the media type of the file.
data_obj_fieldsJSON hash containing the optional fields with which to create the object ("project", "types", "details", "hidden", "name", "properties", "tags"), as provided to the /file/new API method.
void dx::DXFile::downloadDXFile ( const std::string &  dxid,
const std::string &  filename,
int64_t  chunksize = 1048576 
)
static

Shorthand for downloading a remote File to a local file.

The File is downloaded using startLinearQuery() and getNextChunk() semantics. Multiple threads with concurrent HTTP requests are used to fetch the data for higher throughput.

Note
This should be called only after the remote File is in the "closed" state; otherwise, an error of type DXFileError will be thrown.
Parameters
dxidObject handler or id of the file to download.
filenameLocal path for writing the downloaded data.
chunksizeSize, in bytes, of each chunk when downloading the file.
bool dx::DXFile::eof ( ) const

When reading a remote file using read(), returns whether the end of the file has been reached. Calling seek() to set the cursor to before the end of the file causes this flag to be unset.

Returns
Boolean: true if and only if the cursor is at the end of file.
void dx::DXFile::flush ( )

Ensures that all the data sent via previous write() calls has been flushed from the buffers and uploaded to the remote File. Finishes all pending uploads and terminates all write threads. This function blocks until the above has completed.

This function is idempotent.

Note
Since this function terminates the thread pool, use it sparingly (for example, only you have finished all your write() requests, to force the data to be written).
See also
write(const char*, int64_t)
int64_t dx::DXFile::gcount ( ) const
Returns
The number of bytes read by the last call to read().
int64_t dx::DXFile::getMaxBufferSize ( ) const
inline

Returns the buffer size (in bytes) that must be reached before data is flushed.

Returns
Buffer size, in bytes.
bool dx::DXFile::getNextChunk ( std::string &  chunk) const

Obtains the next chunk of bytes after a call to startLinearQuery(). Returns false if startLinearQuery() was not called, or if all the requested chunks from the last call to startLinearQuery() have been exhausted.

Note
- The queries performed by this function and by startLinearQuery() will not update the eof() status.
- Calling seek() will not affect this function.
Parameters
chunkIf this function returns with "true", then this string will be populated with data from next chunk. Otherwise, this string remains untouched.
Returns
"true" if another chunk is available for processing (in which case the value of chunk is copied to the input string). "false" if all chunks have exhausted, or no call to startLinearQuery() was made.
See also
startLinearQuery(), stopLinearQuery()
int dx::DXFile::getNumWriteThreads ( ) const
inline

Returns maximum number of write threads used by parallelized write() operation.

Returns
Number of threads
bool dx::DXFile::is_closed ( ) const
Returns
Boolean: true if and only if the remote file is in the "closed" state.
bool dx::DXFile::is_open ( ) const
Returns
Boolean: true if and only if the remote file is in the "open" state.
DXFile dx::DXFile::newDXFile ( const std::string &  media_type = "",
const dx::JSON &  data_obj_fields = dx::JSON(dx::JSON_OBJECT) 
)
static

Shorthand for creating a DXFile remote File handler for a new empty remote File. The newly initialized File is ready for writing.

Parameters
media_typeString representing the media type of the file.
data_obj_fieldsJSON hash containing the optional fields with which to create the object ("project", "types", "details", "hidden", "name", "properties", "tags"), as provided to the /file/new API method.
Returns
DXFile remote file handler for a new remote file.
DXFile dx::DXFile::openDXFile ( const std::string &  dxid)
static

Shorthand for creating a DXFile remote File handler with the given object id.

Parameters
dxidObject id of the file to open.
Returns
DXFile remote handler for the requested file object
DXFile& dx::DXFile::operator= ( const DXFile to_copy)
inline

Assignment operator.

Note
Only ID of the file/project, and config params (max write threads, buffer size) are copied. No state information such as read pointer location, next part ID to upload, etc are copied.
void dx::DXFile::read ( char *  ptr,
int64_t  n 
)

Reads the next n bytes in the remote file object (or all the bytes up to the end of file if there are fewer than n), and stores the downloaded data at ptr. After read() is called, eof() will return whether the end of the file was reached, and gcount() will return how many bytes were actually read.

Parameters
ptrLocation to which data should be written (user must ensure that sufficient amount of memory has been allocated for data to be written starting at "ptr", before calling this function)
nThe maximum number of bytes to retrieve
void dx::DXFile::seek ( const int64_t  pos)

Changes the position of the reading cursor (for read()) to the specified byte offset.

Calling this function on a File that is not in the "closed" state will throw an object of class DXFileError.

Note
This function does not affect reading via startLinearQuery() or getNextChunk().
See also
read()
Exceptions
DXFileError
Parameters
posNew byte position of the read cursor
void dx::DXFile::setIDs ( const std::string &  dxid,
const std::string &  proj = config::CURRENT_PROJECT() 
)
virtual

Sets the remote File ID associated with this file handler. If the handler had data stored in its internal buffer to be written to the remote file, that data will be flushed.

Parameters
dxidnew File object ID
projID of the project in which to access the File (if NULL, then default workspace will be used).

Reimplemented from dx::DXDataObject.

void dx::DXFile::setIDs ( const char *  dxid,
const char *  proj = NULL 
)
virtual

Sets the remote File ID associated with this file handler. If the handler had data stored in its internal buffer to be written to the remote file, that data will be flushed.

Parameters
dxidnew File object ID
projID of project in which to access the File.

Reimplemented from dx::DXDataObject.

void dx::DXFile::setIDs ( const dx::JSON &  dxlink)

Sets the remote File ID associated with this file handler. If the handler had data stored in its internal buffer to be written to the remote file, that data will be flushed.

Parameters
dxlinkA JSON representing a DNAnexus link. You may also use the extended form: {"$dnanexus_link": {"project": proj-id, "id": obj-id}}.
void dx::DXFile::setMaxBufferSize ( const int64_t  buf_size)
inline

Sets the buffer size (in bytes) that must be reached before data is flushed.

Parameters
buf_sizeNew buffer size, in bytes, to use (must be >= 5242880 (5MB))
Exceptions
DXFileError()if buf_size < 5242880
void dx::DXFile::setNumWriteThreads ( const int  numThreads)
inline

Sets the maximum number of threads used by parallelized write() operation.

Parameters
numThreadsNumber of threads
void dx::DXFile::startLinearQuery ( const int64_t  start_byte = 0,
const int64_t  num_bytes = -1,
const int64_t  chunk_size = 10*1024*1024,
const unsigned  max_chunks = 20,
const unsigned  thread_count = 5 
) const

Starts fetching data in chunks (of the specified byte size) from the remote File in the background. After calling this function, getNextChunk() can be use to access the chunks in order.

Note
- Calling this function invalidates any previous call to the function (all previously started fetching of chunks is stopped).
- The queries performed by this function will not update the eof() status.
Parameters
start_byteStarting byte offset (0-indexed) from which data will be fetched. Defaults to reading from the beginning of the file.
num_bytesTotal number of bytes to be fetched. If not specified, all data to the end of the file is read.
chunk_sizeNumber of bytes to be fetched in each chunk. (Each chunk will be this length, except possibly the last one, which may be shorter.)
max_chunksNumber of fetched chunks to be kept in memory at any time. Note that the number of real chunks in memory could be as high as (max_chunks + thread_count).
thread_countNumber of threads to be used for fetching data.
See also
stopLinearQuery(), getNextChunk()
void dx::DXFile::stopLinearQuery ( ) const

Stops background fetching of all chunks. Terminates all read threads. Any previous call to startLinearQuery() is invalidated.

This function is idempotent.

See also
startLinearQuery(), getNextChunk()
DXFile dx::DXFile::uploadLocalFile ( const std::string &  filename,
const std::string &  media_type = "",
const dx::JSON &  data_obj_fields = dx::JSON(dx::JSON_OBJECT),
bool  waitForClose = false 
)
static

Shorthand for uploading a local file and closing it when done. Sets the name to be equal to the filename if no name is provided in data_obj_fields.

Parameters
filenameLocal path for the file to upload.
media_typeString representing the media type of the file.
data_obj_fieldsJSON hash containing the optional fields with which to create the object ("project", "types", "details", "hidden", "name", "properties", "tags"), as provided to the /file/new API method.
waitForCloseIf set to true, then function returns only after uploaded file is in the "closed" state. Otherwise, returns directly after initiating the file close (the uploaded file will be in the "closing" or "closed" state).
Returns
A remote File handler for the newly uploaded File.
void dx::DXFile::uploadPart ( const std::string &  data,
const int  index = -1 
)

Uploads data as a part. Same functionality as uploadPart(const char*, int64_t, const int).

Warning
Do not mix and match with write().
See also
uploadPart(const char*, int64_t, const int)
Parameters
dataString containing the data to append.
indexNumber with which to label the uploaded part.
void dx::DXFile::uploadPart ( const char *  ptr,
int64_t  n,
const int  index = -1 
)

Uploads the n bytes stored at ptr as a part to the remote File. Blocks until the request is completed.

If there are multiple requests to write to the same part, the last one to finish "wins".

If index is not provided, it defaults to 1 (therefore, possibly overwriting data from other uploadPart() calls that do not specify an index).

Warning
Do not mix and match with write().
Parameters
ptrPointer to the location of data to be written.
nThe number of bytes to write.
indexNumber with which to label the part of the file to be uploaded. If not specified, part 1 is written.
void dx::DXFile::waitOnClose ( ) const

Waits until the remote File object is in the "closed" state.

void dx::DXFile::write ( const char *  ptr,
int64_t  n 
)

Appends the data stored at ptr to the remote File.

The data is written to an internal buffer that is uploaded to the remote file when full.

For increased throughput, this function uses multiple threads for uploading data in the background. It will block only if the internal buffer is full and all available workers (MAX_WRITE_THREADS threads) are already busy with HTTP requests. Otherwise, it returns immediately.

If any of the threads fails then std::terminate() will be called.

Warning
Do not mix and match with uploadPart().
See also
flush()
Parameters
ptrLocation of data to be written
nNumber of bytes to write
void dx::DXFile::write ( const std::string &  data)

Appends the data in the specified string to the remote File.

Same functionality as write(const char*, int64_t).

Warning
Do not mix and match with uploadPart().
See also
write(const char*, int64_t)
flush()
Parameters
dataString to write to the file

The documentation for this class was generated from the following files: