SIONlib 2.0 API changes
open
functions
Current API
Some parts of the current API combine many functions into one. This makes it easy to write down advanced operations concisely, but arguably makes it harder than necessary for the user to get started doing easy things at first. Take for example the following function
int sion_paropen_mpi(
const char *fname,
const char *file_mode,
int *numFiles,
MPI_Comm gComm,
const MPI_Comm *lComm,
sion_int64 *chunksize,
sion_int32 *fsblksize,
int *globalrank,
FILE **fileptr,
char **newfname
)
For each of these 10 function arguments, the user has to find out:
- the argument
intent
:- some are
in
parameters, - some are
out
parameters, - some are both,
- some are marked incorrectly;
- some are
- whether he has to supply a value or not:
- some of the arguments are mandatory,
- some are optional with different special values to signal the absence of a value (e.g.
NULL
vs-1
);
- does it interact with other arguments:
- some arguments have intricate relationships to one another, where they can only be used in certain combinations (e.g.
numFiles
andlComm
).
- some arguments have intricate relationships to one another, where they can only be used in certain combinations (e.g.
Proposed Changes
In the branch DEV-BS-API2
I propose some structural changes to the API that I hope simplify the getting started scenario by deferring some of these questions to a later point during the development of a project that wants to make use of SIONlib. To that end, I have introduced a type sion_mpi_options
(see https://trac.version.fz-juelich.de/SIONlib/browser/branches/DEV-BS-API2/src/parlib/sion_mpi.h#L94) that encapsulates the argument list of sion_paropen_mpi
. It is initialized using a function that takes all arguments that are strictly necessary to open a file (except for rank, which I recently found out is an unnecessary out
argument):
sion_mpi_options sion_mpi_options_new(
const char *filename,
sion_open_mode mode,
MPI_Comm communicator,
int rank // not strictly necessary, see ticket #153
); // ignoring rank, that is three parameters to fopen's two
All other members are initialised to sensible default values, i.e. fsblksize
is detected from the file system, chunksize
is chosen accordingly (a "correct" chunksize is no longer necessary with the hide-chunks
changes) and the file_number
is set to 1...
Setter functions are used to override default values, e.g. sion_mpi_options_set_chunksize
or sion_mpi_options_set_multifile_number
.
Once the user has modified the options object to his satisfaction, it is used as the sole argument of sion_mpi_paropen_from_options
. This design is inspired by the Builder pattern (https://en.wikipedia.org/wiki/Builder_pattern)
Along with these changes to the open
function, an exhaustive list of getter functions for file properties needs to be offered, to query things like newfname
in cases where these are of interest to the user.
An Alternative
Alternatively, the responsibilities could be split slightly differently, encapsulating only the optional arguments in the sion_mpi_options
type. The open function would then look like this:
int sion_paropen_mpi_with_options(
const char *filename,
sion_open_mode mode,
MPI_Comm communicator,
const sion_mpi_options *options
);
And the user can pass in NULL
for options
if he is fine with the defaults and thus does not have to come into contact with the sion_mpi_options
type and associated functions until he needs them. There could also be an even simpler function to do just this:
int sion_paropen_mpi(const char *filename, sion_open_mode mode, MPI_Comm communicator) {
sion_paropen_mpi_with_options(filename, mode, communicator, NULL);
}
Pro
All in all, these changes help answer or defer the questions mentioned earlier:
-
Argument intent: When opening a file, all arguments have intent
in
. Dynamic properties of open files can later be queried by the user through getter functions (in case he is interested). -
Optional arguments: Mandatory arguments have to be provided. Optional arguments are hidden inside the
sion_mpi_options
type. Instead of having to find out what special value signals the absence of an argument, the user can simply ignore them. -
Argument interactions: The setter functions of the
sion_mpi_options
type can be used to enforce constraints between arguments, seesion_mpi_options_set_multifile_{number|communicator}
.
Contra
Although these changes can make it easier to get into working with SIONlib at first by hiding some of the complexity, it does in turn hurt the visibility of some advanced mechanisms. E.g., in the current API, the file_number
argument explicitly appears in the function signature of sion_mpi_paropen
alerting the user to the presence of this feature, whereas in the proposed API, a user can successfully open a file and use it without learning about the multifile
mechanism and as a consequence miss potential performance improvements.
Open Questions
- Can this mechanism replace all arguments of
open
, especially those that are currently encoded in the mode string? - Does this mechanism work for the other
open
flavors (serial, OpenMP, hybrid)? - ...?
seek
and file position in general
Current API
Currently, a location inside a SIONlib container is identified by a triple of (rank
, blocknr
, position_in_block
).
int sion_seek(
int sid,
int rank,
int blocknr,
sion_int64 position_in_block
)
Once the continuous write work has been completed, the model of a logical file in a SIONlib container can be changed from a sequence of chunks each with individual fill amount to a sequence of n-1
chunks containing chunksize
bytes and a last chunk containing the remaining size(logical file) - (n - 1) * chunksize
bytes. In this picture the block numbers can be de-emphasized in favour of absolute offsets into the logical file (which are straightforward to calculate) or relative offsets from the current file pointer position. This would make SIONlib logical files and their API more similar to the naive task-local files they are trying to replace.
Proposed Changes
The names and documentation of the function arguments could be changed, renaming the blocknr
argument to whence
as found on conventional seek
functions and more emphasis would be placed on the special values SION_ABSOLUTE_POS
and SION_END_POS
as well as another special value SION_SEEK_RELATIVE
that allows searching relative to the current position, but across block boundaries. The position_in_block
argument would be renamed to position
.
A step further
Alternatively, to further de-emphasize the concept of blocks, the option to seek by blocks could be removed from sion_seek
altogether and be moved to a different function e.g. sion_seek_block
. This simplifies sion_seek
for use cases where block boundaries do not matter. Furthermore, this enables a more versatile family of seek by block operations, such as relative seek by block (e.g. seek to the beginning of the next block, after writing a header of metadata at the beginning of a file).
Yet further
The interface of sion_seek
could further be simplified by removing the rank
parameter (which, with the changes above, would make the interface as good as identical to a conventional seek
function). This functionality could be moved to a separate function, e.g. sion_seek_logical_file
. This simplifies common use cases where opening a SIONlib container means opening a single logical file inside of it, e.g. the (non-mapped) sion_paropen...
functions or sion_open_rank
where seeking by rank is not possible anyways.
Open Questions
- Which alternative should we decide on?