John Mongan (jmongan-at-mccammon.ucsd.edu)
The file format described in this document was developed for storing data generated by molecular dynamics simulations. It was introduced in version 9 of the AMBER suite of programs (http://amber.scripps.edu).
The primary design goals of this format are:
Programs creating trajectory files (``creators'') shall adhere strictly to the requirements of this document. Programs reading trajectory files (``readers'') shall be as permissive as possible in applying the requirements of this document. Readers may emit warnings if out-of-spec files are encountered; these warnings should include information about the program that originally created the file (see Global attributes, section 4). Readers shall not fail to read a file unless the required information cannot be located or interpreted. In particular, to ensure forward compatability with later extension of the format, readers shall not fail or emit warnings if elements not described in this document are present in the file.
Those using NetCDF versions 4 or later should take care to ensure that files are read and written using this encoding, and not the HDF5 encoding.
This can be accomplished by setting a flag during file creation; refer to API docs for details.
Global attributes shall have type character string. Spelling and capitalization of attribute names shall be exactly as appears below. Creators shall include all attributes marked required and may include attributes marked optional. Creators shall not write an attribute string having a length greater than 80 characters. Readers may warn about missing required attributes, but shall not fail, except in the case of a missing or unexpected Conventions or ConventionVersion attributes.
Contents of this attribute are a comma or space delimited list of tokens representing all of the conventions to which the file conforms. Creators shall include the string AMBER as one of the tokens in this list. In the usual case, where the file conforms only to this convention, the value of the attribute will simply be ``AMBER''. Readers may fail if this attribute is not present or none of the tokens in the list are AMBER. Optionally, if the reader does not expect NetCDF files other than those conforming to the AMBER convention, it may emit a warning and attempt to read the file even when the Conventions attribute is missing.
Contents are a string representation of the version number of this convention. Future revisions of this convention having the same version number may include definitions of additional variables, dimensions or attributes, but are guaranteed to have no incompatible changes to variables, dimensions or attributes specified in previous revisions. Creators shall set this attribute to ``1.0''. If this attribute is present and has a value other than ``1.0'', readers may fail or may emit a warning and continue. It is expected that the version of this convention will change rarely, if ever.
If the creator is part of a suite of programs or modules, this attribute shall be set to the name of the suite.
Creators shall set this attribute to the name of the creating program or module.
Creators shall set this attribute to the preferred textual formatting of the current version number of the creating program or module.
Creators may set use this attribute to represent a user-defined title for the data represented in the file. Absence of a title may be indicated by omitting the attribute or by including it with an empty string value.
Coordinates along the frame dimension will generally represent data taken from different time steps, but may represent arbitrary conformation numbers when the trajectory file does not represent a true trajectory but rather a collection of conformations (e.g. from clustering).
This dimension represents the three spatial dimensions (X,Y,Z), in that order.
Coordinates along this dimension are the indices of particles for which data is stored in the file. The length of this dimension may be different (generally smaller) than the actual number of particles in the simulation if the user chooses to store data for only a subset of particles.
This dimension represents the three lengths (a,b,c) that define the size of the unit cell.
This dimension represents the three angles (alpha,beta,gamma) that define the shape of the unit cell.
This dimension is used for character strings in label variables where the label is longer than a single character. The length of this dimension is equal to the length of the longest label string.
Variables are described below as <type> <name>(<dimension> [,<dimension>..])
Note that the order of dimensions corresponds to the CDL and C APIs. When using the Fortran APIs, the order of dimensions should be reversed.
Label variables shall be written by creators whenever their corresponding dimension is present. These variables are for self-description purposes, so readers may generally ignore them. Labels requiring more than one character per coordinate shall use the label dimension. Individual coordinate labels that are shorter than the length of the label dimension shall be space padded to the length of the label dimension.
Creators shall write the string ``xyz'' to this variable, indicating the labels for coordinates along the spatial dimension.
Creators shall write the string ``abc'' to this variable, indicating the labels for the three lengths defining the size of the unit cell.
Creators shall write the strings ``alpha'', ``beta'', ``gamma'' to this variable, naming the angles defining the shape of the unit cell.
All data variables are optional. Some data variables have dependencies on other data variables, as described below. Creators shall define a units attribute of type character string for each variable as described below. Creators may define a scale_factor attribute of type float for each variable. Creators shall ensure that the units of data values, after being multiplied by the value of scale_factor (if it exists) are equal to that described by the units attribute.If a scale_factor attribute exists for a variable, readers shall multiply data values by the value of the scale_factor attribute before interpreting the data. This scaling burden is placed on the reader rather than the creator, as writing data is expected to be a more time-sensitive operation than reading it.
It is left as an implementation detail whether creators create a separate file for each variable grouping (e.g. coordinates and velocities) or a single file containing all variables. Some creators may allow the user to select the approach. Readers should support reading both styles, that is, combining data from multiple files or reading it all from a single file.
When coordinates on the frame dimension have a temporal sequence (e.g. they form a molecular dynamics trajectory), creators shall define this dimension and write a float for each frame coordinate representing the simulated time value in picoseconds associated with the frame. Time zero is arbitrary, but typically will correspond to the start of the simulation. When the file stores a collection of conformations having no temporal sequence, creators shall omit this variable.
This variable shall contain the Cartesian coordinates of the specified particle for the specified frame.
When the coordinates variable is included and the data in the coordinates variable come from a simulation with periodic boundaries, creators shall include this variable. This variable shall represent the lengths (a,b,c) of the unit cell for each frame. The edge with length a lies along the x axis; the edge with length b lies in the x-y plane. The origin (point of invariance under scaling) of the unit cell is defined as (0,0,0). If the simulation has one or two dimensional periodicity, then the length(s) corresponding to spatial dimensions in which there is no periodicity shall be set to zero.
Creators shall include this variable if and only if they include the
cell_lengths variable. This variable shall represent the angles
(
) defining the unit cell for each frame.
defines the angle between the b and c vectors,
defines the angle between the a and c vectors and
defines the angle between the a and b vectors. Angles
that are undefined due to less than three dimensional periodicity
shall be set to zero.
When the velocities variable is present, it shall represent the cartesian components of the velocity for the specified particle and frame. It is recognized that due to the nature of commonly used integrators in molecular dynamics, it may not be possible for the creator to write a set of velocities corresponding to exactly the same point in time as defined by the time variable and represented in the coordinates variable. In such cases, the creator shall write a set of velocities from the nearest point in time to that represented by the specified frame.
The following is an example of the CDL for a trajectory file conforming to the preceding specification and containing most of the elements described in this document. This CDL was generated using ncdump -h <trajectory file>.
netcdf mdtrj {
dimensions:
frame = UNLIMITED ; // (10 currently)
spatial = 3 ;
atom = 28 ;
cell_spatial = 3 ;
cell_angular = 3 ;
label = 5 ;
variables:
char spatial(spatial) ;
char cell_spatial(cell_spatial) ;
char cell_angular(cell_angular, label) ;
float time(frame) ;
time:units = "picosecond" ;
float coordinates(frame, atom, spatial) ;
coordinates:units = "angstrom" ;
double cell_lengths(frame, cell_spatial) ;
cell_lengths:units = "angstrom" ;
double cell_angles(frame, cell_angular) ;
cell_angles:units = "degree" ;
float velocities(frame, atom, spatial) ;
velocities:units = "angstrom/picosecond" ;
velocities:scale_factor = 20.455f ;
// global attributes:
:title = "netCDF output test" ;
:application = "AMBER" ;
:program = "sander" ;
:programVersion = "9.0" ;
:Conventions = "AMBER" ;
:ConventionVersion = "1.0" ;
}
Standards and formats are most useful when they are supported widely, and become less useful and more burdensome if they fragment into multiple dialects. If you plan to support additional variables, dimensions or attributes beyond those described here in a publicly released creator or reader program, please contact the author (jmongan@mccammon.ucsd.edu) for inclusion of these elements into a future revision of this document.