Tip: For information on the current HDF5 version, enter the following at the IDL prompt:
HELP, 'hdf5', /DLM
The Hierarchical Data Format (HDF) version 5 file format was designed for scientific data consisting of a hierarchy of datasets and attributes (or metadata). HDF is a product of the National Center for Supercomputing Applications (NCSA), which supplies the underlying C-language library; IDL provides access to this library via a set of procedures and functions contained in a dynamically loadable module (DLM).
IDL’s HDF5 routines all begin with the prefix "H5_" or "H5*_".
Programming Model
Hierarchical Data Format files are organized in a hierarchical structure. The two primary structures are:
- The HDF5 group: A grouping structure containing instances of zero or more groups or datasets, together with supporting metadata.
- The HDF5 dataset: A multidimensional array of data elements, together with supporting metadata.
HDF attributes are small named datasets that are attached to primary datasets, groups, or named datatypes.
Code Examples
Reading an Image
The following example opens up the hdf5_test.h5 file and reads in a sample image. It is assumed that the user already knows the dataset name, either from using h5dump, or the H5G_GET_MEMBER_NAME function.
PRO ex_read_hdf5
file = FILEPATH('hdf5_test.h5', $
SUBDIRECTORY=['examples', 'data'])
file_id = H5F_OPEN(file)
dataset_id1 = H5D_OPEN(file_id, '/images/Eskimo')
image = H5D_READ(dataset_id1)
dataspace_id = H5D_GET_SPACE(dataset_id1)
dimensions = H5S_GET_SIMPLE_EXTENT_DIMS(dataspace_id)
dataset_id2 = H5D_OPEN(file_id, '/images/Eskimo_palette')
palette = H5D_READ(dataset_id2)
H5S_CLOSE, dataspace_id
H5D_CLOSE, dataset_id1
H5D_CLOSE, dataset_id2
H5F_CLOSE, file_id
DEVICE, DECOMPOSED=0
WINDOW, XSIZE=dimensions[0], YSIZE=dimensions[1]
TVLCT, palette[0,*], palette[1,*], palette[2,*]
TV, image, /ORDER
END
Reading a Subselection
The following example reads only a portion of the previous image, using the dataspace keywords to H5D_READ.
PRO ex_read_hdf5_select
file = FILEPATH('hdf5_test.h5', $
SUBDIRECTORY=['examples', 'data'])
file_id = H5F_OPEN(file)
dataset_id1 = H5D_OPEN(file_id, '/images/Eskimo')
dataspace_id = H5D_GET_SPACE(dataset_id1)
start = [100, 100]
count = [200, 200]
H5S_SELECT_HYPERSLAB, dataspace_id, start, count, $
STRIDE=[2, 2], /RESET
memory_space_id = H5S_CREATE_SIMPLE(count)
image = H5D_READ(dataset_id1, FILE_SPACE=dataspace_id, $
MEMORY_SPACE=memory_space_id)
dataset_id2 = H5D_OPEN(file_id, '/images/Eskimo_palette')
palette = H5D_READ(dataset_id2)
H5S_CLOSE, memory_space_id
H5S_CLOSE, dataspace_id
H5D_CLOSE, dataset_id1
H5D_CLOSE, dataset_id2
H5F_CLOSE, file_id
DEVICE, DECOMPOSED=0
WINDOW, XSIZE=count[0], YSIZE=count[1]
TVLCT, palette[0,*], palette[1,*], palette[2,*]
TV, image, /ORDER
END
Creating a Data File
The following example creates a simple HDF5 data file with a single sample data set. The file is created in the current working directory.
PRO ex_create_hdf5
file = filepath('hdf5_out.h5')
fid = H5F_CREATE(file)
data = hanning(100,150)
datatype_id = H5T_IDL_CREATE(data)
dataspace_id = H5S_CREATE_SIMPLE(size(data,/DIMENSIONS))
dataset_id = H5D_CREATE(fid,$
'Sample data',datatype_id,dataspace_id)
H5D_WRITE,dataset_id,data
H5D_CLOSE,dataset_id
H5S_CLOSE,dataspace_id
H5T_CLOSE,datatype_id
H5F_CLOSE,fid
END
Reading Partial Datasets
To read a portion of a compound dataset or attribute, create a datatype that matches only the elements you wish to retrieve, and specify that datatype as the second argument to the H5D_READ function. The following example creates a simple HDF5 data file in the current directory, then opens the file and reads a portion of the data.
struct = {time:0.0, data:intarr(40)}
r = REPLICATE(struct,20)
r.time = RANDOMU(seed,20)*1000
r.data = INDGEN(40,20)
file = 'h5_test.h5'
fid = H5F_CREATE(file)
dt = H5T_IDL_CREATE(struct)
ds = H5S_CREATE_SIMPLE(N_ELEMENTS(r))
d = H5D_CREATE(fid, 'dataset', dt, ds)
H5D_WRITE, d, r
H5F_CLOSE, fid
fid = H5F_OPEN(file)
d = H5D_OPEN(fid, 'dataset')
struct = {data:intarr(40)}
dt = H5T_IDL_CREATE(struct)
result = H5D_READ(d, dt)
H5F_CLOSE, fid
The IDL HDF5 Library
The IDL HDF5 library consists of an almost direct mapping between the HDF5 library functions and the IDL functions and procedures. The relationship between the IDL routines and the HDF5 library is described in the following subsections.
Routine Names
The IDL routine names are typically identical to the HDF5 function names, with the exception that an underscore is added between the prefix and the actual function. For example, the C function H5get_libversion() is implemented by the IDL function H5_GET_LIBVERSION.
The IDL HDF5 library contains the following function categories:
Prefix |
Category |
Purpose |
H5 |
Library |
General library tasks
|
H5A |
Attribute |
Manipulate attribute datasets
|
H5D |
Dataset |
Manipulate general datasets
|
H5F |
File |
Create, open, and close files
|
H5G |
Group |
Handle groups of other groups or datasets
|
H5I |
Identifier |
Query object identifiers
|
H5R |
Reference |
Reference identifiers
|
H5S |
Dataspace |
Handle dataspace dimensions and selection
|
H5T |
Datatype |
Handle dataset element information
|
Functions Versus Procedures
HDF5 functions that only return an error code are typically implemented as IDL procedures. An example is H5F_CLOSE, which takes a single file identifier number as the argument and closes the file. HDF5 functions that return values are implemented as IDL functions. An example is H5F_OPEN, which takes a filename as the argument and returns a file identifier number.
Error Handling
All HDF5 functions that return an error or status code are checked for failure. If an error occurs, the HDF5 error handling code is called to retrieve the internal HDF5 error message. This error message is printed to the output window, and program execution stops.
Dimension Order
HDF5 uses C row-major ordering instead of IDL column-major ordering. For row major, the first listed dimension varies slowest, while for column major the first listed dimension varies fastest. The IDL HDF5 library handles this difference by automatically reversing the dimensions for all functions that accept lists of dimensions.
For example, an HDF5 file may be known to contain a dataset with dimensions [5][10][50], either as declared in the C code, or from the output from the h5dump utility. When this dataset is read into IDL, the array will have the dimensions listed as [50, 10, 5], using the output from the IDL help function.
HDF5 Datatypes
In HDF5, a datatype is an object that describes the storage format of the individual data points of a data set. There are two categories of datatypes; atomic and compound datatypes:
- Atomic datatypes cannot be decomposed into smaller units at the API level.
- Compound datatypes are a collection of one or more atomic types or small arrays of such types. Compound datatypes are similar to a struct in C or a common block in Fortran. See Compound Datatypes for additional details.
- In addition, HDF5 uses the following terms for different datatype concepts:
- A named datatype is a datatype that is named and stored in a file. Naming is permanent; a datatype cannot be changed after being named. Named datatypes are created from in-memory datatypes using the H5T_COMMIT routine.
- An opaque datatype is a mechanism for describing data which cannot be otherwise described by HDF5. The only properties associated with opaque types are the size in bytes and an ASCII tag string. See Opaque Datatypes for additional details.
- An enumeration datatype is a one-to-one mapping between a set of symbols and an ordered set of integer values. The symbols are passed between IDL and the underlying HDF5 library as character strings. All the values for a particular enumeration datatype are of the same integer type. See Enumeration Datatypes for additional details.
- A variable length array datatype is a sequence of existing datatypes (atomic, variable length, or compound) which are not fixed in length from one dataset location to another. See Variable Length Array Datatypes for additional details.
Compound Datatypes
HDF5 compound datatypes can be compared to C structures, Fortran structures, or SQL records. Compound datatypes can be nested; there is no limitation to the complexity of a compound datatype. Each member of a compound datatype must have a descriptive name, which is the key used to uniquely identify the member within the compound datatype.
Use one of the H5T_COMPOUND_CREATE or H5T_IDL_CREATE routines to create compound datatypes. Use the following routines to work with compound datatypes:
Example
See H5F_CREATE for an extensive example using compound datatypes.
Opaque Datatypes
An opaque datatype contains a series of bytes. It always contains a single element, regardless of the length of the series of bytes it contains.
When an IDL variable is written to a dataset or attribute defined as an opaque datatype, it is written as a string of bytes with no demarcation. When data in a opaque datatype is read into an IDL variable, it is returned as byte array. Use the FIX routine to convert the returned byte array to the appropriate IDL data type.
Use the H5T_IDL_CREATE routine with the OPAQUE keyword to create opaque datatypes. To create an opaque array, use an opaque datatype with the H5T_ARRAY_CREATE routine. A single string tag can be assigned to an opaque datatype to provide auxiliary information about what is contained therein. Create tags using the H5T_SET_TAG routine; retrieve tags using the H5T_GET_TAG routine. HDF5 limits the length of the tag to 255 characters.
Example
The following example creates an opaque datatype and stores within it a 20-element integer array.
file = 'h5_test.h5'
fid = H5F_CREATE(file)
data = INDGEN(20)
dt = H5T_IDL_CREATE(data, /OPAQUE)
ds = H5S_CREATE_SIMPLE(1)
d = H5D_CREATE(fid, 'dataset', dt, ds)
H5D_WRITE, d, data
H5F_CLOSE, fid
fid = H5F_OPEN(file)
d = H5D_OPEN(fid, 'dataset')
result = H5D_READ(d)
H5F_CLOSE, fid
HELP, result
IDL prints:
RESULT BYTE = Array[40]
Note that the result is a 40-element byte array, since each integer requires two bytes.
Enumeration Datatypes
An enumeration datatype consists of a set of (Name, Value) pairs, where:
- Name is a scalar string that is unique within the datatype (a given name string can only be associated with a single value)
- Value is a scalar integer that is unique within the datatype
Note: Name/value pairs must be assigned to the datatype before it is used to create a dataset. The dataset stores the state of the datatype at the time the dataset is created; additional changes to the datatype will not be reflected in the dataset.
Create the enumeration datatype using the H5T_ENUM_CREATE function. Once you have created an enumeration datatype:
- use the H5T_ENUM_INSERT procedure to associate a single name/value pair with the datatype
- use the H5T_ENUM_VALUEOF function to retrieve the value associated with a single name
- use the H5T_ENUM_NAMEOF function to retrieve the name associated with a single value
These routines replicate the facilities provided by the underlying HDF5 library, which deals only with single name/value pairs. To make it easier to read and write entire enumerated lists, IDL provides two helper routines at package the name/value pairs in arrays of IDL IDL_H5_ENUM structures, which have the following definition:
{IDL_H5_ENUM, NAME:'', VALUE:0}
The routines are:
- H5T_ENUM_SET_DATA associates multiple name/value pairs with an enumeration datatype in a single operation. Data can be provided either as a string array of names and an integer array of values or as a single array of IDL_H5_ENUM structures.
- H5T_ENUM_GET_DATA retrieves multiple name/value pairs from an enumeration datatype in a single operation. Data are returned in an array of IDL_H5_ENUM structures.
The H5T_ENUM_VALUES_TO_NAMES function is a helper routine that lets you retrieve the names associated with an array of values in a single operation.
The following routines may also be useful when working with enumeration datatypes:
H5T_GET_MEMBER_INDEX, H5T_GET_MEMBER_NAME, H5T_GET_MEMBER_VALUE
Example
The following example creates an enumeration datatype and saves it to a file. The example then reopens the file and reads the data, printing the names.
file = 'h5_test.h5'
fid = H5F_CREATE(file)
names = ['dog', 'pony', 'turtle', 'emu', 'wildebeest']
values = INDGEN(5)+1
dt = H5T_ENUM_CREATE()
H5T_ENUM_SET_DATA, dt, names, values
ds = H5S_CREATE_SIMPLE(N_ELEMENTS(values))
d = H5D_CREATE(fid, 'dataset', dt, ds)
H5D_WRITE, d, values
H5F_CLOSE, fid
fid = H5F_OPEN(file)
d = H5D_OPEN(fid, 'dataset')
dt = H5D_GET_TYPE(d)
result = H5D_READ(d)
H5F_CLOSE, fid
PRINT, H5T_ENUM_VALUEOF(dt, 'pony')
PRINT, H5T_ENUM_VALUES_TO_NAMES(dt, result)
Variable Length Array Datatypes
HDF5 provides support for variable length arrays, but IDL itself does not. As a result, in order to store data in an HDF5 variable length array you must:
- Create a series of vectors of data in IDL, each with a potentially different length. All vectors must be of the same data type.
-
Store a pointer to each data vector in the PDATA field of an IDL_H5_VLEN structure. The IDL_H5_VLEN structure is defined as follows:
{ IDL_H5_VLEN, pdata:PTR_NEW() }
- Create an array of IDL_H5_VLEN structures that will be stored as an HDF5 variable length array.
-
The IDL_H5_VLEN structure is defined as follows:
{ IDL_H5_VLEN, pdata:PTR_NEW() }
- Create a base HDF5 datatype from one of the data vectors.
- Create an HDF5 variable length datatype from the base datatype.
- Create an HDF5 dataspace of the appropriate size.
- Create an HDF5 dataset.
- Write the array of IDL_H5_VLEN structures to the HDF5 dataset.
Note: IDL string arrays are a special case: see Variable Length String Arrays for details.
Creating a variable length array datatype is a two-step process. First, you must create a base datatype using the H5T_IDL_CREATE function; all data in the variable length array must be of this datatype. Second, you create a variable length array datatype using the base datatype as an input to the H5T_VLEN_CREATE function.
Note: No explicit size is provided to the H5T_VLEN_CREATE function; sizes are determined as needed by the data being written.
Example: Writing a Variable Length Array
file = 'h5_test.h5'
fid = H5F_CREATE(file)
a = INDGEN(2)
b = INDGEN(3)
c = 3
sArray = REPLICATE({IDL_H5_VLEN},3)
sArray[0].pdata = PTR_NEW(a)
sArray[1].pdata = PTR_NEW(b)
sArray[2].pdata = PTR_NEW(c)
dt1 = H5T_IDL_CREATE(a)
dt = H5T_VLEN_CREATE(dt1)
ds = H5S_CREATE_SIMPLE(N_ELEMENTS(sArray))
d = H5D_CREATE(fid,'dataset', dt, ds)
H5D_WRITE, d, sArray
Examples: Reading a Variable Length Array
Using the H5D_READ function to read data written as a variable length array creates an array of IDL_H5_VLEN structures. The following examples show how to refer to individual data elements of various HDF5 datatypes
Atomic HDF5 Datatypes
To read and access data stored in variable length arrays of atomic HDF5 datatypes, dereference the pointer stored in the PDATA field of the appropriate IDL_H5_VLEN structure. For example, to retrieve the variable b from the data written in the above example:
data = H5D_READ(d)
b = *data[1].pdata
Compound HDF5 Datatypes
If you have a variable length array of compound datatypes, the tag tag of the jth structure of the ith element of the variable length array would be accessed as follows:
data = H5D_READ(d)
a = (*data[i].pdata)[j].tag
Variable Length Arrays of Variable Length Arrays
If you have a variable length array of variable length arrays of integers, the kth integer of the jth element of a variable length array stored in the ith element of a variable length array would be accessed as follows:
data = H5D_READ(d)
a = (*(*data[i].pdata)[j].pdata)[k]
Compound Datatypes Containing Variable Length Arrays
If you have a compound datatype containing a variable length array, the kth data element of the jth variable length array in the ith compound datatype would be accessed as follows:
data = H5D_READ(d)
a = (*data[i].vl_array[j].pdata)[k]
Variable Length String Arrays
Because the data vectors referenced by the pointers stored in the PDATA field of the IDL_H5_VLEN structure must all have the same type and dimension, strings are handled as vectors of individual characters rather than as atomic units. This means that each element in a string array must be assigned to an individual IDL_H5_VLEN structure:
str = ['dog', 'dragon', 'duck']
sArray = REPLICATE({IDL_H5_VLEN},3)
sArray[0].pdata = ptr_new(str[0])
sArray[1].pdata = ptr_new(str[1])
sArray[2].pdata = ptr_new(str[2])
Use the H5T_STR_TO_VLEN function to assist in converting between an IDL string array and an HDF5 variable length string array. The following achieves the same result as the above five lines:
str = ['dog', 'dragon', 'duck']
sArray = H5T_STR_TO_VLEN(str)
Similarly, if you have an HDF5 variable length array containing string data, use the H5T_VLEN_TO_STR function to access the string data:
data = H5D_READ(d)
str = H5T_VLEN_TO_STR(data)