X
10211

Example how to write large data to an HDF5 file piece by piece

Topic:

This article shows an example of how to write a large set of data to an HDF5 file by breaking it down into smaller pieces and writing it to the file piece by piece. 

Discussion:

The program "test_write_h5d" (shown below) generates a set of data that is random is size*. It then creates a HDF5 file and opens it. The data is then broken into 20x20 data segments and each segment is written to the file within a FOR loop.  

NOTE*: It's not entirely randomly size. The dimensions are rounded to the closest multiple of 20. 

pro test_write_h5d
  compile_opt idl2

  ;Create a new HDF 5 file
  file = 'mytest_h5_file.h5'
  fid = H5F_CREATE(file)

  ;randomly creating initial dimensions
  random_number1 = sort(randomu(seed, 1000))
  random_number2 = sort(randomu(seed, 2000))
 
  dim1 = random_number1[0]
  dim2 = random_number2[0]
  
  ;create a big data 
  data = hanning(dim1,dim2)
  
  ;define the step size for 
  ;each dimension
  step1 = 20
  step2 = 40
  
  ;determine the numbers of 
  ;steps needed to write entire
  ;data set. 
  nstep1 = dim1/step1
  nstep2 = dim2/step2
  
  ;redefine the dimensions to be "nice" 
  ;with the step size
  dim1 = nstep1*step1
  dim2 = nstep2*step2
  
  ;create a big data
  data = hanning(dim1,dim2)
  
  ; extract an small segment of the 
  ; data from the larger array
  data_segment = data[0:(step1-1),0:(step2-1)]

  ; create a datatype
  datatype_id = H5T_IDL_CREATE(data)


  ; create a dataspace, allow the dataspace to be extendable
  dataspace_id = H5S_CREATE_SIMPLE([step1,step2],max_dimensions=[-1,-1])

  ; create the dataset
  dataset_id = H5D_CREATE(fid,'Hanning', datatype_id,dataspace_id, chunk_dimensions=[step1,step2])

  ; extend the size of the dataset to fit the data
  H5D_EXTEND,dataset_id,size(data_segment,/dimensions)

  ; write the data to the dataset
  H5D_WRITE,dataset_id,data_segment


  ;Now do the same thing with the rest of 
  ;the data in a piece by piece fashion
  for ind1 = 0L, nstep1-1 do begin

    for ind2 = 0L, nstep2-1 do begin
         
         ;if the  file data space for the 
         ;iterator exist, close it
         if (isa(iter_data_space_id)) then begin
             H5S_CLOSE, iter_data_space_id
         endif 
         
         ;if the memory data space for the
         ;iterator exist, close it
         if (isa(iter_data_space_id2)) then begin
             H5S_CLOSE, iter_data_space_id2
         endif 
         
         ;Determine which indices of the data array
         ;to start the next segment of the data 
         start1 = ind1 * step1
         start2 = ind2 * step2
      
         ;Define the data segment to be written 
         ;to the HDF file
         data_segment = data[start1:(start1+step1-1),start2:(start2+step2-1)]
         
         ;Extend the data set by the step size. The number 
         ;being entered as the dimensions is the new TOTAL
         ;elements in each dimension (not the change).  
         h5d_extend, dataset_id, [start1+step1, start2+step2]
         
         ;Generate a new dataspace
         iter_data_space_id = h5d_get_space(dataset_id)
         
         ;Select the slab of data that should include the new data
         H5S_SELECT_HYPERSLAB, iter_data_space_id, [start1,start2], $
                               [step1,step2], /RESET

         
         ;Create the memory data space to 
         iter_data_space_id2 = h5s_create_simple([step1,step2])
         
         ;Write the data to the file using file data space 
         ;and memory data space generated in this loop  
         h5d_write, dataset_id, data_segment, $
                    FILE_SPACE_ID=iter_data_space_id,$
                    MEMORY_SPACE_ID=iter_data_space_id2
         
    endfor

  endfor

  ; close some identifiers
  H5S_CLOSE, iter_data_space_id
  H5S_CLOSE, iter_data_space_id2
  H5S_CLOSE,dataspace_id
  H5D_CLOSE,dataset_id
  H5T_CLOSE,datatype_id
  H5F_CLOSE,fid

 
 help, data

 ;quickly read the document out 
 h5_list, file
 in_dat = H5_GETDATA(file, '/Hanning')
 s=surface(in_dat)
  
end