X
PrevPrev Go to previous topic
NextNext Go to next topic
Last Post 11 Sep 2006 09:08 AM by  anon
How to reduce the time to read data
 1 Replies
Sort:
You are not authorized to post a reply.
Author Messages

anon



New Member


Posts:
New Member


--
11 Sep 2006 09:08 AM
    Hello all: I have a HDF5 file contain 18306 groups, each group have a two dimensional array with size [5,500] . Thus I have 18306*500(about 9 million) particles , each particle contians 5 information. In my idl program(http://163.23.210.1/~slchen/plot_ptcls.pro) , I read them group by group and then combine them into a single two dimensional array. It takes two hours to read values from the HDF5 file . Its very time consuming. I want to know if there is any method to reduce the processing time. Please help me to get the related information. Such as if I need to change my programming method in IDL or if there is any solution by clusterDL/mpiDL Thank you very much! cincerely,

    Deleted User



    New Member


    Posts:
    New Member


    --
    11 Sep 2006 09:08 AM
    In your program you have a loop that cycles 18,306 times executing this line: ptcls = [[ptcls],[ptcl]] That looks like a very inefficient approach to array-building. Every time it executes if must allocate a new array address and memory block, it must copy its current contents to the new memory location, and the operating system must eventually free its old memory location. The efficient way to provide memory for a large array is to allocate its memory space in the fewest possible calls, preferably just one. Another IDL trick is to reference the array's memory space via a pointer. Thus, what happens if you try code like this: nColumns = 5 nRowsPerGroup = 500 nRows = nRowsPerGroup * n_grps ptcls_ptr = ptr_new(lonarr(nColumns, nRows)) for i = 0, n_grps-1 do begin ptcl = h5d_read(dataset_id) ; copy latest 'ptcl' starting at the memory address [0, i * 500] (*ptcls_ptr)[0, i * nRowsPerGroup] = ptcl endfor This might be a little faster than the pointer-free algorithm: ptcls = lonarr(nColumns * nRows) for ... ptcls[*, (i * 500):((i+1) * 500 - 1)] = ptcl because of some subscript expansion work that IDL does whenever it sees array subscripts addressed like this [*, startIndex:endIndex]. Anyhow, that is one thing. The other is: Shouldn't you be adding a lot of HDF5 CLOSE-type calls to your code. It seems that every H5D_OPEN at the beginning of your 18,306-cycle FOR loop should have a matchin H5D_CLOSE call at the end of the FOR loop, shouldn't it? And, eventually, shouldn't you have an H5F_CLOSE after you are done importing all the data? James Jones
    You are not authorized to post a reply.