X
PrevPrev Go to previous topic
NextNext Go to next topic
Last Post 06 Feb 2019 08:00 AM by  MariM
/compress with openw and openr on large files failing
 2 Replies
Sort:
You are not authorized to post a reply.
Author Messages

Jonathan Joseph



New Member


Posts:3
New Member


--
05 Feb 2019 11:31 AM
    I am working with multi-channel data and generating floating point products that are on order 15k x 5k pixels by 61 channels. I can write and read this data uncompressed using openw and openr without issue. The files are about 18GB. However, in each of those 16 channels (which contain about 75 million pixels each) only about a million pixels have real data - so these files will compress dramatically using gzip (down to about 200MB).

    Unfortunately, using the /compress keyword to openw and openr is not working properly for such large amounts of data. It doesn't crash or seem to show any errors and completes happily, but the data stored in the file is compromised - being all zeros after a certain point in the file. I ran a few tests changing the number of channels, and it seems to fail between an uncompressed data size of about 3GB and 6GB (so I'm assuming that the failure point is likely at 4GB = 2^32). Strangely, when it does fail, the amount of correct data is not at 4GB or 2GB, but always at ~ 1.3GB, after which it is just zeros. I can read and write these files with idl openr,/compress and openw,/compress without any IDL complaints, but the data after ~1.3GB is just zeros. I can read/write the 3GB compressed file without any problems and all the data is fine. So it's a little non-intuitive - the size of the data required to cause a failure is larger than the location of the failure once the file gets that large. Write/read 3GB of data no problem, but try to write/read 6GB of data and you only get back about 1.3GB of actual data.

    If I take a large uncompressed file that I can read/write with no issues and gzip from the shell, reading .gz file in to IDL using openr, /compress also fails (correctly reads only the first ~1.3GB of data and zeros after that). If I have a .gz file that I created using openw,/compress from data that was more than 4G (I used 6GB of data in this test), if I then unzip that file from the shell, the gunzip command does not complain, but the size of the unzipped file is incorrect . In this case, the unzipped file contains ~1.7 GB of data (instead of the full 6GB of data with it all being zeros after 1.3GB). Trying to read this in IDL using openw fails because it is expecting there to be more data, so it hits the end of file.

    I can work around this, by dealing with the uncompressed files - but it would be nice if this worked properly.

    Thanks.

    -JJ

    Running IDL 8.2.2 on Ubuntu 12.04.5

    Jonathan Joseph



    New Member


    Posts:3
    New Member


    --
    06 Feb 2019 07:38 AM
    Upon further investigation, I suspect an internal error in the IDL routine that is reading/writing compressed data, where a number is being computed as a long integer instead of a 64 bit long integer. When writing a data array that is 14773x5138x20 floating point values - the actual size should be 6072293920 bytes. When converted to a long integer, this becomes 1777326624, which is exactly (to the byte) the amount of data being saved (as noted by the size of the file when gunzipped from the shell, or the number of actual non-zero values read back in when using openw, /compress). Likewise, in the case of 61 channels, 4773x5138x61 floating point values - actual size should be 18520496456 bytes. Instead I am seeing 1340627272 (which is what you get when doing the arithmetic in long instead of long 64).

    This is a 64 bit IDL install on a 64 bit system. This seems like an IDL bug. I don't see any restrictions in the on /compress in the man page for openr/openu/openw

    > idl
    IDL Version 8.2.2 (linux x86_64 m64). (c) 2012, Exelis Visual Information Solutions, Inc.

    IDL> print, !version.memory_bits
    64
    IDL> print, !version.file_offset_bits
    64

    MariM



    Veteran Member


    Posts:2396
    Veteran Member


    --
    06 Feb 2019 08:00 AM
    Hi Jonathan,
    Do you have access to IDL 8.5 or newer? I think the bug you are running into was fixed in that version. I found this in our bug database:

    IDL-69378: Numerical arrays of size >= 4GB saved to compressed IDL data Save file appear to be corrupted
    Fix: Process data in 4GB chunks to avoid 32-bit integer limits within the ZLIB library.
    You are not authorized to post a reply.