X
PrevPrev Go to previous topic
NextNext Go to next topic
Last Post 19 Apr 2012 01:05 PM by  anon
read_csv
 1 Replies
Sort:
You are not authorized to post a reply.
Author Messages

anon



New Member


Posts:33
New Member


--
19 Apr 2012 01:05 PM
    Hi: I'm using IDL !VERSION: { x86_64 darwin unix Mac OS X 8.0.1 Oct 5 2010 64 64} I tried using the read_csv function res=read_csv(file,header=header,count=nnn) on a CSV file with columns like: 18 Apr 2012 18:00:00.000,33.436869,-26.123210,396.398016,-0.025948,6.358175,4.300346 and one header line. The first column gets cast as double (!!!). IDL> help,res.field1[0] DOUBLE = 18.000000 I looked at the read_csv code, and it seems like the logic could be cleaned up a bit: I'd say a column where all the values are digits or +- is integer floats could be: [+-][ddddd].[dddddd][E|D][+-][ddddd] If there are any characters beyond .+- DE then it is string Seems like a good use of Regular expressions. -M

    Deleted User



    New Member


    Posts:33
    New Member


    --
    23 Apr 2012 08:58 AM
    It seems to me that is there are "characters not used in representing numbers" then it can be called a string; likewise, if there are no numerals, then it also should automatically be a string. If one defines: like_not_num='[^.+-0123456789de]' ; MJM [^....] means not this list like_int='(^[+-]?[0123456789]+$)' ; "^" means different here than above; here it anchors start of expression like_real='^([+-]?[0123456789]*)(\.)([0123456789]*)([eEdD]?)([+-]?[0-9]*)$' like_real1='^([+-]?[0123456789]*)(\.)([0123456789]*)$' ; non-exponents like_real2='^([+-]?[0123456789]+)(\.)([0123456789]*)([eEdD]{1})([+-]?[0-9]+)$' ; exponents num before dec 5.e5 and 5.5e5 like_real3='^([+-]?[0123456789]+)([eEdD]{1})([+-]?[0-9]+)$' ; exponents, no decimal, i.e. 5e-5 like_real4='^([+-]?[0123456789]*)(\.)([0123456789]+)([eEdD]{1})([+-]?[0-9]+)$' ; exponents num after dec .5e5 and 5.5e5 And then in the section of read_csv.pro where the type determination is done insert this: ; MJM test to see if any non-numerical chars, indicating must be string test_m=stregex(strtrim(subdata,2),like_not_num) if (max(test_m) ne -1) then continue ; this col must be string since at least one has non num chars test_n=stregex(strtrim(subdata,2),'[0-9]+') ; require at least one numeral in non-string ; hmm.. max(test_n) eq -1 means ALL rows in col have no digits ; min(test_n) eq -1 means at least one row in col has no digits if (min(test_n) eq -1) then continue ; no numbers, must be string (guards against combos of +-.de) ; end test ; OK, at this point there is at least on numeral in each row in sub-column under consideration test_i=stregex(strtrim(subdata,2),like_int) ; if all >=0, consistent with integers test_r1=stregex(strtrim(subdata,2),like_real1) ; 5.5 test_r2=stregex(strtrim(subdata,2),like_real2) ; 5.e5 and 5.5e5 test_r3=stregex(strtrim(subdata,2),like_real3) ; 5e5 test_r4=stregex(strtrim(subdata,2),like_real4) ; .5e5 and 5.5e5 test_num=test_r1+test_r2+test_r3+test_r4+test_i if (min(test_num) eq -5) then continue ; at least one row in this col cannot be a valid number so col is string ;;;;; END MJM tests -M
    You are not authorized to post a reply.