Tip: See also the IDL_String.Extract and IDL_String.Split methods, which provide similar functionality but with an object-oriented interface.
The STRSPLIT function splits its input String argument into separate substrings, according to the specified delimiter or regular expression. By default, an array of the position of the substrings is returned. The EXTRACT keyword can be used to cause STRSPLIT to return an array containing the substrings.
STRSPLIT can also be used to split an array of strings. In this case a LIST is returned with the results.
Example
To split a string on spans of whitespace and replace them with hyphens:
Str = 'STRSPLIT chops up strings.'
print, STRJOIN(STRSPLIT(Str, /EXTRACT), '-')
IDL prints:
STRSPLIT-chops-up-strings.
Syntax
Result = STRSPLIT( String [, Pattern] [, COUNT=variable] [, ESCAPE=string | , /REGEX [, /FOLD_CASE]] [, /EXTRACT | , LENGTH=variable] [, /PRESERVE_NULL] )
Return Value
If String is a scalar string or 1-element string array, then STRSPLIT returns an array containing either the positions of the substrings or the substrings themselves (if the EXTRACT keyword is specified).
If String is an array of strings, then STRSPLIT returns a variable of type LIST, where each element of the list contains the result of calling STRSPLIT on a particular element of String.
Arguments
String
A scalar string or a string array to be split into substrings.
Pattern
A scalar string or a string array that can contain one of two types of information:
- One or more single characters, each of which is considered to be a separator. String will be split when any of the characters is detected. For example, if Pattern is " ,"String will be split whenever either a space or a comma is detected. In this case, IDL performs a simple string search for the specified characters. This method is simple and fast.
- If the REGEX keyword is specified, Pattern is considered to be a single regular expression (as implemented by the STREGEX function). This method is slower and more complex, but can handle extremely complicated Pattern strings.
In either case, if the EXTRACT keyword is specified, the separator characters are not included in the result.
Note: Pattern is an optional argument. If it is not specified, STRSPLIT defaults to splitting on spans of whitespace (space or tab characters) in String.
Note: If String is a string array, and Pattern is a scalar, then the same pattern is used for each string. If Pattern is an array, then it must have the same number of elements as String. In this case each element in Pattern is matched up with its corresponding element in String.
Keywords
COUNT
Set this keyword to a named variable that will contain the number of matched substrings returned by STRSPLIT. This value will be 0 if either of the String or Pattern arguments is null. Otherwise, it will contain the number of elements in the Result array. If String is an array, then COUNT will be an array with the same number of elements.
ESCAPE
When doing simple pattern matching, the ESCAPE keyword can be used to specify any characters that should be considered to be “escape” characters. Preceding any character with an escape character prevents STRSPLIT from treating it as a separator character even if it is found in Pattern.
Note that if the EXTRACT keyword is set, STRSPLIT will automatically remove the escape characters from the resulting substrings. If EXTRACT is not specified, STRSPLIT cannot perform this editing, and the returned position and offsets will include the escape characters.
For example:
print, STRSPLIT('a\,b,c', ',', ESCAPE='\', /EXTRACT)
IDL prints:
a,b c
ESCAPE cannot be specified with the FOLD_CASE or REGEX keywords.
EXTRACT
By default, STRSPLIT returns an array of character offsets into String that indicate where the substrings are located. These offsets, along with the lengths available from the LENGTH keyword can be used later with STRMID to extract the substrings. Set EXTRACT to bypass this step, and cause STRSPLIT to return the substrings.
FOLD_CASE
Indicates that the regular expression matching should be done in a case-insensitive fashion. FOLD_CASE can only be specified if the REGEX keyword is set, and cannot be used with the ESCAPE keyword.
LENGTH
Set this keyword to a named variable to receive the lengths of the substrings. Together with this result of this function, LENGTH can be used with the STRMID function to extract the matched substrings.
If String is an array, then LENGTH will be a LIST, where each element of the list is an array of the lengths of the substrings for a particular element of String.
PRESERVE_NULL
Normally, STRSPLIT will not return null length substrings unless there are no non-null values to report, in which case STRSPLIT will return a single empty string. Set PRESERVE_NULL to cause all null substrings to be returned.
Note: With PRESERVE_NULL set, null strings are returned when the search pattern occurs at either end of the input string, or when the search pattern occurs multiple times sequentially.
REGEX
For complex splitting tasks, the REGEX keyword can be specified. In this case, Pattern is taken to be a regular expression to be matched against String to locate the separators. If REGEX is specified and Pattern is not, the default Pattern is the regular expression:
'[ ' + STRING(9B) + ']+'
which means “any series of one or more space or tab characters” (9B is the byte value of the ASCII TAB character).
Note that the default Pattern contains a space after the [ character.
The REGEX keyword cannot be used with the ESCAPE keyword.
For more information about regular expressions, see Regular Expressions.
Tip: To have STRSPLIT split on a multi-character separator pattern (instead of a list of two or more individual separator characters), use the REGEX keyword.
Additional Examples
1. As an example of a more complex splitting task that can be handled with the simple character-matching mode of STRSPLIT, consider a sentence describing different colored ampersand characters. For unknown reasons, the author used commas to separate all the words, and used ampersands or backslashes to escape the commas that actually appear in the sentence (which therefore should not be treated as separators). The unprocessed string looks like:
Str = 'There,was,a,red,&&&,,a,yellow,&&\,,and,a,blue,\&&.'
We use STRSPLIT to break this line apart, and STRJOIN to reassemble it as a standard blank-separated sentence:
S = STRSPLIT(Str, ',', ESCAPE='&\', /EXTRACT)
PRINT, STRJOIN(S, ' ')
IDL prints:
There was a red &, a yellow &, and a blue &.
2. Strings separated by multi-character delimiters cannot be split using the simple character matching mode of STRSPLIT. Such delimiters require the use of a regular expression. For instance, consider splitting the following string on double ampersand boundaries.
str = 'red&&blue&&yellow&&odds&ends'
The desired result of such splitting would be four strings, with the values ‘red’, ‘blue’, ‘yellow’, and ‘odds&ends’. You might be tempted to use STRSPLIT as follows:
PRINT, STRSPLIT(str,'&&',/EXTRACT)
which causes IDL to print:
red blue yellow odds ends
IDL split the string on single ampersand boundaries, yielding 5 strings instead of the desired 4. When using the simple character matching mode of STRSPLIT, the characters in the Pattern argument specify a set of possible single character delimiters. The order of these characters is unimportant, and specifying a character more than once has no effect (the extras are ignored).
To properly split the above string using a regular expression:
print, strsplit(str,'&&',/EXTRACT, /REGEX)
producing the desired IDL output:
red blue yellow odds&ends
3. Suppose you had a complicated string, in which every token was preceded by the count of characters in that token, with the count enclosed in angle brackets:
str = '<4>What<1>a<7>tangled<3>web<2>we<6>weave.'
This is too complex to handle with simple character matching, but can be easily handled using the regular expression '<[0-9]+>' to match the separators. This regular expression can be read as “an opening angle bracket, followed by one or more numeric characters between 0 and 9, followed by a closing angle bracket.” The STRJOIN function is used to glue the resulting substrings back together:
S = STRSPLIT(str,'<[0-9]+>',/EXTRACT,/REGEX)
PRINT, STRJOIN(S, ' ')
IDL prints:
What a tangled web we weave.
4. Here we take a string array, and split each element. The result is a variable of type LIST, where each element is the result of STRSPLIT on the corresponding string:
str = ['Hwæt! We Gardena in geardagum,', $
'þeodcyninga, þrym gefrunon,', $
'hu ða æþelingas ellen fremedon.', $
'Oft Scyld Scefing sceaþena þreatum,', $
'monegum mægþum, meodosetla ofteah,', $
'egsode eorlas. Syððan ærest wearð', $
'feasceaft funden, he þæs frofre gebad,', $
'weox under wolcnum, weorðmyndum þah,', $
'oðþæt him æghwylc þara ymbsittendra', $
'ofer hronrade hyran scolde,', $
'gomban gyldan. þæt wæs god cyning!']
result = STRSPLIT(str, ' ', /EXTRACT, COUNT=count, LENGTH=length)
HELP, result, count, length
FOREACH item, result DO PRINT, STRJOIN(item,'|')
IDL prints:
RESULT LIST <ID=1 NELEMENTS=11>
COUNT LONG = Array[11]
LENGTH LIST <ID=2 NELEMENTS=11>
Hwæt!|We|Gardena|in|geardagum,
þeodcyninga,|þrym|gefrunon,
hu|ða|æþelingas|ellen|fremedon.
Oft|Scyld|Scefing|sceaþena|þreatum,
monegum|mægþum,|meodosetla|ofteah,
egsode|eorlas.|Syððan|ærest|wearð
feasceaft|funden,|he|þæs|frofre|gebad,
weox|under|wolcnum,|weorðmyndum|þah,
oðþæt|him|æghwylc|þara|ymbsittendra
ofer|hronrade|hyran|scolde,
gomban|gyldan.|þæt|wæs|god|cyning!
5. Using the PRESERVE_NULL keyword:
string1 = ",one,two,three,"
split1 = STRSPLIT(string1, ',', /PRESERVE_NULL, /EXTRACT)
HELP, split1
foreach s,split1 do print, strlen(s),':',s
IDL Prints:
SPLIT1 STRING = Array[5]
0:
3:one
3:two
5:three
0:
Try a variation, showing how null strings are generated when the search pattern is repeated sequentially (i.e., twice in a row):
string2 = ",one,,two,,three,"
split2 = STRSPLIT(string2, ',', /PRESERVE_NULL, /EXTRACT)
HELP, split2
foreach s,split2 do print, strlen(s),':',s
IDL Prints:
SPLIT2 STRING = Array[7]
0:
3:one
0:
3:two
0:
5:three
0:
IDL creates the null elements when it encounters the two commas together: ',,'
Version History
5.3 |
Introduced |
6.0 |
Added COUNT keyword
|
8.0 |
Allow string arrays
|
See Also
String Operations, String Processing, STRCMP, STRJOIN, STRMATCH, STREGEX, STRMID, STRPOS, LIST, Regular Expressions, IDL_String