The IDLmlPartition function partitions data so that it can be separated into two or more groups.

Examples


Example 1 splits two arrays into two groups, one with 80% of the elements, the other with 20%:

Features = randomu(seed, 3, 1000)
Values = randomu(seed, 1000)
 
Part = IDLmlPartition({train:80, test:20}, features, values)
 
; These should be 80% of the total
Print, n_elements(part.train.features)
Print, n_elements(part.train.values)
 
; These should be 20% of the total
Print, n_elements(part.test.features)
Print, n_elements(part.test.values)

 

Example 2 splits two arrays into three groups, one with 60% of the elements, one with 30%, and one with 10%:

Features = randomu(seed, 3, 1000)
Values = randomu(seed, 1000)
 
Part = IDLmlPartition({a:0.6, b:0.3, c:0.1}, features, values)
 
; This should be 60% of the total
Print, n_elements(part.a.features)
 
; This should be 30% of the total
Print, n_elements(part.b.features)
 
; This should be 10% of the total
Print, n_elements(part.c.features)
 
; This should be 60% of the total
Print, n_elements(part.a.values)
 
; This should be 30% of the total
Print, n_elements(part.b.values)
 
; This should be 10% of the total
Print, n_elements(part.c.values)

 

Example 3 splits two arrays into three groups of equal size:

Attributes = randomu(seed, 3, 1000)
Values = randomu(seed, 1000)
 
Part = IDLmlPartition({group1:1, group2:1, group3:1}, attributes, values)
 
; These should be 33.3% of the total
Print, n_elements(part.group1.attributes)
Print, n_elements(part.group2.attributes)
Print, n_elements(part.group3.attributes)
Print, n_elements(part.group1.values)
Print, n_elements(part.group2.values)
Print, n_elements(part.group3.values)

Syntax


Result = IDLmlPartition(Partitions, Features, Values [, PARTITION_OFFSET=value])

Return Value


This function returns a dictionary of dictionaries, where the first level of keys is defined by the keys of the Partitions argument, and the second level of keys is defined by the names of the variables passed as arguments. For example, partition = IDLmlPartition({a:60, b:30, c:10}, feats, vals) will return a nested dictionary with the following keys: partition.a.feats, partition.a.vals, partition.b.feats, partition.b.vals, partition.c.feats, and partition.c.vals.

Arguments


Features

Specify an array of features of size n x m, where n is the number of attributes and m is the number of examples.

If you pass in a scalar number for this argument, the function will return the actual indices so you can do the partition yourself.

Partitions

Specify how to partition both features and values. You can use a structure, a dictionary, or an array of numbers. The number of keys in the definition will determine the number of partitions. The keys of the definition will determine the names of the partitions. The values of the definition will determine the relative sizes of the partitions. For example, {train:0.8, test:0.2} will split the dataset into two groups, one named ‘train’, the other named ‘test’, with a relative size of 80% and 20%, respectively.

Values (optional)

Specify an array of values of size m, where m is the number of examples.

If the Features argument is a scalar number, the Values argument is optional.

Keywords


PARTITION_OFFSET (optional)

Set this keyword to return an array to the indices where the partitions are being made.

Version History


8.7.1

Introduced

See Also


IDLmlShuffle