Question about 'MEDIAN' function - NV5 Geospatial

Prev Go to previous topic

Go to next topic

Last Post 18 May 2021 04:29 PM by Ben Castellani

Question about 'MEDIAN' function

3 Replies

Sort:


You are not authorized to post a reply.

Author

Messages

James Lane

New Member

Posts:1

10 May 2021 09:11 AM
Hi all, I had a quick question about the MEDIAN function in IDL that I can't seem to find answers to (https://www.l3harrisgeospatial.com/docs/median.html). I've noticed that when using MEDIAN, you must specify /EVEN as a keyword, otherwise in the case you have an even dataset, the MEDIAN function won't take the average of the middle two numbers. E.g. x = [1,2,3,4] MEDIAN(x) = 3 MEDIAN(x,/EVEN) = 2.5 As far as I'm aware, if you take the MEDIAN on an even set of numbers, you're supposed to average (mean) the middle two. Yet for some reason, the 'default' in IDL is that (unless you specify even), it always gives you what I'm calling the 'upper median'. So MEDIAN([5,6,7,8]) = 7, MEDIAN([10,11,12,13]) = 12, i.e. the value just past halfway through the array. My question is simple: does anybody know why IDL does this by default? Is it 'wrong' to not average the middle two numbers if you're taking the median of a dataset? I processed all the data I was working with using MEDIAN without the /EVEN keyword and I'm trying to figure out whether it's worth reprocessing or not... Thanks James

Ben Castellani

Basic Member

Posts:130

13 May 2021 10:01 AM
IDL was created in the 1980's. Believe it or not, back then the current idea to average the two middle values for datasets with an even number of elements to obtain the median was not standard practice. This idea seems to have emerged in the early 1990's. The Numerical Recipes 2nd Edition book was used to formulate much of core IDL back in the 1980's and 1990's. This book was cutting-edge for its time and described how to use computer code for science! A funny quote about median calculations from this book follows... "One often wants to know the median element in an array, or the top and bottom quartile elements. When N is odd, the median is the kth element, with k = (N + 1)/2. When N is even, statistics books define the median as the arithmetic mean of the elements k = N/2 and k = N/2+1 (that is, N/2 from the bottom and N/2 from the top). If you accept such pedantry, you must perform two separate selections to find these elements. For N > 100 we usually define k = N/2 to be the median element, pedants be damned." See Section 8.5 on Page 333 for more information: https://websites.pmc.ucsc.edu/~fnimmo/eart290c_17/NumericalRecipesinF77.pdf So in summary, yes the accepted method to average the two middle elements is now technically correct, and has been for the last ~30 years. However, in order to maintain backwards compatibility, this "new" method was only included in IDL through the optional EVEN keyword. Hope this helps!