17033 Rate this article:

Big Data Science with Climate Analytics-as-a-Service

NASA Center for Climate Simulation Demonstrates On-demand Analytic Processing for Climate Change Research


As one might easily imagine, the Big Data domain of climate science has been faced with unprecedented growth. At the NASA Center for Climate Simulation (NCCS), data scientists carefully curate the Modern-Era Retrospective Analysis for Research and Applications (MERRA) data collection, a synthesis of more than thirty years of observational data integrated with numerical models that is of increasing importance to climate change research.


NCCS has set up MERRA Analytic Services (MERRA/AS) to perform analyses using the MapReduce parallel computing approach running on Hadoop technology. In order to integrate the capabilities of the system for practical use, the Climate Model Data Services (CDS) API has been provided to support web service access for consumer applications, basic instructions from a command line interface, and advanced programmatic capabilities through python development. The Climate Analytics-as-a-Service (CAaaS) technology stack can be deployed on local enterprise hardware or on the cloud.

 Climate models project 21st century global temperatures.

credit: NASA's Scientific Visualization Studio and NASA Center for Climate Simulation


MERRA spans across 160 terabytes, so it makes perfect sense that the analytical services are backed by some serious computational horsepower. In fact, the Hadoop MapReduce operations are running on a computing cluster powered by 36 Dell R710 servers, each with twelve 3 terabyte hard drives and an internal OS disc. Everything is connected through a 36-port InfiniBand switch and a 48-port Gigabit Ethernet switch. Overall, the cluster is capable of around 11 teraflops.


CAaaS provides a climate research specialization of the business-process-as-a-service concept, something that promises to continue gaining popularity as the cloud computational universe evolves. It provides capabilities which themselves demonstrate the power that such an approach may yield: high-performance and adaptive data proximal analytics, scalable data management, software as a virtualized appliance, and a generalized API that exposes reusable data services. NCSS's hope is that it will serve as a useful resource for developing and evaluating the next generation of climate data analysis tools and capabilities. With a promised reduction in the time spent in the preparation of data used to compare different data models – a long sought goal of the climate research community – MERRA/AS and CAaaS are a great real world example of Hadoop and MapReduce being used to drive experimental development of high-performance analytical applications in the climate science domain.