X

NV5 Geospatial Blog

Each month, NV5 Geospatial posts new blog content across a variety of categories. Browse our latest posts below to learn about important geospatial information or use the search bar to find a specific topic or author. Stay informed of the latest blog posts, events, and technologies by joining our email list!



Not All Supernovae Are Created Equal: Rethinking the Universe’s Measuring Tools

Not All Supernovae Are Created Equal: Rethinking the Universe’s Measuring Tools

6/3/2025

Rethinking the Reliability of Type 1a Supernovae   How do astronomers measure the universe? It all starts with distance. From gauging the size of a galaxy to calculating how fast the universe is expanding, measuring cosmic distances is essential to understanding everything in the sky. For nearby stars, astronomers use... Read More >

Using LLMs To Research Remote Sensing Software: Helpful, but Incomplete

Using LLMs To Research Remote Sensing Software: Helpful, but Incomplete

5/26/2025

Whether you’re new to remote sensing or a seasoned expert, there is no doubt that large language models (LLMs) like OpenAI’s ChatGPT or Google’s Gemini can be incredibly useful in many aspects of research. From exploring the electromagnetic spectrum to creating object detection models using the latest deep learning... Read More >

From Image to Insight: How GEOINT Automation Is Changing the Speed of Decision-Making

From Image to Insight: How GEOINT Automation Is Changing the Speed of Decision-Making

4/28/2025

When every second counts, the ability to process geospatial data rapidly and accurately isn’t just helpful, it’s critical. Geospatial Intelligence (GEOINT) has always played a pivotal role in defense, security, and disaster response. But in high-tempo operations, traditional workflows are no longer fast enough. Analysts are... Read More >

Thermal Infrared Echoes: Illuminating the Last Gasp of a Dying Star

Thermal Infrared Echoes: Illuminating the Last Gasp of a Dying Star

4/24/2025

This blog was written by Eli Dwek, Emeritus, NASA Goddard Space Flight Center, Greenbelt, MD and Research Fellow, Center for Astrophysics, Harvard & Smithsonian, Cambridge, MA. It is the fifth blog in a series showcasing our IDL® Fellows program which supports passionate retired IDL users who may need support to continue their work... Read More >

A New Era of Hyperspectral Imaging with ENVI® and Wyvern’s Open Data Program

A New Era of Hyperspectral Imaging with ENVI® and Wyvern’s Open Data Program

2/25/2025

This blog was written in collaboration with Adam O’Connor from Wyvern.   As hyperspectral imaging (HSI) continues to grow in importance, access to high-quality satellite data is key to unlocking new insights in environmental monitoring, agriculture, forestry, mining, security, energy infrastructure management, and more.... Read More >

1345678910Last
15070 Rate this article:
5.0

Why Hadoop is Kind of a Big Deal

Hadoop Brings Big Data Capabilities to the Enterprise

Anonym

It would seem that the topic of Big Data is everywhere you turn these days. As with any major disruptive new technology, it didn't take long for the hype cycle's generative buzz to eclipse the actual initial adoption of tools brought to market by the Big Data vanguard. Perhaps nothing illustrates this better than the example of Hadoop. With more than half of the Fortune 50 and prominent social media names like Facebook already using it, Hadoop has made a splash big enough to send waves well beyond IT-centric media, even as a solid understanding of what Hadoop is has largely eluded many in the mainstream.

Blue Gene / P by Argonne National Laboratory is licensed under CCBY-SA 2.0

Chances are that by now that many people in your organization have heard about Hadoop. There is also an even higher probability that many of them aren't certain as to what Hadoop is exactly, or why it is so important. The siren song of extracting profit from the overwhelming information glut of today is alluring, but realizing this potential has been a daunting challenge for many enterprises. It can be a challenge even to explain the answers to two basic questions in straight-forward manner: What is Hadoop, and why does it matter?

The simplest response is that Hadoop is a software framework for Big Data, and Big Data is important. To further understand Hadoop, one should understand the basics of how it stores and processes data.

Hadoop logo Copyright © 2014 Apache Software Foundation is licensed under the Apache License, Version 2.0

The way Hadoop stores data files is key. Hadoop's file system is distributed across multiple computers. If a file gets stored in Hadoop, multiple copies are saved on different nodes, which is good because the file will still remain should any one of those given nodes fail for any reason. Furthermore, really large files can be stored in Hadoop; files with sizes much larger than could be fit on any single computer's disks.

The other side of the equation involves how Hadoop processes this distributed data. It dovetails well with the way Hadoop stores data, because a main principle embraced by Hadoop is something termed data locality. Traditionally, an enterprise will move data to the server that is intended to process it. While this seems logical, in fact moving data over a network can be a very time and resource intensive process, especially as the data increases in size. Anyone who has waited impatiently for a very large file to download to their computer before it opens can appreciate this fact. Hadoop embraces the principle that it is better to process data where it resides than to move it somewhere else for processing. Essentially, every node that stores data is a server capable of processing that data in place. All that need be sent to the node over the wire is the processing instructions themselves, which are small in size and won't congest and tie up the network.

The ability to distribute a large quantity of data over a multi-node network that can process it in place means that Hadoop has the capability to operate on an enormous amount of data in parallel. Hadoop excels at applying data-processing tasks that can be run independently on each constituent piece of a large data set, bringing the final results together as an integrated whole in the end.

That may not sound particularly impressive, or it may sound like a lot of trouble to go to or even an unnecessary over-complication, but it is everything. Hadoop is designed to distribute data and computational workloads across clusters of commodity hardware, offering linearly scalable computational power capable of handling massive amounts of data that is tolerant of (inevitable) hardware failures. In other words, exactly the way that more and more data will be processed as we proceed into the future.

And that is a very big deal.

Please login or register to post comments.