Geo Sessions 2025: Geospatial Vision Beyond the Map

8/5/2025

Lidar, SAR, and Spectral: Geospatial Innovation on the Horizon Last year, Geo Sessions brought together over 5,300 registrants from 159 countries, with attendees representing education, government agencies, consulting, and top geospatial companies like Esri, NOAA, Airbus, Planet, and USGS. At this year's Geo Sessions, NV5 is... Read More >

Not All Supernovae Are Created Equal: Rethinking the Universe’s Measuring Tools

6/3/2025

Rethinking the Reliability of Type 1a Supernovae How do astronomers measure the universe? It all starts with distance. From gauging the size of a galaxy to calculating how fast the universe is expanding, measuring cosmic distances is essential to understanding everything in the sky. For nearby stars, astronomers use... Read More >

Using LLMs To Research Remote Sensing Software: Helpful, but Incomplete

5/26/2025

Whether you’re new to remote sensing or a seasoned expert, there is no doubt that large language models (LLMs) like OpenAI’s ChatGPT or Google’s Gemini can be incredibly useful in many aspects of research. From exploring the electromagnetic spectrum to creating object detection models using the latest deep learning... Read More >

From Image to Insight: How GEOINT Automation Is Changing the Speed of Decision-Making

4/28/2025

When every second counts, the ability to process geospatial data rapidly and accurately isn’t just helpful, it’s critical. Geospatial Intelligence (GEOINT) has always played a pivotal role in defense, security, and disaster response. But in high-tempo operations, traditional workflows are no longer fast enough. Analysts are... Read More >

Thermal Infrared Echoes: Illuminating the Last Gasp of a Dying Star

4/24/2025

This blog was written by Eli Dwek, Emeritus, NASA Goddard Space Flight Center, Greenbelt, MD and Research Fellow, Center for Astrophysics, Harvard & Smithsonian, Cambridge, MA. It is the fifth blog in a series showcasing our IDL® Fellows program which supports passionate retired IDL users who may need support to continue their work... Read More >

1 2 3 4 5 6 7 8 9 10 Next Last

«

August 2025

»

Sun

Mon

Tue

Wed

Thu

Fri

Sat

27

28

29

30

31

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

1

2

3

4

5

6

16365 Rate this article:

5.0

Why Hadoop is Kind of a Big Deal

Hadoop Brings Big Data Capabilities to the Enterprise

Anonym Thursday, June 19, 2014

It would seem that the topic of Big Data is everywhere you turn these days. As with any major disruptive new technology, it didn't take long for the hype cycle's generative buzz to eclipse the actual initial adoption of tools brought to market by the Big Data vanguard. Perhaps nothing illustrates this better than the example of Hadoop. With more than half of the Fortune 50 and prominent social media names like Facebook already using it, Hadoop has made a splash big enough to send waves well beyond IT-centric media, even as a solid understanding of what Hadoop is has largely eluded many in the mainstream.

Blue Gene / P by Argonne National Laboratory is licensed under CCBY-SA 2.0

Chances are that by now that many people in your organization have heard about Hadoop. There is also an even higher probability that many of them aren't certain as to what Hadoop is exactly, or why it is so important. The siren song of extracting profit from the overwhelming information glut of today is alluring, but realizing this potential has been a daunting challenge for many enterprises. It can be a challenge even to explain the answers to two basic questions in straight-forward manner: What is Hadoop, and why does it matter?

The simplest response is that Hadoop is a software framework for Big Data, and Big Data is important. To further understand Hadoop, one should understand the basics of how it stores and processes data.

The way Hadoop stores data files is key. Hadoop's file system is distributed across multiple computers. If a file gets stored in Hadoop, multiple copies are saved on different nodes, which is good because the file will still remain should any one of those given nodes fail for any reason. Furthermore, really large files can be stored in Hadoop; files with sizes much larger than could be fit on any single computer's disks.

The other side of the equation involves how Hadoop processes this distributed data. It dovetails well with the way Hadoop stores data, because a main principle embraced by Hadoop is something termed data locality. Traditionally, an enterprise will move data to the server that is intended to process it. While this seems logical, in fact moving data over a network can be a very time and resource intensive process, especially as the data increases in size. Anyone who has waited impatiently for a very large file to download to their computer before it opens can appreciate this fact. Hadoop embraces the principle that it is better to process data where it resides than to move it somewhere else for processing. Essentially, every node that stores data is a server capable of processing that data in place. All that need be sent to the node over the wire is the processing instructions themselves, which are small in size and won't congest and tie up the network.

The ability to distribute a large quantity of data over a multi-node network that can process it in place means that Hadoop has the capability to operate on an enormous amount of data in parallel. Hadoop excels at applying data-processing tasks that can be run independently on each constituent piece of a large data set, bringing the final results together as an integrated whole in the end.

That may not sound particularly impressive, or it may sound like a lot of trouble to go to or even an unnecessary over-complication, but it is everything. Hadoop is designed to distribute data and computational workloads across clusters of commodity hardware, offering linearly scalable computational power capable of handling massive amounts of data that is tolerant of (inevitable) hardware failures. In other words, exactly the way that more and more data will be processed as we proceed into the future.

And that is a very big deal.

Please login or register to post comments.

Sending email from IDL Hash syntax for accessing children in (New) Graphics

NV5 Geospatial Blog