X

NV5 Geospatial Blog

Each month, NV5 Geospatial posts new blog content across a variety of categories. Browse our latest posts below to learn about important geospatial information or use the search bar to find a specific topic or author. Stay informed of the latest blog posts, events, and technologies by joining our email list!



From Image to Insight: How GEOINT Automation Is Changing the Speed of Decision-Making

From Image to Insight: How GEOINT Automation Is Changing the Speed of Decision-Making

4/28/2025

When every second counts, the ability to process geospatial data rapidly and accurately isn’t just helpful, it’s critical. Geospatial Intelligence (GEOINT) has always played a pivotal role in defense, security, and disaster response. But in high-tempo operations, traditional workflows are no longer fast enough. Analysts are... Read More >

Thermal Infrared Echoes: Illuminating the Last Gasp of a Dying Star

Thermal Infrared Echoes: Illuminating the Last Gasp of a Dying Star

4/24/2025

This blog was written by Eli Dwek, Emeritus, NASA Goddard Space Flight Center, Greenbelt, MD and Research Fellow, Center for Astrophysics, Harvard & Smithsonian, Cambridge, MA. It is the fifth blog in a series showcasing our IDL® Fellows program which supports passionate retired IDL users who may need support to continue their work... Read More >

A New Era of Hyperspectral Imaging with ENVI® and Wyvern’s Open Data Program

A New Era of Hyperspectral Imaging with ENVI® and Wyvern’s Open Data Program

2/25/2025

This blog was written in collaboration with Adam O’Connor from Wyvern.   As hyperspectral imaging (HSI) continues to grow in importance, access to high-quality satellite data is key to unlocking new insights in environmental monitoring, agriculture, forestry, mining, security, energy infrastructure management, and more.... Read More >

Ensure Mission Success With the Deployable Tactical Analytics Kit (DTAK)

Ensure Mission Success With the Deployable Tactical Analytics Kit (DTAK)

2/11/2025

In today’s fast-evolving world, operational success hinges on real-time geospatial intelligence and data-driven decisions. Whether it’s responding to natural disasters, securing borders, or executing military operations, having the right tools to integrate and analyze data can mean the difference between success and failure.... Read More >

How the COVID-19 Lockdown Improved Air Quality in Ecuador: A Deep Dive Using Satellite Data and ENVI® Software

How the COVID-19 Lockdown Improved Air Quality in Ecuador: A Deep Dive Using Satellite Data and ENVI® Software

1/21/2025

The COVID-19 pandemic drastically altered daily life, leading to unexpected environmental changes, particularly in air quality. Ecuador, like many other countries, experienced significant shifts in pollutant concentrations due to lockdown measures. In collaboration with Geospace Solutions and Universidad de las Fuerzas Armadas ESPE,... Read More >

1345678910Last
«May 2025»
SunMonTueWedThuFriSat
27282930123
45678910
11121314151617
18192021222324
25262728293031
1234567
14879 Rate this article:
5.0

Why Hadoop is Kind of a Big Deal

Hadoop Brings Big Data Capabilities to the Enterprise

Anonym

It would seem that the topic of Big Data is everywhere you turn these days. As with any major disruptive new technology, it didn't take long for the hype cycle's generative buzz to eclipse the actual initial adoption of tools brought to market by the Big Data vanguard. Perhaps nothing illustrates this better than the example of Hadoop. With more than half of the Fortune 50 and prominent social media names like Facebook already using it, Hadoop has made a splash big enough to send waves well beyond IT-centric media, even as a solid understanding of what Hadoop is has largely eluded many in the mainstream.

Blue Gene / P by Argonne National Laboratory is licensed under CCBY-SA 2.0

Chances are that by now that many people in your organization have heard about Hadoop. There is also an even higher probability that many of them aren't certain as to what Hadoop is exactly, or why it is so important. The siren song of extracting profit from the overwhelming information glut of today is alluring, but realizing this potential has been a daunting challenge for many enterprises. It can be a challenge even to explain the answers to two basic questions in straight-forward manner: What is Hadoop, and why does it matter?

The simplest response is that Hadoop is a software framework for Big Data, and Big Data is important. To further understand Hadoop, one should understand the basics of how it stores and processes data.

Hadoop logo Copyright © 2014 Apache Software Foundation is licensed under the Apache License, Version 2.0

The way Hadoop stores data files is key. Hadoop's file system is distributed across multiple computers. If a file gets stored in Hadoop, multiple copies are saved on different nodes, which is good because the file will still remain should any one of those given nodes fail for any reason. Furthermore, really large files can be stored in Hadoop; files with sizes much larger than could be fit on any single computer's disks.

The other side of the equation involves how Hadoop processes this distributed data. It dovetails well with the way Hadoop stores data, because a main principle embraced by Hadoop is something termed data locality. Traditionally, an enterprise will move data to the server that is intended to process it. While this seems logical, in fact moving data over a network can be a very time and resource intensive process, especially as the data increases in size. Anyone who has waited impatiently for a very large file to download to their computer before it opens can appreciate this fact. Hadoop embraces the principle that it is better to process data where it resides than to move it somewhere else for processing. Essentially, every node that stores data is a server capable of processing that data in place. All that need be sent to the node over the wire is the processing instructions themselves, which are small in size and won't congest and tie up the network.

The ability to distribute a large quantity of data over a multi-node network that can process it in place means that Hadoop has the capability to operate on an enormous amount of data in parallel. Hadoop excels at applying data-processing tasks that can be run independently on each constituent piece of a large data set, bringing the final results together as an integrated whole in the end.

That may not sound particularly impressive, or it may sound like a lot of trouble to go to or even an unnecessary over-complication, but it is everything. Hadoop is designed to distribute data and computational workloads across clusters of commodity hardware, offering linearly scalable computational power capable of handling massive amounts of data that is tolerant of (inevitable) hardware failures. In other words, exactly the way that more and more data will be processed as we proceed into the future.

And that is a very big deal.

Please login or register to post comments.