X

NV5 Geospatial Blog

Each month, NV5 Geospatial posts new blog content across a variety of categories. Browse our latest posts below to learn about important geospatial information or use the search bar to find a specific topic or author. Stay informed of the latest blog posts, events, and technologies by joining our email list!



Mapping Earthquake Deformation in Taiwan With ENVI

Mapping Earthquake Deformation in Taiwan With ENVI

12/15/2025

Unlocking Critical Insights With ENVI® Tools Taiwan sits at the junction of major tectonic plates and regularly experiences powerful earthquakes. Understanding how the ground moves during these events is essential for disaster preparedness, public safety, and building community resilience. But traditional approaches like field... Read More >

Comparing Amplitude and Coherence Time Series With ICEYE US GTR Data and ENVI SARscape

Comparing Amplitude and Coherence Time Series With ICEYE US GTR Data and ENVI SARscape

12/3/2025

Large commercial SAR satellite constellations have opened a new era for persistent Earth monitoring, giving analysts the ability to move beyond simple two-image comparisons into robust time series analysis. By acquiring SAR data with near-identical geometry every 24 hours, Ground Track Repeat (GTR) missions minimize geometric decorrelation,... Read More >

Empowering D&I Analysts to Maximize the Value of SAR

Empowering D&I Analysts to Maximize the Value of SAR

12/1/2025

Defense and intelligence (D&I) analysts rely on high-resolution imagery with frequent revisit times to effectively monitor operational areas. While optical imagery is valuable, it faces limitations from cloud cover, smoke, and in some cases, infrequent revisit times. These challenges can hinder timely and accurate data collection and... Read More >

Easily Share Workflows With the Analytics Repository

Easily Share Workflows With the Analytics Repository

10/27/2025

With the recent release of ENVI® 6.2 and the Analytics Repository, it’s now easier than ever to create and share image processing workflows across your organization. With that in mind, we wrote this blog to: Introduce the Analytics Repository Describe how you can use ENVI’s interactive workflows to... Read More >

Deploy, Share, Repeat: AI Meets the Analytics Repository

Deploy, Share, Repeat: AI Meets the Analytics Repository

10/13/2025

The upcoming release of ENVI® Deep Learning 4.0 makes it easier than ever to import, deploy, and share AI models, including industry-standard ONNX models, using the integrated Analytics Repository. Whether you're building deep learning models in PyTorch, TensorFlow, or using ENVI’s native model creation tools, ENVI... Read More >

1345678910Last
«February 2026»
SunMonTueWedThuFriSat
25262728293031
1234567
891011121314
15161718192021
22232425262728
1234567
20181 Rate this article:
5.0

Why Hadoop is Kind of a Big Deal

Hadoop Brings Big Data Capabilities to the Enterprise

Anonym

It would seem that the topic of Big Data is everywhere you turn these days. As with any major disruptive new technology, it didn't take long for the hype cycle's generative buzz to eclipse the actual initial adoption of tools brought to market by the Big Data vanguard. Perhaps nothing illustrates this better than the example of Hadoop. With more than half of the Fortune 50 and prominent social media names like Facebook already using it, Hadoop has made a splash big enough to send waves well beyond IT-centric media, even as a solid understanding of what Hadoop is has largely eluded many in the mainstream.

Blue Gene / P by Argonne National Laboratory is licensed under CCBY-SA 2.0

Chances are that by now that many people in your organization have heard about Hadoop. There is also an even higher probability that many of them aren't certain as to what Hadoop is exactly, or why it is so important. The siren song of extracting profit from the overwhelming information glut of today is alluring, but realizing this potential has been a daunting challenge for many enterprises. It can be a challenge even to explain the answers to two basic questions in straight-forward manner: What is Hadoop, and why does it matter?

The simplest response is that Hadoop is a software framework for Big Data, and Big Data is important. To further understand Hadoop, one should understand the basics of how it stores and processes data.

Hadoop logo Copyright © 2014 Apache Software Foundation is licensed under the Apache License, Version 2.0

The way Hadoop stores data files is key. Hadoop's file system is distributed across multiple computers. If a file gets stored in Hadoop, multiple copies are saved on different nodes, which is good because the file will still remain should any one of those given nodes fail for any reason. Furthermore, really large files can be stored in Hadoop; files with sizes much larger than could be fit on any single computer's disks.

The other side of the equation involves how Hadoop processes this distributed data. It dovetails well with the way Hadoop stores data, because a main principle embraced by Hadoop is something termed data locality. Traditionally, an enterprise will move data to the server that is intended to process it. While this seems logical, in fact moving data over a network can be a very time and resource intensive process, especially as the data increases in size. Anyone who has waited impatiently for a very large file to download to their computer before it opens can appreciate this fact. Hadoop embraces the principle that it is better to process data where it resides than to move it somewhere else for processing. Essentially, every node that stores data is a server capable of processing that data in place. All that need be sent to the node over the wire is the processing instructions themselves, which are small in size and won't congest and tie up the network.

The ability to distribute a large quantity of data over a multi-node network that can process it in place means that Hadoop has the capability to operate on an enormous amount of data in parallel. Hadoop excels at applying data-processing tasks that can be run independently on each constituent piece of a large data set, bringing the final results together as an integrated whole in the end.

That may not sound particularly impressive, or it may sound like a lot of trouble to go to or even an unnecessary over-complication, but it is everything. Hadoop is designed to distribute data and computational workloads across clusters of commodity hardware, offering linearly scalable computational power capable of handling massive amounts of data that is tolerant of (inevitable) hardware failures. In other words, exactly the way that more and more data will be processed as we proceed into the future.

And that is a very big deal.

Please login or register to post comments.