X

NV5 Geospatial Blog

Each month, NV5 Geospatial posts new blog content across a variety of categories. Browse our latest posts below to learn about important geospatial information or use the search bar to find a specific topic or author. Stay informed of the latest blog posts, events, and technologies by joining our email list!



Deploy, Share, Repeat: AI Meets the Analytics Repository

Deploy, Share, Repeat: AI Meets the Analytics Repository

10/13/2025

The upcoming release of ENVI® Deep Learning 4.0 makes it easier than ever to import, deploy, and share AI models, including industry-standard ONNX models, using the integrated Analytics Repository. Whether you're building deep learning models in PyTorch, TensorFlow, or using ENVI’s native model creation tools, ENVI... Read More >

Blazing a trail: SaraniaSat-led Team Shapes the Future of Space-Based Analytics

Blazing a trail: SaraniaSat-led Team Shapes the Future of Space-Based Analytics

10/13/2025

On July 24, 2025, a unique international partnership of SaraniaSat, NV5 Geospatial Software, BruhnBruhn Innovation (BBI), Netnod, and Hewlett Packard Enterprise (HPE) achieved something unprecedented: a true demonstration of cloud-native computing onboard the International Space Station (ISS) (Fig. 1). Figure 1. Hewlett... Read More >

NV5 at ESA’s Living Planet Symposium 2025

NV5 at ESA’s Living Planet Symposium 2025

9/16/2025

We recently presented three cutting-edge research posters at the ESA Living Planet Symposium 2025 in Vienna, showcasing how NV5 technology and the ENVI® Ecosystem support innovation across ocean monitoring, mineral exploration, and disaster management. Explore each topic below and access the full posters to learn... Read More >

Monitor, Measure & Mitigate: Integrated Solutions for Geohazard Risk

Monitor, Measure & Mitigate: Integrated Solutions for Geohazard Risk

9/8/2025

Geohazards such as slope instability, erosion, settlement, or seepage pose ongoing risks to critical infrastructure. Roads, railways, pipelines, and utility corridors are especially vulnerable to these natural and human-influenced processes, which can evolve silently until sudden failure occurs. Traditional ground surveys provide only periodic... Read More >

Geo Sessions 2025: Geospatial Vision Beyond the Map

Geo Sessions 2025: Geospatial Vision Beyond the Map

8/5/2025

Lidar, SAR, and Spectral: Geospatial Innovation on the Horizon Last year, Geo Sessions brought together over 5,300 registrants from 159 countries, with attendees representing education, government agencies, consulting, and top geospatial companies like Esri, NOAA, Airbus, Planet, and USGS. At this year's Geo Sessions, NV5 is... Read More >

1345678910Last
«October 2025»
SunMonTueWedThuFriSat
2829301234
567891011
12131415161718
19202122232425
2627282930311
2345678
17709 Rate this article:
5.0

Why Hadoop is Kind of a Big Deal

Hadoop Brings Big Data Capabilities to the Enterprise

Anonym

It would seem that the topic of Big Data is everywhere you turn these days. As with any major disruptive new technology, it didn't take long for the hype cycle's generative buzz to eclipse the actual initial adoption of tools brought to market by the Big Data vanguard. Perhaps nothing illustrates this better than the example of Hadoop. With more than half of the Fortune 50 and prominent social media names like Facebook already using it, Hadoop has made a splash big enough to send waves well beyond IT-centric media, even as a solid understanding of what Hadoop is has largely eluded many in the mainstream.

Blue Gene / P by Argonne National Laboratory is licensed under CCBY-SA 2.0

Chances are that by now that many people in your organization have heard about Hadoop. There is also an even higher probability that many of them aren't certain as to what Hadoop is exactly, or why it is so important. The siren song of extracting profit from the overwhelming information glut of today is alluring, but realizing this potential has been a daunting challenge for many enterprises. It can be a challenge even to explain the answers to two basic questions in straight-forward manner: What is Hadoop, and why does it matter?

The simplest response is that Hadoop is a software framework for Big Data, and Big Data is important. To further understand Hadoop, one should understand the basics of how it stores and processes data.

Hadoop logo Copyright © 2014 Apache Software Foundation is licensed under the Apache License, Version 2.0

The way Hadoop stores data files is key. Hadoop's file system is distributed across multiple computers. If a file gets stored in Hadoop, multiple copies are saved on different nodes, which is good because the file will still remain should any one of those given nodes fail for any reason. Furthermore, really large files can be stored in Hadoop; files with sizes much larger than could be fit on any single computer's disks.

The other side of the equation involves how Hadoop processes this distributed data. It dovetails well with the way Hadoop stores data, because a main principle embraced by Hadoop is something termed data locality. Traditionally, an enterprise will move data to the server that is intended to process it. While this seems logical, in fact moving data over a network can be a very time and resource intensive process, especially as the data increases in size. Anyone who has waited impatiently for a very large file to download to their computer before it opens can appreciate this fact. Hadoop embraces the principle that it is better to process data where it resides than to move it somewhere else for processing. Essentially, every node that stores data is a server capable of processing that data in place. All that need be sent to the node over the wire is the processing instructions themselves, which are small in size and won't congest and tie up the network.

The ability to distribute a large quantity of data over a multi-node network that can process it in place means that Hadoop has the capability to operate on an enormous amount of data in parallel. Hadoop excels at applying data-processing tasks that can be run independently on each constituent piece of a large data set, bringing the final results together as an integrated whole in the end.

That may not sound particularly impressive, or it may sound like a lot of trouble to go to or even an unnecessary over-complication, but it is everything. Hadoop is designed to distribute data and computational workloads across clusters of commodity hardware, offering linearly scalable computational power capable of handling massive amounts of data that is tolerant of (inevitable) hardware failures. In other words, exactly the way that more and more data will be processed as we proceed into the future.

And that is a very big deal.

Please login or register to post comments.