Apache Hadoop

The Apache Hadoop project is the development of open source software for scalable and secure distributed computing.

The Hadoop software library is a framework that enables distributed processing of large datasets using clusters of computers or servers, using simple programming models.

Hadoop is designed to scale easily from single server systems to thousands of machines.

The library detects and handles errors at the application level so that it does not rely on hardware and provides high availability with a cluster of servers with a high level of fault tolerance on each of the machines that make up the cluster.

Hadoop Modules

The Hadoop project consists of four main modules:

Hadoop Common is a set of common utilities for all other modules.

HDFS, Hadoop Distributed Files System, is the distributed file system that provides high-performance access to application data.

Hadoop YARN is a framework for scheduling tasks and managing cluster resources.

Hadoop MapReduce is a system for parallel processing of large datasets, based on YARN.

Hadoop Ecosystem

Apart from the above modules, the complete platform, or Hadoop ecosystem, includes other related projects such asApache Ambari, Apache Cassandra, Apache HBase, Apache Hive, Apache Mahout, Apache Pig or Apache Spark, among others.

Ecosistema de Apache Hadoop

Resources about Apache Hadoop

Apache Hadoop Project Home Page

Tutorials and help on the official Apache Hadoop Wiki

Apache Hadoop download page

Posts about Apache projects on Dataprix

Log in to post comments

Otros productos software del fabricante

Apache Spark

Spark is an open source framework from Apache Software Foundation for distributed processing of large amounts of data on clusters of computers, designed for use in Big Data environments, and created to enhance the capabilities of its…

Apache Hive

Editor de consultas SQL de Apache Hive

Hive is a software that works on Hadoop clusters creating a layer that allows the developer to abstract from the management of HDFS and MapReduce files through SQL-based data query operations, with the HiveQL language…

Apache NiFi

Apache NiFi is a data integration platform designed to automate the flow of information between systems. Its visual approach allows users to design, manage and monitor data flows…

Empresas especializadas

Featured software

Semrush

Semrush Semrush is a web tool for SEO and SEM analysis, focused on the search for keywords (Keyword Research) and competitive analysis.
This web tool, pay per use, provides a user-friendly analysis using data giving access to organic positioning and pay per click for the top 20 positioned keywords in the search results (SERP) of local versions of Google and Bing search engines for key countries and to more than 71…

Promoted Resources

Today's Top Picks for Our Readers:

Recommended by