Apache Spark

Spark is an open source framework from Apache Software Foundation for distributed processing of large amounts of data on clusters of computers, designed for use in Big Data environments, and created to enhance the capabilities of its predecessor MapReduce.

Spark inherits the scalability and fault tolerance capabilities of MapReduce, but far surpasses it in terms of processing speed, ease of use and analytical capabilities.

Apache Spark runs on a JVM (Java Virtual Machine) and supports several languages such as Java, Scala, Python, Clojure and R for the development of applications that can perform Map and Reduce operations by interacting with the Spark core through its API.

Ecosistema Apache Spark

In addition to the Core API, at a higher level, the so-called Spark Ecosystem provides libraries that provide added Machine Learning and Analytics capabilities for Big Data.

The most important Spark libraries are:

Spark Streaming: for real-time streaming data processing.
Spark SQL + DataFrames: provides a layer for connecting to Spark data through a JDBC API, allowing SQL-style queries to be run on traditional BI and data visualisation tools.
Spark MLlib: Machine Learning library, which allows the use of Machine Learning algorithms and utilities.
Spark GraphX: an API for generating graphs and parallel graph computation.

Log in to post comments

Otros productos software del fabricante

Apache Hive

Editor de consultas SQL de Apache Hive

Hive is a software that works on Hadoop clusters creating a layer that allows the developer to abstract from the management of HDFS and MapReduce files through SQL-based data query operations, with the HiveQL language…

Apache Hadoop

Arquitectura de apache Hadoop

The Hadoop software library is a framework that enables distributed processing of large datasets using clusters of computers or servers, using simple programming models.

Hadoop is designed to scale…

Prueba Semrush gratis 14 días!

Empresas especializadas

Featured software

Globalgest ERP

GlobalGest ERP

Globalgest ERP is a cloud-based enterprise resource planning software designed for construction, engineering, environmental, photovoltaic and general facilities companies...

Semrush

Semrush Semrush is a web tool for SEO and SEM analysis, focused on the search for keywords (Keyword Research) and competitive analysis.
This web tool, pay per use, provides a user-friendly analysis using data giving access to organic positioning and pay per click for the top 20 positioned keywords in the search results (SERP) of local versions of Google and Bing search engines for key countries and to more than 71…

LANSA BI

LANSA BI is a business intelligence tool that seamlessly integrates with IBM DB2 databases and is specially designed to provide analytics for IBM i/AS400 applications.
Its native integration with DB2 enables real-time data analysis and business intelligence report…

Promoted Resources

Today's Top Picks for Our Readers:

Recommended by