Python para Data Science

Python is a general-purpose open source language, but thanks to the development of powerful analytics, data processing and predictive modelling libraries, it has become the main programming language used for Data Science projects, along with R.

Clasificacion multilabel con la librería Scikit-learn de Python

Python is an interpreted, object-oriented programming language that is easy to install and use, and is supported by a large community.

Python libraries for Data Science

These are the most commonly used Python libraries for Data Science:

  • SciPy is a collection of packages for mathematical, scientific and engineering processing.
    • NumPy is a SciPy subpackage for numerical and string, record and object processing. It allows efficient manipulation of large multidimensional arrays of records and arrays.
    • Pandas is a library, also included in SciPy, that provides data structures and data analysis and manipulation tools, widely used in the data preparation phase.
    • Matplotlib is a library for creating 2D plots.

 

  • Scikit-learn is a library built on top of SciPy, with machine learning and data mining utilities that implements regression, classification, clustering and dimensionality reduction algorithms.

Resources on Python for Data Science

Course to learn Data Science with Python

Data Science with Python