Pandas is an open-source data manipulation and analysis library for the Python programming language. It was developed by Wes McKinney in 2008 while he was working at AQR Capital Management.
The name “pandas” is derived from the term “panel data”, which refers to multidimensional data sets that are often used in economics and finance. The library was designed to provide easy and efficient manipulation of such data sets, and it quickly became a popular tool among data scientists and analysts.
The first version of pandas was released in 2008, and it included basic data structures such as Series and DataFrame, as well as a set of functions for handling missing data, merging and joining data sets, and aggregating data.
Over the next few years, pandas continued to evolve and gain popularity among data professionals. In 2011, the library was added to the Python Package Index (PyPI), making it easier to install and use.
In 2013, pandas version 0.12 was released, which included significant improvements to the library’s performance and functionality. This version introduced new data structures such as Panel and Panel4D, as well as new functionality for handling time series data.
In 2014, pandas version 0.13 was released, which added support for handling categorical data. This was a major improvement as it enabled users to work with data that contained non-numeric values.
In 2015, pandas version 0.14 was released, which brought many improvements such as new data structures, better performance, and several bug fixes.
Pandas continues to be actively developed, with new versions being released regularly. The latest version, pandas 1.2, which was released in December 2020, includes new functionality for handling missing data, working with arrays and strings, and improved performance for several common data manipulation tasks.
Pandas has become one of the most widely used data manipulation and analysis libraries in the Python ecosystem. It is used in many fields including finance, economics, social sciences, and engineering, and is an essential tool for data analysts and scientists.