Introduction
Introduction
This report focuses on comparing the performance of Polars library with the Pandas Library which are part of the Python ecosystem. The experiments include measuring the time for reading various data files which includes file formats such as csv, json, parquet and excel.
The experiments additionally explore the time taken for column selection, filtering of rows and sorting of data frames.
All the experiments were done through statistical means by repeated experiments on the same dataset.
Key Takeaways
1. A comparison of Polars and Pandas Performance
Polars outperformed pandas in execution time for reading data, writing data and other dataframe operations which was validated by a t-test on repeated experiments on the same dataset.
2. Enhanced reading and writing speed
Data reading - The average time for Polars for reading data from csv's was 0.0234 seconds whereas in Pandas it was 0.3506 seconds. Similarly, the average time for reading data from an excel for polars awas 1.5653 seconds whereas for Pandas it was 23.2387 seconds.
Data Writing - Polars took an average of 0.0327 seconds, while Pandas took 0.0472 seconds for csv data. With p-values less than 0.05, statistical validation validated Polars advantage.
3. Trustworthy testing and methodology
100,000 customer records in 12 columns make up the dataset. The performance was compared on a machine with 16.0 GB of RAM (15.7 GB usable) and a 12th Gen Intel(R) Core (TM) i5-1235U 1.30 GHz processor. Robust statistical approach ensured reliable results.
Conclusion
The comparison of performance between Polars and Pandas indicates that Polars consistently outperformed Pandas in data reading, writing, selecting, filtering and sorting operations on a dataset of 100,000 customer records. Polars can be recommended over Pandas for tasks where execution time is crucial.
How Acuity Knowledge Partners can help
This document showcases the research efforts taken by our Data and Technology Solutions team to examine new frameworks and Python libraries. Our decade-long experience with Python programming and experimentation enables us to provide a variety of technology related solutions.