Pandas is the most popular data manipulation Python package in the Python.
I am sure the reader is familiar with how to use the package.
And you probably already know what the .apply
method is in the Pandas.
But here is a simple refresher visualization for you.
As you can see the .apply
method is a Pandas way to map our custom function to the DataFrame we have.
Internally, it would apply our custom function over columns or rows one by one by looping them.
What is the difference between vectorization and vectorization? Let’s see the following visualization.
Vectorization is an operation between the columns, whereby the whole columns are applied directly instead of looping them individually.
So, the question becomes: Which one is faster? and when to use each one?
Let’s try to experiment to see which one is faster. We would start with the simple one.
The experiment below would try out column addition.
As we can see the .apply
method time is much slower than the vectorization process. Specifically, the experiment shows 420x time differences between them.
Let’s try a more complex one where we want to perform categorization with conditions.
For conditional function, vectorization can be done with the np.where
.
As expected, the vectorization time is way faster than the .apply
time. In the experiment, it’s around 136x times faster.
The experiment showed that using vectorization every time is way better than the .apply
method. But when should we use it?
Let me summarize them for you.
Basically, we should use vectorization when it’s critical for the speed but only when it’s applicable.
That’s all for today! I hope this helps you speed up your Pandas performance!
Are there any more things you would love to discuss? Let’s talk about it together!
👇👇👇
FREE Learning Material for you❤️
👉Speeding Up Your Python Code With NumPy
For diagrams, which tool do you use