Solution for numpy datetime64 comparison slower than pandas Timestamp
is Given Below:
I’ve been quite surprised to find that comparing scalar numpy
datetime64 objects is significantly slower than comparing pandas
Timestamp objects. My understanding is that internally pd.Timestamp is using
datetime64[ns] so I’m a bit baffled as to how
pd.Timestamp is faster in this case.
Here’s my simple attempt at comparing the performance of doing a less than comparison.
import pandas as pd import numpy as np # create datetime64 and timestamp objects dt1 = np.datetime64("1900-01-01", "ns") dt2 = np.datetime64("2020-01-01", "ns") ts1 = pd.Timestamp("1900-01-01") ts2 = pd.Timestamp("2020-01-01") # time datetime64 comparisons %% timeit for _ in range(1000000): _ = dt1 < dt2 # NOTE: 3.07 s +/- 796 ms per loop # time Timestamp comparisons %%timeit for _ in range(1000000): _ = ts1 < ts2 # NOTE: 125 ms +/- 6.2 ms per loop
It seems that Pandas is approximately 25x faster here. I’ve tried looking at the source code but am not sufficiently familiar with C or cython to understand what Pandas might be doing to achieve such an improvement. I did look at this somewhat related question but it’s quite old and the timings there were not consistent with what I found (quite possibly due to updates to the libraries over the last 6 years).