Numpy matrix creation timing oddity

Solution for Numpy matrix creation timing oddity
is Given Below:

My application requires a starting matrix where each column is staggered-by-1 from the previous. It will contain millions of complex numbers representing a signal, but a small example is:

array([[ 0,  1,  2,  3],
       [ 1,  2,  3,  4],
       [ 2,  3,  4,  5],
       [ 3,  4,  5,  6],
       [ 4,  5,  6,  7],
       [ 5,  6,  7,  8],
       [ 6,  7,  8,  9],
       [ 7,  8,  9, 10]])

I tried two creation methods, one fast, one slow. I don’t understand why the fast matrix creation method causes subsequent calculations to run slowly, while the slow matrix creation results in faster running calculations. The subroutine calcs() simply takes FFTs to offer minimal code to demonstrate the issue I see in my actual signal processing code. A sample run yields:

python ex.py 
Slow Create, Fast Math
   57.90 ms, create
   36.79 ms, calcs()
   94.69 ms, total
Fast Create, Slow Math
   15.13 ms, create
  355.38 ms, calcs()
  370.50 ms, total

Code follows. Any insight would be appreciated!

import numpy as np
import time

N = 65536
Np = 64

# Random signal for demo.
x = np.random.randint(-50,50,N+Np) + 1j*np.random.randint(-50,50,N+Np)

def calcs(sig):
    np.fft.fft(sig)

print('Slow Create, Fast Math')
t0 = time.time()
X = np.zeros((N, Np), dtype=complex)
for col in range(Np):
    X[:,col] = x[col:col+N]
t1 = time.time()
calcs(X)
t2 = time.time()
print('  %6.2f ms, create' % (1e3 * (t1 - t0)))
print('  %6.2f ms, calcs()' % (1e3 * (t2 - t1)))
print('  %6.2f ms, total' % (1e3 * (t2 - t0)))

print('Fast Create, Slow Math')
t0 = time.time()
X = np.array([x[i:i+N] for i in range(Np)]).transpose()
t1 = time.time()
calcs(X)
t2 = time.time()
print('  %6.2f ms, create' % (1e3 * (t1 - t0)))
print('  %6.2f ms, calcs()' % (1e3 * (t2 - t1)))
print('  %6.2f ms, total' % (1e3 * (t2 - t0)))

user3483203’s comment, above, provides answer to the issue. If I avoid the transpose by creating the matrix with:

X = np.array([x[i:i+Np] for i in range(N)], dtype=complex)

subsequent calcs() timing is as expected. Thank you, user3483203!