Pytorch Custom Dataset Class and Dataloader for Multivariate Time Series

I want to create a custom dataset class and Dataloader in Pytorch which preprocesses data from a pandas dataframe with n rows (observations) and m columns (features).

What I specifically wanna have is a dataloader which loads the tensors such that tensor.shape = torch.Size([1, num_features, num_sequence]) where num_features is a number which corresponds to the number of features (m) and num_sequence refers to the time window size w and contains the corresponding values to a feature. Furthermore, if I choose batch_size to be a number x, the dataloader should return several tensors such that:

BatchIndex 1, tensor.size([1, num_feat, time_window with rows 1 - w])
BatchIndex 2, tensor.size([1, num_feat, time_window with rows w+1 - 2w])
...
BatchIndex X, tensor.size([1, num_feat, time_window with rows n-w - n])

So far, I only managed to create a Class which loads one feature at a time and the batch_size shifts the first entry by one, such that:

BatchIndex1: Tensor([1,2,3], [2,3,4], [3,4,5]) 
BatchIndex2: Tensor([4,5,6], [5,6,7], [7,8,9])
etc.

by using the following code:

class Training_Prep(Dataset):
    def __init__(self, df_train):
        self.mytraindata = df_train[["value"]]
        
    def __len__(self):
        return len(self.mytraindata) - 60

    def __getitem__(self, index):
        training_data = torch.zeros(60,1)
        for i in range(0, 60):
            training_data[i] = torch.tensor(self.mytraindata.iloc[index + i][0])
        return training_data

def setup_data_loader(batch_size, use_cuda = False):
    kwargs = {"num_workers": 0, "pin_memory": use_cuda}
    traindata = Training_Prep(df_train = trainset)

    train_loader = torch.utils.data.DataLoader(traindata,
                                               batch_size = batch_size,
                                               shuffle = False,
                                               drop_last = True)
    
    for index, (data) in enumerate(train_loader):
        print('BatchIndex {}, data.shape {}'.format(index, data.shape))

    return train_loader

Is there somebody who has an idea how to approach this issue?

Leave a Comment