Regressor with LSTM layer keeps returning same value

Solution for Regressor with LSTM layer keeps returning same value
is Given Below:

If I run following code, I am getting array of the same values (predicted), as you can see here:
enter image description here

Basically my input to regressor is array of numbers 0, 1, 2, … 99, and I expect output to be 100.
I am doing this in sequence (multiple times), as you can see in the code.
This code should be runnable. What am I doing wrong, and why the expected result, and the outcome is different?

enter image description here

The code:

import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt

from keras.layers import Dense
from keras.layers import LSTM
from keras.models import Sequential
from keras.layers import Dropout
from sklearn.preprocessing import MinMaxScaler
from datetime import datetime
from datetime import timedelta
from time import mktime


my_data = []
for i in range(0, 1000):
    my_data.append(i)

X_train = []
y_train = []

np_data = np.array(my_data)

for i in range(0, np_data.size - 100 ):
    X_train.append(np_data[i : i+100])
    y_train.append(np_data[i+100])

X_train, y_train = np.array(X_train), np.array(y_train)

X_train = np.reshape(X_train, [X_train.shape[0], X_train.shape[1], 1])

regressor = Sequential()

regressor.add(LSTM(units=50, return_sequences=True, input_shape=(X_train.shape[1], 1)))

regressor.add(Dropout(0.2))


regressor.add(LSTM(units=50, return_sequences=True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units=50, return_sequences=True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units=50, return_sequences=True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units=50))
regressor.add(Dropout(0.2))

regressor.add(Dense(units=1))

regressor.compile(optimizer="adam", loss="mean_squared_error")

regressor.fit(X_train, y_train, epochs=5, batch_size=32)


X_test = []
y_test = []

my_data = []
for i in range(1000, 1500):
    my_data.append(i)

np_data = np.array(my_data)

for i in range(0, np_data.size - 100 ):
    X_test.append(np_data[i : i+100])
    y_test.append(np_data[i+100])

X_test = np.array(X_test)

X_test = np.reshape(X_test, [X_test.shape[0], X_test.shape[1], 1])

predicted = regressor.predict(X_test)


plt.plot(y_test, color="#ffd700", label = "Real Data")
plt.plot(predicted, color="#1fb864", label = "Predicted Data")

plt.title(" Price Prediction")
plt.xlabel("X axis")
plt.ylabel("Y axis")
plt.legend()
plt.show()

As I explained in the comment, this is a simple linear problem, so you can use a linear regression. If you want to use keras/tf, you can build a model with a single dense layer, here is a code that will work:

import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
from keras import optimizers
from keras.layers import Dense
from keras.layers import LSTM
from keras.models import Sequential
from keras.layers import Dropout
from sklearn.preprocessing import MinMaxScaler
from datetime import datetime
from datetime import timedelta
from time import mktime

my_data = []
for i in range(0, 1000):
    my_data.append(i)

X_train = []
y_train = []

np_data = np.array(my_data)

for i in range(0, np_data.size - 100):
    X_train.append(np_data[i: i + 100])
    y_train.append(np_data[i + 100])

X_train, y_train = np.array(X_train), np.array(y_train)

X_train = np.reshape(X_train, [X_train.shape[0], X_train.shape[1]])

regressor = Sequential()

regressor.add(Dense(units=1, input_shape=(len(X_train[1]),)))

regressor.compile(optimizer=optimizers.adam_v2.Adam(learning_rate=0.1), loss="mean_squared_error")

regressor.fit(X_train, y_train, epochs=1000, batch_size=len(X_train))

X_test = []
y_test = []

my_data = []
for i in range(1000, 1500):
    my_data.append(i)

np_data = np.array(my_data)

for i in range(0, np_data.size - 100):
    X_test.append(np_data[i: i + 100])
    y_test.append(np_data[i + 100])

X_test = np.array(X_test)

X_test = np.reshape(X_test, [X_test.shape[0], X_test.shape[1]])

predicted = regressor.predict(X_test)

plt.plot(y_test, color="#ffd700", label="Real Data")
plt.plot(predicted, color="#1fb864", label="Predicted Data")

plt.title(" Price Prediction")
plt.xlabel("X axis")
plt.ylabel("Y axis")
plt.legend()
plt.show()

The code above will result in the desired prediction, here are the changes I made:

  1. changed the model to a single dense layer, as I explained, it is a linear relationship
  2. increase batch size. this is just for faster training, you can reduce if you want, but then you need to decrease learning rate and increase epochs at same time
  3. increase epochs to 1000. This data contains tons of useless information, only the last value of each X is useful, so it takes relatively more epochs to learn this. In fact, it is common to have thousands or even tens of thousands of epochs when using linear regression like this, since each epoch is very fast anyway
  4. reshape data to (num_samples, num_features), which is expected by Dense layer
  5. increase learning rate, just for learning faster

I just modified this to prove my point, I didn’t further tune any other params, I am sure you can add regularizers, change learning rate, and so on to make it faster and easier. But honestly, I don’t think it is worth time to tune them, since predicting linear relationships is really not what deep learning is for.
Hope this help, feel free to comment if you have further confusion 🙂

Your model is absolutely overkill for this problem but this is not a issue !
We want to predict a linear function wich can be present with only 2 parameters (predicted = model(x) = param1 + param2 * x). A model with only one neurone (wheight + bias) should be enough.
Here your model have 91,251 parameters !
Model using LSTM and model using Dense layer are topologically holomorphic, thus every LSTM model are be able to achieve the same result as Dense model and vice versa. (LSTM are normally simple to train to achieve the same result as a Dense model.)

There are many issues and best practices are not respected in your code.

This type of problem are called “Time-Series Forecasting” they are a lot of great article on internet if you want to investigate more on this topic.

First of all always scale your data !
Unscaled data makes training more difficult.
Typically, for regression problems, the dataset is scaled between 0 and 1. So just divide your data with the maximum value in your np_data.
Extremely high values of the loss function, such as the “mean_square_error”, should give a hint that the data that the model receives is not scaled.

For model using LSTM layer reshape X_train and y_train :

  • X_train should be in shape : (dataset_size, n_past, n_feature)
  • y_train should be in shape : (dataset_size, n_future, n_feature)

Where :

  • n_feature : the number of differente data present in the dataset, that the model should have to make their prediction. For example if you want to predict the avearge temperature of the next day given the average presure, average temperature and the precipitation of the last N day, n_feature should be equal to 3 (“Multi-variable Time series forecasting”)
  • n_past : the number of the past entries given to the model
  • n_future : the number of future prediction you what to predict (“Time series multi step forecasting”)

(note: n_feature in X_train and y_train could not be the same)

Here :

  • n_past : 100 (wich is overkill I reduce to 4 in my code to speed up the training)
  • n_future : 1 because you what to predict only one number but you can predict for example next 10 numbers (you need to change the way you create y_train to match the shape (dataset_len, 10, 1) obviously)
  • n_feature : 1

Start with a more simple model :
Number of hidden layer, number of neurones and for LSTM n_past are hyperparametres like optimizer, learning_rate, batch_size, weight and bias initialisation…
So start simple and increase the model complexity if your model is not able to reach your goal.

Increase the number of training epoch.

Investigate the comportement of the loss function during the training: the goal is to converge to 0.

Make a validation set to controle overfitting during the training.

my_data = []
for i in range(0, 1000):
    my_data.append(i)

X_train = []
y_train = []

np_data = np.array(my_data)

# last 4 values to predict the next one
n_past = 4
n_future = 1
n_feature = 1

for i in range(0, np_data.size - n_past):
    X_train.append(np_data[i : i + n_past])
    y_train.append(np_data[i + n_past])

X_train, y_train = np.array(X_train), np.array(y_train)

# Reshape
X_train = np.reshape(X_train, [X_train.shape[0], n_past, n_feature])
y_train = np.reshape(y_train, [y_train.shape[0], n_future, n_feature])

# Rescale dataset ]0,1]
max_value = np.max(np_data)
X_train = X_train / max_value
y_train = y_train / max_value

# More simple model (always overkill for linear function anyway)
# No Droupout because I dont know if the model is doing overfitting
regressor = Sequential()
regressor.add(LSTM(units=16, return_sequences=True, input_shape=(n_past, n_feature)))
regressor.add(LSTM(units=16, return_sequences=True))
regressor.add(LSTM(units=16))
regressor.add(Dense(units=1))

regressor.compile(optimizer="Adam", loss="mse")

# Summary the model to see if all layers are well combinated.
regressor.summary()

# validation_split = 0.2 : 20% of X_train and y_train are using to test your model
history = regressor.fit(X_train, y_train, epochs=100, batch_size=32, validation_split=0.2)

# Plot the training
plt.plot(history.history["loss"], color="red", label = "Traning loss")
plt.plot(history.history["val_loss"], color="green", label = "Validation loss")
plt.title("Training")
plt.xlabel("Epoch")
plt.ylabel("mse")
plt.legend()
plt.show()


# Make one test
test_i = 12
data = X_train[test_i].reshape(1, n_past, 1) # taking [test_i] result to the lost of the first dimension so : reshape to (batch_size=1, n_past, n_feature) for making prediction
expected = y_train[test_i]
predicted = regressor.predict(data)
print(f"data: {data.reshape(-1,) * max_value}nExpected: {expected * max_value}nPredicted: {predicted[0] * max_value}")
# multipled by max_value to rescale to the original data


X_test = []
y_test = []
my_data = []
for i in range(1000, 1500):
    my_data.append(i)

np_data = np.array(my_data)

for i in range(0, np_data.size - n_past ):
    X_test.append(np_data[i : i + n_past])
    y_test.append(np_data[i + n_past])

X_test = np.array(X_test)
X_test = np.reshape(X_test, [X_test.shape[0], n_past, n_feature])

# scale the data with the max_value of the training
X_test = X_test / max_value
predicted = regressor.predict(X_test)

# rescale the prediction
predicted = predicted * max_value

plt.plot(y_test, color="#ffd700", label = "Real Data")
plt.plot(predicted, color="#1fb864", label = "Predicted Data")

plt.title(" Price Prediction")
plt.xlabel("X axis")
plt.ylabel("Y axis")
plt.legend()
plt.show()

enter image description here
enter image description here