Questions tagged [lstm]

LSTM stands for Long Short-Term Memory. When we use this term most of the time we refer to a recurrent neural network or a block (part) of a bigger network.

LSTM (Long Short-Term Memory)

LSTM is a specialized type of Recurrent Neural Network (RNN) architecture designed to address the vanishing gradient problem that affects standard RNNs. Introduced by Hochreiter and Schmidhuber in 1997, LSTMs can learn long-term dependencies in sequential data.

Key Characteristics

Memory Cell: Contains a cell state that acts as a conveyor belt of information flowing through the network
Gating Mechanism: Uses three gates (input, forget, and output) to regulate information flow
Long-term Dependencies: Effectively captures relationships between elements separated by many time steps
Gradient Control: Special architecture prevents vanishing/exploding gradients common in vanilla RNNs

Applications

Natural Language Processing (text generation, machine translation)
Time Series Analysis and Prediction
Speech Recognition
Music Generation
Video Analysis
Anomaly Detection in sequential data

Technical Details

LSTMs process sequences through a chain of repeating modules. Each module contains:

Forget Gate: Decides what information to discard from cell state
Input Gate: Controls what new information enters the cell state
Output Gate: Determines what parts of the cell state are output

Their ability to selectively remember or forget information makes LSTMs particularly effective for sequential data with long-range dependencies.

Learning Resources

Books

"Deep Learning" by Goodfellow, Bengio, and Courville (Chapter on Sequence Modeling)
"Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow" by Aurélien Géron

Implementation Resources

1136 questions

184

votes

6 answers

When to use GRU over LSTM?

The key difference between a GRU and an LSTM is that a GRU has two gates (reset and update gates) whereas an LSTM has three gates (namely input, output and forget gates). Why do we make use of GRU when we clearly have more control on the network…

asked Oct 17 '16 at 11:47

Sayali Sonawane

2,101
3
13
13

votes

1 answer

Time Series prediction using LSTMs: Importance of making time series stationary

In this link on Stationarity and differencing, it has been mentioned that models like ARIMA require a stationarized time series for forecasting as it's statistical properties like mean, variance, autocorrelation etc are constant over time. Since…

deep-learning predictive-modeling time-series forecast lstm

asked Nov 16 '17 at 07:57

PixelPioneer

votes

6 answers

Validation loss is not decreasing

I am trying to train a LSTM model. Is this model suffering from overfitting? Here is train and validation loss graph:

machine-learning neural-network regression lstm rnn

asked Dec 27 '18 at 08:23

DukeLover

votes

2 answers

How to feed LSTM with different input array sizes?

If I like to write a LSTM network and feed it by different input array sizes, how is it possible? For example I want to get voice messages or text messages in a different language and translate them. So the first input maybe is "hello" but the…

keras lstm

asked Apr 07 '19 at 08:04

user3486308

1,310
5
19
29

votes

2 answers

What's the difference between the cell and hidden state in LSTM?

LSTM cells consist of two types of states, the cell state and hidden state. How do cell and hidden states differ, in terms of their functionality? What information do they carry?

machine-learning neural-network deep-learning lstm rnn

asked Oct 10 '20 at 07:07

user105907

votes

4 answers

What does the output of model.predict function from Keras mean?

I have built a LSTM model to predict duplicate questions on the Quora official dataset. The test labels are 0 or 1. 1 indicates the question pair is duplicate. After building the model using model.fit, I test the model using model.predict on the…

machine-learning python neural-network keras lstm

asked Jul 31 '18 at 03:48

Dookoto_Sea

votes

2 answers

Sliding window leads to overfitting in LSTM?

Will I overfit my LSTM if I train it via the sliding-window approach? Why do people not seem to use it for LSTMs? For a simplified example, assume that we have to predict the sequence of characters: A B C D E F G H I J K L M N O P Q R S T U V W X Y…

lstm backpropagation mini-batch-gradient-descent

asked Feb 09 '18 at 01:10

Kari

2,756
2
21
51

votes

2 answers

What is the job of "RepeatVector" and "TimeDistributed"?

I read about them in Keras documentation and other websites, but I couldn't exactly understand what exactly they do and how should we use them in designing many-to-many or encoder-decoder LSTM networks? I saw them used in the solution of this…

keras lstm

asked Mar 01 '19 at 20:43

user3486308

1,310
5
19
29

votes

1 answer

Understanding Timestamps and Batchsize of Keras LSTM considering Hiddenstates and TBPTT

What I'm trying to do What I am trying to do is predicting the next data-point $x_t$ for each point in the timeseries $[x_0, x_1, x_2,...,x_T]$ in the context of a date-stream in real-time, in theory the series is infinity. If a new value $x$ is…

deep-learning keras lstm rnn backpropagation

asked Aug 31 '18 at 12:26

KenMarsu

votes

3 answers

What is LSTM, BiLSTM and when to use them?

I am very new to Deep learning and I am particularly interested in knowing what are LSTM and BiLSTM and when to use them (major application areas). Why are LSTM and BILSTM more popular than RNN? Can we use these deep learning architectures in…

machine-learning deep-learning rnn lstm

asked Dec 14 '17 at 01:53

Volka

votes

3 answers

Advantages of stacking LSTMs?

I'm wondering in what situations it is advantageous to stack LSTMs?

machine-learning neural-network deep-learning lstm

asked Aug 29 '17 at 16:48

Vadim Smolyakov

votes

1 answer

Multi-dimentional and multivariate Time-Series forecast (RNN/LSTM) Keras

I have been trying to understand how to represent and shape data to make a multidimentional and multivariate time series forecast using Keras (or TensorFlow) but I am still very unclear after reading many blog posts/tutorials/documentation about how…

python keras rnn lstm

asked Feb 07 '18 at 14:49

Bastien

votes

5 answers

Prediction interval around LSTM time series forecast

Is there a method to calculate the prediction interval (probability distribution) around a time series forecast from an LSTM (or other recurrent) neural network? Say, for example, I am predicting 10 samples into the future (t+1 to t+10), based on…

machine-learning deep-learning time-series prediction lstm

asked Nov 06 '17 at 12:16

4Oh4

votes

2 answers

Dropout on which layers of LSTM?

Using a multi-layer LSTM with dropout, is it advisable to put dropout on all hidden layers as well as the output Dense layers? In Hinton's paper (which proposed Dropout) he only put Dropout on the Dense layers, but that was because the hidden inner…

neural-network lstm rnn dropout stacked-lstm

asked Sep 13 '18 at 13:17

BigBadMe

votes

2 answers

How to implement "one-to-many" and "many-to-many" sequence prediction in Keras?

I struggle to interpret the Keras coding difference for one-to-many (e. g. classification of single images) and many-to-many (e. g. classification of image sequences) sequence labeling. I frequently see two different kind of codes: Type 1 is where…

keras rnn lstm sequence

asked Jan 08 '18 at 08:55

Hendrik

8,767
17
43
55

2 3

…

75 76 Next

Questions tagged [lstm]

LSTM (Long Short-Term Memory)

Key Characteristics

Applications

Technical Details

Learning Resources

Foundational Papers

Online Tutorials

Courses

Books

Implementation Resources