Recurrent Networks

Definition

Neural architectures that process sequences by maintaining a hidden state that is updated at each time step; GRUs and LSTMs are gated variants that address the vanishing gradient problem in plain RNNs.

Intuition

A plain RNN passes a single state vector forward through time, but gradients vanish over long sequences; gates in GRU/LSTM selectively remember or forget information, allowing gradients to flow across hundreds of time steps.

Formal Description

RNN: $a^{< t >} = g (W_{aa} a^{< t - 1 >} + W_{a x} x^{< t >} + b_{a})$ , $\overset{y}{^}^{< t >} = g_{y} (W_{y a} a^{< t >} + b_{y})$ ; same weights $W_{aa}, W_{a x}$ shared across all time steps; types: many-to-many (machine translation via seq2seq), many-to-one (sentiment), one-to-many (text generation).

GRU: introduces update gate $Γ_{u}$ and relevance gate $Γ_{r}$ ; candidate: $\tilde{c}^{< t >} = tanh (W_{c} [Γ_{r} ⊙ c^{< t - 1 >}, x^{< t >}] + b_{c})$ ; update: $c^{< t >} = Γ_{u} ⊙ \tilde{c}^{< t >} + (1 - Γ_{u}) ⊙ c^{< t - 1 >}$ ; gate values near 0/1 allow exact copying of state across many steps (bypasses vanishing gradient).

LSTM: three gates — forget $Γ_{f}$ , update $Γ_{u}$ , output $Γ_{o}$ ; separate cell state $c^{< t >}$ and hidden state $a^{< t >}$ ; $c^{< t >} = Γ_{f} ⊙ c^{< t - 1 >} + Γ_{u} ⊙ \tilde{c}^{< t >}$ ; $a^{< t >} = Γ_{o} ⊙ tanh (c^{< t >})$ ; LSTM has more parameters than GRU but historically stronger on many tasks.

Applications

Time series forecasting, language modeling (token-level), speech recognition (before Transformers), NER, part-of-speech tagging; largely superseded by Transformers for NLP but still used in streaming/low-latency applications.

Trade-offs

Sequential processing limits parallelization (Transformers are fully parallel); LSTMs are harder to train than GRUs; both still suffer from vanishing gradients at very long contexts (hundreds to thousands of steps); Transformers have O(T²) attention but O(1) path length vs. O(T) for RNNs.

Notes

Explorer

recurrent_networks

Recurrent Networks

Definition

Intuition

Formal Description

Applications

Trade-offs

Links

Graph View

Table of Contents

Backlinks