Representation Learning

Core Idea

Representation learning discovers compact, informative encodings of raw data that capture structure useful for downstream tasks. Rather than hand-engineering features, the model learns what to extract. Autoencoders do this through reconstruction; contrastive methods do it by pulling similar examples together and pushing dissimilar examples apart.

Mathematical Formulation

Autoencoder

An autoencoder comprises an encoder $f_{ϕ} : R^{d} \to R^{k}$ ( $k ≪ d$ ) and a decoder $g_{θ} : R^{k} \to R^{d}$ , trained to minimise reconstruction error:

L = \frac{1}{n} i = 1 \sum n ∥ g_{θ} (f_{ϕ} (x_{i})) - x_{i} ∥^{2}

The bottleneck layer $z = f_{ϕ} (x) \in R^{k}$ is the latent representation.

import torch
import torch.nn as nn
 
class Autoencoder(nn.Module):
    def __init__(self, input_dim, latent_dim):
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Linear(input_dim, 256), nn.ReLU(),
            nn.Linear(256, latent_dim)
        )
        self.decoder = nn.Sequential(
            nn.Linear(latent_dim, 256), nn.ReLU(),
            nn.Linear(256, input_dim)
        )
 
    def forward(self, x):
        z = self.encoder(x)
        return self.decoder(z), z

Variational Autoencoder (VAE)

The VAE places a prior $p (z) = N (0, I)$ on the latent space and trains an approximate posterior $q_{ϕ} (z ∣ x) = N (μ_{ϕ} (x), diag (σ_{ϕ}^{2} (x)))$ .

Evidence Lower BOund (ELBO):

L_{VAE} = reconstruction E_{q_{ϕ} (z ∣ x)} [lo g p_{θ} (x ∣ z)] - regularisation D_{KL} (q_{ϕ} (z ∣ x) ∥ p (z))

The KL term forces the posterior to match the prior, regularising the latent space. Reparameterisation trick enables backprop through the sampling step: $z = μ + σ ⊙ ϵ$ , $ϵ \sim N (0, I)$ .

Contrastive Learning

Learns representations such that similar (positive) pairs have nearby embeddings and dissimilar (negative) pairs have distant embeddings.

InfoNCE / NT-Xent loss (SimCLR):

L = - lo g \frac{exp ( sim ( z _{i} , z _{j} ) / τ )}{\sum _{k \neq = i} exp ( sim ( z _{i} , z _{k} ) / τ )}

where $sim (u, v) = u^{⊤} v / (∥ u ∥∥ v ∥)$ and $τ$ is a temperature parameter.

Positive pairs are created via data augmentation (crop, flip, colour jitter for images; token masking for text).

Inductive Bias

Autoencoder: assumes the data lies on a low-dimensional manifold; all information must pass through the bottleneck.
VAE: assumes a smooth, continuous latent space with a Gaussian prior; supports generation via sampling $z \sim N (0, I)$ .
Contrastive: assumes that augmented views of the same instance are semantically equivalent; requires careful negative mining.

Training Objective

Method	Objective
Autoencoder	Minimise reconstruction loss (MSE or BCE)
VAE	Maximise ELBO = reconstruction − KL divergence
Contrastive (SimCLR)	Maximise agreement between augmented views

Strengths

Pre-trained representations transfer to downstream tasks with little labelled data.
Autoencoders provide interpretable compression and can detect anomalies via high reconstruction error.
VAEs enable controllable generation and latent-space interpolation.

Weaknesses

Representations may not capture task-relevant structure if the reconstruction/contrastive objective is misaligned with the downstream task.
VAEs often produce blurry reconstructions (blurred by the expected reconstruction objective).
Contrastive learning requires large batch sizes or memory banks for sufficient negatives.

Variants

Denoising autoencoder: corrupts inputs before encoding; forces the encoder to learn robust representations.
Sparse autoencoder: adds L1 penalty on activations; encourages sparse, disentangled representations.
β-VAE: scales the KL term by $β > 1$ ; encourages disentanglement.
BYOL / SimSiam: contrastive-free methods that avoid the need for negative pairs.

References

Kingma, D.P. & Welling, M. (2014). “Auto-Encoding Variational Bayes.” ICLR.
Chen, T. et al. (2020). “A Simple Framework for Contrastive Learning of Visual Representations.” ICML.
Bengio, Y. et al. (2013). “Representation Learning: A Review and New Perspectives.” IEEE TPAMI.

Notes

Explorer

representation_learning