Style Cost Function

Definition

A loss that measures the stylistic difference between two images by comparing correlations between feature map channels (Gram matrices) at multiple layers of a pretrained CNN. Used in neural style transfer to optimize pixel values so a generated image matches the style of a reference image.

Intuition

Style is captured by co-activation patterns across channels — which texture patterns tend to occur together — not by exact spatial positions. The Gram matrix computes these second-order channel statistics by removing spatial structure. Two images share the same style when their channel correlations match, regardless of where those textures appear.

Formal Description

Feature maps at layer $l$ : reshape to $A^{[l]} \in R^{n_{C}^{[l]} \times (n_{H}^{[l]} \cdot n_{W}^{[l]})}$ (channels × flattened spatial positions).

Gram matrix:

$G^{[l]} = A^{[l]} (A^{[l]})^{⊤} \in R^{n_{C}^{[l]} \times n_{C}^{[l]}}$

Entry $G_{ij}^{[l]}$ is the inner product between feature maps $i$ and $j$ , capturing how correlated those two channel activations are across all spatial locations.

Style cost for layer $l$ :

$J_{style}^{[l]} = \frac{1}{( 2 n _{C}^{[l]} n _{H}^{[l]} n _{W}^{[l]} ) ^{2}} G_{S}^{[l]} - G_{G}^{[l]}_{F}^{2}$

where $S$ = style image, $G$ = generated image.

Total style cost (weighted sum over layers):

$J_{style} = \sum_{l} λ^{[l]} J_{style}^{[l]}$

Combined NST objective:

$J = α J_{content} + β J_{style}$

Optimize the pixel values of the generated image (not network weights) via gradient descent.

Applications

Neural style transfer; texture synthesis; measuring perceptual similarity between images. The Gram matrix approach has also influenced perceptual loss functions used in super-resolution and image-to-image translation.

Trade-offs

Gram matrix discards all spatial information by design — good for style, bad if spatial structure matters. Computation grows with the square of the number of channels. The choice of which layers to use for style vs. content significantly affects results (lower layers → fine textures; higher layers → semantic style). Optimization in pixel space is slow compared to feed-forward style transfer networks.

Notes

Explorer

style_cost_function

Style Cost Function

Definition

Intuition

Formal Description

Applications

Trade-offs

Links

Graph View

Table of Contents

Backlinks