Style Cost Function
Definition
A loss that measures the stylistic difference between two images by comparing correlations between feature map channels (Gram matrices) at multiple layers of a pretrained CNN. Used in neural style transfer to optimize pixel values so a generated image matches the style of a reference image.
Intuition
Style is captured by co-activation patterns across channels — which texture patterns tend to occur together — not by exact spatial positions. The Gram matrix computes these second-order channel statistics by removing spatial structure. Two images share the same style when their channel correlations match, regardless of where those textures appear.
Formal Description
Feature maps at layer : reshape to (channels × flattened spatial positions).
Gram matrix:
Entry is the inner product between feature maps and , capturing how correlated those two channel activations are across all spatial locations.
Style cost for layer :
where = style image, = generated image.
Total style cost (weighted sum over layers):
Combined NST objective:
Optimize the pixel values of the generated image (not network weights) via gradient descent.
Applications
Neural style transfer; texture synthesis; measuring perceptual similarity between images. The Gram matrix approach has also influenced perceptual loss functions used in super-resolution and image-to-image translation.
Trade-offs
Gram matrix discards all spatial information by design — good for style, bad if spatial structure matters. Computation grows with the square of the number of channels. The choice of which layers to use for style vs. content significantly affects results (lower layers → fine textures; higher layers → semantic style). Optimization in pixel space is slow compared to feed-forward style transfer networks.
Links
- neural_style_transfer (in 02_modeling/computer_vision)
- backpropagation