Object Detection

Definition

The task of predicting axis-aligned bounding boxes and class labels for all instances of target object categories in an image; requires solving localization and classification jointly.

Intuition

Unlike classification (one label per image), detection must find where each object is. YOLO-style models predict boxes directly from grid cells; anchor boxes provide shape priors; IoU measures localization quality; NMS removes duplicate detections.

Formal Description

Bounding box parameterization: $(b_{x}, b_{y}, b_{w}, b_{h})$ = center + width/height, normalized to image or grid cell dimensions.

Intersection over Union (IoU):

$IoU (B_{1}, B_{2}) = \frac{area ( B _{1} \cap B _{2} )}{area ( B _{1} \cup B _{2} )} \in [0, 1]$

Threshold typically 0.5 for “correct” detection; used in loss and NMS.

Anchor boxes: predefined box shapes (aspect ratios) at each grid cell position; each grid cell predicts multiple boxes (one per anchor); box prediction is a delta from the anchor shape; improves detection of elongated objects (cars, pedestrians).

YOLO (You Only Look Once): divide image into $S \times S$ grid; each cell predicts $B$ boxes (one per anchor), each box has $(b_{x}, b_{y}, b_{w}, b_{h}, p_{obj})$ + $C$ class probabilities; output tensor: $S \times S \times B \times (5 + C)$ ; single forward pass — fast enough for real-time.

Non-Maximum Suppression (NMS):

Discard boxes with $p_{obj} < threshold$
For each class, sort remaining by confidence
Greedily keep the highest-confidence box
Remove all remaining boxes with IoU ≥ threshold against the kept box
Repeat

Loss function (YOLO-style): sum of localization loss (MSE on box coordinates), confidence loss (BCE on objectness), classification loss (BCE or CE).

Applications

Autonomous driving (pedestrian/vehicle detection), surveillance, medical imaging (lesion detection), retail (inventory counting).

Trade-offs

YOLO trades accuracy for speed vs. two-stage detectors (Faster R-CNN)
NMS is sequential and hard to parallelize
Anchor box design is domain-specific (sizes must match typical object scales)
Recent anchor-free methods (FCOS, DETR) avoid anchor design

Notes

Explorer

object_detection

Object Detection

Definition

Intuition

Formal Description

Applications

Trade-offs

Links

Graph View

Table of Contents

Backlinks