Visual Inspection

Problem

Automatically detect defects, anomalies, or quality issues in physical products or environments using camera images or video — replacing or augmenting manual visual inspection on a production line, in a warehouse, or in field operations. Visual inspection ML must achieve human-level accuracy at machine speed, and the cost of a false negative (defective product shipped to customer) typically far exceeds the cost of a false positive (acceptable product rejected).

Users / Stakeholders

RoleDecision
Quality control inspectorReview flagged items; approve borderline cases
Production line managerStop/continue production; escalate to maintenance
Quality engineerDefect root cause analysis; process improvement
Customer quality teamDefect rate reporting; warranty claim analysis

Domain Context

  • Real-time constraint: Products on a moving line pass the camera at fixed intervals. Inference must complete before the product exits the inspection zone (typically 50–200ms budget).
  • Limited labelled defects: Defects are rare by definition. Collecting labelled defect images requires running the line until defects occur. Semi-supervised and anomaly detection approaches are valuable.
  • Defect taxonomy: Cracks, scratches, chips, contamination, incorrect assembly, missing components. Each defect type may require a different model or threshold.
  • Lighting and sensor variation: Image quality depends on lighting, camera calibration, lens contamination. Distribution shift from sensor degradation is a significant production risk.
  • Edge deployment: Cameras and inference must often run at the edge (no cloud round-trip). Edge hardware: NVIDIA Jetson, Intel OpenVINO, Google Coral. Model compression (quantization, pruning) required.
  • ISO 9001 / IATF 16949: Quality management standards require documented inspection procedures. ML-based inspection requires validation records showing equivalence to manual inspection.

Inputs and Outputs

Input:

images:        High-resolution camera frames (1MP–12MP), greyscale or RGB
trigger:       Line encoder pulse triggers capture
metadata:      product_id, batch_id, station_id, timestamp, shift
reference:     Golden sample image for comparison (optional)

Output:

decision:         PASS / FAIL / MARGINAL (human review)
defect_class:     CRACK / SCRATCH / CONTAMINATION / MISSING_COMPONENT / ...
defect_location:  Bounding box or segmentation mask on image
confidence:       P(defect) ∈ [0, 1]
audit_record:     Image + annotation stored for traceability

Decision or Workflow Role

Camera captures product image (triggered by conveyor encoder)
  ↓
Image preprocessing: normalisation, crop to ROI, white balance correction
  ↓
Inference on edge device (<50ms budget)
  ↓
PASS:     Product continues down line
FAIL:     Pneumatic reject gate activates
MARGINAL: Image queued for human review within 30s
  ↓
Review outcome logged → retraining dataset
  ↓
Weekly: retrain cycle with new confirmed defect examples

Modeling / System Options

ApproachStrengthWeaknessWhen to use
CNN classification (ResNet, EfficientNet)High accuracy with sufficient labelled dataRequires many defect examplesMature product line with rich defect history
Anomaly detection (PatchCore, PADIM)No defect labels required; detects novel defectsHigher FPR; harder to classify defect typeFew labelled defects; novel defect types
Object detection (YOLO)Locates and classifies multiple defects in one passRequires bounding box annotationsMultiple defect types in one image
Semantic segmentation (U-Net)Pixel-level defect mappingAnnotation cost; slower inferencePrecise area measurement required
Traditional CV (template matching, edge detection)No training needed; fast; interpretableFragile to variation; limited to simple defectsSimple go/no-go checks; quick deployment

Recommended: PatchCore or PaDiM for initial deployment (no defect labels needed). Transition to fine-tuned EfficientNet-B4 as defect dataset grows. YOLO for multi-defect detection on complex assemblies.

Deployment Constraints

  • Inference latency: < 50ms at the edge. Use TensorRT or OpenVINO for optimisation.
  • Edge hardware: NVIDIA Jetson AGX Orin (50–100ms), Intel NUC + OpenVINO (30–80ms), Google Coral (~5ms for small models).
  • Model size: Quantise to INT8 for edge. Accuracy drop typically <1% vs FP32.
  • Connectivity: Edge inference with cloud sync for retraining data. Must function offline.
  • Traceability: Every inspection decision must be stored with product ID for quality audits. Retention: typically 3–10 years.

Risks and Failure Modes

RiskDescriptionMitigation
Lighting changeSensor/bulb degradation changes image distribution → model failsAutomated lighting calibration; distribution shift monitoring
Novel defect typeNew defect not in training data → classified as PASSAnomaly detection as safety layer; low-confidence → MARGINAL
Edge device failureInference device crashes → line runs without inspectionHardware watchdog; fallback to manual inspection alert
Overfit to training cameraModel doesn’t generalise to new cameras after equipment upgradeDomain adaptation; camera-specific fine-tuning
Class imbalanceVery few defect examples → model biased toward PASSOversampling; synthetic defect augmentation

Success Metrics

MetricTargetNotes
Defect detection recall> 99%Miss rate drives customer escapes
False positive rate< 0.5%Reject yield loss: economic cost
Inference latency P99< 50msLine speed constraint
Customer escape rate< 10 PPMParts Per Million defective at customer
Inspector workload reduction> 80%Operational efficiency
Model uptime> 99.5%Reliability SLA

References

  • Bergmann, P. et al. (2020). Uninformed Students: Student-Teacher Anomaly Detection. CVPR.
  • Roth, K. et al. (2022). Towards Total Recall in Industrial Anomaly Detection. (PatchCore)

Modeling

Reference Implementations

Adjacent Applications