CI/CD for ML Pipelines

Purpose

Implements continuous integration and continuous delivery for machine learning systems using GitHub Actions. Extends standard software CI/CD with ML-specific gates: data pipeline validation, model evaluation against performance thresholds, and automated model registration. Every pull request that touches training code, features, or model config is validated before merge.

Examples

Training pipeline CI: On every PR touching src/ or configs/, run the full training pipeline on a sample of data, check that validation AUC exceeds a threshold, and report metrics as a PR comment.

Model CD: On merge to main, retrain on full data, register the new model version in MLflow, and deploy to the staging environment as a canary if metrics exceed the champion’s baseline.

Architecture

Repository Layout

.
├── .github/workflows/
│   ├── ci.yml              # lint, unit tests, data pipeline check
│   ├── train_validate.yml  # model training + evaluation gate
│   └── deploy.yml          # model registry update + deployment
├── src/
│   ├── train.py
│   ├── evaluate.py
│   └── serve.py
├── tests/
│   ├── test_features.py    # unit tests for feature engineering
│   └── test_pipeline.py    # integration tests for pipeline stages
├── dvc.yaml                # data pipeline definition
└── params.yaml             # hyperparameters

CI Workflow — Lint, Test, and Data Validation

# .github/workflows/ci.yml
name: CI
 
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
 
env:
  PYTHON_VERSION: "3.11"
 
jobs:
  lint-and-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
 
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: ${{ env.PYTHON_VERSION }}
          cache: pip
 
      - name: Install dependencies
        run: pip install -r requirements-dev.txt
 
      - name: Lint (ruff)
        run: ruff check src/ tests/
 
      - name: Type check (mypy)
        run: mypy src/
 
      - name: Unit tests
        run: pytest tests/test_features.py tests/test_pipeline.py -v --cov=src --cov-report=xml
 
      - name: Upload coverage
        uses: codecov/codecov-action@v4
 
  data-pipeline-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
 
      - name: Set up DVC
        run: pip install dvc dvc-s3
 
      - name: Pull reference data sample
        run: dvc pull data/sample/
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
 
      - name: Check pipeline is reproducible
        run: dvc repro --dry-run

Training and Evaluation Gate

# .github/workflows/train_validate.yml
name: Train and Validate
 
on:
  push:
    branches: [main]
    paths: ["src/**", "configs/**", "params.yaml"]
 
jobs:
  train:
    runs-on: self-hosted   # GPU runner or cloud runner with GPU
    steps:
      - uses: actions/checkout@v4
 
      - name: Pull training data
        run: dvc pull data/splits/
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
 
      - name: Train model
        run: python src/train.py --experiment-name ci-${{ github.sha }} --run-name pr-${{ github.run_number }}
        env:
          MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
 
      - name: Evaluate and gate
        id: evaluate
        run: |
          python src/evaluate.py \
            --run-name pr-${{ github.run_number }} \
            --threshold-auc 0.92 \
            --output metrics.json
          echo "auc=$(jq .val_auc metrics.json)" >> $GITHUB_OUTPUT
        env:
          MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
 
      - name: Comment metrics on PR
        uses: actions/github-script@v7
        with:
          script: |
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: `## Model Evaluation Results\n**Val AUC**: ${{ steps.evaluate.outputs.auc }}\n✅ Passed threshold (0.92)`
            })

Model Registration and Canary Deployment

# .github/workflows/deploy.yml
name: Deploy Model
 
on:
  workflow_run:
    workflows: ["Train and Validate"]
    types: [completed]
    branches: [main]
 
jobs:
  register-and-deploy:
    if: ${{ github.event.workflow_run.conclusion == 'success' }}
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
 
      - name: Register model in MLflow
        run: python scripts/register_model.py --alias challenger
        env:
          MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
 
      - name: Deploy canary (10% traffic)
        run: |
          kubectl set image deployment/fraud-api \
            api=ghcr.io/${{ github.repository }}:${{ github.sha }}
          kubectl annotate deployment/fraud-api \
            "deploy.kubernetes.io/canary-weight=10"
        env:
          KUBECONFIG: ${{ secrets.KUBECONFIG }}
 
      - name: Monitor canary (5 min)
        run: python scripts/monitor_canary.py --duration 300 --max-error-rate 0.02
 
      - name: Promote champion alias
        run: python scripts/promote_model.py --version latest --alias champion
        env:
          MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}

evaluate.py — Performance Gate Script

# src/evaluate.py
import argparse, json, sys
import mlflow
from mlflow.tracking import MlflowClient
 
def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--run-name", required=True)
    parser.add_argument("--threshold-auc", type=float, default=0.90)
    parser.add_argument("--output", default="metrics.json")
    args = parser.parse_args()
 
    client = MlflowClient()
    runs = client.search_runs(
        experiment_ids=["1"],
        filter_string=f"tags.mlflow.runName = '{args.run_name}'",
        order_by=["start_time DESC"],
        max_results=1,
    )
 
    if not runs:
        print(f"No run found for {args.run_name}", file=sys.stderr)
        sys.exit(1)
 
    metrics = runs[0].data.metrics
    auc = metrics.get("val_auc", 0.0)
 
    with open(args.output, "w") as f:
        json.dump(metrics, f)
 
    if auc < args.threshold_auc:
        print(f"FAIL: val_auc={auc:.4f} < threshold={args.threshold_auc}", file=sys.stderr)
        sys.exit(1)
 
    print(f"PASS: val_auc={auc:.4f}")
 
if __name__ == "__main__":
    main()

References