Product Recommendation

Problem

Surface the products a user is most likely to purchase from a catalogue of thousands to millions of items. Recommendations appear on: homepage, product detail page (“you may also like”), cart page (“frequently bought together”), email campaigns, and search results. The system must balance relevance, diversity, novelty, and business objectives (margin, inventory clearance, promoted products).

Users / Stakeholders

RoleDecision / Action
ShopperDiscovers products they would not have searched for
MerchandiserAllocates promotional slots; monitors recommendation diversity
Category managerEnsures recommendations support category strategy
Revenue teamTracks incremental revenue attributed to recommendations

Domain Context

  • Catalogue scale: 10K–100M items. Full pairwise scoring is infeasible. Two-stage architecture (retrieval → ranking) is standard.
  • Data sparsity: Most users interact with <0.01% of the catalogue. Collaborative filtering suffers from cold start for new users and items.
  • Position bias: Items shown in top positions receive more clicks regardless of quality. Training on raw click data encodes this bias — debiasing (inverse propensity scoring) is required.
  • Seasonality: Holiday periods, fashion seasons, promotional events shift preferences dramatically.
  • Regulatory: In some markets (EU Digital Markets Act), algorithmic ranking of third-party products must be disclosed. Personalisation based on sensitive attributes (health, religion) is restricted.
  • Business constraints: Contracts with brands may require minimum impressions for sponsored products. Margin rules may suppress low-margin recommendations.

Inputs and Outputs

Input signals:

User signals: purchase history, browse history, wishlist, ratings, returns
Item signals: category, price, brand, ratings, inventory level, launch date
Context: device, location, time of day, session context, referral source
Interaction signals: clicks, add-to-cart, purchases, dwell time, scroll depth
Business rules: promoted products, inventory thresholds, brand partnerships

Output:

Ranked item list: [item_id_1, item_id_2, ..., item_id_N] with relevance scores
Explanation (optional): "Because you bought X" or "Trending in your region"
Metadata: source (collaborative / content / trending / sponsored)

Decision or Workflow Role

[Retrieval stage] 
  → Candidate generation: ANN search over item embeddings (top 200–1000)
  → Collaborative filtering recall (user-item similarity)
  → Content-based recall (item attribute similarity)

[Ranking stage]
  → Feature assembly (user + item + context features)  
  → Learning-to-Rank model (LambdaMART / neural ranking)
  → Business rule overlay (suppress OOS, boost promoted)

[Presentation]
  → A/B test slot allocation
  → Diversity post-filter (MMR or intent-aware diversification)
  → Logging for feedback loop

Modeling / System Options

ApproachStrengthWeaknessWhen to use
Matrix Factorisation (SVD, ALS)Simple, well-understood, handles sparsityNo context features; cold start; staticBaseline collaborative filtering
Neural CF (NCF, LightFM)Learns non-linear interactions; can incorporate featuresTraining cost; harder to explainMedium-scale, some feature richness
Two-tower (DSSM)Fast ANN retrieval; user + item embedding spacesContext not captured in towerLarge-scale retrieval stage
LambdaMART (LTR)Directly optimises ranking metrics (NDCG); feature-richNot end-to-end; requires good candidate setRanking stage after retrieval
Session-based (BERT4Rec, SASRec)Captures in-session intent; no long-term profile neededCold start if no sessionShort-session or anonymous users
GNN (PinSage)Graph structure (bought together) improves retrievalComplex to build and maintainPinterest/Amazon scale, explicit graph data

Recommended: Two-tower retrieval → LambdaMART re-ranker for most e-commerce contexts. Session-based model for anonymous/new user fallback.

Deployment Constraints

  • Latency: 50–150ms p99 for ranked results. Retrieval stage (ANN) must complete in <20ms.
  • Throughput: Homepage impressions for large retailers: 10K–100K QPS during peak.
  • Freshness: Item embeddings need nightly refresh for new items. User embeddings updated daily or real-time for session signals.
  • Cold start fallback: New users → trending + editorial curated lists. New items → content-based embedding from product attributes.
  • Explainability: “Customers who viewed X also viewed Y” — simple rule-based explanation overlaid on model results. Not legally mandated but improves user trust.

Risks and Failure Modes

RiskDescriptionMitigation
Filter bubbleRecommending only items similar to past — reduces discoveryDiversity constraints; novelty penalty in ranking objective
Popularity biasPopular items recommended everywhere — rich get richerDebiasing; long-tail exposure metrics
Position bias in trainingModel learns to recommend what was shown, not what was preferredIPS weighting; counterfactual evaluation
Cold startNew users / items have no signalFallback rules; content-based embeddings; onboarding flow
Out-of-stock recommendationsRecommending unavailable items damages UXReal-time inventory filter in ranking stage
Feedback loop collapseModel trains on biased outputs → amplifies initial errorsOffline evaluation + holdout group; exploration policy

Success Metrics

MetricTargetNotes
Click-through rate (CTR)+10–30% vs random baselinePrimary engagement metric
Add-to-cart rate+5–15% lift in experimentStronger purchase intent signal
Revenue per session+2–8% incrementalTrue business outcome; requires A/B test
Coverage (catalogue %)> 20% of items recommended per dayPrevents long-tail neglect
NDCG@10Offline ranking qualityProxy for online performance
Diversity (ILD)> 0.6 intra-list diversityUser satisfaction; reduces abandonment

References

  • Koren, Y. et al. (2009). Matrix Factorization Techniques for Recommender Systems. IEEE Computer.
  • Yi, X. et al. (2019). Sampling-Bias-Corrected Neural Modeling for Large Corpus Item Recommendations. RecSys.

Modeling

AI Engineering

ML Engineering

Reference Implementations

Adjacent Applications