Anatomy of an eCommerce Search Engine by Mayur Datar

25

Transcript of Anatomy of an eCommerce Search Engine by Mayur Datar

Page 1: Anatomy of an eCommerce Search Engine by Mayur Datar
Page 2: Anatomy of an eCommerce Search Engine by Mayur Datar
Page 3: Anatomy of an eCommerce Search Engine by Mayur Datar

● Search is one of the most important discovery tools in E-commerce.

● Powers other features like merchandising (promotions), recommendations etc.

● Accounts for big fraction of the units sold and GMV.

Page 4: Anatomy of an eCommerce Search Engine by Mayur Datar

● Important signals that affect search: Price, offers, popularity, availability, serviceability etc.

● Used in ranking of products.

● Exposed as filters and sorts to end users.

● These signals are very dynamic, particularly during sales.

Page 5: Anatomy of an eCommerce Search Engine by Mayur Datar

● E-commerce search != websearch.● Documents have a structure to them● Queries have an implicit structure

● Challenges:○ Large document collection with a long heavy tail○ Extremely high rate of changes/updates (Thousands per sec)○ Geo specific ranking○ Multi-objective optimization (GMV, Units, Ads revenue, Long

Term Value)

● Opportunities:○ Broad queries: personalization can play a huge role

Page 6: Anatomy of an eCommerce Search Engine by Mayur Datar

● Queries per day: XXX Millions / week● Latencies:

○ Average: ~ 100 ms○ Median: ~ 50 ms○ 90th percentile: ~ 500 ms

● Documents retrieved and scored from index:○ Median: 1K to 10K○ 95th percentile: 200K to 500K○ 99th percentile: 500K to 3M+

● Search CTR: Around 50%

Page 7: Anatomy of an eCommerce Search Engine by Mayur Datar

● Architectural overview of the search platform○ Serving and Ingestion○ Serving functional view○ Serving architectural view○ Ingestion architectural view○ Example ingestion topology

● Search quality○ Challenges○ Life of a query: Typical flow for query understanding○ Illustrative problems

Page 8: Anatomy of an eCommerce Search Engine by Mayur Datar

● 1,000,000 Compute Cores● 2.56 Petabytes RAM● 120 Petabytes Disk

Storage● 1 Petabytes NVMe SSD● 128 Tbps bisection

bandwidth Clos network

Page 9: Anatomy of an eCommerce Search Engine by Mayur Datar
Page 10: Anatomy of an eCommerce Search Engine by Mayur Datar

Query Rewriter(Spell Check, Concept, NLP, Intent, Augmentation,Retrieval/Scoring query formulation)

Reverse Proxy(Geo Coding, User Context, Caching, Isolation, Rate Limit, Tee-off test framework)

Search Broker(Distributed Search across shards, Blending Of Results from shards)

Searcher(Matching, Scoring, Faceting, Top-K Retrieval (pass-1 ranking))

Text index NRT index

Metadata

Re-ranking(Pass-2 Ranking) - ML Model

Pluggable Ranking Models

Pluggable Rewriter Modules

Page 11: Anatomy of an eCommerce Search Engine by Mayur Datar

Serving:Arch View

Page 12: Anatomy of an eCommerce Search Engine by Mayur Datar
Page 13: Anatomy of an eCommerce Search Engine by Mayur Datar
Page 14: Anatomy of an eCommerce Search Engine by Mayur Datar

● Architectural overview of the search platform○ Serving and Ingestion○ Serving functional view○ Serving architectural view○ Ingestion architectural view○ Example ingestion topology

● Search quality○ Challenges○ Life of a query: Typical flow for query understanding○ Illustrative problems

Page 15: Anatomy of an eCommerce Search Engine by Mayur Datar

● Marketplace○ Catalog entries vary in quality from seller to seller. Spam is

rampant.● Diversity of users● Mobile heavy users: Real estate on UI● Poor internet connectivity

Page 16: Anatomy of an eCommerce Search Engine by Mayur Datar

● Literacy/Internet awareness● Language● Economic power● Regional preferences

Abstraction: City-tier

Query/Intent SolicitationResult Presentation

Product Ranking

Page 17: Anatomy of an eCommerce Search Engine by Mayur Datar

40% increase in proportion of tier-3 customers vis-a-vis metro

Page 18: Anatomy of an eCommerce Search Engine by Mayur Datar

Query: samsang

Relative ratio of query Tier-3 Vs Metro: 1.8

Query: jins

Relative ratio of query Tier-3 Vs Metro: 2.2

Page 19: Anatomy of an eCommerce Search Engine by Mayur Datar
Page 20: Anatomy of an eCommerce Search Engine by Mayur Datar

Query Scoring

Normalisation (Index time as well)

- String clean-up- lower

Spell Correction- Resource-based

- term->term- Query->query

- Online

Init Context

Phrasing (Index time as well)

- Frequent bi/tri grams

Stemming (Index time as well)

- Core e-commerce stemmer

- plurals

Common MetaData Store (Query Level)- Raw Data: metrics (CTR, Impression, NDCG…)- Derived Data: Store, LM score, Features

Synonyms- Resource-based

Intent- Deductions- Tagging (CRF)

Query Rewrite- Best query selection- Partial match

SOLR interface

Query Understanding Output Generator

FDP

Retrieval ranking logic

Store Classifier

Query LMFeature Store

Classification

Page 21: Anatomy of an eCommerce Search Engine by Mayur Datar

• Special patterns:– Segmented words: lgnexus5Counting: “samsang” & no-click followed by “samsung”& click a million times– Context aware counting

• Language modeling and edit distance• Term to vector models in deep learning.

Specific

General

Page 22: Anatomy of an eCommerce Search Engine by Mayur Datar

● Intent: From query tokens to (implicit) attributes that are represented by those tokens

● Examples:○ “red tape shoes” -> (brand) “red tape” (store) “shoes”○ “kids party dress 4-5 years pack of 2” -> (ideal_for) “kids”

(occasion) “party” (store) “dress” (size) “4-5 years” (pack_of) “pack of 2”

○ “samsung e6 cases” -> (“compatible_with”) “samsung e6” (store) “cases”

● Memorization, Language modeling, CRF

Page 23: Anatomy of an eCommerce Search Engine by Mayur Datar

Past orders Product Views

Users’ activity on the platform

Customised Search Ranking for User-segment

Page 24: Anatomy of an eCommerce Search Engine by Mayur Datar

economical expensive

shoes

watches

Past orders Product Views

5 price ranges defined for each vertical.

1 2 3 4 5

User-Segments based on price affinities

Users’ past activity on the platform.

Customised Search Ranking for each User-segment

Price Personalization

# of

use

rs

Page 25: Anatomy of an eCommerce Search Engine by Mayur Datar