Shipment address classification in logistics, Ravindra Babu, Flipkart
-
Upload
mohit-ranjan -
Category
Data & Analytics
-
view
181 -
download
1
Transcript of Shipment address classification in logistics, Ravindra Babu, Flipkart
Motivation, Problem Definition and Solution OverviewData Challenges, Modeling, Solutions and Deployment
Summary
Shipment Address Classification in Logistics inthe absence of Geolocation Information
Dr. T. Ravindra Babu,Data Scientist,
Flipkart
August 1, 2015
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of Geolocation Information
Motivation, Problem Definition and Solution OverviewData Challenges, Modeling, Solutions and Deployment
Summary
Presentation Plan
Motivation, Problem Definition and Solution Overview
Data Challenges, Modeling, Solutions and Deployment
Summary
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of Geolocation Information
Motivation, Problem Definition and Solution OverviewData Challenges, Modeling, Solutions and Deployment
Summary
Motivation and Problem Definition
I Motivation
I Problem DefinitionI Typical Operations Scenario at Delivery Hub without a model
I Inscan of shipments received from Mother HubI Manual reading of address; Assign to the Route/FEI Sorting and Delivery
I Overview of Proposed SolutionI Capturing FEs’ domain knowledge and modelling around itI Classifying an address to be belonging to a pre-defined subareaI Allocation of the shipments to Route/FE based on Machine
Learning based ClassifierI Sorting and Delivery
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of Geolocation Information
Motivation, Problem Definition and Solution OverviewData Challenges, Modeling, Solutions and Deployment
Summary
Motivation and Problem Definition
I MotivationI Problem Definition
I Typical Operations Scenario at Delivery Hub without a modelI Inscan of shipments received from Mother HubI Manual reading of address; Assign to the Route/FEI Sorting and Delivery
I Overview of Proposed SolutionI Capturing FEs’ domain knowledge and modelling around itI Classifying an address to be belonging to a pre-defined subareaI Allocation of the shipments to Route/FE based on Machine
Learning based ClassifierI Sorting and Delivery
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of Geolocation Information
Motivation, Problem Definition and Solution OverviewData Challenges, Modeling, Solutions and Deployment
Summary
Motivation and Problem Definition
I MotivationI Problem Definition
I Typical Operations Scenario at Delivery Hub without a modelI Inscan of shipments received from Mother HubI Manual reading of address; Assign to the Route/FEI Sorting and Delivery
I Overview of Proposed SolutionI Capturing FEs’ domain knowledge and modelling around itI Classifying an address to be belonging to a pre-defined subareaI Allocation of the shipments to Route/FE based on Machine
Learning based ClassifierI Sorting and Delivery
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of Geolocation Information
Motivation, Problem Definition and Solution OverviewData Challenges, Modeling, Solutions and Deployment
Summary
Delivery Hub and Subareas
Figure: Hub and SubareasDr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of Geolocation Information
Motivation, Problem Definition and Solution OverviewData Challenges, Modeling, Solutions and Deployment
Summary
Insights into Address Data
I No. of words in an addresses ranges from 4 to 75 leaving fewoutliers of more than 100.
I Word like Apartments is spelt in 263 different ways; whitefield24 ways, industrial 25 ways, Bangalore 161 ways, karnataka70 ways, etc.
I Structure in address is lacking even in city like Bangalore.Few examples.
I Some words a specific to certain places/states. Examples:halli, hobli; bawdi, kuan; society; layout; etc.
I Addressing Systems across the world: US, Europe, Korea,Japan; countries like Brazil, and India
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of Geolocation Information
Motivation, Problem Definition and Solution OverviewData Challenges, Modeling, Solutions and Deployment
Summary
Insights into Address Data
I No. of words in an addresses ranges from 4 to 75 leaving fewoutliers of more than 100.
I Word like Apartments is spelt in 263 different ways; whitefield24 ways, industrial 25 ways, Bangalore 161 ways, karnataka70 ways, etc.
I Structure in address is lacking even in city like Bangalore.Few examples.
I Some words a specific to certain places/states. Examples:halli, hobli; bawdi, kuan; society; layout; etc.
I Addressing Systems across the world: US, Europe, Korea,Japan; countries like Brazil, and India
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of Geolocation Information
Motivation, Problem Definition and Solution OverviewData Challenges, Modeling, Solutions and Deployment
Summary
Insights into Address Data
I No. of words in an addresses ranges from 4 to 75 leaving fewoutliers of more than 100.
I Word like Apartments is spelt in 263 different ways; whitefield24 ways, industrial 25 ways, Bangalore 161 ways, karnataka70 ways, etc.
I Structure in address is lacking even in city like Bangalore.Few examples.
I Some words a specific to certain places/states. Examples:halli, hobli; bawdi, kuan; society; layout; etc.
I Addressing Systems across the world: US, Europe, Korea,Japan; countries like Brazil, and India
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of Geolocation Information
Motivation, Problem Definition and Solution OverviewData Challenges, Modeling, Solutions and Deployment
Summary
Insights into Address Data
I No. of words in an addresses ranges from 4 to 75 leaving fewoutliers of more than 100.
I Word like Apartments is spelt in 263 different ways; whitefield24 ways, industrial 25 ways, Bangalore 161 ways, karnataka70 ways, etc.
I Structure in address is lacking even in city like Bangalore.Few examples.
I Some words a specific to certain places/states. Examples:halli, hobli; bawdi, kuan; society; layout; etc.
I Addressing Systems across the world: US, Europe, Korea,Japan; countries like Brazil, and India
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of Geolocation Information
Motivation, Problem Definition and Solution OverviewData Challenges, Modeling, Solutions and Deployment
Summary
Insights into Address Data
I No. of words in an addresses ranges from 4 to 75 leaving fewoutliers of more than 100.
I Word like Apartments is spelt in 263 different ways; whitefield24 ways, industrial 25 ways, Bangalore 161 ways, karnataka70 ways, etc.
I Structure in address is lacking even in city like Bangalore.Few examples.
I Some words a specific to certain places/states. Examples:halli, hobli; bawdi, kuan; society; layout; etc.
I Addressing Systems across the world: US, Europe, Korea,Japan; countries like Brazil, and India
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of Geolocation Information
Motivation, Problem Definition and Solution OverviewData Challenges, Modeling, Solutions and Deployment
Summary
Proposed Model
Figure: example caption
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of Geolocation Information
Motivation, Problem Definition and Solution OverviewData Challenges, Modeling, Solutions and Deployment
Summary
Preprocessing
I An elaborate preprocessing model was necessary that accountsfor the following.
I Retaining only those terms that possibly help classification(discriminability)
I Merging of terms by empirical statistical models as well asdomain knowledge based rules, n-grams, abbreviating, etc.
I Developing data dependent dictionaries based on patternclustering (Machine Learning) and forming an equivalent set
I Preprocessing reduces the vocabulary size by 65% asmeasured on a large dataset
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of Geolocation Information
Motivation, Problem Definition and Solution OverviewData Challenges, Modeling, Solutions and Deployment
Summary
Preprocessing for Data Compaction
Figure: Impact of Preprocessing
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of Geolocation Information
Motivation, Problem Definition and Solution OverviewData Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing::Fraud Address Classification -Address Strings
Sl.No. Address
1 adf6546s54f6sadfsd6dsa4f6sd54f6sd46fasd54sd6f2 gasdfashagadfasmejastic3 fdgdf4 hjsdhaddsdsasdsa5 dsfadafadsasdfsdafsda6 hjsdhaddsdsasdsa7 asd8 lmflvml9 assasfsafasfsasfsfsafashaphilomena10 faskjbdasdlkjbsaasd
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of Geolocation Information
Motivation, Problem Definition and Solution OverviewData Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing::Fraud Address Classification -Address Strings-Heatmap
Figure: MonkeyType AddressesDr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of Geolocation Information
Motivation, Problem Definition and Solution OverviewData Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing::Fraud Address Classification -Items Bought
Figure: Items bought by such peopleDr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of Geolocation Information
Motivation, Problem Definition and Solution OverviewData Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Probabilistic Separation ofCompound Words
I To a large extent, Addresses are not amenable to EnglishDictionaries
I While writing addresses it is often found that the customereither inadvertently misses the space or removed duringstorage/retrieval
I Separating such compound wordsI Compute empirical probabilities of wordsI Assuming conditional independence, if the joint probability of a
compound word is less than the product of the individualwords, separate the words
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of Geolocation Information
Motivation, Problem Definition and Solution OverviewData Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Probabilistic Separation ofCompound Words
I To a large extent, Addresses are not amenable to EnglishDictionaries
I While writing addresses it is often found that the customereither inadvertently misses the space or removed duringstorage/retrieval
I Separating such compound wordsI Compute empirical probabilities of wordsI Assuming conditional independence, if the joint probability of a
compound word is less than the product of the individualwords, separate the words
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of Geolocation Information
Motivation, Problem Definition and Solution OverviewData Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Probabilistic Separation ofCompound Words
I To a large extent, Addresses are not amenable to EnglishDictionaries
I While writing addresses it is often found that the customereither inadvertently misses the space or removed duringstorage/retrieval
I Separating such compound wordsI Compute empirical probabilities of wordsI Assuming conditional independence, if the joint probability of a
compound word is less than the product of the individualwords, separate the words
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of Geolocation Information
Motivation, Problem Definition and Solution OverviewData Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Frequent Pattern Tree forn-gram Generation
I Frequent pattern tree is a celebrated approach in mining largedatasets
I We implement a modified version of the tree to generaten-grams
I Conventional method
I New approach
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of Geolocation Information
Motivation, Problem Definition and Solution OverviewData Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Frequent Pattern Tree forn-gram Generation
I Frequent pattern tree is a celebrated approach in mining largedatasets
I We implement a modified version of the tree to generaten-grams
I Conventional method
I New approach
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of Geolocation Information
Motivation, Problem Definition and Solution OverviewData Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Frequent Pattern Tree forn-gram Generation
I Frequent pattern tree is a celebrated approach in mining largedatasets
I We implement a modified version of the tree to generaten-grams
I Conventional method
I New approach
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of Geolocation Information
Motivation, Problem Definition and Solution OverviewData Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing :: Frequent Pattern Tree forn-gram Generation
I Frequent pattern tree is a celebrated approach in mining largedatasets
I We implement a modified version of the tree to generaten-grams
I Conventional method
I New approach
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of Geolocation Information
Motivation, Problem Definition and Solution OverviewData Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing::Clustering for equivalent set ofwords with spell variations - Ex. koramangala, electronics
koramanagala koromangala kormanagala koramnagalakoramangalato kanamangala koramanagla koremangalakoaramangala koramamgala karamangala tkoramangalakormangalla koramongala koarmangala korammangalakoramangalla koramangale koramanagal
electronice eclectronic elelctronic eelectronic electronica electroincselectronics electroninc electrinics electroncis electronincs
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of Geolocation Information
Motivation, Problem Definition and Solution OverviewData Challenges, Modeling, Solutions and Deployment
Summary
Models in Preprocessing:: Clustering for ... spell variations- Ex. Bannerghattaroad(61 variations)
bannerghattaroad, bannergattaroad, banerghattaroad, bannerghataroad,
bannerughattaroad, bannarghattaroad, banergattaroad,
banneraghattaroad, bannerghettaroad, bannerugattaroad,
bhannerghattaroad, bennerghattaroad, bannerghttaroad,
bannargattaroad, banarghattaroad, banneghattaroad, banneragattaroad,
bennarghattaroad, baneerghattaroad, bannergettaroad,
banngerghattaroad, banerghataroad, bannerghuttaroad, bannergatharoad,
benerghattaroad, bannerghattaroadto, bannergataroad,
bannergattharoad, banerghettaroad, bannerguttaroad, bannarghataroad,
bannnerghattaroad, bannarghettaroad, banerughattaroad,
bannergahttaroad, bhannerughattaroad, bennergattaroad,
bannerghattroad, bannaraghattaroad, bannerhattaroad,
bannerghatharoad, banneerghattaroad, bannaerghattaroad,
baneergattaroad, bhannergattaroad, bhanerghattaroad,
bannerughataroad, baneerghataroad, bannerghatroad, baneghattaroad,
bannerghtaroad, bannerghatttaroad, bannerghattharoad,
banneraghataroad, bannergahattaroad, bangerghattaroad,
banerghttaroad, bannegattaroad, baneraghattaroad, banngergattaroad,
bannerghatteroad
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of Geolocation Information
Motivation, Problem Definition and Solution OverviewData Challenges, Modeling, Solutions and Deployment
Summary
Models in Post-processing :: Semi-Supervised Methods
Discussion
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of Geolocation Information
Motivation, Problem Definition and Solution OverviewData Challenges, Modeling, Solutions and Deployment
Summary
Revisiting The Model
I Supervised Classification
Figure: example caption
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of Geolocation Information
Motivation, Problem Definition and Solution OverviewData Challenges, Modeling, Solutions and Deployment
Summary
Summary
I NoveltyI Solution is novel and developed in-houseI No similar solution found in the Literature
Dr. T. Ravindra Babu, Data Scientist, Flipkart Shipment Address Classification in Logistics in the absence of Geolocation Information