Deep Learning for Object Classification in Retail …cs229.stanford.edu › proj2016 › poster ›...

1
Deep Learning for Object Classification in Retail Stores Idawati Bustan, Mervyn Wee, Timothy Chong {idawati, wyrm, timchong}@stanford.edu Stanford University Introduction Being able to recognize products from images would be useful in automating inventory management for the retail space. We took pictures of products on shelves in stores and investigated the use of a Convoluted Neural Network (CNN) in accurately recognizing these products. By examining the way in which we constructed the training data, the CNN obtained good accuracy not only in recognizing products that it had seen before, but also in products that it had never seen before. Model The retail stores we took images from had a limited variety of products, leading to a small training set. We used transfer learning to overcome this problem. Transfer learning allowed us to take a CNN fully trained on another set of classes, and retrain just the final layer to suit our purposes (Torrey, 2016). In effect, it took less time to train our model, allowed us to use a small dataset(Soekhoe, 2016). Data Retail stores have thousands of products. To start, we targeted classifying eight classes of products. We constructed 3 CNN models that each received a different training set. The rationale behind this was to see how a mix of in-store and outside data could help in generalizing the model (for future products) and yet make it specific to classify current products All 3 models were tested on the same test set in order to compare which model performed better. Despite M3 performing the best, we found that many predictions had a probability of 20% to 40%, even if these predictions were correct. The first step we would take would be to increase the confidence in these predictions so that the model would be more well trained. This could be done my training it on more data or increasing the epochs when training the CNN. Result M3 made the best predictions on seen and unseen products, followed by M1, and then M2. Discussion We had expected M3 to perform the best, followed by M1 and then M2. We were worried that the lack of variety of products in retail stores meant that M1 would have low accuracy on products it has not seen before. M3 proved that adding images of products from the Internet to the training set would increase the accuracy of such predictions. Future Confusion Matrix (25 seen & 25 unseen items per model) Can Tape Drill Hanger Paint Hammer Axe Shower Can 9 0 0 0 0 0 0 0 Tape 11 25 0 0 0 0 0 0 Drill 0 0 25 2 0 0 0 0 Hanger 0 0 0 23 0 0 0 0 Paint 4 0 0 0 25 0 0 0 Hamme 1 0 0 0 0 23 2 0 Axe 0 0 0 0 0 2 23 0 Shower 0 0 0 0 0 0 0 25 Predicted Actual Can Tape Drill Hanger Paint Hammer Axe Shower Can 22 0 0 1 0 2 0 0 Tape 0 14 0 0 0 0 0 0 Drill 0 0 25 1 0 3 0 0 Hanger 0 0 0 18 0 0 0 0 Paint 3 0 0 0 25 0 0 1 Hamme 0 3 0 5 0 20 1 0 Axe 0 0 0 0 0 0 24 0 Shower 0 8 0 0 0 0 0 24 Predicted Actual Can Tape Drill Hanger Paint Hammer Axe Shower Can 14 0 0 0 0 0 0 0 Tape 9 25 0 0 0 0 0 0 Drill 0 0 25 2 0 0 0 0 Hanger 0 0 0 23 0 0 0 0 Paint 2 0 0 0 25 0 0 0 Hamme 0 0 0 0 0 24 1 0 Axe 0 0 0 0 0 1 24 0 Shower 0 0 0 0 0 0 0 25 Actual Predicted Can Tape Drill Hanger Paint Hammer Axe Shower Can 24 0 0 0 0 0 0 0 Tape 0 16 0 0 0 0 0 0 Drill 0 0 25 1 0 0 1 0 Hanger 0 1 0 21 0 0 0 0 Paint 1 0 0 0 25 0 0 1 Hamme 0 2 0 2 0 25 12 0 Axe 0 0 0 0 0 0 12 0 Shower 0 6 0 1 0 0 0 24 Actual Predicted Can Tape Drill Hanger Paint Hammer Axe Shower Can 18 0 0 0 5 3 1 2 Tape 7 25 3 2 0 0 3 1 Drill 0 0 22 0 0 0 0 0 Hanger 0 0 0 8 0 0 0 0 Paint 0 0 0 0 20 0 0 0 Hamme 0 0 0 1 0 12 4 0 Axe 0 0 0 0 0 9 17 0 Shower 0 0 0 14 0 1 0 22 Actual Predicted Seen items Unseen items Model 1 Model 3 Model 2 Can Tape Drill Hanger Paint Hammer Axe Shower Can 25 0 0 3 2 1 0 2 Tape 0 24 0 0 0 2 0 6 Drill 0 0 25 0 0 0 1 0 Hanger 0 0 0 3 0 0 0 0 Paint 0 0 0 0 23 0 0 0 Hamme 0 1 0 1 0 19 11 0 Axe 0 0 0 1 0 3 13 0 Shower 0 0 0 17 0 0 0 17 Actual Predicted Overall results (50 test images per model) Model 1 Model 3 Model 2 Reference Soekhoe, D., van der Putten, P., and Plaat, A. (2016). On the impact of data set size in transfer learning using deep neural networks. Torrey, L. and Shavlik. J. (2009). Transfer Learning . In E. Soria, J. Martin, R. Magdalena, M. Martinez and A. Serrano, editors, Handbook of Research on Machine Learning Applications , IGI Global 2009. December 13 th 2016, Machine Learning, Computer Science We also expected that M2, would give poor predictions on images because images store the Internet were significantly different from images from the store. M3 was a balanced model which did well on seen and unseen test images. For future consideration, M2 performed best on products that had a specific, shape that was low in variation.

Transcript of Deep Learning for Object Classification in Retail …cs229.stanford.edu › proj2016 › poster ›...

Page 1: Deep Learning for Object Classification in Retail …cs229.stanford.edu › proj2016 › poster › WeeChongBustan...Deep Learning for Object Classification in Retail Stores IdawatiBustan,

Deep Learning for Object Classif icat ion in Retai l StoresI d a w a t i B u s t a n , M e r v y n W e e , T i m o t h y C h o n g { i d a w a t i , w y r m , t i m c h o n g } @ s t a n f o r d . e d u S t a n f o r d U n i v e r s i t y

I n t r o d u c t i o nBeing able to recognize products f rom images would beuseful in automat ing inventory management for there ta i l space . We took pic tures of products on shelves ins tores and inves t igated the use of a Convoluted NeuralNetwork (CNN) in accurate ly recogniz ing theseproducts . By examining the way in which we const ructedthe t ra in ing data , the CNN obtained good accuracy notonly in recogniz ing products that i t had seen before , buta lso in products that i t had never seen before .

Mode lThe reta i l s tores we took images f rom had a l imi tedvar ie ty of products , leading to a smal l t ra in ing set . Weused transfer learning to overcome this problem.

Transfer learning al lowed us to take a CNN ful ly t ra inedon another se t of c lasses , and ret ra in jus t the f inal layerto sui t our purposes (Torrey, 2016) . In effec t , i t tookless t ime to t ra in our model , a l lowed us to use a smal ldatase t (Soekhoe, 2016) .

Da t aRetai l s tores havethousands ofproducts . To star t , wetargeted class i fy ingeight c lasses ofproducts .

We const ructed 3 CNN models that each received adifferent t ra in ing set . The ra t ionale behind this was tosee how a mix of in-s tore and outs ide data could help ingenera l iz ing the model ( for fu ture products) and yetmake i t speci f ic to class i fy current products

Al l 3 models weretes ted on the sametes t se t in order tocompare whichmodel per formedbet ter.

Despi te M3 performing the bes t , we found that manypredic t ions had a probabi l i ty of 20% to 40%, even ifthese predic t ions were correct . The f i r s t s tep we wouldtake would be to increase the conf idence in thesepredic t ions so that the model would be more wel lt ra ined . This could be done my tra in ing i t on more dataor increas ing the epochs when tra in ing the CNN.

Resu l tM3 made the bes t predic t ions on seen and unseenproducts , fo l lowed by M1, and then M2.

D i s cu s s i o nWe had expected M3 to perform the bes t , fo l lowed byM1 and then M2. We were worr ied that the lack ofvar ie ty of products in re ta i l s tores meant that M1 wouldhave low accuracy on products i t has not seen before .M3 proved that adding images of products f rom theInternet to the t ra in ing set would increase the accuracyof such predic t ions .

F u t u r e

Confus ion Matr ix (25 seen & 25 unseen i tems permodel)

Can Tape Drill Hanger Paint Hammer Axe Shower

Can 9 0 0 0 0 0 0 0

Tape 11 25 0 0 0 0 0 0

Drill 0 0 25 2 0 0 0 0

Hanger

0 0 0 23 0 0 0 0

Paint

4 0 0 0 25 0 0 0

Hammer

1 0 0 0 0 23 2 0

Axe 0 0 0 0 0 2 23 0

Show

er

0 0 0 0 0 0 0 25

Pred

icted

Actual

Can Tape Drill Hanger Paint Hammer Axe Shower

Can 22 0 0 1 0 2 0 0

Tape 0 14 0 0 0 0 0 0

Drill 0 0 25 1 0 3 0 0

Hanger

0 0 0 18 0 0 0 0

Paint

3 0 0 0 25 0 0 1

Hammer

0 3 0 5 0 20 1 0

Axe 0 0 0 0 0 0 24 0

Show

er

0 8 0 0 0 0 0 24

Pred

icted

Actual

Can Tape Drill Hanger Paint Hammer Axe Shower

Can 14 0 0 0 0 0 0 0

Tape 9 25 0 0 0 0 0 0

Drill 0 0 25 2 0 0 0 0

Hanger

0 0 0 23 0 0 0 0

Paint

2 0 0 0 25 0 0 0

Hammer

0 0 0 0 0 24 1 0

Axe 0 0 0 0 0 1 24 0

Show

er

0 0 0 0 0 0 0 25

Actual

Pred

icted

Can Tape Drill Hanger Paint Hammer Axe Shower

Can 24 0 0 0 0 0 0 0

Tape 0 16 0 0 0 0 0 0

Drill 0 0 25 1 0 0 1 0

Hanger

0 1 0 21 0 0 0 0

Paint

1 0 0 0 25 0 0 1

Hammer

0 2 0 2 0 25 12 0

Axe 0 0 0 0 0 0 12 0

Show

er

0 6 0 1 0 0 0 24

Actual

Pred

icted

Can Tape Drill Hanger Paint Hammer Axe ShowerCan 18 0 0 0 5 3 1 2

Tape 7 25 3 2 0 0 3 1

Drill 0 0 22 0 0 0 0 0

Hanger

0 0 0 8 0 0 0 0

Paint

0 0 0 0 20 0 0 0

Hammer

0 0 0 1 0 12 4 0

Axe 0 0 0 0 0 9 17 0

Show

er

0 0 0 14 0 1 0 22

ActualPred

icted

S e e n i t e m s

U n s e e n i t e m s

M o d e l 1 M o d e l 3M o d e l 2

Can Tape Drill Hanger Paint Hammer Axe Shower

Can 25 0 0 3 2 1 0 2

Tape 0 24 0 0 0 2 0 6

Drill 0 0 25 0 0 0 1 0

Hanger

0 0 0 3 0 0 0 0

Paint

0 0 0 0 23 0 0 0

Hammer

0 1 0 1 0 19 11 0

Axe 0 0 0 1 0 3 13 0

Show

er

0 0 0 17 0 0 0 17

Actual

Pred

icted

Overal l resul ts (50 tes t images per model)

M o d e l 1 M o d e l 3M o d e l 2

R e f e r e n c e

S o ek h o e , D . , van de r P u t t en , P. , and P laa t , A . ( 2016 ) . O n theimpac t o f da t a s e t s i ze in t r ans f e r l ea rn ing us ing deep neu ra lne tw orks .

To r r ey, L . and S h av l ik . J . ( 2009 ) . Tr an s f e r Lea r n in g . I n E .S o r i a , J . M ar t i n , R . M agda lena , M . M ar t inez and A . S e r r an o ,ed i to r s , Handbook o f Research on Mach ine Learn ingApp l i ca t i ons , I G I G loba l 2009 .

D e c e m b e r 1 3 t h 2 0 1 6 , M a c h i n e L e a r n i n g , C o m p u t e r S c i e n c e

We also expected that M2, wouldgive poor predic t ions on imagesbecause images s tore theInternet were s igni f icant lydi fferent f rom images f rom thes tore . M3 was a balanced modelwhich did wel l on seen andunseen tes t images .

For fu ture considera t ion , M2 performed best onproducts that had a speci f ic , shape that was low invar ia t ion .