Deep Learning for Object Classification in Retail …cs229.stanford.edu › proj2016 › poster ›...
Transcript of Deep Learning for Object Classification in Retail …cs229.stanford.edu › proj2016 › poster ›...
Deep Learning for Object Classif icat ion in Retai l StoresI d a w a t i B u s t a n , M e r v y n W e e , T i m o t h y C h o n g { i d a w a t i , w y r m , t i m c h o n g } @ s t a n f o r d . e d u S t a n f o r d U n i v e r s i t y
I n t r o d u c t i o nBeing able to recognize products f rom images would beuseful in automat ing inventory management for there ta i l space . We took pic tures of products on shelves ins tores and inves t igated the use of a Convoluted NeuralNetwork (CNN) in accurate ly recogniz ing theseproducts . By examining the way in which we const ructedthe t ra in ing data , the CNN obtained good accuracy notonly in recogniz ing products that i t had seen before , buta lso in products that i t had never seen before .
Mode lThe reta i l s tores we took images f rom had a l imi tedvar ie ty of products , leading to a smal l t ra in ing set . Weused transfer learning to overcome this problem.
Transfer learning al lowed us to take a CNN ful ly t ra inedon another se t of c lasses , and ret ra in jus t the f inal layerto sui t our purposes (Torrey, 2016) . In effec t , i t tookless t ime to t ra in our model , a l lowed us to use a smal ldatase t (Soekhoe, 2016) .
Da t aRetai l s tores havethousands ofproducts . To star t , wetargeted class i fy ingeight c lasses ofproducts .
We const ructed 3 CNN models that each received adifferent t ra in ing set . The ra t ionale behind this was tosee how a mix of in-s tore and outs ide data could help ingenera l iz ing the model ( for fu ture products) and yetmake i t speci f ic to class i fy current products
Al l 3 models weretes ted on the sametes t se t in order tocompare whichmodel per formedbet ter.
Despi te M3 performing the bes t , we found that manypredic t ions had a probabi l i ty of 20% to 40%, even ifthese predic t ions were correct . The f i r s t s tep we wouldtake would be to increase the conf idence in thesepredic t ions so that the model would be more wel lt ra ined . This could be done my tra in ing i t on more dataor increas ing the epochs when tra in ing the CNN.
Resu l tM3 made the bes t predic t ions on seen and unseenproducts , fo l lowed by M1, and then M2.
D i s cu s s i o nWe had expected M3 to perform the bes t , fo l lowed byM1 and then M2. We were worr ied that the lack ofvar ie ty of products in re ta i l s tores meant that M1 wouldhave low accuracy on products i t has not seen before .M3 proved that adding images of products f rom theInternet to the t ra in ing set would increase the accuracyof such predic t ions .
F u t u r e
Confus ion Matr ix (25 seen & 25 unseen i tems permodel)
Can Tape Drill Hanger Paint Hammer Axe Shower
Can 9 0 0 0 0 0 0 0
Tape 11 25 0 0 0 0 0 0
Drill 0 0 25 2 0 0 0 0
Hanger
0 0 0 23 0 0 0 0
Paint
4 0 0 0 25 0 0 0
Hammer
1 0 0 0 0 23 2 0
Axe 0 0 0 0 0 2 23 0
Show
er
0 0 0 0 0 0 0 25
Pred
icted
Actual
Can Tape Drill Hanger Paint Hammer Axe Shower
Can 22 0 0 1 0 2 0 0
Tape 0 14 0 0 0 0 0 0
Drill 0 0 25 1 0 3 0 0
Hanger
0 0 0 18 0 0 0 0
Paint
3 0 0 0 25 0 0 1
Hammer
0 3 0 5 0 20 1 0
Axe 0 0 0 0 0 0 24 0
Show
er
0 8 0 0 0 0 0 24
Pred
icted
Actual
Can Tape Drill Hanger Paint Hammer Axe Shower
Can 14 0 0 0 0 0 0 0
Tape 9 25 0 0 0 0 0 0
Drill 0 0 25 2 0 0 0 0
Hanger
0 0 0 23 0 0 0 0
Paint
2 0 0 0 25 0 0 0
Hammer
0 0 0 0 0 24 1 0
Axe 0 0 0 0 0 1 24 0
Show
er
0 0 0 0 0 0 0 25
Actual
Pred
icted
Can Tape Drill Hanger Paint Hammer Axe Shower
Can 24 0 0 0 0 0 0 0
Tape 0 16 0 0 0 0 0 0
Drill 0 0 25 1 0 0 1 0
Hanger
0 1 0 21 0 0 0 0
Paint
1 0 0 0 25 0 0 1
Hammer
0 2 0 2 0 25 12 0
Axe 0 0 0 0 0 0 12 0
Show
er
0 6 0 1 0 0 0 24
Actual
Pred
icted
Can Tape Drill Hanger Paint Hammer Axe ShowerCan 18 0 0 0 5 3 1 2
Tape 7 25 3 2 0 0 3 1
Drill 0 0 22 0 0 0 0 0
Hanger
0 0 0 8 0 0 0 0
Paint
0 0 0 0 20 0 0 0
Hammer
0 0 0 1 0 12 4 0
Axe 0 0 0 0 0 9 17 0
Show
er
0 0 0 14 0 1 0 22
ActualPred
icted
S e e n i t e m s
U n s e e n i t e m s
M o d e l 1 M o d e l 3M o d e l 2
Can Tape Drill Hanger Paint Hammer Axe Shower
Can 25 0 0 3 2 1 0 2
Tape 0 24 0 0 0 2 0 6
Drill 0 0 25 0 0 0 1 0
Hanger
0 0 0 3 0 0 0 0
Paint
0 0 0 0 23 0 0 0
Hammer
0 1 0 1 0 19 11 0
Axe 0 0 0 1 0 3 13 0
Show
er
0 0 0 17 0 0 0 17
Actual
Pred
icted
Overal l resul ts (50 tes t images per model)
M o d e l 1 M o d e l 3M o d e l 2
R e f e r e n c e
S o ek h o e , D . , van de r P u t t en , P. , and P laa t , A . ( 2016 ) . O n theimpac t o f da t a s e t s i ze in t r ans f e r l ea rn ing us ing deep neu ra lne tw orks .
To r r ey, L . and S h av l ik . J . ( 2009 ) . Tr an s f e r Lea r n in g . I n E .S o r i a , J . M ar t i n , R . M agda lena , M . M ar t inez and A . S e r r an o ,ed i to r s , Handbook o f Research on Mach ine Learn ingApp l i ca t i ons , I G I G loba l 2009 .
D e c e m b e r 1 3 t h 2 0 1 6 , M a c h i n e L e a r n i n g , C o m p u t e r S c i e n c e
We also expected that M2, wouldgive poor predic t ions on imagesbecause images s tore theInternet were s igni f icant lydi fferent f rom images f rom thes tore . M3 was a balanced modelwhich did wel l on seen andunseen tes t images .
For fu ture considera t ion , M2 performed best onproducts that had a speci f ic , shape that was low invar ia t ion .