Deep Learning for Object Classification in Retail …cs229.stanford.edu › proj2016 › poster ›...

Post on 27-Jun-2020

7 views 0 download

Transcript of Deep Learning for Object Classification in Retail …cs229.stanford.edu › proj2016 › poster ›...

Deep Learning for Object Classif icat ion in Retai l StoresI d a w a t i B u s t a n , M e r v y n W e e , T i m o t h y C h o n g { i d a w a t i , w y r m , t i m c h o n g } @ s t a n f o r d . e d u S t a n f o r d U n i v e r s i t y

I n t r o d u c t i o nBeing able to recognize products f rom images would beuseful in automat ing inventory management for there ta i l space . We took pic tures of products on shelves ins tores and inves t igated the use of a Convoluted NeuralNetwork (CNN) in accurate ly recogniz ing theseproducts . By examining the way in which we const ructedthe t ra in ing data , the CNN obtained good accuracy notonly in recogniz ing products that i t had seen before , buta lso in products that i t had never seen before .

Mode lThe reta i l s tores we took images f rom had a l imi tedvar ie ty of products , leading to a smal l t ra in ing set . Weused transfer learning to overcome this problem.

Transfer learning al lowed us to take a CNN ful ly t ra inedon another se t of c lasses , and ret ra in jus t the f inal layerto sui t our purposes (Torrey, 2016) . In effec t , i t tookless t ime to t ra in our model , a l lowed us to use a smal ldatase t (Soekhoe, 2016) .

Da t aRetai l s tores havethousands ofproducts . To star t , wetargeted class i fy ingeight c lasses ofproducts .

We const ructed 3 CNN models that each received adifferent t ra in ing set . The ra t ionale behind this was tosee how a mix of in-s tore and outs ide data could help ingenera l iz ing the model ( for fu ture products) and yetmake i t speci f ic to class i fy current products

Al l 3 models weretes ted on the sametes t se t in order tocompare whichmodel per formedbet ter.

Despi te M3 performing the bes t , we found that manypredic t ions had a probabi l i ty of 20% to 40%, even ifthese predic t ions were correct . The f i r s t s tep we wouldtake would be to increase the conf idence in thesepredic t ions so that the model would be more wel lt ra ined . This could be done my tra in ing i t on more dataor increas ing the epochs when tra in ing the CNN.

Resu l tM3 made the bes t predic t ions on seen and unseenproducts , fo l lowed by M1, and then M2.

D i s cu s s i o nWe had expected M3 to perform the bes t , fo l lowed byM1 and then M2. We were worr ied that the lack ofvar ie ty of products in re ta i l s tores meant that M1 wouldhave low accuracy on products i t has not seen before .M3 proved that adding images of products f rom theInternet to the t ra in ing set would increase the accuracyof such predic t ions .

F u t u r e

Confus ion Matr ix (25 seen & 25 unseen i tems permodel)

Can Tape Drill Hanger Paint Hammer Axe Shower

Can 9 0 0 0 0 0 0 0

Tape 11 25 0 0 0 0 0 0

Drill 0 0 25 2 0 0 0 0

Hanger

0 0 0 23 0 0 0 0

Paint

4 0 0 0 25 0 0 0

Hammer

1 0 0 0 0 23 2 0

Axe 0 0 0 0 0 2 23 0

Show

er

0 0 0 0 0 0 0 25

Pred

icted

Actual

Can Tape Drill Hanger Paint Hammer Axe Shower

Can 22 0 0 1 0 2 0 0

Tape 0 14 0 0 0 0 0 0

Drill 0 0 25 1 0 3 0 0

Hanger

0 0 0 18 0 0 0 0

Paint

3 0 0 0 25 0 0 1

Hammer

0 3 0 5 0 20 1 0

Axe 0 0 0 0 0 0 24 0

Show

er

0 8 0 0 0 0 0 24

Pred

icted

Actual

Can Tape Drill Hanger Paint Hammer Axe Shower

Can 14 0 0 0 0 0 0 0

Tape 9 25 0 0 0 0 0 0

Drill 0 0 25 2 0 0 0 0

Hanger

0 0 0 23 0 0 0 0

Paint

2 0 0 0 25 0 0 0

Hammer

0 0 0 0 0 24 1 0

Axe 0 0 0 0 0 1 24 0

Show

er

0 0 0 0 0 0 0 25

Actual

Pred

icted

Can Tape Drill Hanger Paint Hammer Axe Shower

Can 24 0 0 0 0 0 0 0

Tape 0 16 0 0 0 0 0 0

Drill 0 0 25 1 0 0 1 0

Hanger

0 1 0 21 0 0 0 0

Paint

1 0 0 0 25 0 0 1

Hammer

0 2 0 2 0 25 12 0

Axe 0 0 0 0 0 0 12 0

Show

er

0 6 0 1 0 0 0 24

Actual

Pred

icted

Can Tape Drill Hanger Paint Hammer Axe ShowerCan 18 0 0 0 5 3 1 2

Tape 7 25 3 2 0 0 3 1

Drill 0 0 22 0 0 0 0 0

Hanger

0 0 0 8 0 0 0 0

Paint

0 0 0 0 20 0 0 0

Hammer

0 0 0 1 0 12 4 0

Axe 0 0 0 0 0 9 17 0

Show

er

0 0 0 14 0 1 0 22

ActualPred

icted

S e e n i t e m s

U n s e e n i t e m s

M o d e l 1 M o d e l 3M o d e l 2

Can Tape Drill Hanger Paint Hammer Axe Shower

Can 25 0 0 3 2 1 0 2

Tape 0 24 0 0 0 2 0 6

Drill 0 0 25 0 0 0 1 0

Hanger

0 0 0 3 0 0 0 0

Paint

0 0 0 0 23 0 0 0

Hammer

0 1 0 1 0 19 11 0

Axe 0 0 0 1 0 3 13 0

Show

er

0 0 0 17 0 0 0 17

Actual

Pred

icted

Overal l resul ts (50 tes t images per model)

M o d e l 1 M o d e l 3M o d e l 2

R e f e r e n c e

S o ek h o e , D . , van de r P u t t en , P. , and P laa t , A . ( 2016 ) . O n theimpac t o f da t a s e t s i ze in t r ans f e r l ea rn ing us ing deep neu ra lne tw orks .

To r r ey, L . and S h av l ik . J . ( 2009 ) . Tr an s f e r Lea r n in g . I n E .S o r i a , J . M ar t i n , R . M agda lena , M . M ar t inez and A . S e r r an o ,ed i to r s , Handbook o f Research on Mach ine Learn ingApp l i ca t i ons , I G I G loba l 2009 .

D e c e m b e r 1 3 t h 2 0 1 6 , M a c h i n e L e a r n i n g , C o m p u t e r S c i e n c e

We also expected that M2, wouldgive poor predic t ions on imagesbecause images s tore theInternet were s igni f icant lydi fferent f rom images f rom thes tore . M3 was a balanced modelwhich did wel l on seen andunseen tes t images .

For fu ture considera t ion , M2 performed best onproducts that had a speci f ic , shape that was low invar ia t ion .