Download - Reco4J @ London Meetup (June 26th)

Transcript
Page 1: Reco4J @ London Meetup (June 26th)

Alessandro  Negro   Reco4J  Project  @  London  Meetup    -­‐  June  2013  

Reco4J  Project  Intelligent  RecommendaAons  for  

Your  Business  

Page 2: Reco4J @ London Meetup (June 26th)

Alessandro  Negro   Reco4J  Project  @  London  Meetup    -­‐  June  2013   Page  1  

Recommender  Systems  •  A  system  that  can  recommend  or  present  items  to  the  user  based  on  the  user’s  interests  and  interacAons  

•  One  of  the  best  ways  to  provide  a  personalized  customer  experience  

•  Built  by  exploiAng  collecAve  intelligence  to  perform  predicAons  

•  Examples:  Amazon,  YouTube,  NeSlix,  Yahoo,  Tripadvisor,  Last.fm,  IMDb  

Page 3: Reco4J @ London Meetup (June 26th)

Alessandro  Negro   Reco4J  Project  @  London  Meetup    -­‐  June  2013   Page  2  

The  Example:  NeSlix  •  The  world  largest  online  movie  rental  services,  33  million  members  in  40  countries  

•  60%  of  members  selecAng  movies  based  on  recommendaAons  (September  2008)  

•  NeSlix  Prize:  US$  1,000,000  was  given  to  the  BellKor's  PragmaAc  Chaos  team  which  bested  NeSlix's  own  algorithm  for  predicAng  raAngs  by  10.06%  (September  2009)  

•  75%  of  the  content  watched  on  the  service  comes  from  its  recommendaAon  engine  (April  2012)  

Page 4: Reco4J @ London Meetup (June 26th)

Alessandro  Negro   Reco4J  Project  @  London  Meetup    -­‐  June  2013   Page  3  

Why  Recommender  Systems  •  Standard  uses:  

–  Increase  the  number  of  items  sold  –  Sell  more  diverse  items  –  Increase  the  user  saAsfacAon  –  Increase  user  fidelity  –  Beeer  understand  what  the  user  wants      

•  Advanced  uses:  –  Create  ad  hoc  campaigns  (per  geographic  area,  per  type  of  users)  –  OpAmize  products  distribuAon  over  a  wide  area  for  large  retail  chains  

Page 5: Reco4J @ London Meetup (June 26th)

Alessandro  Negro   Reco4J  Project  @  London  Meetup    -­‐  June  2013   Page  4  

Problem  •  There  are  no  available  sofware  products  for  state-­‐of-­‐the-­‐art  recommender  systems  

•  There  is  no  "best  soluAon"  •  There  is  no  "one  soluAon  fits  all”  •  The  NeSlix  winner  composed  104  different  algorithms  •  A  high-­‐end  recommender  engine  can  be  built  only  through  expensive  custom  projects  

•  Large  scale  user/item  datasets  require  a  big  data  approach  

Page 6: Reco4J @ London Meetup (June 26th)

Alessandro  Negro   Reco4J  Project  @  London  Meetup    -­‐  June  2013   Page  5  

SoluAon:  Reco4J    

A  graph-­‐based  recommender  engine  

Page 7: Reco4J @ London Meetup (June 26th)

Alessandro  Negro   Reco4J  Project  @  London  Meetup    -­‐  June  2013   Page  6  

Reco4J  Main  Goals  •  Implement  the  state-­‐of-­‐the-­‐art  in  the  recommendaAon  on  top  of  a  graph  model  

•  Ready  to  use  framework  •  Extend/Improve  exisAng  sofwares:  –  Neo4j  –  ElasAcsearch  –  R  

•  Provide  sofware  /  cloud  services  /  consultancy    •  Contribute  to  the  RecSys  research  field  

Page 8: Reco4J @ London Meetup (June 26th)

Alessandro  Negro   Reco4J  Project  @  London  Meetup    -­‐  June  2013   Page  7  

Reco4J  Features  •  Core  

–  Based  on  collabora.ve  filtering  approach  –  Independent  from  source  knowledge  datasets  –  Persistent  models  (mulA  model  supported)  –  Updatable  models  –  Composable  models/algorithms  

•  Algorithms  –  Commercial  and  research-­‐oriented  algorithms  –  Context-­‐aware  recommendaAons  –  Social  recommendaAons  

•  Opera.ons  –  Cluster  and  cloud-­‐ready  for  Big  Data  Analysis  –  MulAtenant  

Page 9: Reco4J @ London Meetup (June 26th)

Alessandro  Negro   Reco4J  Project  @  London  Meetup    -­‐  June  2013   Page  8  

Reco4J  Under  the  Hood  •  J  is  for  Java  •  Customized  algorithm  implementaAon  based  on  graph  data  model  •  Terracoea®  Big  Memory  integraAon  •  Neo4J  graph  database:  

–  Data  source  repository  –  Persistent  model  repository  

•  Apache  Hadoop  –  Map  /  Reduce  based  model  building  

•  Apache  Mahout  –  Graph  data  model  –  Recommender  –  AlternaAng  Least  Square  Algorithms  (Hadoop  Version)  

Page 10: Reco4J @ London Meetup (June 26th)

Alessandro  Negro   Reco4J  Project  @  London  Meetup    -­‐  June  2013   Page  9  

Algorithms  Roadmap  •  CollaboraAve  filtering  

–  Memory  based  (Neighborhood)  •  User/Item  based  

–  Several  distance  algorithms  (Cosine,  Euclidean,  Tanimoto,  etc.)  •  Graph  based  

–  Path  Based  Similarity  (Shortest  Path,  Number  of  Paths)  –  Random  Walk  Similarity  (Item  Rank,  Average  first-­‐passage/commute  Ame)  

–  Model  based  (Latent  factor)  •  Stochas6c  gradient  descendant  •  Alterna6ng  least  square  •  SVD++  (by  Koren)  

•  Social  recommendaAon  –  Trust  based  approach  –  ProbabilisAc  approach  

Page 11: Reco4J @ London Meetup (June 26th)

Alessandro  Negro   Reco4J  Project  @  London  Meetup    -­‐  June  2013   Page  10  

Algorithms  Roadmap  (2)  •  Cross-­‐curng  features  (all  algos)  – Context  awareness  – Composability  – Real  Ame  – ParallelizaAon  

Page 12: Reco4J @ London Meetup (June 26th)

Alessandro  Negro   Reco4J  Project  @  London  Meetup    -­‐  June  2013   Page  11  

Context-­‐Aware  RecommendaAon  “The  ability  to  reach  out  and  touch  customers  anywhere  means  that  companies  must  deliver  not  just  compe;;ve  products  but  also  unique,  real-­‐;me  customer  experiences  shaped  by  customer  context”  

C.  K.  Prahalad    

•  Incorporate  contextual  informa6on  in  the  recommendaAon  process  •  Modeling  contextual  InformaAon  

–  From:  User  x  Item  -­‐>  RaAng  –  To:  User  x  Item  x  Context  -­‐>  RaAng  

•  Hierarchical  structure  •  Three  approaches  

–  Contextual  pre-­‐filtering  –  Contextual  post-­‐filtering  –  Contextual  modeling  

Page 13: Reco4J @ London Meetup (June 26th)

Alessandro  Negro   Reco4J  Project  @  London  Meetup    -­‐  June  2013   Page  12  

Advantage  of  graph  database  •  NoSQL  database  to  handle  BigData  •  Extensibility  •  No  aggregate-­‐oriented  database  •  Minimal  informaAon  needed  •  Natural  way  for  represenAng  connecAons:  

–  User  -­‐  to  -­‐  item  –  Item  -­‐  to  -­‐  item  –  User  -­‐  to  -­‐  User  

•  Graph  Based/Social  Algorithms  •  Graph  ParAAoning  (sharding)  •  Performance  

Page 14: Reco4J @ London Meetup (June 26th)

Alessandro  Negro   Reco4J  Project  @  London  Meetup    -­‐  June  2013   Page  13  

Example:  Find  Neighbors  

Page 15: Reco4J @ London Meetup (June 26th)

Alessandro  Negro   Reco4J  Project  @  London  Meetup    -­‐  June  2013   Page  14  

Why  Neo4J?  •  Java  based  •  Embeddable/Extensible  •  NaAve  graph  storage  with  naAve  graph  processing  engine  

•  Open  Source,  with  commercial  version  •  Property  Graph  •  ACID  support  •  Scalability/HA  •  Comprehensive  query/traversal  opAons  

Page 16: Reco4J @ London Meetup (June 26th)

Alessandro  Negro   Reco4J  Project  @  London  Meetup    -­‐  June  2013   Page  15  

RecommendaAon  Model  

Page 17: Reco4J @ London Meetup (June 26th)

Alessandro  Negro   Reco4J  Project  @  London  Meetup    -­‐  June  2013   Page  16  

Persistence  Model  

Page 18: Reco4J @ London Meetup (June 26th)

Alessandro  Negro   Reco4J  Project  @  London  Meetup    -­‐  June  2013   Page  17  

Persistence  Model  

Page 19: Reco4J @ London Meetup (June 26th)

Alessandro  Negro   Reco4J  Project  @  London  Meetup    -­‐  June  2013   Page  18  

Persistence  Model  

Page 20: Reco4J @ London Meetup (June 26th)

Alessandro  Negro   Reco4J  Project  @  London  Meetup    -­‐  June  2013   Page  19  

A  code  example  

Page 21: Reco4J @ London Meetup (June 26th)

Alessandro  Negro   Reco4J  Project  @  London  Meetup    -­‐  June  2013   Page  20  

Reco4J  +  Hadoop  •  Queue  Based  Process  •  Operates  both  on  cluster  and  cloud  •  Each  process  downloads  data  from  

Neo4J/Reco4J  before  or  during  computaAon  

•  Stores  data  into  Reco4J  Model    

•  Scaling  augmenAng  the  number  of:  •  Neo4J  Nodes  (only  one  master)  •  Hadoop  Nodes  

Page 22: Reco4J @ London Meetup (June 26th)

Alessandro  Negro   Reco4J  Project  @  London  Meetup    -­‐  June  2013   Page  21  

Reco4J  in  the  Cloud  •  Recommenda.on  as  a  service  (RaaS)  •  Reco4J  cloud  infrastructure  offers:  –  Pay  as  you  need  –  Pay  as  you  grow  –  Support  for  burst  –  Periodical  analysis  at  lower  costs  –  Test/evaluate  several  algorithms  on  a  reduced  dataset  –  Compose  algorithms  dynamically  

Page 23: Reco4J @ London Meetup (June 26th)

Alessandro  Negro   Reco4J  Project  @  London  Meetup    -­‐  June  2013   Page  22  

Consultancy  Goals  

Analysis  

Data  Source  

ExploraAon  

Process  DefiniAon  

Import  Data  

Test/EvaluaAon  

Deploy  

Page 24: Reco4J @ London Meetup (June 26th)

Alessandro  Negro   Reco4J  Project  @  London  Meetup    -­‐  June  2013   Page  23  

Thank  you  

Alessandro  Negro  Linkedin:  hep://it.linkedin.com/in/alessandronegro/  Email:  [email protected]        Reco4J  Site:  hep://www.reco4j.org  Twieer:  @reco4j  GitHub:  heps://github.com/reco4j