S :Mul&’documentSummarizaon ofServiceandProductReviews … ·...

25
STARLET: Mul&document Summariza&on of Service and Product Reviews with Balanced Ra&ng Distribu&ons ICDM 2011 – SENTIRE Workshop – Vancouver, CA December 11, 2011 The University Of Sheffield. at&t Labs - Research Giuseppe “Pino” Di Fabbrizio*^, Ahmet Aker ^, and Rob Gaizauskas ^ * ^

Transcript of S :Mul&’documentSummarizaon ofServiceandProductReviews … ·...

Page 1: S :Mul&’documentSummarizaon ofServiceandProductReviews … · STARLET:"Mul&’documentSummarizaon" of"Service"and"ProductReviews"" with"Balanced"Rang"Distribu&ons" ICDM2011"–SENTIRE"Workshop"–Vancouver,"CA"

STARLET:  Mul&-­‐document  Summariza&on  of  Service  and  Product  Reviews    

with  Balanced  Ra&ng  Distribu&ons  

ICDM  2011  –  SENTIRE  Workshop  –  Vancouver,  CA  December  11,  2011  

The University Of Sheffield.

at&t Labs - Research

Giuseppe  “Pino”  Di  Fabbrizio*^,  Ahmet  Aker  ^,  and  Rob  Gaizauskas  ^  

* ^

Page 2: S :Mul&’documentSummarizaon ofServiceandProductReviews … · STARLET:"Mul&’documentSummarizaon" of"Service"and"ProductReviews"" with"Balanced"Rang"Distribu&ons" ICDM2011"–SENTIRE"Workshop"–Vancouver,"CA"

Outline  

•  Introduc&on  •  Summariza&on  as  search  problem  

  A*  search    Feature  extrac&on    Star  ra&ng  predic&on  model  

  Training  

•  Experiments  

•  Results  and  discussion  

Dec  11,  2011   pino   2  

Page 3: S :Mul&’documentSummarizaon ofServiceandProductReviews … · STARLET:"Mul&’documentSummarizaon" of"Service"and"ProductReviews"" with"Balanced"Rang"Distribu&ons" ICDM2011"–SENTIRE"Workshop"–Vancouver,"CA"

Ques&ons  

•  Summariza&on  -­‐  What  does  it  mean  to  summarize  reviews?  

•  Star  ra&ngs  –  Does  the  number  of  star  provide  enough  informa&on?  

•  Selec&on  process  –  What  is  important  to  preserve?  

•  Learning  from  data  –  Can  we  learn  what  is  relevant  from  data?  

•  Controversiality  –  What  do  we  do  about  contradictory  informa&on?  

Dec  11,  2011   pino   3  

Page 4: S :Mul&’documentSummarizaon ofServiceandProductReviews … · STARLET:"Mul&’documentSummarizaon" of"Service"and"ProductReviews"" with"Balanced"Rang"Distribu&ons" ICDM2011"–SENTIRE"Workshop"–Vancouver,"CA"

A  reasonable  goal  

•  Given  a  set  of  reviews  evalua&ng  a  specific  en&ty  (restaurant,  hotel,  digital  camera,  etc.)  and  related  aspects  describing  the  en&ty  (food,  service,  atmosphere,  etc.)  

 Extract  the  sentences  with  relevant  informa&on  about  the  evaluated  aspects  preserving  the  average  opinions  distribu&ons    

Dec  11,  2011   pino   4  

Page 5: S :Mul&’documentSummarizaon ofServiceandProductReviews … · STARLET:"Mul&’documentSummarizaon" of"Service"and"ProductReviews"" with"Balanced"Rang"Distribu&ons" ICDM2011"–SENTIRE"Workshop"–Vancouver,"CA"

Automa&c  summariza&on  

The  process  of  dis-lling  the  most  important  informa-on  from  a  text  to  produce  an  abridged  version  for  a  par-cular  task  and    users.  

[Mani  and  MayBury,  1999]  

•  Methods  

  Extrac&ve  –  text  units  (phrase  /  sentence)    selec&on    Compression  –  text  simplifica&on    

  Abstrac&ve  –  natural  language  genera&on  

•  Evalua&on  metrics  

  Intrinsic  –  human  generated  (gold)  reference  

  Extrinsic  –  evaluated  according  some  u&lity  func&on  (i.e.,  document  snippet  accuracy  in  web  search)  

•  Input  /  Output    Text,  speech,  graphics  (any  combina&on)  

Dec  11,  2011   pino   5  

Page 6: S :Mul&’documentSummarizaon ofServiceandProductReviews … · STARLET:"Mul&’documentSummarizaon" of"Service"and"ProductReviews"" with"Balanced"Rang"Distribu&ons" ICDM2011"–SENTIRE"Workshop"–Vancouver,"CA"

Mul&-­‐document  summariza&on  

• Tradi&onal  mul&-­‐document  summariza&on  (DUC,  TAC)  

  Focuses  on  facts,  usually  coherent  and  non  contradictory      Edited,  high  quality  wrihen  text    Limited  number  of  documents  (<<100)  

  Typical  approach  •  Sentences  clustering,  selec&on,    and  ordering  in  a  domain-­‐independent  way  

Dec  11,  2011   pino   6  

Page 7: S :Mul&’documentSummarizaon ofServiceandProductReviews … · STARLET:"Mul&’documentSummarizaon" of"Service"and"ProductReviews"" with"Balanced"Rang"Distribu&ons" ICDM2011"–SENTIRE"Workshop"–Vancouver,"CA"

Typical  summariza&on  tasks  

•  News  ar&cles     [McKeown  et  al.,  2002]  

•  Medical  literature  

  [Elhadad  et  al.,  2005]  

•  Biographies    [Copeck  et  al.,  2002]  

•  Technical  ar&cles    [Saggion  and  Guy,  2001]  

•  Blogs    [Mithhun  and  Kosseim,  2009]  

Dec  11,  2011   pino   7  

Page 8: S :Mul&’documentSummarizaon ofServiceandProductReviews … · STARLET:"Mul&’documentSummarizaon" of"Service"and"ProductReviews"" with"Balanced"Rang"Distribu&ons" ICDM2011"–SENTIRE"Workshop"–Vancouver,"CA"

Mul&-­‐document  summariza&on  (opinion)  

• Mul&-­‐document  summariza&on  for  evalua,ve  text  

  Contradictory  opinions    Poorly    wrihen  (typos,  misspellings,  ungramma&cal,  jargon)  

•  20  different  ways  to  misspell  atmosphere:  atmophere,  atmopshere,  atmoshere,  atmoshpere,  atmoshphere,  atmosophere,  atmospehere,  atmospere,  atmosphare,  atmoslhere,  atmospheric,  atmosphire,  atmosphre,  atmostphere,  atmousphere,  atmsphere,  atomosphere,  atompospere,  atomsphere,  atsmosphere  

  Vast  range  of  domains  (restaurants,  hotels,  cars,  books,  toasters,  etc.)  

  Number  of  documents  could  be  large  for  popular  products  (>200)    

  Typical  approach  •  Sentence    selec&on  on  sen&ment-­‐laden  sentences    

•  Template-­‐based  natural  language  genera&on  

Dec  11,  2011   pino   8  

Page 9: S :Mul&’documentSummarizaon ofServiceandProductReviews … · STARLET:"Mul&’documentSummarizaon" of"Service"and"ProductReviews"" with"Balanced"Rang"Distribu&ons" ICDM2011"–SENTIRE"Workshop"–Vancouver,"CA"

MEAD*  [Carenini  et  al.,  2006,  Carenini  et  al.  2011]  

•  Based  on  MEAD  [Radev  et  al.,  2003],  an  open  source,  PERL-­‐based  extrac&ve  summarizer  

•  Three  steps  process    Feature    calcula&on  –  evaluate  how  informa&ve  is  the  sentence.  Use  centroids  and    evalua&ve  features  

  Classifica&on  –  combine  features  in  one  score  

  Reranking  –  sentence  scores  adjustments  based  on  the  number  of  opinions  present  in  a  sentence  (regardless  of  the    polarity)  

•  Drawbacks    Sentence  selec&on  based  on  most  frequently  discussed  aspects  

  Polarity  of  sentences  is  ignored  (posi&ve  and  nega&ve  sentences  have  the  same  contribu&on)  

  Summariza&on  features  based  on  expert  knowledge  

Dec  11,  2011   pino   9  

Page 10: S :Mul&’documentSummarizaon ofServiceandProductReviews … · STARLET:"Mul&’documentSummarizaon" of"Service"and"ProductReviews"" with"Balanced"Rang"Distribu&ons" ICDM2011"–SENTIRE"Workshop"–Vancouver,"CA"

Summariza&on  as  search  problem  

•  Scoring  func&on  as  linear  combina&on  of  summariza&on  features  

Dec  11,  2011   pino   10  

Page 11: S :Mul&’documentSummarizaon ofServiceandProductReviews … · STARLET:"Mul&’documentSummarizaon" of"Service"and"ProductReviews"" with"Balanced"Rang"Distribu&ons" ICDM2011"–SENTIRE"Workshop"–Vancouver,"CA"

Summariza&on  model  

Dec  11,  2011   pino   11  

•  Assuming  that  the  features  are  independent  

•  Find  the  parameters              such  that            score  is  similar  to  the  score  from  a  gold  standard  summary  

•  Exponen&ally  large  search  space  

•  where  S  is  the  total  number  of  sentences  and  L(W)  is  the  number  of  sentences  that  best  matches  the  required  summary  word  length  W  

Page 12: S :Mul&’documentSummarizaon ofServiceandProductReviews … · STARLET:"Mul&’documentSummarizaon" of"Service"and"ProductReviews"" with"Balanced"Rang"Distribu&ons" ICDM2011"–SENTIRE"Workshop"–Vancouver,"CA"

Reviews Sentences

1

5

6

2

3

4

7

Length = 1

2,1

2,5

2,6

2,2

2,3

2,4

2,7

Length = 2 Summaries

Length = L(W)

Summary

1,1

1,5

1,6

1,2

1,3

1,4

1,7

1,1,1

1,1,2

1,1,3

Length = 3

S E

[Aker et al., 2010]

Dec  11,  2011   pino   12  

Goal  Find  the  best  scoring  path  from  S  to  E  

Page 13: S :Mul&’documentSummarizaon ofServiceandProductReviews … · STARLET:"Mul&’documentSummarizaon" of"Service"and"ProductReviews"" with"Balanced"Rang"Distribu&ons" ICDM2011"–SENTIRE"Workshop"–Vancouver,"CA"

A*  search  

•  Sooo  many  stars  …  

•  Informed  search  algorithm  

•  Best-­‐first  strategy  •  Guarantee  to  find    op&mal  solu&on  if  heuris&c  func&on  is  monotonic  or  follows  the  admissible  heuris8c  requirement:  

  Es&mated  cost  from  the  current  node  to  the  goal  node  never  overes&mates  the  actual  cost  

  For  the  node  n:  f(n)  =    s(n)  +  h(n)    Where  

•  s(n)    -­‐  sum  of  the  current  scores  based  on  the  summary  so  far  

•  h(n)  -­‐  heuris&c  func&on  to  es&mate  how  far  from  the  final  summary  length    [Aker  et  al.,  2010]  

•  Heuris&c  keeps  in  considera&on  global  constraints  such  as  ‘summary  length’  

Dec  11,  2011   pino   13  

Page 14: S :Mul&’documentSummarizaon ofServiceandProductReviews … · STARLET:"Mul&’documentSummarizaon" of"Service"and"ProductReviews"" with"Balanced"Rang"Distribu&ons" ICDM2011"–SENTIRE"Workshop"–Vancouver,"CA"

Model  parameter  op&miza&on  

•  Find  the  parameters              such  that            score  is  similar  to  the  score  from  a  gold  standard  summary  

•  ROUGE  metric  to  measure  accuracy  of  the  current  summary          with  a  gold  reference  summary        

•  Minimize  the  loss  func&on  

•  Minimum  error  rate  training  (MERT)  [Och,  2003]  

•  First  order  approxima&on  method  using  Powell  search  (not  convex)  

•  Itera&ve  method,  uses  n-­‐best  candidates  in  A*  search  to  find  parameters  

Dec  11,  2011   pino   14  

Page 15: S :Mul&’documentSummarizaon ofServiceandProductReviews … · STARLET:"Mul&’documentSummarizaon" of"Service"and"ProductReviews"" with"Balanced"Rang"Distribu&ons" ICDM2011"–SENTIRE"Workshop"–Vancouver,"CA"

Feature  extrac&on  

•  Ra&ng  predic&on  model  

•  For  each  aspect                                                                                                                                        es&mate  the  ra&ngs                                                  for  any  document    

•  MaxEnt  classifica&on  algorithm  trained  on  6,823  restaurant  reviews  with  an  average  rank  loss  of  0.63  

•  Predicts  ra&ng  distribu&ons  (awer  proper  confidence  score  normaliza&on)  Dec  11,  2011   pino   15  

Feature Extraction &

Training

Predic&ve  Models  

Atmosphere

Food

Value

Service

Overall

Aspect  

Ra&ngs  

Review  

Aspect  

Ra&ngs  

Review  

Aspect  

Ra&ngs  

Review  

[Gupta,  Di  Fabbrizio,  Haffner,  2010]

Page 16: S :Mul&’documentSummarizaon ofServiceandProductReviews … · STARLET:"Mul&’documentSummarizaon" of"Service"and"ProductReviews"" with"Balanced"Rang"Distribu&ons" ICDM2011"–SENTIRE"Workshop"–Vancouver,"CA"

Predicted  and  target  ra&ngs  

Dec  11,  2011   pino   16  

… …

… …

… …

… …

0.1 0.05 0.1

0.35 0.4

0

0.2

0.4

0.6

1 2 3 4 5

Average food ratings

0.15  0.05   0  

0.25  

0.55  

0  

0.2  

0.4  

0.6  

1   2   3   4   5  

 Predicted  food  ra,ngs  

Page 17: S :Mul&’documentSummarizaon ofServiceandProductReviews … · STARLET:"Mul&’documentSummarizaon" of"Service"and"ProductReviews"" with"Balanced"Rang"Distribu&ons" ICDM2011"–SENTIRE"Workshop"–Vancouver,"CA"

Review  ra&ngs  as  summariza&on  features  

• For  each  review  document  set  

  For  each  aspect  i,  average  the  ra&ngs  by  aspect  to  create  target  reference  distribu&on    

  For  each  sentence  j,  calculate  aspect  ra&ng  predic&ons    For  each  sentence,  calculate  Kullback–Leibler  divergence  with  the    reference  summary    

• KL-­‐divergence  is  used  then  used  during  training  to  find  op&mal  parameters  

Dec  11,  2011   pino   17  

Page 18: S :Mul&’documentSummarizaon ofServiceandProductReviews … · STARLET:"Mul&’documentSummarizaon" of"Service"and"ProductReviews"" with"Balanced"Rang"Distribu&ons" ICDM2011"–SENTIRE"Workshop"–Vancouver,"CA"

Reviews Sentences

1

5

6

2

3

4

7

Length = 1

2,1

2,5

2,6

2,2

2,3

2,4

2,7

Length = 2 Summaries

Length = L(W)

Predicted  aspect  ra&ngs  

Target  aspect  ra&ngs  

Summary

1,1

1,5

1,6

1,2

1,3

1,4

1,7

1,1,1

1,1,2

1,1,3

Length = 3

Dec  11,  2011   pino   18  

Page 19: S :Mul&’documentSummarizaon ofServiceandProductReviews … · STARLET:"Mul&’documentSummarizaon" of"Service"and"ProductReviews"" with"Balanced"Rang"Distribu&ons" ICDM2011"–SENTIRE"Workshop"–Vancouver,"CA"

Data  

  From  3,866  available  restaurants  (we8there.com),  selected  131  with  more  than  five  reviews  

  Selected  60  over  131  restaurants  that  had  reviews  on  tripadvisor.com  highly  voted  by  by  readers  as  useful  

  Created  the  GOLD  reference  by  selec&ng  the    20  reviews  from  tripadvisor.com    with  the  highest  number  of  “helpful  votes”  (same  &me  frame  as  the  we8there.com  reviews)  

  Remaining  40  restaurants  used  as  training  set  

Dec  11,  2011   pino   19  

Page 20: S :Mul&’documentSummarizaon ofServiceandProductReviews … · STARLET:"Mul&’documentSummarizaon" of"Service"and"ProductReviews"" with"Balanced"Rang"Distribu&ons" ICDM2011"–SENTIRE"Workshop"–Vancouver,"CA"

Experimental    setup  

•  Target  length:  100  words  •  Baseline  

  Randomly  selected  sentences  with  no  repe&&on  &ll  it  reaches  the  target  length  

•  MEAD  

  Tradi&onal  mul&-­‐document  summariza&on  

•  Starlet    Using  only  ra&ng  distribu&ons  as  feature  and  web-­‐based  GOLD  reference  

Dec  11,  2011   pino   20  

Page 21: S :Mul&’documentSummarizaon ofServiceandProductReviews … · STARLET:"Mul&’documentSummarizaon" of"Service"and"ProductReviews"" with"Balanced"Rang"Distribu&ons" ICDM2011"–SENTIRE"Workshop"–Vancouver,"CA"

Output    example  

Random  Summary  We  ended  up  wai&ng  45  minutes  for  a  table  15  minutes  for  a  waitress  and  by  that  &me  they  had  sold  out  of  fish  fry  s  .  This  would  be  at  least  4  visits  in  the  last  three  years  and  the  last  visit  was  in  March  2004  .    During  a  recent  business  trip  I  ate  at  the  Fireside  Inn  3  &mes  the  food  was  so  good  I  did  n't  care  to  try  anyplace  else  .  I  always  enjoy  mee&ing  friends  here  when  I  am  in  town  .    The  food  especially  pasta  calabria  is  delicious  .    I  like  ea&ng  at  a  resturant  where  I  can  not  see  the  plate  when  my  entry  is  served  .    

MEAD  Summary  During  a  recent  business  trip  I  ate  at  the  Fireside  Inn  3  &mes  the  food  was  so  good  I  did  n't  care  to  try  anyplace  else  .    I  have  had  the  pleasure  to  visit  the  Fireside  on  every  trip  I  make  to  the  Buffalo  area  .    The  Fireside  not  only  has  great  food  it  is  one  of  the  most  comfortable  places  we  have  seen  in  a  long  &me  The  service  was  as  good  as  the  meal  from  the  &me  we  walked  in  to  the  &me  we  lew  we  could  have  not  had  a  beher  experience  We  most  certainly  will  be  back  many  &mes  .    

Starlet    Summary  Delicious  .  Can't  wait  for  my  next  trip  to  Buffalo  .  GREAT  WINGS  .  I  have  reorranged  business  trips  so  that  I  could  stop  in  and  have  a  helping  or  two  of  their  wings  .  We  were  seated  promptly  and  the  staff  was  courteous  .  The  service  was  not  rushed  and  was  very  ,mely  .  The  food  especially  pasta  calabria  is  delicious  .  2  thumbs  UP  .  A  great  night  for  all  .  the  food  is  very  good  and  well  presented  .  The  price  is  more  than  compe&vite  .  It  took  30  minutes  to  get  our  orders  .  

Dec  11,  2011   pino   21  

0.00  

0.20  

0.40  

0.60  

0.80  

1   2   3   4   5  

service  

Page 22: S :Mul&’documentSummarizaon ofServiceandProductReviews … · STARLET:"Mul&’documentSummarizaon" of"Service"and"ProductReviews"" with"Balanced"Rang"Distribu&ons" ICDM2011"–SENTIRE"Workshop"–Vancouver,"CA"

ROUGE  evalua&on  

Dec  11,  2011   pino   22  

Page 23: S :Mul&’documentSummarizaon ofServiceandProductReviews … · STARLET:"Mul&’documentSummarizaon" of"Service"and"ProductReviews"" with"Balanced"Rang"Distribu&ons" ICDM2011"–SENTIRE"Workshop"–Vancouver,"CA"

Manual  evalua&on  

•  Three  judges  (two  na&ve  speakers)  •  Ra&ng  scale:  5  (very  good)  to  1  (very  poor)  •  Evalua&ons  

  Gramma&cality  -­‐  gramma&cally  correct  and  without  ar&facts  

  Redundancy  -­‐  absence  of  unnecessary  repe&&ons;    Clarity  -­‐  easy  to  read    Coverage  -­‐  level  of  coverage  for  the  aspects  and  the  polarity  expressed  in  the  summary    

  Coherence  -­‐  well  structured  and  organized  

Dec  11,  2011   pino   23  

Page 24: S :Mul&’documentSummarizaon ofServiceandProductReviews … · STARLET:"Mul&’documentSummarizaon" of"Service"and"ProductReviews"" with"Balanced"Rang"Distribu&ons" ICDM2011"–SENTIRE"Workshop"–Vancouver,"CA"

Discussion  

•  Gramma,cally  -­‐  consistent  across  the  three  methods  and  depend  only  on  the  quality  of  the  source  sentence  

•  Poorly  wrihen  sentences  can  be  penalized  by  introducing  new  features  during  training  that  take  into  considera&on  the  number  of  misspellings  

•  Redundancy    -­‐  slightly  beher  for  Starlet.  Sentence  similarity  features  can  be  added  during  training  by  using  centroid-­‐based  clustering  and  demote  similar  sentences  to  these  already  included  in  the  summary.  

•   Clarity  and  coherence  -­‐  slightly  beher  in  Starlet,  but  more  inves&ga&on  is  necessary    

•  Coverage  -­‐  decidedly  beher  than  for  the  other  approaches,  showing  that  Starlet  correctly  selects  informa&on  relevant  to  the  users  

Dec  11,  2011   pino   24  

Page 25: S :Mul&’documentSummarizaon ofServiceandProductReviews … · STARLET:"Mul&’documentSummarizaon" of"Service"and"ProductReviews"" with"Balanced"Rang"Distribu&ons" ICDM2011"–SENTIRE"Workshop"–Vancouver,"CA"

Conclusions  

•  Summariza&on  -­‐  What  does  it  mean  to  summarize  reviews?  

•  Star  ra&ngs  –  Does  the  number  of  star  provide  enough  informa&on?  

•  Selec&on  process  –  What  is  important  to  preserve?  

•  Learning  from  data  –  Can  we  learn  what  is  relevant  from  data?  

•  Controversiality  –  What  do  we  do  about  contradictory  informa&on?  

Dec  11,  2011   pino   25