Schauer Spellman Paper - law.duke.edu · ! 2!...

32
1 Note to Duke colleagues: This manuscript is in many ways more preliminary than the typical workshop paper. Not only does it need refinement of analysis, modification of argumentative structure, addition of references, and polishing of prose, but it could also benefit greatly from many more examples (and counterexamples). We think there is a genuine idea here, but much more needs to be done, and we are grateful for your assistance. Fred Schauer & Bobbie Spellman 10/26/2015 CALIBRATING LEGAL JUDGMENTS Frederick Schauer 1 & Barbara A. Spellman 2 Legal decisionmakers must frequently assess the judgments of other legal decisionmakers. The Supreme Court is required to evaluate the judgments of federal courts of appeals and state supreme courts, and those courts must evaluate the legal and sometimes the factual rulings of trial courts. Courts are also routinely called upon to determine the legal sufficiency of decisions by Presidents, governors, members of Congress, state legislators, heads of executive departments, police officers, school principals, and countless administrative officials. So too must 1 David and Mary Harrison Distinguished Professor of Law, University of Virginia. 2 Professor of Law, University of Virginia. Earlier versions of this Article have been presented at the University of Arizona College of Law, the University of Pennsylvania School of Law, and at the 2015 Convention of the International Association of Legal and Social Philosophy. We are grateful for audience comments and questions on those occasions, as well as Will Baude, Paul Mahoney, Stephen Morse, and Andrew Vollmer for valuable information and references.

Transcript of Schauer Spellman Paper - law.duke.edu · ! 2!...

Page 1: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  1  

Note  to  Duke  colleagues:    This  manuscript  is  in  many  ways  more  preliminary  than  the  typical  workshop  paper.    Not  only  does  it  need  refinement  of  analysis,  modification  of  argumentative  structure,  addition  of  references,  and  polishing  of  prose,  but  it  could  also  benefit  greatly  from  many  more  examples  (and  counterexamples).    We  think  there  is  a  genuine  idea  here,  but  much  more  needs  to  be  done,  and  we  are  grateful  for  your  assistance.    Fred  Schauer  &  Bobbie  Spellman    

 

10/26/2015  

 

CALIBRATING  LEGAL  JUDGMENTS  

 

Frederick  Schauer1  &  Barbara  A.  Spellman2  

 

  Legal  decision-­‐makers  must  frequently  assess  the  judgments  of  other  legal  

decision-­‐makers.      The  Supreme  Court  is  required  to  evaluate  the  judgments  of  

federal  courts  of  appeals  and  state  supreme  courts,  and  those  courts  must  evaluate  

the  legal  and  sometimes  the  factual  rulings  of  trial  courts.    Courts  are  also  routinely  

called  upon  to  determine  the  legal  sufficiency  of  decisions  by  Presidents,  governors,  

members  of  Congress,  state  legislators,  heads  of  executive  departments,  police  

officers,  school  principals,  and  countless  administrative  officials.      So  too  must  

                                                                                                               1  David  and  Mary  Harrison  Distinguished  Professor  of  Law,  University  of  Virginia.    2  Professor  of  Law,  University  of  Virginia.    Earlier  versions  of  this  Article  have  been  presented  at  the  University  of  Arizona  College  of  Law,  the  University  of  Pennsylvania  School  of  Law,  and  at  the  2015  Convention  of  the  International  Association  of  Legal  and  Social  Philosophy.    We  are  grateful  for  audience  comments  and  questions  on  those  occasions,  as  well  as  Will  Baude,  Paul  Mahoney,  Stephen  Morse,  and  Andrew  Vollmer  for  valuable  information  and  references.            

Page 2: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  2  

judges  and  magistrates  in  issuing  search  warrants  assess  the  reliability  of  the  

representations  made  to  them  by  law  enforcement  officers  seeking  the  warrant.    

And  of  course  jurors  as  well  as  judges  must  evaluate  the  veracity  and  credibility  of  

the  witnesses  who  testify  in  court.      

  When  such  situations  arise  in  everyday  life,  the  evaluator  of  someone  else’s  

decision  or  judgment  often  seeks  to  inform  her  evaluation  by  considering,  explicitly  

or  implicitly,  the  other  decisions  or  judgments  made  by  the  decision-­‐maker  whose  

decision  is  now  under  review.    When  a  friend  recommends  a  restaurant  or  movie  to  

us,  we  would  like  to  know  something  about  that  friend’s  reactions  to  other  

restaurants  or  movies,  preferably  in  the  context  of  restaurants  or  movies  we  have  

both  seen,  so  that  we  can  assess  whether  the  judgment  on  this  occasion  is  worth  

following.    If  another  friend  says  that  we  will  like  someone  we  have  yet  to  meet,  we  

want  to  have  some  idea  of  who  the  friend  likes  and  does  not  like,  in  order  that  we  

can  decide  what  to  make  of  the  endorsement  on  this  occasion.3    And  when  an  

applicant  to  admission  to  graduate  school  reports  having  an  undergraduate  GPA  of  

3.6,  the  graduate  school  wants  to  know  where  the  3.6  stands  in  relation  to  other  

students  (hence  the  value  of  class  rankings),  just  as  an  employer  of  a  student  wants  

to  know  much  the  same  thing,  and  just  as  the  recipient  of  a  recommendation  wants  

to  know  whether  the  recommender  tends  to  like  everyone,  or  no  one,  or  something  

in  between.      

                                                                                                               3  Yes,  we  are  thinking  of  blind  dates,  but  the  situation  arises  on  other  contexts  as  well.    

Page 3: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  3  

  In  ordinary  talk,  this  process  of  attempting  to  gauge  another’s  decisions  is  

often  referred  to  as  calibration,  although,  as  we  will  explain,  this  ordinary  sense  of  

calibration  both  diverges  from  an  important  technical  sense  and  may  also  be  the  

covering  term  that  encompasses  a  number  of  diverse  processes,  which  it  will  be  

important  to  distinguish.    But  for  the  moment  we  can  describe  “calibration”  as  the  

process  by  which  a  user  of  some  measuring  device  (or  an  assessor  of  some  

measurer)  attempts  to  establish  the  relationship  between  the  indicated  measure  

and  some  relevant  standard.4    Thus,  we  calibrate  a  (weight)  scale  by  attempting  to  

establish  the  relationship  between  what  the  scale  reads  and  what  something  

actually  weighs.    If  the  scale  systematically  reads  five  pounds  low,  and  if  (perhaps  

counterfactually)  we  wish  to  know  what  we  actually  weigh,  we  calibrate  the  scale  

(or  calibrate  ourselves  to  the  scale)  by  adding  five  pounds  to  the  scale’s  reading  or  

by  adjusting  the  scale  to  take  account  of  the  error.    So  too  when  we  take  a  

measurement  knowing  that  some  ruler  gives  a  measure  slightly  longer  than  actual  

length,5  or  when  a  rifleman  aims  slightly  lower  than  where  the  gunsight  tells  him  to  

aim,  knowing  from  previous  experience  that  the  gunsight  leads  him  consistently  to  

miss  high.    

                                                                                                               4  The  Oxford  English  Dictionary,  for  example,  tells  us  that  to  calibrate  is  to  “make  allowance  for  [the]  irregularities”  of  a  measuring  instrument.    1  Compact  Edition  of  the  Oxford  English  Dictionary  318  (1971).    5  We  understand,  of  course,  that  neither  one  foot  nor  one  inch  nor  one  pound  have  some  sort  of  metaphysical  reality.    Feet,  inches,  and  pounds,  like  meters  and  kilograms,  are  human-­‐created  standards.  But  although  the  standard  meter,  for  example,  is  a  creation  of  human  beings,  it  is  nevertheless  the  case  that  a  ruler  is  accurate  insofar  as  what  it  reads  approaches  the  standard  measure.    

Page 4: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  4  

At  times  the  desire  for  this  kind  of  calibration  is  systematized.    Colleges  and  

universities  often  provide  information  so  that  employers  and  graduate  schools  can  

calibrate  whether  the  3.6  is  extraordinarily  good  or  barely  above  average.6    And,  

increasingly,  the  user  of  on-­‐line  restaurant  and  hotel  reviews  at  sites  such  as  

TripAdvisor  can  examine  the  review  history  of  particular  reviewers  in  order  to  

determine  whether  a  reviewer’s  rave  review  should  be  discounted  because  this  

particular  reviewer  says  nice  things  about  everyplace,  or  whether  some  reviewer’s  

brutally  negative  assessment  should  similarly  be  discounted  (or,  to  put  it  differently,  

inflated)  because  that  reviewer  has  something  bad  to  say  about  every  establishment  

he  patronizes.7      

  Yet  although  such  calibration  is  ubiquitous  in  ordinary  life,  and  is  at  first  

glance  seemingly  desirable,  the  legal  system  appears  implicitly  to  resist  it.    A  court  

evaluating  an  administrative  law  judge’s  denial  of  Veterans  or  Medicare  or  Social  

Security  benefits  would  presumably  wish  to  know  whether  that  administrative  law  

judge  is  a  frequent  denier  or  instead  whether  this  denial  should  be  entitled  to  great  

deference  because  that  judge  denies  benefits  so  rarely.    Much  the  same  desire  for  a                                                                                                                  6  For  example,  prior  to  the  emphasis  on  absolute  grade  point  averages  prompted  by  the  formula  used  by  the  US  News  and  World  Report  law  school  ranking  system,  some  law  schools  would  take  into  account  in  their  admissions  algorithm  the  previous  grade  point  averages  and  law  school  performances  of  applicants  from  particular  undergraduate  colleges  and  universities.    This  knowledge  would  enable  the  law  school,  implicitly  in  its  algorithm,  to  calibrate  (or  adjust)  the  undergraduate  grade  point  average  of  a  particular  applicant  according  to  the  prior  performances  of  prior  applicants  from  that  same  school.        7  We  have  in  mind  TripAdvisor  and  Yelp,  although  there  are  other  restaurant  and  hotel  rating  sites  that  do  much  the  same  thing.    Indeed,  the  TripAdvisor  model,  in  which  a  reviewer’s  review  history  is  easily  accessible  to  the  user,  is  one  to  which  we  will  refer  repeatedly  in  this  Article.      

Page 5: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  5  

decisional  history  would  seem  to  apply  as  well  to  the  decision  by  that  administrative  

law  judge  in  evaluating  the  grant  or  denial  by  the  administrative  official  who  made  

the  initial  decision.      Similarly,  the  judges  of  an  appellate  court  reviewing  a  trial  

judge’s  legal  or  factual  ruling  against  a  labor  union  might  want  to  know  whether  

they  are  assessing  the  decision  of  a  judge  who  typically  rules  for  (or  against)  unions,  

so  as  better  to  calibrate  their  assessment  of  the  trial  judge’s  ruling  on  this  occasion.    

In  the  same  way,  the  Supreme  Court’s  review  of  a  lower  court’s  decision  to  uphold  a  

death  sentence  might  usefully  be  informed  by  knowing  whether  the  court  under  

review  is  generally  sympathetic  or  hostile  to  capital  sentences.    And  a  jury  

attempting  to  determine  whether  a  witness  who  describes  a  person  as  drunk  is  

accurate  or  exaggerating  would  ideally  benefit  from  information  about  the  full  range  

of  instances  in  which  that  witness  has  described  someone  as  drunk,  or  not.  

  However  sensible  it  might  seem  for  judges,  jurors,  and  appellate  courts  to  

wish  to  calibrate  in  just  this  way,  and  however  much  the  TripAdvisor  model8  seems  

to  embody  a  larger  rationality,  the  legal  system  typically,  albeit  implicitly,  avoids  

this  form  of  calibration.    Whatever  the  judges  of  a  United  States  Court  of  Appeals  

might  actually  be  thinking  in  evaluating  a  decision  by  a  sentencing  District  Judge  to  

depart  upwards  from  the  sentencing  guidelines,  or  whatever  those  appellate  judges  

might  actually  know  from  informal  research,  personal  contact,  or  hallway  gossip,  it  

would  be  inappropriate  for  the  appellate  court  to  say  explicitly  in  an  opinion  that  it  

was  rejecting  the  upward  departure  because  this  judge  is  known  to  be  an  especially  

                                                                                                               8  See  note  7  and  accompanying  text,  supra.    

Page 6: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  6  

tough  sentencer,  or  has  a  history  of  upward  departures.9    Indeed,  it  would  even  be  

considered  inappropriate  for  an  appellate  court  openly  to  seek  such  information,  

however  much  the  appellate  judges  might  be  aware  of  the  information  from  

previous  cases  or  word  of  mouth.    And  cross-­‐examination  of  a  witness  about  other  

similar  assessments  she  has  made  –  of  drunkenness,  say  –  would  likely  be  excluded  

by  rules  limiting  cross-­‐examination  to  matters  addressed  on  direct  examination.10  

  Our  goal  in  this  Article  is  to  examine  the  ways  in  which  just  this  kind  of  

calibration  might  be  useful  in  various  legal  settings,  to  trace  why  the  legal  system  

seems  implicitly  to  be  hostile  to  what  so  many  other  aspects  of  life  and  decision-­‐

making  have  embraced,  and  to  suggest  ways  in  which  calibration  might  valuably  

become  more  accepted  than  it  is  now  in  various  legal  contexts.    Our  aim  is  less,                                                                                                                  9  On  departures  from  the  Sentencing  Guidelines  generally,  see  Michael  S.  Gelacak,  Ilene  H.  Nagel,  &  Barry  L.  Johnson,  Departures  Under  the  Federal  Sentencing  Guidelines:  An  Empirical  and  Jurisprudential  Analysis,  81  Minn.  L.  Rev.  299  (1996).    Upward  departures  are  now  constrained  by,  inter  alia,  United  States  v.  Booker,  543  U.S.  220  (2004),  but  for  present  purposes  we  need  not  get  into  the  details  about  departures  or  review  of  them,  other  than  to  note  that  the  standard  of  review  is  a  deferential  one.    See  Gall  v.  United  States,  552  U.S.  38,  45-­‐52  (2007)  (mandating  an  abuse-­‐of-­‐discretion  standard  in  reviewing  departures  from  the  sentencing  guidelines).    10  F.  R.  Evid.  Rule  611  (b).    This  is  a  topic  about  which  one  of  us  has  previously  done  experimental  research.    See  Barbara  A.  Spellman  &  Elizabeth  R.  Tenney,  Credible  Testimony  In  and  Out  of  Court,  17  Psychonomic  Bull.  &  Rev.  168  (2010);  Barba  aA.  Spellman,  Elizabeth  R.  Tenney,  &  Margaret  J.  Scalia,  Relying  on  Other  People’s  Metamemory,  in  Successful  Remembering  and  Successful  Forgetting:  A  Festschrift  in  Honor  of  Robert  J.  Bork  387  (Aaron  S.  Benjamin  ed.,  2011);  Elizabeth  R.  Tenney,  Barbara  A.  Spellman,  &  Robert  J.  MacCoun,  The  Benefits  of  Knowing  What  you  Know  (and  What  You  Don’t):  How  Calibration  Affects  Credibility,  44  J.  Experiment.  Soc.  Psych.  1368  (2008);  Elizabeth  R.  Tenney,  Barbara  A.  Spellman,  Robert  J.  MacCoun,  &  Reid  Hastie,  Calibration  Trumps  Confidence  as  a  Basis  for  Witness  Credibility,  18  Psych.  Sci.  46  (2007);  Elizabeth  R.  Tenney,  Barbara  A.  Spellman,  &  Robert  MacCoun,  Expanding  the  Scope  of  Cross  Examination  So  that  Jurors  Can  Infer  Witness  Calibration,  available  at  papers.ssrn.com/sol3/papers.cfm?abstract_id=998593.htm.    

Page 7: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  7  

however,  to  make  recommendations  for  reform  than  it  is  to  open  up  for  analysis  and  

discussion  a  topic  that  has  seemed  too  long  to  be  unfortunately  ignored  in  the  

design  of  legal  institutions.                            

I. Three  Concepts  of  Calibration  

As  mentioned  above,  we  use  “calibration”  as  a  covering  term  to  describe  three  

quite  different  processes,  or  three  different  contexts  in  which  an  assessor  might  be  

faced  with  assessing  someone  else’s  assessment.    For  purposes  of  clarity,  and  also  to  

sharpen  the  focus  on  the  one  of  the  three  that  is  our  principal  concern  here,  we  will  

describe  each  of  the  three  processes  that  might  all  be  considered  as  calibration  in  

one  sense  or  another.      

  Before  turning  to  the  three  processes,  we  again  start,  broadly  speaking,  with  

the  idea  that  calibration  is  the  process  of  setting  or  assessing  the  relationship  

between  some  measuring  device  (or  measurer)  in  order  to  conform  the  

measurement  to  some  relevant  standard.    As  in  the  examples  of  scales,  rulers,  and  

gunsights,  we  treat  the  measuring  device  as  accurate  insofar  as  what  the  measuring  

device  reports  accords  with  the  relevant  standard,  and  we  treat  the  measuring  

device  as  well  calibrated  insofar  as  its  “judgments”  over  a  range  of  measurements  

demonstrate  a  consistent  and  therefore  predictable  difference  between  what  the  

device  indicates  and  what  the  underlying  “truth”  actually  is.    So  a  scale  that  reads  

150  pounds  when  the  actual  weight  is  145  is  inaccurate  by  five  pounds,  but  could  be  

considered  well  calibrated  to  the  extent  that  the  scale  is  always  five  pounds  high.    

Even  though  the  scale  is  inaccurate,  its  inaccuracy  would  be  reliable,  and  the  

calibration  is  effective  insofar  as  it  uses  reliability  to  compensate  for  inaccuracy.    

Page 8: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  8  

And  if  the  scale  were  inaccurate  but  not  in  any  reliable  way,  attempts  at  calibration  

would  be  less  effective,  making  the  scale,  or  our  use  of  it,  less  well  calibrated.        Thus,  

we    calibrate  our  use  of  a  reliably  inaccurate  scale  by  subtracting  five  pounds,  and  

insofar  as  this  calibration  produces  consistent  results  our  assessments  of  weight  

become  well  calibrated.    

When  we  move  from  the  judgments  of  mechanical  devices  like  scales,  rulers,  and  

gunsights  to  the  judgments  of  human  beings,  however,  things  become  more  

complex.    Indeed,  there  are  multiple  ways  of  assessing  the  calibration  of  human  

judgment,  and  it  is  important  at  the  outset  to  get  clear  just  what  it  is  that  we  are  

talking  about.    Specifically,  disaggregating  the  multiple  ideas  encompassed  by  the  

covering  term  “calibration”  is  especially  important  because  much  of  the  

psychological  literature  on  calibration  turns  out  to  address  an  issue  related  to  but  

importantly  different  from  the  type  of  calibration  that  is  our  primary  focus  here.    

And  that  is  our  principal  justification  for  commencing  the  analysis  by  distinguishing  

three  concepts  of  calibration.  

A. Confidence  and  Calibration    

1. Calibration  in  Psychology  Research.  

The  first  concept  of  calibration,  and  the  one  that  dominates  the  psychological  

research,  focuses  on  the  relationship  between  the  degree  of  confidence  a  decision-­‐

maker  has  expressed  in  some  judgment  and  the  actual  accuracy  of  that  judgment.11  

                                                                                                               11  See,  e.g.,  Linda  Bol  &  Douglas  J.  Hacker,  Calibration  Research:  Where  Do  We  Go  from  Here,?”  3  Frontiers  Psych.  229  (2012)  (“Calibration  is  the  degree  of  fit  between  a  person’s  judgment  of  performance  and  his  or  her  actual  performance.”).    See  also  Douglas  J.  Hacker,  Linda  Bol,  &  Matt  C.  Keener,  Metacognition  in  Education:  A  Focus  on  Calibration,  in  Handbook  of  Metamemory  and  Memory  429  (John  Dunlosky  &  

Page 9: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  9  

Let  us  label  this  confidence-­‐accuracy  calibration.    Accuracy  is  of  course  itself  a  

relation  between  a  judgment  and  reality  (or  “ground  truth”),  and  thus  when  we  talk  

about  confidence-­‐accuracy  calibration  we  are  talking  about  two  types  of  accuracy,  

each  the  subject  of  considerable  research  by  psychologists.    One  is  absolute  

accuracy,  which  typically  is  described  as  “calibration”  in  a  narrow  sense,  and  the  

other  is  relative  accuracy,  also  known  as  “resolution”  or  “discrimination.”      

Decision  makers  may  be  more  or  less  confident  in  their  judgments,  and  those  

judgments  may  be  more  or  less  accurate.  Insofar  as  the  decision  maker’s  degree  of  

confidence  aligns  with  the  likelihood  that  the  judgment  is  correct,  the  decision  

maker  is  considered  to  be  well-­‐calibrated  (in  the  absolute  accuracy  sense).    And  the  

decision  maker  is,  accordingly,  understood  as  less  well-­‐calibrated  to  the  extent  that  

the  degree  of  confidence  over-­‐  or  under-­‐predicts  the  likely  accuracy  of  the  

judgment.      

Suppose,  for  example,  that  someone  –  the  decision  maker,  or  exerciser  of  

judgment  –  judges  the  speed  of  a  passing  car  to  be  55  miles  per  hour,  plus  or  minus  

five  miles  per  hour.    And  suppose  that  the  decision  maker  has  a  degree  of  confidence  

such  that  she  is  80%  sure  that  her  judgment  is  correct.    And  if  it  turns  out,  over  

some  number  of  trials,  that  she  is  in  fact  correct  80%  of  the  time  in  assessing  speeds  

within  this  range,  then  we  could  conclude  that  she  is  well-­‐calibrated  –  her  

confidence  level  is  a  reliable  predictor  of  the  likelihood  of  her  accuracy.    But  if  she                                                                                                                                                                                                                                                                                                                                            Robert  A.  Bjork,  eds.,  2008);  Kevin  Krug,  The  Relationship  between  Confidence  and  Accuracy:  Current  Thoughts  of  the  Literature  and  a  New  Area  of  Research,  3  Appled  Psych.  in  Crim.  Just.  7  (2007);  Karlos  Luna  &  Beatriz  Martin-­‐Luengo,  Confidence-­‐Accuracy  Calibration  with  General  Knowledge  and  Eyewitness  Memory  Cued  Recall  Questions,  26  Applied  Cognitive  Psych.  289  (2012).        

Page 10: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  10  

were  accurate  only  40%  of  the  time  that  she  expressed  80%  confidence,  and  30%  of  

the  time  she  expressed  50%  confidence,  and  50%  of  the  time  she  expressed  30%  

confidence,  then  she  would  be  poorly  calibrated  because  although  it  looks  like  she  is  

overconfident  on  average  (i.e.,  her  confidence  overstates  her  accuracy),  there  is  still  

no  way  to  use  her  estimates  to  predict  much  of  use.      

On  the  other  hand,  consider  someone  who  is  generally  overconfident  but  is  so  in  

a  systematic  way;  that  is,  who  does  not  show  absolute  accuracy  but  does  show  good  

relative  accuracy.    Take  the  example  from  Scott  Plous,  “suppose  a  decision  maker  

were  50  percent  accurate  when  70  percent  confident,  60  percent  accurate  when  80  

percent  confident,  and  70  percent  accurate  when  90  percent  confident.    In  such  a  

case  confidence  would  be  perfectly  correlated  with  accuracy,  even  though  the  

decision  maker  would  be  uniformly  overconfident  by  20  percent.”12    Even  though  

the  decision  maker  was  inaccurate  in  her  assessments,  and  even  though  the  decision  

maker  was  inaccurate  in  her  degree  of  confidence  in  her  assessments,  the  

uniformity  of  the  overconfidence,  her  reliability,  would  make  it  easy  to  calibrate  our  

use  of  that  decision  maker’s  conclusions.    

2.   How  Legal  Judgments  are  Different  

Although  these  analyses  of  confidence-­‐accuracy  calibration  dominate  the  

psychological  literature,  they  have  obvious  shortcomings  when  applied  to  legal  

contexts.    And  that  is  because,  first,  legal  decision  makers  at  all  levels  and  in  all  

contexts  rarely  articulate  the  degree  of  confidence  they  have  in  their  conclusions;  

                                                                                                               12  Scott  Plous,  The  Psychology  of  Judgment  and  Decision  Making  225  (1993).    

Page 11: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  11  

and,  second,  unlike  in  psychology  research,  legal  decision  makers  typically  do  not  

have  access  to  the  “ground  truth”.  

 Although  witnesses  at  trial,  and  especially  in  response  to  cross-­‐examination,  

may  express  varying  degrees  of  certainty  about  the  facts  and  observations  that  they  

are  reporting,  such  expressions  of  less  than  complete  confidence,  or  indeed  even  

actual  expressions  of  complete  or  even  partial  confidence,  are  largely  absent  (or  at  

least  invisible)  in  the  context  of  judicial  judgments.13    Justice  Brandeis  thus  captured  

the  phenomenon  well  when  he  observed  that  he  ordinarily  convinced  himself  to  a  

lower  degree  of  certainty  (fifty-­‐one  percent,  he  said)  than  that  with  which  he  

expressed  his  judgment  in  writing  an  opinion.14    And  under  one  understanding  (or,  

perhaps  more  accurately,  our  understanding)  of  Ronald  Dworkin’s  well-­‐known  “one  

right  answer  thesis,”15  Dworkin  agrees  with  Brandeis.    It  is  not  as  if  judges  actually  

believe  that  there  is  no  plausible  alternative  answer,  Dworkin  is  best  understood  as  

claiming,  but  that  it  is  a  feature  of  the  phenomenology  of  judging  that  judges  believe  

that  that  their  answer  is  correct,  and  believe  that  any  other  answer  is  incorrect,  

independent  of  the  actual  strengths  of  those  beliefs.16  

                                                                                                               13  And  so  too,  typically,  with  administrative  decisions  and  judgments.    14  Brandeis  made  the  statement  in  the  context  of  comparing  himself  to  Justice  Cardozo,  who,  Brandeis  believed,  found  it  necessary  to  convince  himself  one  hundred  percent  before  reaching  a  judgment  or  writing  an  opinion.    See  Joseph  L.  Rauh,  et  al.,  A  Personal  View  of  Justice  Benjamin  N.  Cardozo:  Recollections  of  Four  Cardozo  Law  Clerks,  1  Cardozo  L.  Rev.  5,  12,  18  (1979).      15  Ronald  Dworkin,  Justice  in  Robes  41-­‐43  &  266  nn.  3-­‐5  (2006);  Ronald  Dworkin,  Is  There  Really  No  Right  Answer  in  Hard  Cases,?”  in  A  Matter  of  Principle  119  (1985).        16  See  Dworkin,  Justice  in  Robes,  supra  note  15,  at  266  nn.  3,  5.      

Page 12: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  12  

Because  judges  (as  well  as  police  officers  seeking  search  warrants  and  

administrative  officials  making  administrative  decisions)  are  thus  typically  loath  to  

describe  in  their  opinions  the  degree  of  confidence  they  have  in  their  judgments,17  

and  because  judges  seem  especially  reluctant  to  admit  to  relative  low  levels  of  

confidence,  the  principal  psychological  concept  of  calibration  described  above  may  

be  of  limited  value  in  most  legal  contexts.    It  would  be  nice  to  know,  in  theory,  

whether  a  given  legal  judgment  was  correct  or  incorrect  and  how  much  confidence  a  

judge  had  that  her  conclusion  was  correct,  thus  enabling  an  observer  to  determine  

the  degree  to  which  the  judge’s  confidence  was  calibrated  with  the  likelihood  that  

the  judge  reached  the  correct  conclusion.    But  with  ground  truths  rarely  accessible  

(or  existent)  for  such  decisions,  and  with  degrees  of  confidence  even  more  rarely  

expressed,  it  turns  out  that  this  precise  sense  of  calibration  is  of  limited  value  in  

thinking  about  the  nature  of  legal  judgment.  

B.     Leaving  Confidence  Behind  –  Calibration  for  Accuracy      

Because  explicit  expressions  of  degrees  of  confidence  are  so  rare  in  legal  and  

judicial  contexts,  a  more  relevant  conception  of  calibration  in  the  context  of  the                                                                                                                  17  We  can  think  of  two  possible  but  indirect  exceptions  to  the  statement  in  the  text.    One  might  arise  in  the  context  of  civil  actions  against  public  officials  under  42  U.S.C.  §1983  (or  Bivens  v.  Six  Unknown  Named  Agents,  403  U.S.  388  (1971)),  where  only  violations  of  “clearly  established  law,”  see  Wilson  v.  Layne,  526  U.S.  603  (1999);  Anderson  v.  Creighton,  483  U.S.  635  (1987);  Harlow  v.  Fitzgerald,  457  U.S.  800  (1982),  can  produce  liability  in  the  face  of  a  qualified  immunity  claim.    And  the  other  would  arise  in  the  contexts  (legal  malpractice  actions  being  the  most  obvious)  in  which  questions  of  what  the  law  is  are  treated  as  questions  of  fact.    Thus  a  judge  in  a  civil  rights  action  might  rule  (on  a  motion  for  summary  judgment,  for,  example)  that  the  state  of  the  law  was  sufficiently  uncertain  that  there  could  be  no  violation  of  clearly  established  law,  and  we  can  imagine  an  expert  witness  in  a  legal  malpractice  action  testifying  that  the  state  of  the  law  was,  for  example,  probably  such-­‐and-­‐such,  but  not  certainly  such-­‐and-­‐such.            

Page 13: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  13  

analysis  here  is  one  that  is  concerned  not  with  the  alignment  between  confidence  

and  accuracy,  but  rather  with  the  seemingly  simpler  question  of  the  alignment  

between  an  expressed  judgment  and  the  ground  truth  –  the  actual  fact  of  the  matter.  

As  noted  previously,  the  relation  between  judgment  and  truth  is  what  is  ordinarily  

understood  as  accuracy,18  and  so    we  can  think  of  the  effort  to  align  our  judgments  

with  the  ground  truth  as  accuracy  calibration.  

Accuracy  calibration  is  closer  to  the  examples  of  the  scale  that  reads  five  pounds  

high  or  the  ruler  whose  indication  is  an  eighth  of  an  inch  short.      So  if  we  substitute  a  

human  observer  –  a  witness  –  for  a  mechanical  scale,  we  could  imagine  a  human  

being  who,  like  the  “Guess  Your  Weight”  booths  at  carnivals,  estimated  the  weight  of  

the  people  she  observed.    And  if  the  estimate  were  consistently  five  pounds  over  the  

actual  weight,19  we  could  calibrate  her  judgments  by  subtracting  five  pounds  from  

each  of  her  estimates.    That  calibration  would  then  bring  an  increase  in  the  accuracy  

of  the  post-­‐calibration  determination.  

So  now  suppose  that  we  are  dealing  with  a  witness  who  is  testifying  at  a  trial,  or  

a  bystander  who  has  witnessed  a  crime  and  is  reporting  what  she  saw  to  the  police.    

The  witness  or  bystander  reports  that  the  person  she  saw  running  out  of  the  bank  

waving  a  gun  and  wearing  a  ski  mask  appeared  to  weigh  about  200  pounds.    If  this  

were  a  report  to  the  police,  the  police  officer  might  (in  theory,  even  if  rarely  in  

practice)  ask  the  witness  if  her  estimates  of  weight  were  usually  high,  or  usually  

                                                                                                               18  Or,  more  precisely,  as  absolute  accuracy,  in  the  sense  described  above,  and  as  distinguished  from  relative  accuracy.      19  And  on  “actual”  weight,  see  also  note  5,  supra.    

Page 14: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  14  

low,  or  usually  close  to  accurate.    And  if  the  estimate  of  200  pounds  were  part  of  a  

witness’s  testimony  at  a  trial,  on  cross-­‐examination  the  witness  might  have,  in  

theory,  have  been  asked  about  the  accuracy  of  her  other  estimates  of  weight  on  

other  occasions.    Alternatively,  opposing  counsel  might  have  offered  evidence  about  

previous  weight  estimates  by  this  witness  that  had  proved  to  be  inaccurate.    Under  

either  scenario,  the  idea  would  be  to  calibrate  the  accuracy  of  the  witness  on  this  

occasion  by  looking  at  the  degree  of  her  accuracy  on  other  occasions.    A  history  of  

inaccuracy  would  lead  the  rational  evaluator  of  the  testimony  to  discount  it,  just  as  a  

history  of  consistent  overestimates  would  lead  the  rational  evaluator  to  subtract  

from  the  estimate  provided  by  the  witness.        

If  such  matters  had  been  raised  on  cross-­‐examination,  it  is  likely  that  the  inquiry  

would  have  been  excluded,  possibly  under  something  like  Federal  Rule  of  Evidence  

611,  which  as  applied  typically  excludes  matters  relating  to  events  other  than  the  

ones  being  litigated  in  the  case  at  hand.20    However  useful  it  might  be  to  the  trier  of  

fact  to  be  able  to  calibrate  the  witness’s  testimony  in  just  the  way  just  described,  the  

legal  system  appears  resistant  to  allowing  a  trier  of  fact  to  calibrate  a  factual  report  

                                                                                                               20  The  statement  in  the  text  is  an  overstatement,  partly  because  Rule  611  does  allow  inquiries  into  credibility,  partly  because  of  variation  among  jurisdictions  with  respect  to  whether  they  have  wide  or  narrow  scope  limitations  on  cross-­‐examination,  see  Christopher  B.  Mueller  &  Laird  C.  Kirkpatrick,  Evidence  §6.63  at  603-­‐06  (5th  ed.  2012),  partly  because  so  much  is  left  to  the  trial  judge’s  discretion,  and  partly  because  trial  judges  vary  in  terms  of  how  widely  they  understand  Rule  611’s  allowance  in  all  cases  of  cross-­‐examination  on  “matters  affecting  the  witness’s  credibility.”    As  a  result,  all  we  claim  here  is  that  the  issues  we  offer  in  this  Article  might  lead  to  a  broader  scope  of  cross-­‐examination  and  rebuttal  evidence  than  now  generally  exists  in  both  the  federal  and  state  systems.    

Page 15: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  15  

by  examining  the  accuracy  of  other  and  even  similar  reports  made  by  the  same  

observer.21  

Although  we  will  return  to  the  factual  witness  example  presently,  this  is  not  the  

place  to  pursue  it,  in  large  part  because  it  is  not  clear  that  even  this  type  of  

calibration  is  especially  relevant  to  the  kinds  of  legal,  as  opposed  to  factual,  

judgments  that  are  often  made  by  the  courts  and  other  legal  actors  whose  

judgments  are  being  reviewed.    Unlike  estimates  of  weight  and  other  factual  reports,  

locating  the  ground  truth  of  a  legal  judgment  is  more  elusive.    Such  a  task  is  not  

impossible,  of  course.    For  example,  a  reviewing  court  might  wish  to  know  how  

often  a  trial  judge  had  made  obvious  errors  of  law  occasioning  reversal.22    Especially  

under  circumstances  in  which  a  reviewing  appellate  court  would  perceive  itself  to  

be  highly  knowledgeable  about  the  area  of  law  at  issue,  that  court  might  engage  in  

rigorous  scrutiny  of  the  legal  judgments  of  a  trial  judge  known  to  be  frequently  

reversed  for  making  obvious  mistakes  of  law,  while  at  the  same  time  being  highly                                                                                                                  21  The  testimony  of  expert  witnesses  represents  an  obvious  exception,  because  here  it  is  in  fact  common  for  cross-­‐examination  to  focus  on  the  other  assessments  made  by  that  expert.        22  Implicit  in  this  statement  is  the  belief  that  reversal  is  some  (admittedly  imperfect)  guide  to  the  conformity  of  a  judgment  below  with  some  notion  of  legal  accuracy  or  legal  correctness.    One  example  is  Chief  Justice  Marshall’s  observation  in  Marbury  v.  Madison,  1  Cranch  (5  U.S.)  137  (1803),  that  a  law  allowing  a  conviction  for  treason  on  the  testimony  of  only  one  witness  would  be  plainly  unconstitutional  in  light  of  the  two-­‐witness  rule  in  Article  III,  Section  3,  of  the  Constitution.    We  believe  that  there  are,  in  like  fashion,  other  examples  of  legal  decisions  that  are  simply  wrong  independently  of  some  court  declaring  them  so,  see  H.L.A.  Hart,  The  Concept  of  Law  124-­‐47  (Penelope  A.  Bulloch,  Joseph  Raz,  &  Leslie  Green,  eds.,  3d  ed.,  2012),  but  we  also  recognize  that,  especially  at  the  appellate  level,  and  especially  in  light  of  the  selection  effect  (see  George  L.  Priest  &  Benjamin  Klein,  The  Selection  of  Disputes  for  Litigation,  13  J.  Legal  Stud.  1  (1984)),  such  examples  of  plain  legal  error  or  inaccuracy  are  rare.      

Page 16: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  16  

deferential,  under  conditions  of  legal  uncertainty,  to  the  judgments  of  a  trial  judge  

whose  decisions  on  matters  of  law  were  routinely  upheld  on  appeal.    The  reviewing  

court  would  use  the  reversal  rate  as  way  of  calibrating  their  judgment  of  the  

accuracy,  under  conditions  of  legal  uncertainty,  of  the  trial  judge’s  legal  conclusions.      

Although  this  kind  of  calibration  is  thus  possible  in  theory,  it  is  likely  that,  in  

practice,  and  especially  given  the  operation  of  a  selection  effect  making  cases  

involving  clear  right  or  wrong  answers  disproportionally  unlikely  to  be  litigated,,23  

it  is  rare  that  we  are  able  to  characterize  decisions  on  matters  of  law,  or  even  of  

mixed  questions  of  law  and  fact,  as  simply  right  or  wrong.    Rather,  such  decisions  

are  more  likely  to  involve  questions  in  which  there  is  no  ground  truth  or  in  which  

we  do  not  know  what  the  ground  truth  is.    In  such  cases,  a  reviewing  body  may  be  

less  concerned  with  the  degree  of  accuracy  of  some  reviewed  decision  as  a  matter  of  

ground  truth  than  it  is  with  just  how  to  evaluate  the  evaluative  judgment  that  is  

being  reviewed.    And  it  is  this  concern  not  with  fact  but  with  evaluating  an  

evaluation  that  leads  us  to  our  third  conception  of  calibration.  

C.   The  Calibration  of  Evaluative  Judgments  

  Although  reviewing  courts  and  other  reviewing  institutions  are  sometimes  

required  to  review  factual  determinations  and  legal  determinations  that  have  

relatively  clear  right  or  wrong  answers,  the  assessment  of  the  judgments  of  others  

even  more  often  arises  in  contexts  in  which  the  judgments  being  assessed  are  far  

more  evaluative  than  factual  or  otherwise  straightforward.    The  question  then  is                                                                                                                  23  See  especially  Priest  &  Klein,  supra  note  22.  A  good  overview  of  the  central  issues  is  Leandra  Lederman,  Which  Cases  Go  to  Trial?:  An  Empirical  Study  of  Predictions  of  Failure  to  Settle,  49  Case  West.  Res.  L.  Rev.  315  (1999).    

Page 17: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  17  

how  do  courts  assess  an  assessment,  and  how  do  they  evaluate  an  evaluation?    

When  an  appellate  court  is  evaluating  a  lower  court’s  determination  that  a  

defendant  had  (or  did  not  have)  the  effective  assistance  of  counsel,  that  a  

warrantless  search  was  or  was  not  reasonable,  that  a  state  interest  was  or  was  not  

substantial,  that  a  regulatory  mechanism  was  or  was  not  the  least  restrictive  (of  

some  constitutionally-­‐recognized  interest),  or  that  a  defendant  should  or  should  not  

prevail  on  summary  judgment  because  of  the  insufficiency  of  the  plaintiff’s  potential  

evidence,  the  appellate  court  is  faced  with  the  task  of  evaluating  what  is  itself  an  

evaluative  judgment.      And  in  such  cases,  we  might  hypothesize  that  the  evaluator  

might  usefully  wish  to  know  just  what  scale  the  original  decision-­‐maker  was  

employing  in  making  the  decision  now  under  review.      

  In  review  contexts  such  as  these,  calibration  takes  on  a  different  meaning,  

and  we  can  label  it  evaluative  calibration.    Just  as  the  graduate  school  or  employer  

wants  to  know  what  a  3.6  from  some  university  means,  and  just  as  the  potential  

hotel  or  restaurant  patron  wants  to  know  what  two  stars  means,24  so  too  might  

evaluators  often  wish  to  be  able  to  calibrate  the  kinds  of  earlier  evaluations  that  

have  no  intrinsic  meaning,  or  at  least  have  a  broad  enough  range  of  meaning  that  

there  is  no  particular  conception  of  a  clearly  right  or  clearly  wrong  answer.    And  

although  one  might  be  (and  should  be)  a  metaphysical  realist  –  a  believer  in  a  mind-­‐

independent  reality  –  about  water  and  gravity  and  gold,  and  maybe  even  about  the  

                                                                                                               24  Even  apart  from  questions  of  evaluative  calibration,  we  would  also,  of  course,  want  to  know  what  the  scale  is.    For  restaurants,  for  example,  three  stars  is  the  best  you  can  do  in  the  Michelin  Guide,  four  stars  is  the  maximum  for  the  New  York  Times,  and  other  guides  go  up  to  a  maximum  of  five  stars.        

Page 18: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  18  

rightness  of  altruism  and  the  wrongness  of  child  abuse,  there  are  few  metaphysical  

realists  about  the  star  ratings  for  hotels  and  restaurants,  the  wine  ratings  on  the  100  

point  scale  commonly  used  by  wine  experts,  and  even  the  idea  of  an  A-­‐  or  a  B+  on  a  

grade  scale.    And  so  when  we  want  to  know  whether  an  88  point  wine  or  a  two  star  

review  or  rating  or  a  3.6  grade  point  average  is  good  or  mediocre,  we  would,  ideally,  

like  to  know  about  the  other  ratings  of  the  rater.25      If  the  rater  has  given  a  

restaurant  three  stars  out  of  a  possible  four  when  we  thought  the  restaurant  was  

terrible,  and  if  the  rater  consistently  gives  high  ratings  to  restaurants  we  believed  

on  the  basis  of  our  own  experiences  to  be  mediocre  or  worse,  we  might  then  well  

ignore  or  discount  the  rater’s  rating  of  an  establishment  we  were  considering  

patronizing  for  the  first  time.    Similarly,  when  some  law  schools  algorithmically  

lowered  (or  raised)  the  grade  point  averages  of  students  coming  from  particular  

institutions,26  their  view  was  based  on  having  seen  how  students  with  those  grade  

averages  from  those  schools  actually  performed  in  law  schools.    If  the  students  

consistently  underperformed  their  undergraduate  grade  point  averages  compared  

to  students  from  other  undergraduate  institutions,  then  this  differential  would  be  

reflected  in  an  adjustment  of  the  admission  index,  and  the  adjustment  can  be  

considered  a  form  of  calibration.      

                                                                                                               25  The  Michelin  Guide  says  that  three-­‐star  restaurants  are  “worth  a  journey.”    But  if  half  the  restaurants  in  the  Guide  were  worth  a  journey  in  the  Guide’s  opinion,  we  might  be  more  reluctant  to  actually  make  the  journey  than  if  such  a  rating  were  given,  as  is  actually  the  case,  to  less  than  one  percent  of  the  establishments  rated.      26  See  note  6,  supra.    

Page 19: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  19  

  Thus,  when  we  are  speaking  of  evaluative  calibration,  we  are  not  primarily  

interested  in  accuracy.    Rather,  the  concern  is  with  just  how  we  should  understand  

what  someone  else’s  judgment  means  in  light  of  the  other  decisions  or  judgments  

that  that  decision  maker  has  reached,  and  thus  in  light  of  what  we  can  infer  that  

decision  maker’s  evaluation  scale  to  be.    And  if  we  then  engage  in  our  subsequent  

evaluation  of  that  earlier  evaluation  in  light  of  this  knowledge,  we  can  be  said  to  

have  engaged  in  a  process  of  calibration.  

II. Some  Potential  Applications  

A.   The  Norm  of  Non-­‐Calibration  

We  have  offered  some  potential  applications  of  the  possibility  of  calibration,  and  

it  is  now  time  to  explore  several  of  them  in  greater  depth.    We  do  so  in  order  to  

hypothesize  the  existence  of  what  appears  to  be  a  norm  of  non-­‐calibration,  the  

seeming  norm  of  judicial  behavior  that  prohibits  or  discourages  courts  from  

officially  examining  the  previous  judgments  of  the  body  under  review  in  order  to  

calibrate  those  judgments,  or  from  officially  acknowledging  that  such  calibration  has  

occurred  even  if  it  has  taken  place  surreptitiously.27  

                                                                                                               27  In  saying  that  calibration  sometimes  occurs  “surreptitiously,”  we  do  not  mean  to  imply  anything  pernicious,  but  rather  to  suggest  that  judges  often  know  of  the  past  behavior  of  the  legislatures,  courts,  and  agencies  whose  judgments  they  are  reviewing,  and  might  well  be  influenced  by  that  knowledge  even  as  they  believe  that  it  is  officially  the  kind  of  information  that  they  should  not  take  into  account.    Cf.  Andrew  J.  Wistrich,  Chris  Guthrie,  &  Jeffrey  J.  Rachlinski,  Can  Judges  Ignore  Inadmissible  Information?  The  Difficulty  of  Deliberately  Disregarding,  153  U.  Pa.  L.  Rev.  1251  (2005)  (reporting  a  study  in  which  judges  were  often  unable  to  ignore  information  they  actually  had  but  knew  was  legally  unusable).    

Page 20: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  20  

To  start  with  a  relatively  straightforward  example,  consider  a  Supreme  Court  

Justice,  especially  one  with  no  extreme28  views  one  way  or  another  about  the  death  

penalty,  who  is  faced  with  evaluating29  a  decision  by  a  state  supreme  court  to  

uphold  the  death  penalty  as  against  defense  claims  of,  for  example,  procedural  

defects,  ineffective  assistance  of  counsel,  or  cruel  and  unusual  administration.    For  

that  Justice,  we  can  ask  whether  it  would  make  a  difference  that  the  state  supreme  

court  whose  judgment  is  under  review  almost  always  affirms  death  penalty  

sentences,  or  instead  almost  always  vacates  them.30    If  the  state  supreme  court  had  a  

long  and  persistent  history  of  affirming  capital  sentences  against  such  objections,  

then  the  reviewing  Justice  might  suppose  that  this  case  presented  a  more  or  less  

typical  case,  and  would  evaluate  the  decision  below  according  to  the  standard  that  

she  generally  applied  to  such  matters.    But  if  the  court  being  reviewed  was  a  court  

that  often  or  almost  always  vacated  capital  sentences,  then  the  reviewing  Justice  

might  calibrate  this  decision  accordingly,  concluding  that  here  the  causes  for  

potential  reversal  might  be  especially  absent.    And  as  a  result  she  might  be  inclined  

to  be  more  deferential  in  her  review.  Conversely,  if  the  case  arose  on  a  government  

appeal  from  a  lower  appellate  reversal,  the  reviewing  Justice  might  conclude  that  a  

reversal  of  a  sentence  or  conviction  by  a  court  that  routinely  upholds  convictions  or                                                                                                                  28  We  use  this  term  not  as  a  pejorative,  but  just  as  a  way  of  describing  both  tails  of  a  distribution  of  views.    29  Possibly  in  deciding  on  the  merits,  possibly  in  deciding  whether  to  vote  to  grant  certiorari,  and  possibly  in  deciding  whether  when  sitting  as  a  single  Justice  to  grant  a  request  for  a  stay.      30  Cf.  Charles  Fried,  Impudence,  1992  Sup.  Ct.  Rev.  155  (recounting  the  persistent  anti-­‐death  penalty  actions  of  the  Ninth  Circuit  in  the  early  1990s).      

Page 21: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  21  

sentences  is  an  action  that  is  very  high  on  the  reviewed  court’s  scale  of  error.31    

Insofar  as  the  reviewing  Justice  has  information  of  this  variety  about  other  decisions  

by  the  court  being  reviewed,  she  can  be  understood  to  be  calibrating  the  current  

decision,  or,  more  precisely,  to  be  calibrating  her  attitude  to  the  current  decision  in  

light  of  what  she  knows  from  other  cases  to  be  the  relevant  scale.      

If  we  posit  that  actually  knowing  about  these  other  results  would  enable  the  

reviewing  Justice  more  accurately  to  calibrate  the  decision  under  review,  or  to  

calibrate  her  degree  of  deference  to  the  court  being  reviewed,  the  question  then  

arises  as  to  whether  she  would  or  should  in  fact  be  permitted  to  examine  those  

other  decisions.    If  she  did  calibrate  on  the  basis  of  other  decisions  not  now  before  

her,  would  this  be  a  practice  that  could  be  openly  acknowledged,  for  example  by  

making  reference  to  these  other  decisions  in  an  opinion?    Could  a  judge  actually  say  

that  she  is  applying  especially  close  scrutiny  on  review,  for  example,  because  of  the  

                                                                                                               31  The  phenomenon  here  is  related  but  not  identical  to  the  occasional  practice  of  the  Supreme  Court  in  signaling  extreme  easiness  by  the  assignment  of  the  opinion  to  the  Justice  least  likely  on  the  basis  of  past  performance  to  be  perceived  as  sympathetic  to  the  claim  now  being  upheld.    Obviously  unanimity  itself  will  sometimes  send  such  a  signal,  and  Brown  v.  Board  of  Education,  347  U.S.  483  (1954),  Cooper  v.  Aaron,  358  U.S.  1  (1958),  and  United  States  v.  Nixon,  418  U.S.  683  (1974),  are  well-­‐known  examples.    But  there  is  also  a  signal  sent  when  Justice  Rehnquist  writes  for  a  unanimous  Supreme  Court  in  Jenkins  v.  Georgia,  418  U.S.  153  (1974),  making  clear  the  limits  of  the  “local  standards”  idea  in  obscenity  law,  and  so  too  with  Justice  White  writing  for  a  unanimous  Court  in  Sable  Communications  v.  FCC,  492  U.S.  115  (1989),  again  dealing  with  the  limits  of  obscenity  and  communications  indecency  law,  here  in  the  context  of  a  ban  on  sexually  explicit  telephone  services.    In  such  cases,  the  assignment  of  the  opinion  to  the  Justice  known  to  be  least  receptive  to  the  kinds  of  claims  now  being  upheld  is  perhaps  a  way  of  telling  the  audience  for  the  opinion  that  the  case  is  especially  easy,  and  that  this  form  of  signal  is  dependent  on  an  implicit  calibration  by  the  audience  for  the  opinion.          

Page 22: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  22  

previous  decisions  by  the  particular  court  or  the  particular  judge  whose  judgment  

she  is  now  reviewing?  

     The  death  penalty  example  is  unrepresentative  in  some  ways,  but  

representative  in  others.    It  is  definitional  of  the  appellate  process  that  judges  are  

reviewing  the  decisions  of  other  judges,32  and  whether  it  be  a  death  penalty  appeal,  

a  grant  or  denial  of  a  motion  for  summary  judgment,  a  decision  to  support  or  reject  

a  constitutional  challenge  to  some  state  law  or  practice,  or  any  of  a  large  number  of  

other  contexts  in  which  there  is  an  appellate  review  of  an  evaluative  decision  below,  

the  basic  dynamic  remains  one  of  a  legal  evaluation  of  an  earlier  evaluative  legal  

judgment.    And  especially  when  the  governing  law  requires  that  the  evaluation  be  

something  other  than  de  novo,33  the  reviewing  judges  would  seem  to  benefit  from  

being  able  to  calibrate  the  judgments  they  are  being  asked  to  review.      And  those  

reviewing  judges  would  also  seem  to  benefit  by  being  able  to  know  as  much  about  

the  other  judgments  of  the  reviewed  judge  as  a  reader  of  a  restaurant  review  

                                                                                                               32  We  recognize  that  appellate  courts  often  review  jury  verdicts,  but  even  in  such  cases  the  appellate  court  is  in  the  position  of  reviewing  the  decision  of  a  trial  judge  to  let  the  case  go  the  jury,  or  to  refuse  to  set  aside  a  jury  verdict.      33  When  review  is  genuinely  de  novo,  we  might  imagine  that  basis  for  the  decision  being  reviewed  makes  little  or  no  difference.    But  if  the  standard  of  review  is  anything  above  de  novo,  some  degree  of  deference  is  required,  and  it  is  in  those  situations  where  the  reviewing  body  would  like,  we  suppose,  to  have  some  idea  of  where  on  some  scale  the  particular  decision  lies  for  the  body  being  reviewed.    When  the  Supreme  Court  is  engaged  in  genuine  rational  basis  review,  for  example,  as  in  cases  like  New  Orleans  v.  Dukes,  427  U.S.  297  (1976),  or  Williamson  v.  Lee  Optical  of  Oklahoma,  Inc.,  348  U.S.  483  (1955),  it  is  engaged  in  extreme  deference  to  the  administrative  or  legislative  judgment  below,  and  in  evaluating  that  judgment  it  might  wish  to  know  just  what  kinds  of  decisions  that  body  makes,  so  as  better  to  be  able  to  calibrate  its  implicit  standard  of  review  to  the  decision  and  the  decision-­‐maker  being  reviewed  on  this  occasion.    

Page 23: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  23  

benefits  from  knowing  about  the  other  judgments  of  the  reviewer.    But  what  

appears  to  be  a  norm  of  non-­‐calibration  precludes  knowing  about  such  other  

judgments.      Congress,  state  legislatures,  administrative  officials,  administrative  law  

judges,  and  lower  court  judges  all  make  decisions  that  are  part  of  a  large  collection  

of  decisions  by  that  institution  or  judge,  and  thus  any  particular  decision  being  

reviewed  lies  somewhere  on  a  scale  for  that  institution  or  that  judge.    But  under  a  

norm  of  non-­‐calibration  there  is  no  way,  at  least  officially  and  openly,  for  a  

reviewing  court  or  other  institution  to  obtain  or  overtly  use  just  this  sort  of  

presumably  valuable  information.  

As  described  briefly  above,  a  similar  opportunity  for  calibration  also  arises  in  the  

context  of  jurors  (or  judges  operating  as  triers  of  fact)  in  evaluating  the  testimony  of  

witnesses.    But  again  calibration  by  reference  to  the  analog  of  a  decisional  history  is  

rarely  permitted.      Sometimes,  of  course,  witnesses  will  testify  as  to  matters  of  fact  

that  have  straightforward  answers  or  testify  about  questions  where  the  answer  is  a  

simple  yes  or  no.    But  witnesses  also  testify  about  speed,  height,  weight,  

temperature,  attitude  (“he  seemed  angry”),  condition  (“he  was  drunk”),  and  a  vast  

number  of  other  matters  that  are  as  much  evaluative  (even  if  not  normatively  

evaluative)  as  they  are  simply  factual,  and  that  in  important  ways  can  be  

characterized  in  terms  of  a  scale.    Speed  is  variable,  as  is  height  and  weight  and  so  

on,  but  there  are  also  degrees  of  anger,  degrees  of  drunkenness,  and  degrees  of  a  

very  large  number  of  things  about  witnesses  routinely  offer  evidence.  

When  witnesses  testify  about  such  scalar  matters,  the  issues  appear  to  be  similar  

to  those  arising  when  a  reviewing  court  is  evaluating  an  earlier  determination  by  a  

Page 24: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  24  

lower  court,  legislature,  administrative  official,  or  administrative  law  judge.    As  with  

these  latter  examples,  the  evaluator  of  a  witness’s  testimony  wants  to  know  where  

on  the  witness’s  scale  some  conclusion  lies,  so  as  to  be  able  better  to  understand  and  

use  it.    Indeed,  the  ideal  testimony,  in  theory,  would  be  testimony  in  which  the  

assessor  –  judge  or  jury  –  would  know  about  a  previous  judgment  by  the  witness  on  

some  question  whose  answer  is  already  known  by  the  assessor.    If  a  witness  testifies  

that  Susan  was  angry,  it  would  be  ideal,  in  theory,  if  the  assessor  knew  how  the  

witness  would  characterize  the  attitude  of  Harry,  known  by  the  assessor,  on  an  

occasion  also  known  by  the  assessor.      This  is  close  to  what  happens  when  we  hear  

or  read  a  restaurant  review,  for  what  we  would  really  like  to  know  is  how  the  

reviewer  rated  another  restaurant  about  which  we  have  already  formed  an  opinion.    

With  that  information  in  hand,  we  would  be  best  able  to  calibrate  the  review  of  the  

reviewer  on  this  occasion,  and  thus,  with  analogous  information  on  hand,  the  trier  of  

fact  would  be  best  able  to  calibrate  the  testimony  of  the  witness  on  this  occasion.  

Such  ideal  information  will,  of  course,  rarely  be  available.    Nevertheless,  a  

second-­‐best  solution  would  be  for  the  trier  of  fact  to  be  able  to  have  information  

about  other  judgments  made  by  the  witness,  even  if  not  about  things  already  known  

to  the  trier  of  fact.    But  at  least  if  the  trier  of  fact  knew  about  the  other  judgments,  

and  could  form  some  impression  about  the  alignment  of  those  judgments  with  some  

assumed  reference  standard,  the  trier  of  face  could  engage  in  a  better  calibration  of  

the  testimony  than  might  be  possible  under  current  practice,  where  even  with  

Page 25: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  25  

respect  to  witnesses  at  trial  a  norm  of  non-­‐calibration  appears  to  make  such  matters  

largely  inaccessible.34      

B.   A  Large  Exception  and  Its  Implications  

  We  have  suggested  that  there  appears  to  exist  a  prevailing  rule  of  non-­‐

calibration.    That  is,  courts  and  other  reviewing  bodies  will  not  typically  calibrate  

their  standard  of  review  or  degree  of  deference  to  what  they  know  or  might  find  out  

about  the  previous  decisions  of  the  body  being  reviewed.    Or,  if  the  reviewers  do  

engage  in  such  calibration,  whether  consciously  or  not,  they  typically  will  not,  again  

because  of  the  force  of  the  norm  of  unavailability,  admit  to  doing  it.      

A  significant  exception  to  the  norm  of  non-­‐calibration  appears  to  exist  with  

respect  to  judicial  review  of  administrative  agencies.    In  this  context,  there  is  some  

indication  that  the  norm  of  non-­‐calibration  is  weaker,  and  that  the  past  decisions  

and  behavior  of  an  agency  being  reviewed  are  thought  to  influence  the  attitude  of  

the  reviewing  court  on  a  particular  occasion.  

  The  practice  of  “agency-­‐specific”35  standards  of  review  appears  to  have  

started  with  reactions  to  what  were  perceived  to  be  National  Labor  Relations  Board  

                                                                                                               34  Witnesses  can,  of  course,  be  cross-­‐examined  or  impeached  on  issues  going  to  their  credibility.      Fed.  R.  Evid.  607,  611  (b).    Moreover,  cross-­‐examination  or  impeachment  going  to  witness  credibility  is  typically  not  limited  by  the  “scope  of  direct”  rule.    See,  e.g.,  United  States  v.  Moore,  917  F.2d  215,  222  (6th  Cir.  1990);  United  States  v.  Sullivan,  803  F.2d  87,  90-­‐91  (3d  Cir.  1986).    But  credibility  is  rarely  understood  so  broadly  as  to  allow  wide-­‐ranging  cross-­‐examination  or  impeachment  going  to  the  kind  of  calibration  we  are  discussing  here,  especially  because  credibility  is  widely  understood  to  be  focused,  even  if  not  exclusively,  on  issues  of  veracity  and  not  on  issues  of  perception  or  judgment.        35  See  Richard  E.  Levy  &  Robert  L.  Glicksman,  Agency-­‐Specific  Precedents,  89  Texas  L.  Rev.  499  (2011).    

Page 26: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  26  

practices  that  diverged  from  those  of  other  agencies,  in  particular  the  use  by  the  

NLRB  of  adjudication  as  a  rule-­‐making  tool  in  order  to  avoid  the  structures  and  

strictures  imposed  by  the  Administrative  Procedure  Act  on  agency-­‐rulemaking,  and  

also  a  differentially  (compared  to  other  agencies)  large  gap  between  articulated  

standards  and  adjudicative  outcomes.36    As  a  result,  it  appeared  to  some  observers  

(although  not  acknowledged  by  the  reviewing  courts)  that  judicial  review  of  NLRB  

decisions  was  different  from  and  more  intrusive  than  judicial  review  of  other  

agencies,  a  consequence  of  the  unadmitted  judicial  knowledge  of  exactly  this  

differential  behavior.  

  More  recently,  others  have  noticed  the  more  widespread  existence  of  the  

same  phenomenon,37  and  have  gone  on  to  endorse  it,38  with  Richard  Pildes,  for  

example,  arguing  that  taking  differences  among  agencies  and  their  past  behavior  

into  account  in  evaluating  the  decisions  of  those  agencies  is  a  sensible  and  realistic  

reaction  to  the  differences  among  agencies  in  their  political  makeup,  their  structure,  

the  method  by  which  their  senior  members  are  appointed,  the  matters  they  are  

called  up  to  decide,  and  much  else.39    To  fail  to  do  so,  Pildes  argues,  is  a  formalistic  

reluctance  to  treat  all  agencies  the  same  when  in  fact  they  are  plainly  not.40  

                                                                                                               36  See  Joan  Flynn,  The  Costs  and  Benefits  of  Hiding  the  Ball”:  NLRB  Policymaking  and  the  Failure  of  Judicial  Review,  75  B.U.  L.  Rev.  387  (1995),  and  earlier,  and  especially,  Ralph  Winter,  Judicial  Review  of  Agency  Decisions:  The  Labor  Board  and  the  Court,  1968  Sup.  Ct.  Rev.  53.          37  E.g.,  Jennifer  Nou,  Sub-­‐Regulating  Elections,  2013  Sup.  Ct.  Rev.  135.    38  See  Richard  H.  Pildes,  Institutional  Formalism  and  Realism  in  Constitutional  and  Public  Law,  2013  Sup.  Ct.  Rev.  1,  21-­‐30.    39  Id.  

Page 27: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  27  

  We  do  not  in  this  Article  purport  to  make  a  contribution  to  administrative  

law.    But  the  fact  that  the  past  decisions  of  a  particular  agency  seem  often  to  be  

relevant  to  courts  reviewing  agency  decisions  suggests  a  possible  generalization.    If  

it  is  at  times  useful  and  appropriate  for  reviewing  courts  to  take  an  agency’s  past  

judgments  into  account  in  determining  the  degree  and  type  of  scrutiny  to  be  applied  

to  that  agency’s  judgment  on  a  particular  occasion,  then  might  it  also  at  times  be  

useful  and  appropriate  for  a  reviewing  appellate  court  to  do  the  same  with  different  

trial  courts  and  trial  judges?    Similarly,  might  it  be  useful  and  appropriate  for  a  

magistrate  to  do  the  same  with  different  police  officers  and  different  police  

departments  who  are  seeking  search  warrants?    And  might  it  be  also  useful  and  

appropriate  for  an  administrative  law  judge  to  do  much  the  same  thing  with  the  

different  officials  and  different  parts  of  the  agency  whose  judgments  she  is  asked  to  

review?      

  At  times  such  history-­‐based  calibration  practices  do  exist  when  legal  

decision-­‐makers  are  evaluating  the  judgments  and  actions  of  citizens.    The  

Securities  and  Exchange  Commission,  for  example,  is  frequently  required  to  

evaluate  the  accuracy  or  non-­‐misleadingness  of  representations  made  in,  for  

example,  registration  statements,  proxy  statements,  and  periodic  reports.    But  in  

some  of  these  areas  the  Commission  has  an  overt  process  of  allowing  some  

registrants  the  benefit  of  a  fast-­‐track  or  similarly  cursory  review  process,  a  process                                                                                                                                                                                                                                                                                                                                              40  Id.    Pildes  is  correct  that  treating  different  agencies,  or  different  phenomena  in  general,  in  the  same  way,  is  formalistic,  although  for  one  of  us  such  formalism  is  not  necessarily  always  to  be  condemned.    See  Frederick  Schauer,  Formalism,  97  Yale  L.J.  509  (1988).      

Page 28: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  28  

that  is  available  only  to  registrants  with  proven  track  records  of  accuracy,  and  a  

process  that  is  revocable  upon  evidence  of  inaccuracy  on  some  occasion.41    In  

evaluating  the  accuracy  of  such  filings,  therefore,  the  Commission  is  thus  openly  

calibrating  its  degree  of  scrutiny  to  what  it  knows  from  the  past  practices  of  the  

individual  or  entity  being  reviewed.    The  audit  practices  of  the  Internal  Revenue  

Service  are  similar  even  if  less  overt  and  less  (publicly)  systematized,  where  again  

the  degree  of  scrutiny  at  the  audit  and  subsequent  stages  appears  to  take  account  of  

the  particular  history  of  the  particular  taxpayer.  

  The  practice  of  agency-­‐specific  review  suggests  that  the  form  of  calibration  

we  describe  here  is  hardly  unknown  to  the  law.    At  times  the  norm  of  unavailability  

therefore  does  not  prevail,  and  reviewing  bodies  calibrate  their  assessments  of  the  

judgment  being  reviewed  in  light  of  the  full  array  of  judgments  made  over  time  by  

subject  of  the  review,  just  as  the  user  of  a  TripAdvisor  review  calibrates  her  

assessment  of  a  review  in  light  of  the  full  array  of  reviews  made  by  a  particular  

reviewer.    But  agency-­‐specific  review  is  still  more  than  exception  than  the  rule,  and  

the  question  is  then  presented  about  what  might  explain  the  rarity  in  law  of  a  

practice  that  characterizes  not  only  TripAdvisor,  but  also  the  way  that  most  people  

make  most  of  their  decisions  in  most  aspects  of  their  lives.              

III. Barriers  to  Calibration  

                                                                                                               41  See,  e.g.,  the  “safe  harbor”  provisions  in  Section  21E  of  the  Securities  Exchange  Act,  15  U.S.C.  §78u-­‐5  (2014),  the  “seasoned  issuer”  provisions  under  SEC  Rule  405,  17  C.F.R.  §230.405  (2015),  the  “bad  actor  disqualifications”  under  Regulation  D,  Rule  506,  17  C.F.R.  §230.506(d)  (2015),  and  the  streamlined  disclosure  procedures  provided  for  in  Rule  144(c)(1),  17  C.F.R.  §230.144(c)(1).        

Page 29: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  29  

If  we  are  correct  in  believing,  along  with  much  of  the  non-­‐legal  world  and  some  

of  the  legal  world,  that  calibration,  whether  of  witness  testimony  or  legal  judgments  

being  reviewed  by  legal  reviewers,  is  at  least  sometimes  potentially  valuable,  if  we  

are  correct  that  calibration  will  often  be  assisted  by  knowing  about  judgments  made  

by  the  witness  or  reviewee  body  on  other  occasions  (and  maybe  even  the  outcome  

of  those  judgments),  and  if  we  are  correct  that  such  information  is  routinely  

unavailable  in  the  legal  system,  then  we  might  usefully  think  about  why  this  is  so.  

One  possibility  is  that  law  imagines  itself  as  a  pervasively  particularistic  

institution.    Whether  it  be  legal  scholars  urging  that  matters  be  decided  “one  case  at  

a  time,”42  or  the  fact-­‐specific  and  particularistic  orientation  of  the  common  law,  or  

the  general  impermissibility  of  using  evidence  of  acts  or  behavior  on  other  

occasions  to  prove  conformity  on  this  occasion,43  there  are  important  ways  in  which  

the  legal  system,  whether  for  reasons  of  supposed  particularistic  justice  or  for  

reasons  of  efficiency  or  just  because  of  a  pervasive  legal  ideology  of  particularism,44  

is  persistently  averse  to  spending  too  much  time  dealing  with  cases  or  issues  other                                                                                                                  42  Cass  R.  Sunstein,  One  Case  at  a  Time:  Judicial  Minimalism  on  the  Supreme  Court  (2001).    43  This  is  the  so-­‐called  propensity  or  character  rule,  embodied  in,  for  example,  Rule  404  of  the  Federal  Rules  of  Evidence.        44  Which  one  of  us  has  already  spent  far  too  much  time  and  ink  challenging.    See  Frederick  Schauer,  Thinking  Like  a  Lawyer:  A  New  Introduction  to  Legal  Reasoning  (2009);  Frederick  Schauer,  Profiles,  Probabilities,  and  Stereotypes  (2003);  Frederick  Schauer,  Playing  By  the  Rules:  A  Philosophical  Examination  of  Rule-­‐Based  Decision-­‐Making  in  Law  and  in  Life  (1991).    The  other  one  of  us,  however,  has  spent  less  times  and  less  ink  arguing  that  people  can  be  generalizers,  and  is  concerned  with  the  frequency  with  people  are  often  –  depending  on  context  –  particularists.  See  Barbara  A.  Spellman,  Individual  Reasoning,  in  Intelligence  Analysis:  Behavioral  and  Social  Scientific  Foundations  117  (Baruch  Fischhoff  &  Cherie  Chauvin  eds.,  2011).    

Page 30: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  30  

than  the  one  now  under  consideration.    Whether  this  is  actually  true  is  a  difficult  

question,  but  it  appears  as  if  the  law  believes  that  it  is  true.    And  whether  it  is  

desirable  is  also  a  difficult  question,  but,  again,  the  law  appears  to  believe  that  it  is  

desirable.    And  thus  it  may  be  that  the  aversion  to  calibrating  by  use  of  decisional  

history,  although  most  overtly  manifested  in  the  kinds  of  examples  we  have  been  

discussing,  is  in  fact  best  explained  by  something  far  more  pervasive  in  law  in  

general.  

Other  obstacles  to  history-­‐based  calibration  may  be  more  pragmatic.    How  will  

judges  (or  jurors)  get  the  kind  of  information  they  might  need  to  engage  in  the  

process  of  calibration?    How  will  they  find  out  about  other  decisions,  and  other  

cases,  and  about  how  those  other  decisions  turned  out?    There  is  a  serious  risk  that  

opening  the  door  to  this  kind  of  information  will  lead  to  such  an  expansion  of  the  

domain  of  usable  legal  information  that  the  disadvantages,  even  if  only  logistical  and  

pragmatic,  would  far  outweigh  the  advantages.      

Moreover,  the  aversion  to  calibration  may  also  embody  non-­‐epistemic  goals.    

Just  as  deference  in  general  may  often  reflect  a  non-­‐epistemic  respect  for  the  

decision-­‐making  powers  of  others,45  the  unwillingness  of  an  appellate  body  to  

examine  the  decisional  history  of  those  it  is  reviewing  may  manifest  a  form  of  

respect,  even  if  not  epistemically  justified,  for  those  it  is  reviewing.    Relatedly,  there  

may  be  non-­‐epistemic  institutional  equality  goals  that  would  militate  in  favor  of  

treating  all  agencies  or  all  lower  courts  as  equivalent  even  if  they  are  not.    An  

appellate  court  that  is  willing,  openly,  to  treat  the  decisions  of  some  lower  court                                                                                                                  45  See  Philip  Soper,  The  Ethics  of  Deference:  Learning  from  Law’s  Morals  (2002).    

Page 31: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  31  

judges  with  less  deference  than  it  treats  others  is  displaying  a  lack  of  respect  which,  

even  when  epistemically  justified,  may  be  thought  inconsistent  with  some  of  the    

goals  of  the  legal  system.        

Yet  it  is  important  to  recognize  that  calibration,  even  under  existing  procedures,  

is  not  absent.    It  is  just  that  legal  decision  makers  calibrate  against  their  own  views,  

or  against  an  assumed  average,  or  an  assumption  that  the  witness  or  lower  court  

judge  is  more  or  less  like  them.    Or  it  may  be  that  legal  decision  makers,  especially  

judges,  do  exactly  what  we  are  suggesting  here,  but  do  it  in  the  halls  by  making  use  

of  gossip  in  the  country’s  courthouses,  or  do  it  by  reading  the  newspapers,  or  in  

other  ways  obtain  information  that  they  are,  in  theory,  not  supposed  to  have,  and  

use  information  that  they  are  not  permitted  to  acknowledge  publicly  that  they  have  

in  the  first  place.    Insofar  as  this  is  the  case,  and  thus  insofar  as  there  is  far  more  

calibration  from  other  events  and  other  decisions  than  legal  actors  are  willing  or  

allowed  to  admit,  then  it  is  possible  that  bringing  the  entire  practice  out  in  the  open,  

making  it  more  systematic,  and  making  it  more  legitimate,  may  some  salutary  

effects.  

IV. Conclusion  

The  law  hovers  ambivalently  between  generality  and  particularity.    The  law’s  

generality  is  exemplified  by  its  heavy  reliance  on  rules,  on  precedent,  and  on  various  

principles,  maxims,  canons,  and  other  vehicles  of  generality.    But  the  law’s  

particularity  manifests  itself  in,  for  example,  the  very  idea  of  common  law  method,  

in  the  calls  to  make  decisions  one  case  at  a  time  or  on  the  facts  of  particular  

controversies,  and  in  the  reluctance  of  the  law  of  evidence  to  allow  evidence  of  past  

Page 32: Schauer Spellman Paper - law.duke.edu · ! 2! judges!and!magistrates!inissuingsearchwarrantsassessthereliabilityofthe! representations!made!to!themby!law!enforcement!officersseekingthewarrant.

  32  

practices  to  be  used  as  proof  of  current  behavior,  however  epistemically  and  

probabilistically  rational  such  a  course  of  action  may  be.                

Law’s  pervasive  but  not  universal  reluctance  to  allow  reviewers  to  take  account  

of  the  past  decisions  of  the  individuals  and  institutions  it  is  reviewing  reflects  this  

ambivalence.    By  looking  only  at  the  particular  decision  under  review,  and  not  

calibrating  the  posture  of  review  on  the  basis  of  a  history  of  decisions,  reviewing  

courts  and  other  reviewing  institutions  embody  the  particularism  that  is  one  part  of  

the  American  and  common  law  legal  traditions.46    But  generality  is  also  a  part  of  the  

legal  tradition,  both  in  the  United  States  and  elsewhere.    In  exercising  review  

without  calibrating  in  light  of  the  reviewee’s  history,  reviewing  institutions  choose  a  

form  of  particularism  not  only  over  generality,  but  over  accuracy  as  well.    In  some  

review  environments  that  may  be  the  right  choice  to  make.    But  in  others  it  may  not  

be,  and  it  is  not  implausible  to  suggest  that,  at  times,  law  may  have  at  least  a  small  

bit  to  learn  from  institutions  such  as  TripAdvisor.                        

                                                                                                               46  Of  course  distinguishing  among  different  administrative  agencies,  or  among  different  judges  whose  decisions  are  being  reviewed,  is  itself  a  form  of  particularism,  just  as  treating  all  agencies  the  same  despite  their  differences,  or  treating  all  reviewee  courts  and  judges  the  same  despite  their  differences,  is  a  form  of  generalization.    This  may  suggest  that  the  dimensions  of  generality  and  particularity,  at  least  as  applied  to  law,  are  themselves  complex,  but  exploring  this  question  would  take  us  far  beyond  the  focus  of  this  article.