fateman et al - fast floating-point processing in common lisp
Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ....
-
Upload
judah-wedgewood -
Category
Documents
-
view
214 -
download
1
Transcript of Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ....
![Page 1: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/1.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Henry S. Baird
CSE Dept, Lehigh Univ.(Joint work with: Richard Fateman, Allison Coates, Kris Popat,
Monica Chew, Tom Breuel, Mark Luk, Terry Riopka, Michael Moll,
Dan Lopresti, Sui-Yu Wang, Jon Bentley, and Colin Mallows)
Protecting eCommercefrom Robots Impersonating
Human Users
![Page 2: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/2.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
A Pitfall of the World Wide Web
© Peter Steiner, The New Yorker, July 5, 1993, p. 61 (Vol.69, No. 20)
![Page 3: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/3.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Straws in the wind…
Mid 90’s: spammers trolling for email addresses
• in defense, people start disguising them, e.g.
“baird AT cse DOT lehigh DOT edu”
1997: abuse of ‘Add-URL’ feature at AltaVista
• some write programs to add their URL many times
• to skew search rankings in their favor
Andrei Broder et al (then at DEC SRC)
• a user action which is legitimate when performed once
becomes abusive when repeated many times
• no effective legal recourse
• how to block or slow down these programs …
![Page 4: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/4.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
The first known instance…
Altavista’s AddURL filter
1999: “ransom note filter”
• randomly pick letters, fonts, rotations – render as an image
• every user is required to read and type it in correctly
• reduced “spam add_URL” by “over 95%”
Weaknesses: isolated chars, filterable noise, affine deformationsM. D. Lillibridge, M. Abadi, K. Bharat, & A. Z. Broder, “Method for Selectively
Restricting Access to Computer Systems,” U.S. Patent No. 6,195,698, Filed April 13, 1998, Issued February 27, 2001.
An image of text, not ASCII
![Page 5: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/5.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Alan Turing (1912-1954)
1936 a universal model of computation
1940s helped break Enigma (U-boat) cipher
1949 first serious uses of a working computer
including plans to read printed text
(he expected it would be easy)
1950 proposed a test for machine intelligence
![Page 6: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/6.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Turing’s Test for AI
How to judge that a machine can ‘think’:
• play an ‘imitation game’ conducted via teletypes
• a human judge & two invisible interlocutors:• a human
• a machine `pretending’ to be human
• after asking any questions (challenges) he/she
wishes, the judge decides which is human
• failure to decide correctly would be convincing
evidence of machine intelligence
Modern GUIs invite richer challenges than teletypes….
A. Turing, “Computing Machinery & Intelligence,” Mind, Vol. 59(236), 1950.
![Page 7: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/7.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Completely Automated Public Turing Teststo Tell Computers & Humans Apart
challenges can be generated & graded automatically
(i.e. the judge is a machine) accepts virtually all humans, quickly & easily rejects virtually all machines resists automatic attack for many years
(even assuming that its algorithms are known?)
NOTE: machines administer, but cannot pass the test!
L. von Ahn, M. Blum, N.J. Hopper, J. Langford, “CAPTCHA: Using Hard AI Problems For Security,” Proc., EuroCrypt 2003, Warsaw, Poland, May 4-
8, 2003.
“CAPTCHAs”
![Page 8: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/8.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Some Typical CAPTCHAs
Microsoft
eBay/PayPal
Yahoo!
PARC’s PessimalPrint
![Page 9: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/9.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Cropping up everywhere…
Used to defend against:• skewing search-engine rankings (Altavista, 1999)• infesting chat rooms, etc (Yahoo!, 2000)• gaming financial accounts (PayPal, 2001)• robot spamming (MailBlocks, SpamArrest 2002)• In the last two years: Overture, Chinese website, HotMail, CD-rebate, TicketMaster, MailFrontier, Qurb, Madonnarama, Gay.com, …
… how many have you seen? On the horizon:
• ballot stuffing, password guessing, denial-of-service attacks• `blunt force’ attacks (e.g. UT Austin break-in, Mar ’03)• …many others
D. P. Baron, “eBay and Database Protection,” Case No. P-33, Case Writing Office,Stanford Graduate School of Business, Stanford Univ., 2001.
![Page 10: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/10.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
The Limitations ofImage Understanding Technology
There remains a large gap in ability
between human and machine vision systems,
even when reading printed text
Performance of OCR machines has been systematically studied:
7 year olds can consistently do better!
This ability gap has been mapped quantitatively
S. Rice, G. Nagy, T. Nartker, OCR: An Illustrated Guide to the Frontier, Kluwer Academic Publishers: 1999.
![Page 11: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/11.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Image Degradation Modeling
Effects of printing & imaging:
We can generate challenging
images pseudorandomly
H. Baird, “Document Image Defect Models,” in H. Baird, H. Bunke, & K. Yamamoto (Eds.),Structured Document Image Analysis, Springer-Verlag: New York, 1992.
blur
thrs
sen
s
thrs x blur
![Page 12: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/12.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Machine Accuracy is Often a NearlyMonotonic Function of Parameters
T. K. Ho & H. S. Baird, “Large Scale Simulation Studies in Image Pattern Recognition,”IEEE Trans. on PAMI, Vol. 19, No. 10, p. 1067-1079, October 1997.
![Page 13: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/13.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Can You Read These Degraded Images?
Of course you can …. but OCR machines cannot!
![Page 14: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/14.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
The PessimalPrint CAPTCHA
Three OCR machines fail when: OCR outputs
– blur = 0.0
& threshold 0.02 - 0.08
– threshold = 0.02
& any value of blur
~~~.I~~~
~~i1~~
N/A
N/A
N/A ~~I~~
A. Coates, H. Baird, R. Fateman, “Pessimal Print: A Reverse Turing Test,” Proc. 6th IAPR Int’l Conf. On Doc. Anal. & Recogn. (ICDAR’01), Seattle, WA, Sep 10-13, 2001.
… but people find all these easy to read
![Page 15: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/15.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Variations & Generalizations
CAPTCHA
Completely Automatic Public Turing test to tell Computers and Humans Apart
HUMANOID
Text-based dialogue which an individual can use to authenticate that he/she is himself/herself (‘naked in a glass bubble’)
PHONOID
Individual authentication using spoken language
Human Interactive Proof (HIP)An automatically administered challenge/response protocol An automatically administered challenge/response protocol
allowing a person to authenticate him/herself as belonging to a allowing a person to authenticate him/herself as belonging to a certain group over a network without the burden of passwords, certain group over a network without the burden of passwords, biometrics, mechanical aids, or special training.biometrics, mechanical aids, or special training.
![Page 16: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/16.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
1st Int’l Workshop onHuman Interactive Proofs
PARC, Palo Alto, CA, January 9-11, 2002
![Page 17: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/17.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
2nd Int’l Workshop onHuman Interactive Proofs
PARC, Palo Alto, CA, January 9-11, 2002Lehigh University, Bethlehem, PA – May 19-20, 2005
![Page 18: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/18.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Weaknesses of Existing CAPTCHAs
English lexicon is too predictable:
• dictionaries are too small
• only 1.2 bits of entropy per character (cf. Shannon)
Physics-based image degradations vulnerable
to well-studied image restoration attacks, e.g.
Complex images irritate people
• even when they can read them
• need user-tolerance experiments
![Page 19: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/19.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Human Readers
Literature on the psychophysics of reading is helpful:
many kinds of familiarity helps, not just English words
optimal word-image size is known:
0.3-2 degrees subtended angle
optimal contrast conditions known
other factors measured for the best performance:
to achieve and sustain “critical reading speed”
BUT gives no answer to:
where’s the optimal comfort zone?
G. E. Legge, D. G. Pelli, G. S. Rubin, & M. M. Schleske,
“Psychophysics of Reading: I. normal vision,” Vision Research 25(2), 1985.
J. Grainger & J. Segui, “Neighborhood Frequency Effects
in Visual Word Recognition,’ Perception & Psychophysics 47, 1990.
![Page 20: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/20.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
The BaffleText CAPTCHA
Nonsense words• generate ‘pronounceable’ – not ‘spellable’ – words
using a variable-length character n-gram Markov model• they look familiar, but aren’t in any lexicon, e.g.
ablithan wouquire quasis
Gestalt perception• force inference of a whole word-image
from fragmentary or occluded characters, e.g.
• using a single familiar typeface also helps
M. Chew & H. S. Baird, “BaffleText: A Human Interactive Proof,”
Proc., SPIE/IS&T Conf. on Document Recognition & Retrieval X, Santa Clara, CA, January 23-24, 2003.
![Page 21: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/21.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Mask Degradations
Parameters of pseudorandom mask generator:• shape type: square, circle, ellipse, mixed• density: black-area / whole-area• range of radii of shapes
![Page 22: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/22.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
User Acceptance
% Subjects willing to solve a BaffleText…
17% every time they send email
39% … if it cut spam by 10x
89% every time they register for an e-commerce site
94% … if it led to more trustworthy recommendations
100% every time they register for an email account
Out of 18 responses to the exit survey.
![Page 23: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/23.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Many Are Vulnerable to Character-Segmentation Attack
Effective strategy of attack:
• Segment image into characters
• Apply aggressive OCR to isolated chars
• If it’s known (or guessed) that the word is ‘spellable’
(e.g. legal English), use the lexicon to constrain
interpretations
Patrice Simard (MS Research) reports that this
breaks many widely used CAPTCHAs
![Page 24: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/24.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
So, try to generate word-imagesthat will be hard to segment into characters
Slice characters up: -vertical cuts; then -horizontal cuts
Set size of cuts to constant within a word
Choose positions of cuts randomly
Force pieces to drift apart: ‘scatter’ horiz. & vert.
Change intercharacter space
![Page 25: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/25.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Character fragments can interpenetrate
Not only is it hard to segment the word into characters, ….
… it can be hard to recombine characters’ fragments into characters
![Page 26: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/26.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
How Well Can People Read These?
We carried out a human legibility trial with the help of ~60 volunteers: students, faculty, & staff at Lehigh Univ. plus colleagues at Avaya Labs Research
![Page 27: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/27.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Subjects were told they got it right/wrong– after they rated its ‘difficulty’
![Page 28: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/28.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Subjective difficulty ratingsare correlated with illegibility
Right:
Wrong:
1 Easy
2
3
4
5 Impossible
![Page 29: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/29.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
People Rated These “Easy’ (1/5)
aferatic
memmari
heiwho
nampaign
![Page 30: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/30.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Rated “Medium Hard” (3/5)
overch / ovorch
wouwould
atlager / adager
weland / wejund
![Page 31: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/31.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Rated “Impossible” (5/5)
acchown /
echaeva
gualing /
gealthas
bothere /
beadave
caquired /
engaberse
![Page 32: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/32.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Why is ScatterType legible at all?
Should it surprise you that this is legible…?
We speculate that we can read it because:• human readers exploit typeface consistency cues … evidence remains in small details of local shape• this ability seems largely unconscious
![Page 33: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/33.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Mean Horizontal Scattervs Mean Vertical Scatter
Mirage: data analysis tool,Tin Kam Ho, Bell Labs.
Right:
Wrong:
1 Easy
2
3
4
5 Impossible
![Page 34: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/34.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
The Arms Race
When will serious technical attacks be launched?
• ‘spam kings’ make $$ millions
• two spam-blocking firms rely on CAPTCHAs
How long can a CAPTCHA withstand attack?
• especially if its algorithms are published or guessed
Strategy: keep a pipeline of defenses in reserve:
• continuing partnership between R&D & users
![Page 35: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/35.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Lots of Open Research Questions
What are the most intractable obstacles to machine vision?
segmentation, occlusion, degradations, …?
Under what conditions is human reading most robust?
linguistic & semantic context, Gestalt, style consistency…?
Where are ‘ability gaps’ located?
quantitatively, not just qualitatively
How to generate challenges strictly within ability gaps?
fully automatically
an indefinitely long sequence of distinct challenges
![Page 36: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/36.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Disguised CAPTCHAs
Note that many normal navigation aids are CAPTCHAs (though not designed for that purpose)
![Page 37: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/37.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Implicit CAPTCHAs
We are investigating design principles for “implicit CAPTCHAs” that relieve these drawbacks:• Challenges disguised as necessary browsing links• Challenges that can be answered with a single click while still
providing several bits of confidence• Challenges that can be answered only through experience of
the context of the particular website• weave CAPTCHAs into a multi-page “story”• can’t be extracted and “farmed-out” to people
• Challenges that are so easy that failure indicates a failed robot attack
![Page 38: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/38.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Alan Turing might have enjoyed the irony …
A technical problem – machine reading –
which he thought would be easy,
has resisted attack for 50 years, and
now allows the first widespread
practical use of variants of
his test for artificial intelligence.
![Page 40: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/40.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Henry S. BairdMichael A. Moll
Sui-Yu Wang
A Highly Legible CAPTCHA
that Resists Segmentation Attacks
![Page 41: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/41.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Some Typical CAPTCHAs
AltaVista
eBay/PayPal
Yahoo!
PARC’s PessimalPrint
![Page 42: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/42.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
All These Are Vulnerable to Segment-then-Recognize Attack
Effective strategy of attack:
• Segment image into characters
• Apply aggressive OCR to isolated chars
• If it’s known (or guessed) that the word is ‘spellable’
(e.g. legal English), use the lexicon to constrain
interpretations
Patrice Simard (MS Research) et al report that this
breaks many widely used CAPTCHAs
![Page 43: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/43.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
We try to generate word-imagesthat will be hard to segment into characters
Slice characters up: -vertical cuts; then -horizontal cuts
Set size of cuts to constant within a word
Choose positions of cuts randomly
Force pieces to drift apart: ‘scatter’ horiz. & vert.
Change intercharacter space
![Page 44: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/44.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Character fragments can interpenetrate
Not only is it hard to segment the word into characters, ….
… it can be hard to recombine characters’ fragments into characters
![Page 45: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/45.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Nonsense Words
We use nonsense (but English-like) words (as in BaffleText):
• generated pseudorandomly by a stochastic variable-length character n-gram model
• trained on the Brown corpus … this protects against lexicon-driven attacks
Why not use random strings?• We want to help human readers feel confident they have made
a plausible choice, so they’ll put up with severe image degradations (Cf. research in psychophysics of reading.)
M. Chew & H. S. Baird, “BaffleText: a Human Interactive Proof,” Proc., 10th SPIE/IS&T Document Recognition and Retrieval Conf., (DRR2003), Santa Clara, CA, January 23-24, 2003.
![Page 46: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/46.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
How Well Can People Read These?
We carried out a human legibility trial with the help of ~60 volunteers: students, faculty, & staff at Lehigh Univ. plus colleagues at Avaya Labs Research
![Page 47: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/47.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Subjects were told they got it right/wrong– after they rated its ‘difficulty’
![Page 48: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/48.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Subjective difficulty ratingswere correlated with objective difficulty
• People often know when they’ve done well• This can be used to ensure that challenges aren’t too
hard (frustrating, angering)
Subjective difficulty level
AL
L
Easy
1 2 3 4
Impossible
5
No. of Challenges
4275
610
1056
1105
962
542
Percent answered correctly
52.6
81.3
73.5
56.0
32.8
7.7
![Page 49: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/49.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
The same data, graphically
Right:
Wrong:
1 Easy
2
3
4
5 Impossible
![Page 50: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/50.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
People Rated These “Easy’ (1/5)
aferatic
memmari
heiwho
nampaign
![Page 51: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/51.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Rated “Medium Hard” (3/5)
overch / ovorch
wouwould
atlager / adager
weland / wejund
![Page 52: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/52.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Rated “Impossible” (5/5)
acchown /
echaeva
gualing /
gealthas
bothere /
beadave
caquired /
engaberse
![Page 53: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/53.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Why is ScatterType legible?
Does it surprise you that this is legible…?
I speculate that we can read it because:• we exploit typeface consistency … the evidence is small details of local shape• this ability seems largely unconscious
![Page 54: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/54.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Ensuring that ScatterType is Legible
We mapped the domain of legibility as a function of engineering choices:
typefaces
characters in the alphabet
cutting & scattering parameters:
cut fractionexpansion fractionhorizontal scatter meanvertical scatter meanh & v scatter variancecharacter separation
![Page 55: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/55.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Some typefaces remain legiblewhile others degrade quickly
![Page 56: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/56.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Some Characters QuicklyBecome Confusable
overch‘o’ ‘e’ ‘c’ confusions
![Page 57: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/57.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Mean Horizontal Scattervs Mean Vertical Scatter
Mirage: data analysis tool,Tin Kam Ho, Bell Labs.
Right:
Wrong:
1 Easy
2
3
4
5 Impossible
![Page 58: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/58.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Cut Fraction Histogram
Right:
Wrong:
1 Easy
2
3
4
5 Impossible
![Page 59: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/59.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Character Separation Histogram
Right:
Wrong:
1 Easy
2
3
4
5 Impossible
![Page 60: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/60.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Finding Parameter Rangesfor High Legibility
d = Euclidean distance from origin of Mean Horiz Scatter vs Mean Vertical Scatter
![Page 61: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/61.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Guided by this Analysis, We Can Define Legibility Regimes
Trivial: large cut fraction and small expansion
Simple: character separation also decreases
Easy: in original trial, correct 81% of time
Medium Hard: larger scatter distances degrades legibility noticeably
![Page 62: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/62.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Other Examples - “Easy”
“wexped” - difficult to segment ‘e’, ‘x’ and ‘p’. Shows difficulty of achieving 100% legibility
“veral” - same parameters as above but different font. Not as difficult to segment
![Page 63: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/63.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Other Examples - “Too Hard”
“thern”difficult to read, but easier than most with the same parameter values. Font makes a big difference.
“wezre”satisfactorily illegible, though probably segmentable
![Page 64: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/64.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Future Work
We have exhausted the experimental data from the 1st trial
How can we automatically create images with given difficulty?
We have generated many images that seem difficult to segment automatically, but
we don’t understand how to guarantee this
We need to understand the effects of typefaces on ScatterType legibility
We want to study character-confusion pairs more
Attacking ScatterType
• Testing on best OCR systems
• Invite attacks from other researchers
• Is it credible if we attack it ourselves, and fail?
![Page 65: Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird CSE Dept, Lehigh Univ. (Joint work with : Richard Fateman, Allison Coates, Kris.](https://reader035.fdocuments.in/reader035/viewer/2022062712/56649c9b5503460f94959df1/html5/thumbnails/65.jpg)
Pattern Recognition Research LabD. Lopresti & H. S. Baird
Contacts
Henry S. Baird [email protected]
Michael Moll [email protected]