A Billion Word to Remember
Transcript of A Billion Word to Remember
![Page 1: A Billion Word to Remember](https://reader031.fdocuments.in/reader031/viewer/2022011913/61d76ddf3bf08642a10bb764/html5/thumbnails/1.jpg)
A Billion Words to RememberThe Lifetime Reader
George NagyRensselaer Polytechnic Institute
![Page 2: A Billion Word to Remember](https://reader031.fdocuments.in/reader031/viewer/2022011913/61d76ddf3bf08642a10bb764/html5/thumbnails/2.jpg)
What would it take to
record, remember, and retrieve
all text read or seen or heard
during one’s lifetime?
2/1/2017 A billion words to remember 2
![Page 3: A Billion Word to Remember](https://reader031.fdocuments.in/reader031/viewer/2022011913/61d76ddf3bf08642a10bb764/html5/thumbnails/3.jpg)
2/1/2017 A billion words to remember 3
read
![Page 4: A Billion Word to Remember](https://reader031.fdocuments.in/reader031/viewer/2022011913/61d76ddf3bf08642a10bb764/html5/thumbnails/4.jpg)
2/1/2017A billion words to remember
4
or heard
![Page 5: A Billion Word to Remember](https://reader031.fdocuments.in/reader031/viewer/2022011913/61d76ddf3bf08642a10bb764/html5/thumbnails/5.jpg)
Not a new idea In 1945, Vannevar Bush proposed the Memex:
2/1/2017 A billion words to remember 5
The camera hound of the future wears on his forehead a lump a little larger than a walnut. It takes pictures 3 millimeters square … only a factor of 10 beyond present practice. … .Wholly new forms of encyclopedias … with a mesh of associative trails … The entire material of the Britannica in reduced microfilm form would go on a sheet eight and one-half by eleven inches. …
![Page 6: A Billion Word to Remember](https://reader031.fdocuments.in/reader031/viewer/2022011913/61d76ddf3bf08642a10bb764/html5/thumbnails/6.jpg)
What will it take today?
2/1/2017 A billion words to remember
http://pngimg.com/upload/laptop_PNG5940.png
1. A Sensor Module 2. A Host Computer
![Page 7: A Billion Word to Remember](https://reader031.fdocuments.in/reader031/viewer/2022011913/61d76ddf3bf08642a10bb764/html5/thumbnails/7.jpg)
Sensor Module
2/1/2017 A billion words to remember 7
Camera:1 frame per second (FPS)20 Megapixels RBG60° field of view (FOV)Autofocus 25 cm to ∝< 10 g
MIC
GPS (or link)
Onboard processor:Text detectionText-image compressionLog (time and space stamp)Encryption? 20 GB memory (images)
Bluetooth or Wi-Fi
![Page 8: A Billion Word to Remember](https://reader031.fdocuments.in/reader031/viewer/2022011913/61d76ddf3bf08642a10bb764/html5/thumbnails/8.jpg)
Camera-based OCR in 1960 (20 x 20 pixel camera)
2/1/2017 A billion words to remember 8
![Page 9: A Billion Word to Remember](https://reader031.fdocuments.in/reader031/viewer/2022011913/61d76ddf3bf08642a10bb764/html5/thumbnails/9.jpg)
2/1/2017 A billion words to remember 9
Text Detection and Recognition in Imagery: A Survey
Qixiang Ye, Member, IEEE and David Doermann, Fellow, IEEE
7.3 Remaining Problems Processing multilingual text.
Processing incidental text. Real-time detection and recognition.End-to-end recognition.
Open vocabulary recognition.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 37, NO. 7, JULY 2015
Over 200 citations!
![Page 10: A Billion Word to Remember](https://reader031.fdocuments.in/reader031/viewer/2022011913/61d76ddf3bf08642a10bb764/html5/thumbnails/10.jpg)
Host module: laptop, tablet, or smartphone
2/1/2017 A billion words to remember 10
StoreDuplicate detectionReading orderOCRIndexText compression ~10 GB (text)
RetrieveBrowser & digilib search toolsInverted indexTemporal and spatial proximityUser modelPattern matchingVector-space modelPerfect hashingSignature filesLatent semantic indexingGraph algorithmsRelevance feedback…
![Page 11: A Billion Word to Remember](https://reader031.fdocuments.in/reader031/viewer/2022011913/61d76ddf3bf08642a10bb764/html5/thumbnails/11.jpg)
2/1/2017 A billion words to remember 11
![Page 12: A Billion Word to Remember](https://reader031.fdocuments.in/reader031/viewer/2022011913/61d76ddf3bf08642a10bb764/html5/thumbnails/12.jpg)
Volume Calculationstext-image volume:4K x 5K pixels x 3 B/pixel x 1 fps x 3600 s/h x 8 hrs / 100x compression= 17 GB /day
audio volume: 4 KB/sec x 3600 s/h x 8 hrs = 115 MB/day (estimates vary from 300 B/s for a vocoder to 1.4 MB/s for high-fidelity stereo CD audio books)
image text volume:2 B/char x 5 chars/ word x 300 words/min x 60 m/h x 8 hrs /5x = 300 KB/day 300 KB x 365 x 100 = 10 GB / lifetime
audio text volume: same
2/1/2017 A billion words to remember 12
![Page 13: A Billion Word to Remember](https://reader031.fdocuments.in/reader031/viewer/2022011913/61d76ddf3bf08642a10bb764/html5/thumbnails/13.jpg)
Three advantages of searching apersonal collection compared to web search
2/1/2017 A billion words to remember 13
1. Total lifetime volume only 10 GB compared to millions of times as much on WWW
2. Desired items already familiarand therefore easier to identify from top returns
3. Fractured prose & OCR errors not bothersome because we won’t re-broadcast found items
![Page 14: A Billion Word to Remember](https://reader031.fdocuments.in/reader031/viewer/2022011913/61d76ddf3bf08642a10bb764/html5/thumbnails/14.jpg)
Some underlying research problems
2/1/2017 A billion words to remember 14
Image acquisition
Text-image analysis
Information retrieval
Ethical and legal issues
![Page 15: A Billion Word to Remember](https://reader031.fdocuments.in/reader031/viewer/2022011913/61d76ddf3bf08642a10bb764/html5/thumbnails/15.jpg)
Image acquisition problems
2/1/2017 A billion words to remember 15
• Text detection in spatial context: at home, at work, in local venues, in transit, abroad
• Mosaicking required by head and body motion
• Lazy compression of text images
• Optional hands-free annotation (via mic)
• Optional gestural annotation, e.g. by tracing a phrase on a printed page
or computer screen with a designated finger
• Long-lasting or self-charging power supply
![Page 16: A Billion Word to Remember](https://reader031.fdocuments.in/reader031/viewer/2022011913/61d76ddf3bf08642a10bb764/html5/thumbnails/16.jpg)
Text-image analysis problems
2/1/2017 A billion words to remember 16
• Perspective-invariant recognition instead of rectification
• Reading-order (no gaze tracking)
• Duplicate detection from consecutive frames and after interruptions
• Retention policy for undecipherable and unindexable fragments of text, and for near-duplicates
• Adaptation to predictable reading material like the newspaper, mags, the rest of the Jack Aubrey series, IJDAR, Python v2.7.6 documentation
![Page 17: A Billion Word to Remember](https://reader031.fdocuments.in/reader031/viewer/2022011913/61d76ddf3bf08642a10bb764/html5/thumbnails/17.jpg)
Information retrieval problems
2/1/2017 A billion words to remember 17
• Retrieval strategies that mesh with our own mental recall
• Personalization: scripts and languages— reading speed—reading postures—computer display settings—work, leisure, shopping and napping habits
• Selective, topic-, time-, or location-specific summarization
• Logging queries, responses, and user reactions for improving retrieval
![Page 18: A Billion Word to Remember](https://reader031.fdocuments.in/reader031/viewer/2022011913/61d76ddf3bf08642a10bb764/html5/thumbnails/18.jpg)
Ethical and legal issues
2/1/2017 A billion words to remember 18
• Security and privacy: what do these mean over a lifetime?
• What is the legal difference between deliberately acquired information, as with a smartphone or camera, and autonomously acquired information?
• Where must the owner of a Lifetime Reader not look (and record)?
• What responsibility does delayed discovery of a crime entail (for instance, reading an airplane seat neighbor’s laptop screen that one glanced at two years ago)?
• What are the social and marketing implications of lifetime text logging?
![Page 19: A Billion Word to Remember](https://reader031.fdocuments.in/reader031/viewer/2022011913/61d76ddf3bf08642a10bb764/html5/thumbnails/19.jpg)
Product announcement expected on or about April 1, 2021.
Thank you for your interest and support!
2/1/2017 A billion words to remember 19
![Page 20: A Billion Word to Remember](https://reader031.fdocuments.in/reader031/viewer/2022011913/61d76ddf3bf08642a10bb764/html5/thumbnails/20.jpg)
2/1/2017 A billion words to remember 20
![Page 21: A Billion Word to Remember](https://reader031.fdocuments.in/reader031/viewer/2022011913/61d76ddf3bf08642a10bb764/html5/thumbnails/21.jpg)
No, I won’t wear it!
2/1/2017 A billion words to remember 21