Texts and Digital Objects What seems to have changed.
-
Upload
miguel-monroe -
Category
Documents
-
view
218 -
download
0
Transcript of Texts and Digital Objects What seems to have changed.
![Page 1: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/1.jpg)
Texts and Digital Objects
What seems to have changed
![Page 2: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/2.jpg)
The web as universal library
• Generation I the ASCII text
• Generation II the XML text
• Generation III the book as object
![Page 3: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/3.jpg)
The web as universal library
• Generation I the ASCII textA web of text nodes with documents at the nodes
• Generation II the XML textA web where the documents retain deep structure but the web is still the library
• Generation III the book as objectThe library will be imported to the web. Page by page. Library by library. The web is simply a way of accessing the universal library of print objects.
![Page 4: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/4.jpg)
But are we going backwards?
![Page 5: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/5.jpg)
But are we going backwards?
Some of the movement looks a trifle retrograde
![Page 6: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/6.jpg)
Generation I
The primacy of texts Nodes can in principle also contain non-text information such as diagrams, pictures, sound, animation etc. The term hypermedia is simply the expansion of the hypertext idea to these other media. (Tim Berners Lee 1989 proposal for a www written at CERN)
Texts: hypertext, http, and ASCII will do
![Page 7: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/7.jpg)
Generation I circa 1995
A forest of connected texts which frankly doesn’t look too great.
![Page 8: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/8.jpg)
![Page 9: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/9.jpg)
Project Gutenberg
• Texts are what matter
• Accuracy matters
• Page numbering doesn’t
• Typography doesn’t matter either
![Page 10: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/10.jpg)
But a good deal is lost
• Typography may not matter, but good web design does
• Typography carries a lot of meta-data
• Meta-data and the formal structure of the text needs to be kept
• Variety, flexibility, and machine-readability ……. xml
![Page 11: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/11.jpg)
![Page 12: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/12.jpg)
Generation II circa 2000
Books repurposed for the web look a lot better than flat ASCII.
But there is a big overhead.
![Page 13: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/13.jpg)
Republished for the web
• Inevitable duplication• Page numbers don’t matter• Typography can be optimised for web
browsers• Structure and added value is preserved• Links and HTTP connections are fine• But this re-purposing is a hassle and
ultimately confusing
![Page 14: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/14.jpg)
So Google has a better idea
• Words matter• Pages matter• Books matter• Libraries matter• And they should be searched in the way
that all other digital objects and collections can be searched
![Page 15: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/15.jpg)
Generation III circa 2005
Put books on the web just as they are. Books not texts are the
primary resource for a library.
![Page 16: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/16.jpg)
Keep it simple
• Scan every page of every book• OCR every word and symbol• Store every word and symbol in a database• Store an image of every page in the database• Know precisely where every word is on every
page
![Page 17: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/17.jpg)
How the Google system works
• The browser has a JPEG and some HTML around it
• The web page is an image with search terms highlighted
• The intelligence is in the database• Search is precise and fast• The Google database would be the
universal library
![Page 18: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/18.jpg)
![Page 19: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/19.jpg)
![Page 20: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/20.jpg)
Pages really matter
• Every print page is a web page• A book is just a collection of web pages• The concept of a ‘union catalogue’ will now
have its co-relative a ‘union library collection’ (ie what is a duplicate?)
• There is no such thing as a Google edition• Are the Google standards of preservation
good enough?
![Page 21: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/21.jpg)
Simplicity and Conservatism
• Publishers should be flattered• Book designers, editors and typographers
should be more than flattered• Authors are still authors• Catalogues and references work with minimal
adjustment• Book warehouses become obsolete
![Page 22: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/22.jpg)
So what is lost?
• Perhaps publishers and authors lose profits????
• The text is lost. The text is readable and searchable…. But there is no text.
• A searchable text, but not an entire and complete text. A collection of pages (JPEGs).
• Certainly none of the deep structure of the xml is retained
• Linkages and references are absent
![Page 23: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/23.jpg)
What is gained?
• Books: all texts, documents and libraries become fully searchable.
• Automation of reading and accessibility of rare editions.
• Incredibly cheap in relation to the enhanced availability
• Bibliographies and Catalogues and other systems of metadata are preserved
![Page 24: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/24.jpg)
There is much left to do
• No fine structure in the pages
• Poor navigation within the books
• The commercial model has to be invented
• It will not all be advertising driven
![Page 25: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/25.jpg)
Exact Editions uses a Google-style platform for magazines
Technology is similar but the sociology is different.
![Page 26: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/26.jpg)
![Page 27: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/27.jpg)
![Page 28: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/28.jpg)
![Page 29: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/29.jpg)
![Page 30: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/30.jpg)
![Page 31: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/31.jpg)
![Page 32: Texts and Digital Objects What seems to have changed.](https://reader035.fdocuments.in/reader035/viewer/2022062417/5514a828550346b2598b5e67/html5/thumbnails/32.jpg)
Similar to Google Book Search
• Platform for publishers of magazines
• Publishers can add web functionality (links and advertisements)
• PDF as input and automated production
• Subscription or free access
• Full web functionality (statistics and integration with web apps)