Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N....
-
Upload
allyson-strickland -
Category
Documents
-
view
213 -
download
0
Transcript of Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N....
![Page 1: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/1.jpg)
Million Books to the WebAn Example of Indo-US Collaboration
Lessons Learnt & The Road Ahead
Prof N. Balakrishnan
Indo-US Workshop on Open Digital Libraries & InteroprabilityWashington, DC
June 23, 2003
Supercomputer Education and Research Centre
Indian Institute of Science
Bangalore India
School of Computer Science
Carnegie Mellon University
Pittsburgh USA
![Page 2: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/2.jpg)
Lessons from the past
• fires of Alexandria – irrevocably severed our access to any of the works of the ancients.
• introduction of printing technology – several Indian and Chinese knowledge disseminated by word of
mouth and on palm leaves virtually disappear or inaccessible
• New cultural revolutions – edifices built by destroying the past irrevocably
– later revolutions seek solace in attempting to preserve what was destroyed
– we need to preserve our heritage independent of the political and social ups and downs
A single wanton act of destruction can destroy an entire line of heritage
![Page 3: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/3.jpg)
Lessons from Reality
In a thousand years:
only a few of the paper documents we have today will survive the ravages of deterioration, loss, and outright destruction.
Existing archives of paper many other works still in existence today are rare
- only accessible to a small population of scholars and collectors at specific geographic locations
Contrary to the popular beliefs, the libraries, museums, and publishers do not routinely maintain broadly comprehensive archives of the considered works of man
No one can afford to do this, unless the archive is digital
![Page 4: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/4.jpg)
The Approach• Technology Driven Vision• Decide on the stake holders
– Never make it exclusive
• Pilot Projects to perfect technology• Bring in advanced management
concepts – like People Maturity Models – Quality assurance– automate wherever possible
Continued…
![Page 5: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/5.jpg)
The Approach• Lessons from the past
– Too many Digital Library Projects – with half-life of less than 2 years from the date of
“Launch” or a long incubation time– Follow Nike – JUST DO IT
• Digital Library must have two ingredients– A knowledge Amplifier– Free-access, giving avenues for every one to make
economic benefit• still contribute to multiplication of knowledge by circulation
• In India, it should be a test bed for our Language Technology Research– a show case for our heritage
![Page 6: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/6.jpg)
Elements of Technology
• Microprocessors• Memory• Connectivity• Software
All these technologies are growing exponentially
![Page 7: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/7.jpg)
Communication Revolution
If you are amazed at the drop in cost of computing,wait till you see what is going to happen to bandwidth.
Network technology will increase 10-100 times fasterthan processor technology
-Andy Grove, Titan of Intel
Bandwidth will double every year
Network speeds become comparable to interconnect speeds
![Page 8: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/8.jpg)
Death of Time and Distance
Anytime, Anyplace and Anyone
Together, the technology of Computers and Communications Revolutions aim at
![Page 9: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/9.jpg)
The World of Computers & Communication
Small fish eat the Big Fish Microprocessors offer performances
comparable to supercomputers; Paradigm Shift from Dinosaurs to mammals- from performance to functionality
NETWORK is everywhere Web is a preferred medium of communication
for everyone - including the military & the terrorists
Companies that make more and more Software Free – capitalize more- Open archives
![Page 10: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/10.jpg)
Processor of Tomorrow
• Carbon Nano Tubes– 5 to 10 atoms wide – promise to replace silicon soon
• Flexible Transistors– made from plastic, oraganic
materials• Silicon will live for 15 years• Moore’s law will live longer• 1000 times growth in 10 years
The winner will be decided by:Material Convergence + Human Like interactions
![Page 11: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/11.jpg)
Processor of Tomorrow
• A billion Transistors at 10 to 20 GHz Clock rates by 2010
• 128 G Bytes of Main Memory• Terra byte of Disk Storage- may be
Holographic• Speech input/ output ASR• Multiligual• Terrabit connectivity at PC• The DL plans of today must be
sensitive to this
![Page 12: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/12.jpg)
The Road Ahead
ScientificCalculations
Data Analysis
Expert Systems
SuperHumans
Poor
Medium
Rich
Brilliant
KnowledgeContent
Emulating HumanPerformance:
See, Hear, Talk, and “Think”
Bill Joy’sNightmare
Evolution
Nan
osys
tem
s
![Page 13: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/13.jpg)
The future trends:• Browser will be the only medium of
communication.• It will be active- with voice and video,
language independent.• Mobility will be the key.• Small form factor devices such as Palms,
PDAs and Tablets would be the future.• We would soon see TVPCT at the cost of a
TV• We will witness major convergence between
ICT, Nano Technologies and Biological Sciences
![Page 14: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/14.jpg)
Electronic Resources and the Library of the Future
E-mags; E-books; E-music; E-Movies
![Page 15: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/15.jpg)
Dedicated E-book Readers
• Dedicated readers – about 20,000
• Palm devices – 6,000,000• PC’s – hundreds of
millions• “For people accustomed
to reading text on a computer for hours at a time, e-book screen clarity is a non-issue.”
• A low cost E-Book reader design on in India
![Page 16: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/16.jpg)
http://www.eink.com/technology/index.htm
• E Ink is made up of millions of microcapsules– each the diameter of a human hair
• Each microcapsule contains– positively charged white particles &– negatively charged black particles
• that float in a clear fluid
• A film of transistors supplies the voltage to the capsules
• A negative charge makes the white particles move to the top of the microcapsule– an opposite electric field pulls the black
particles to the bottom of the microcapsules, mimicking the effect of print.
• Electronic ink is a real power miser
![Page 17: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/17.jpg)
E-ink/e-paper (Lucent)The technology has been identified and
development is well under wayBy the year 2003, we envision electronic
books • that can display volumes of
information as easily as flipping a page,
• permanent newspapers that update themselves daily via wireless broadcast
• Just as today's books give people easy access to everyday information, tomorrow's books will provide the same easy access to the dynamic data of the information age
The world of publishing will never be the same
![Page 18: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/18.jpg)
Indian Institute of Science’s Simputer
• A hand held Linux Box at around US$ 200• Has the state of the art browser• Color screen• very good speech synthesizer
– In English and many Indian Languages
• A very powerful tool for access with wireless• Soon to be modified as an E-bookwww.simputer.orgwww.picopeta.comwww.ncoretech.com
![Page 19: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/19.jpg)
The Challenges in ComputingTomorrow’s computing
needs are not in mflops and Gflops
The computer to process Information, recognition and DM like a Human
Small inexpensive Robots, swarms will be a reality
![Page 20: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/20.jpg)
Ray Kurzweil:The Age of Spiritual Machines“A $1,000 PC (in 1999-dollars)…
– 2009 = trillion calculations/second
– 2019 = 20 million billion calculations/second (the human brain)
– 2029 = 2 * 1019 calculations/second (1,000 human brains)
![Page 21: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/21.jpg)
![Page 22: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/22.jpg)
Ray Kurzweil:The Age of Spiritual Machines
• 2009: “Computer displays have all the display qualities of paper- high resolution, high contrast, large viewing angle, and no flicker. Books, magazines, and newspapers are now routinely read on displays that are the size of small books.”
• 2009: “At least half of all (business) transactions are conducted online.”
![Page 23: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/23.jpg)
• 2009: “There is effective convergence of all media, which exist as digital objects (that is, files) distributed by the ever-present high-bandwidth, wireless information web. Users can instantly download books, magazines, newspapers, television, radio, movies, and other forms of software to their highly portable personal communication devices.”
Ray Kurzweil:The Age of Spiritual Machines
![Page 24: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/24.jpg)
2009• A $1,000 PC delivers Terahertz speeds• PCs with high resolution visual displays come in a
range of sizes– from those small enough to be embedded in clothing and jewelry – to the size of a thin book
• Cables are disappearing– Communication between components uses wireless technology, as
does access to the Web
• The majority of text is created using continuous speech recognition– Also ubiquitous are language user interfaces.
• Most routine business transactions (purchases, travel, etc.) take place between a human and a virtual personality– Often the virtual personality includes an animated visual presence
that looks like a human face
![Page 25: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/25.jpg)
• 2019: “Reading books, magazines, newspapers, and other Web documents; listening to music; watching three-dimensional moving images (for example, television, movies); engaging in three-dimensional visual phone calls; entering virtual environments (by yourself, or with others who may be geographically remote); and various combinations of these activities are all done through the ever-present communications Web and do not require any equipment, devices, or objects that are not worn or implanted.”
Ray Kurzweil:The Age of Spiritual Machines
![Page 26: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/26.jpg)
2029: “The ever learning Society”• Learning now constitutes the primary focus of
the human species. • Human learning is accomplished using virtual
teachers (and virtual libraries?). • Learning is enhanced by widely available neural
implants, which improve memory and perception but cannot yet download knowledge directly.
• Automated agents are learning, on their own without human assistance. Machines can now create significant new knowledge with little or no human intervention; unlike humans, machines easily share knowledge structures with one another.
Ray Kurzweil:The Age of Spiritual Machines
![Page 27: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/27.jpg)
And Then There Was Music
• RealJukeBox• Win Amp• MP3• Napster
![Page 28: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/28.jpg)
The Growth rates
• The processor performance doubles every 18 Months
• The Network bandwidth doubles every year
• The storage capacity doubles every nine months
• Soon you will have processor bottleneck • 1000 times growth in storage in 10 years
– I already have 250 GB on a single disk-
![Page 29: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/29.jpg)
Recognition verses Recall• Recognition is like seeing your
friend’s face in a sea of faces– even if he has changed since you last saw him– storage intensive and fast
• Recall is like figuring out how to repair your car’s carburetor using a manual and you have never done that before- applying knowledge to a new situation- processor intensive and less storage
• Brian works on recognition• Present day computers prefer recall –
remember the Y2K• Future computers would work like the
brain- recognition
![Page 30: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/30.jpg)
Recognition verses Recall- what it does to our DL
• We will move away from quantitative search (key word match) to “aboutness” and content based retrieval
• In Future the documents will be read more by computers than by humans – will it change the way we write ? Would we think in html or in xml ?
• From mere Text data to 3d Objects, voice and video
• Multiligual• Every conceivable form of knowledge
expression
![Page 31: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/31.jpg)
Technology Driven vision for The Digital Library
• We can store everything– all the knowledge of the human race– in all forms– that is the Universal Digital Library
• Cost of Selection is stationary but storage cost is plummeting
It is not about contents alone- It is about networking of people
![Page 32: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/32.jpg)
Education
Real-time Engineering Science Business
Universities CollegesSchools
3 Ls of Learning1. Face-to-Face Lectures2. Virtual Labs3. Universal Digital Library
![Page 33: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/33.jpg)
Universal Library Vision
All recorded information online• instantly available
– To Anyone– Anywhere in the world – In any language– searchable, browsable, navigable by
humans and machines
![Page 34: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/34.jpg)
Digital Library Contents
• Books• Periodicals (journals, newspapers)• Art, photographs• Databases, software• Movies, video• Music, opera, danceSuppose all of this were on the Web
![Page 35: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/35.jpg)
Digital Library of the future
•Digital library•Digital museum•Digital tour guide•Research assistant• Knowledge amplifier
![Page 36: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/36.jpg)
Can we store all the human knowledge in a Digital formThere are about 100 Million books written by the
human raceMultiply by 10 for all other form of knowledge1 book = 500 pp. = 1 MB uncompressed
– 109 books = 1015 bytes = 1 petabyte
140 million computers on the Internet– At 20 GB free space each >2.8 Zetabytes
now
1 GB of disk costs ~$1– 1 petabyte < $1 million– Our Peta Byte server Initiative– Storage is not the limitation but creation
and coordination are– Avoiding Duplication and connectivity are
![Page 37: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/37.jpg)
Universal Digital Library
• More than 120 million PCs on the net• Each having atleast 20 GB of free
space• Peer to peer Communication• Can we store all the Human
Knowledge in the computers
This is todayThe time consuming process is taking the printed books to the web- The technology
is not an impediment
![Page 38: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/38.jpg)
Technology Driven Vision for the Universal Digital Library• A vision to store everything that the
human race ever produced• A mission to digitize 1 Million Books
and make them freely available
![Page 39: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/39.jpg)
The Strategy for Scanning of books• A planetary Scanner like the Minolta PS 7000• Takes about two hours to scan a 500 page
book, crop, OCR and convert it to TIFF, HTML and XML files
• About 10, 000 pages to the web in a day• Storage per book is around ~ 60MB• 100 Tera byte is not an issue• Our Partner Internet Archives has 370 TB
adding 30 TB a day• Distributed data bases
![Page 40: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/40.jpg)
Identification ofBooks
Pre-Scanning process
Process InvolvedProcess Involved
ConversionProcess
Scanning Process
Image Processing
Process
![Page 41: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/41.jpg)
Scanning
•2 pages at a time •Stored in tif format•2 pages at a time
•Stored in tif format
![Page 42: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/42.jpg)
Post scanning operations
• Skew Correction• Document Registration• Dot Shading and Speck Removal• Image centering• Image Cropping• Smoothing and Completion
![Page 43: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/43.jpg)
Image comparison
Original Image
![Page 44: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/44.jpg)
Processed ImageSW 1
![Page 45: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/45.jpg)
OCR CONVERSION
![Page 46: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/46.jpg)
Performance evaluation for various fonts in Kannada language OCR
Series1: Average performance efficiency before using the cropping software.
Series2: Average performance efficiency after using cropping software.
![Page 47: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/47.jpg)
The Digitized book
![Page 48: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/48.jpg)
• Average book size ~ 500 Pages• Size of Page as Image ~ 50-150
KB • Size of Page as text file
(rtf /htm) ~ 8 – 15 KB• Average size of Digitized book ~
60MB
![Page 49: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/49.jpg)
Brightness – Dark(1 in scale) and contrast – 9(in scale)
Original image
Cropped image
![Page 50: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/50.jpg)
Million Books to the web- Stake holders as Partners
• Academia- CS, IS and users• Researchers and Language
Technologists• Cultural and Religious
Organizations• Public Libraries• Government Agencies• None too exclusive
![Page 51: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/51.jpg)
Background and Status
• Collaborative Project between India and US• Lead roles by CMU and IISc• Initiated by CMU sending scanners free of cost to
India. NSF supported• Initiated by the Office of the Principal Scientific
Advisor to GOI by a Seed funding to IISc• Fuelled by MCIT’s whole hearted support• More than 16 centres in academic, religious and
government institutions spread across the country• 69 scanners in place• China, Egypt (Alexandria Library), Srilanka,
Australia joining in• There is light on the other side of the tunnel
![Page 52: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/52.jpg)
Hubs of DL Activities in India
Anna University, Chennai, Tamil NaduArulmigu Kalasligam College of Engineering, Srivilliputur, Madurai, Tamil
NaduGoa University, GoaIndian Institute of Information Technology, Allahabad, Uttar PradeshInternational Institute of Information Technology, Hyderabad, Andhra
PradeshCity and State Central Library, Andhra PradeshShanmugha Art, Science, Technology & Research Academy, Thanjavore,
Tamil NaduSringeri Mutt, Sringeri, KarnatakaTirumala Tirupathi Devasthanams, Tirupathi, Anadhra PradeshMahastrastra Industrial Development Corporation, MaharastraUniversirty of Pune, PuneKanchi University, Kanchi, Tamil NaduIndian Institute of AstroPhysics, Karnataka
![Page 53: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/53.jpg)
Scanner Operation at Hubs
2 1 2 1 1 1
10
53 4
2 13
5
40
05
1015202530354045
![Page 54: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/54.jpg)
Progress of Various Centre in Scanning
1704
10311097
2000
504 465 273 158
6276
3042
0500
100015002000250030003500400045005000
IISc
AK
CE
SA
ST
RA
TT
D
MID
C
PU
NE
AU
Kanchi
CC
L
SC
L
Centre
No.
of
Boo
ks
![Page 55: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/55.jpg)
8377
08
1589
33 4514
52
5000
00
1341
00
9733
4
1525
02
3939
5
1319
001
1080
759
0
200000
400000
600000
800000
1000000
1200000
1400000
Centre
No.
of
Pag
es
Number of Pages Scanned
![Page 56: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/56.jpg)
Category of Books
2962
5596
836
430176 168
384
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Engl
ish
Telu
gu
Tam
il
Sans
krit
Kan
nada
Oth
ers
Urd
u
EnglishTeluguTamilSanskritKannadaOthersUrdu
![Page 57: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/57.jpg)
Cumulative Status
4771184
16550
Books Pages
![Page 58: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/58.jpg)
More Centres and Initiatives-Already 61 scanners in operation+ 39 in the pipe line
• Rashtrapathi Bhavan• Punjab Technical University• IIIT Hyderabad and University of
Hyderabad
![Page 59: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/59.jpg)
MCIT’s Initiatives
• Mobile Van with VSAT for the Book Mobile• ERNET providing connectivity to all centres• Many Centres supported with funds for
computers and for scanning operations• Total spending from Government support
and from Scanning Centre’s resources is ten times more than the Scanning equipment cost and effectively 100 times more
• Support from all quarters of the government, religious leaders, academia and private agencies
• Universal Digital Library of India to be launched
![Page 60: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/60.jpg)
Some Observations and the Road
ahead• More than 5 million pages have been
scanned• The highest average rate of sustained
scanning was about 4,000 pages per day at Hyderabad during February.
• Our goal is to establish best practices to reach 6000 pages a day
• 3 years – 1 M Books• By 2020 – 20 Million Books, 2 Million
Songs, 200,000 Movies • The most enviable content creation
![Page 61: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/61.jpg)
Road Ahead
• Establishing the Digital Library of India on the same lines as the E-Governance Initiative
• Under the MCIT• Head Quartered in AP• A think tank for content selection,
delivery, technology and policy directions for the country
• Creation of special funds for 4C
![Page 62: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/62.jpg)
Criteria for Selecting Mega Centres- 5 of them planned
• Geographical Distribution• Availability of contents of interest to
larger user base• Local enthusiasm to support and
sustain this activity• Budget of US$ 200,000 Initially and
around 0.5 cent per page of output• One single scanner can produce 2
Million pages a year-• We will have 300 scanners – a Million
books a year
![Page 63: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/63.jpg)
Raod Ahead
• Mega Content Creation Centres • New Delhi, Varanasi, Allahabad,
Hyderabad, Far east (Tawang or Guahathi), Kolkotta and Chennai
• Each Centre having around 40 scanners and 5 mobile scanners
• Content Creation Centres with upto 5 scanners in Gujarat, Rajasthan so as to cover the entire country
• Spearheading Language Technology Initiatives
• Adding voice and video of our heritage
![Page 64: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/64.jpg)
Universal Digital Library
• Goal — To have all public knowledge online, available for free to all, everywhere
• An achievable goal– There are only some 100,000,000 books in the world– A few billion dollars could bring these online
• Limitations– Copyright and licensing issues– Different language books and character recognition
technologies• We must ensure that English is not necessarily the de facto
language
• Universal Library
![Page 65: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/65.jpg)
TECHNOLOGICAL CHALLENGES
• Input (scanning, digitizing, OCR)• Data representation
– text, notations, images, web pages
• Navigation and Search• Multilingual Issues• Output (voice, pictures, virtual
reality)• Synthetic Documents
![Page 66: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/66.jpg)
SEARCH ENGINE of UDL• Very powerful light weight and
scalable CMU search engine• Greenstone• Both are working and are being
evaluated for the choice• Both have been modified for use as
Indian Language search engines- language independent search
• Future- Semantic web and content based retrieval – Speech input and speech output
![Page 67: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/67.jpg)
SearchEngine
TimeTaken
Boolean Proximity Case Stemming
Greenstone Not depending on the number of hits
OR & NOT
Default :AND
Phrase searching
User can select the
option
Stemming allowed
UDL Highly depending on the number
of hits
OR Default :AND
No No Case Sensitivity
Not available
COMPARATIVE ANALYSIS – GREENSTONE Vs UDL SEARCH ENGINES
![Page 68: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/68.jpg)
Choice of Collection• Use books from libraries that are
beyond copyright• Administrative metadata from OCLC,
ISBN, and other sources• Dublin Core for Indian Books• A Copy Right Metadata – aggressive
attempts to obtain copy right- Free Copyright from many agencies including GoI
• Source Library Metadata• Converge towards focussed collection
![Page 69: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/69.jpg)
Funding – Road Ahead• Funding effort must be an organized activity• Commercial funding unlikely for “public good”
activity– Must go to governments, NGOs
• World Bank• Qatar (if CMU deal succeeds)• Benefits of UDL:
– Digital Opportunity– Use in distance education– International involvement – cultural diversity– Technology dissemination– Low cost v. conventional libraries
• Funding is tied to Outreach (next slide)
![Page 70: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/70.jpg)
Outreach• The UDL message must be disseminated• Present at World Summit (WSIS) in
Geneva (12/03)• Pre-WSIS meeting at CERN (12/03)• Establish liaison with UN Decade of
Literacy (2003-2013)• Points:
– Terabyte servers– “Free to read” policy– Universal Dictionary (applicability to other
domains)
![Page 71: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/71.jpg)
Access by Public
• All content free to read, print one page at a time
• Restrictions imposed by donors will be respected
• Categories of use will be recognized, e.g. cannot print entire document
• Buttons, links to fulfillment houses and publishers are allowed- to take in “born Digital” copyrighted material
![Page 72: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/72.jpg)
Partner Relations- Future• All material scanned or input as part of
the UDL will be shared by all partners• Preference for national umbrella
organizations to simplify international partner relations
• Relationships between partners and their national DLs encouraged
• Online communication and collaboration tools needed to facilitate partner questions and interchanges
• Written partnership agreement will be made
![Page 73: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/73.jpg)
Standards• Published standards within the UDL• Quality control and testing standard • Funding to be sought to support
standards development• Logo to be developed (graphic device
without words). Must appear on all sites, all pages
• Logo should have a hot link to a gateway site that links all UDL sites
• Local variability in look and feel of sites is permitted so long as the logo is displayed
![Page 74: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/74.jpg)
Scanning/OCR Policy• We scan what gives greatest
impetus to continued funding• Language: majority of content in
English; otherwise no restriction• Scans will be previewed for
minimum quality; OCR will not be corrected unless local site desires
![Page 75: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/75.jpg)
Metadata
• All entries MUST have metadata according to MARC or Dublin Core
![Page 76: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/76.jpg)
Copyright• Public domain materials: no restrictions,
tools for printing entire document provided• Works of uncertain copyright status:
– Good faith effort to determine status, locate owner– Scan and index work– After a waiting period (at least one month), make
work viewable
• Archival material (old but unique)– Allow resolution restriction to avoid devaluation of
original
• Out-of-print in-copyright (OPIC)– Seek blanket permissions from publishers
![Page 77: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/77.jpg)
Possible Intake Model
CMUUL SERVER
INDIACENTRAL
MIRROR SITE
ENGLISHINTAKE
SCANNINGCENTER
SCANNINGCENTER
TAMILINTAKE
LOCALMATERIALS
SCANNINGCENTER
GUJARATIINTAKE
LOCALMATERIALS
HINDIINTAKE
SCANNINGCENTER
LOCALMATERIALS
ARTINTAKE
SCANNINGCENTER
CHINESEMIRROR SITE
AUSTRALIANMIRROR SITE
INDIA
OUTSIDEINDIA
![Page 78: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/78.jpg)
The Digital Library a Test Bed for language research
• Rich data in many languages from the Million Books to the web Project - atleast 10,000 books in any language
• Translations in many languages- Gita, NBT, NCERT etc- an excellent tool for language translation-
• Training data for the OCR• The case insensitive ITRANS standard
![Page 79: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/79.jpg)
The Digital Library a Test Bed for language research
• Rich data makes the creation of OCRs in Indian languages easy- In Tamil, Kannada and Malayalam – A rapid prototyping
• Speech synthesis and recognition• Indian Language Search Engines• Example Based Machine Translatio
n• Universal Dictionary
![Page 80: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/80.jpg)
Word English POS Pron Use Lang
danúbia linen tape HUNdanum water PMPdanun early PMPdanup hunger PMPdanup hunger, starvation PMPdanupan hungry, starving PMPdaný existent SLOdaný existing SLOdaný given SLOdaný číslom numerical SLOdaný na pospas obnoxious SLOdanyag landscape n HILdaog overturn v CEBdaog prevail v CEBdaogdaog manhandle v CEBdaong boat with a covered cabin, ark TAGdaong bring the ship to shore TAGdaot harm v CEBdaot mar v CEBdaotan bad adj CEBdaotan'g buut dislike n CEBdaotan'g hitabo mishap n CEBdaotan'g tinguha malice n CEBdaotan'g tuyo malice n CEBdapa granary n CEBdapa lie flat on stomach or face
down PMP
dapa lie flat on stomach or face down
TAGdapače on the contrary adv BOSdapadnúť (na nohy)
to land SLOd'apaiser to appease v FRE
HUNGARIAN
KAMPAMPANGAN
SLOVAK
HILIGAYNON
CEBUANO
TAGALOG
BOSNIAN
FRENCH
The Universal Dictionary
![Page 81: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/81.jpg)
Aboutness Hierarchy- Dr Shamos Universe
Word
Sentence
Paragraph
Section
Chapter
Collection
BookNewspaper
Article
Photograph
Object
3D Artifact
Glyph
KEYWORD SEARCHINGOCCURS HERE
SUBJECT SEARCHINGOCCURS HERE
![Page 82: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/82.jpg)
Legal and Business Challenges• Use of copyrighted material• Economics (Who pays? Who
gets?)• Privacy• Reliability of information• Change in the nature of teaching• Change in the nature of
Information creation and use
![Page 83: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/83.jpg)
Philosophy of Copy Right Laws
• Protect the Inventor so that private investments in R & D would flow
• Disseminate the information so that society grows
• Protect the fairuse• Ensure you get what you paid
for
![Page 84: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/84.jpg)
What can be copyrighted ?
• Must be tangible, e.g. a lecture can’t be copyrighted, a transcript of it can
• Work must be original
• Work must be creative - even minimal efforts usually count as creative
![Page 85: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/85.jpg)
Fair use doctrine
Authorizes any person to make fair use of a published or unpublished copyrighted work (including the making of unauthorized copies) in these contexts:
In connection with criticism of or comment on the work
In the course of news reporting For teaching purposes or As part of scholarship or research activity
![Page 86: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/86.jpg)
Four basic Factors:
1. The purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes
2. The nature of the copyrighted work3. The amount and substantiality of
the portion used in relation to the copyrighted work as a whole; and
4. The effect of the use upon the potential market for or value of the copyrighted work
![Page 87: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/87.jpg)
www.library.org principles
1. Scholarly and government information and knowledge is a public good
• that should be available, maintaining the balance of the rights of the individual creator vs. the needs of the public
2. The Library is the intellectual crossroads of the community.
3. Librarians will conceptualize and ensure
• implementation of innovative new systems• for the creation and dissemination of information
for succeeding generations.
![Page 88: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/88.jpg)
“This rule provides that the first sale of a copy of a work to a member of the public ‘exhausts’ the rights holder’s ability to control further distribution of that copy. A library is thus free to lend, or even rent or sell, its copies of books to patrons”
How does this work in the Digital World ?
![Page 89: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/89.jpg)
Music, Movie and Entertainment Industry
• Much larger part of most of the economies
• Large production costs• Need to protect business interest• Need to technology to protect • NAPSTER – peer to peer communication• DeCSS• NAPSTER for video ??• Consumer is different from the creator
![Page 90: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/90.jpg)
New paradigms in the Digital Library
• Should the laws used for protecting commercially attractive enterprise such as patents, music, entertainment be applied to DL
• The dissemination of information creates multiplication unlike in music etc
• Shorter life cycles for the information
![Page 91: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/91.jpg)
Copyright Conflicting requirements
Need to protect the financial interests of creators in order to encourage private investments to the economy
Need to create a framework for every human being to create
The 2nd principle should dominate in DLThe 1st principle should dominate the
others
![Page 92: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/92.jpg)
The Concept of FourC
The scientific community is the only one that is creator and consumer of information
It pays for both The SW Industry had shown
the way for freeware Can we do it in Scholarly
communication, text books etc.
![Page 93: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/93.jpg)
The Concept of FourC
In the 20th Century, in the interest of public good the Governments created BBC, PBS, AIR and also the Public Library System- provided compensation for artists and writers while providing free access to public
Total Global Expenditure in public broadcasting and public libraries exceed 100 B$
Look at our kings who supported all the poets and scholars
We need to find the 21st Century equivalent of BBC, AIR and PBS.
![Page 94: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/94.jpg)
The Concept of FourC
Learn from NAPSTER- will we have a video equivalent of NAPSTER
It is impossible to police and protect IP Rights at gigabit rate connections
Some countries and WIPO under pressure from lobbying groups form the draconian Copy Right Laws
Remember the FAIR USE Doctrine- and what the creators want- recognition and compensation
![Page 95: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/95.jpg)
The Solution -FourCConsortium for Compensation of Creative
Contents- FourCSet aside 25% of the current national
expenditure on public broadcasting and PLsAuthors are encouraged to put the work on
the web after a few years of commercial exploitation- many models- in return get tax excempt etc.
India showing the way IASc and INSABooks out of printTitanic effectAuthors Can take back the Copy right
![Page 96: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/96.jpg)
The Solution -FourC
Authors compensation based on the hits
Future versions of text books may be FAQs and XMLised-
Many eceonomic models- Can work for Courseware as well
![Page 97: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/97.jpg)
The Solution -FourC
The changing trend in publications- we want the documents to be readable by the machines as well humans
Born digital documentsCan we compensate those for
creating contents for the webCan we compensate those who create
music and movies for the web- really small form factor – small screens
![Page 98: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/98.jpg)
• Knowledge multiplies whenever bits are circulated on the web
• Technology has a habit of creating a problem (by knowledge explosion) and spending the rest of its time in trying to solve it- through Digital Library
• The Universal Digital Library with 20 Million Books by 2020 – A year our President dreams India to become a developed nation
• A FourC Policy and a Digital Library Act are in the anvil in India to meet this mission
• If a billion people sneeze- together we can create a Hurricane
• With the technology of the two nations we will convert this hurricane into useful energy and light up the world of knowledge
ConclusionConclusion
![Page 99: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/99.jpg)
• If you are creating a digital library, it should be for access by anyone, anytime and from any place
• If Your Digital Library Is For Exclusive Use, Let Us Talk About Weather
• There Is Nothing Called, Your DL, My DL
– It Is Our DL– The Universal Digital Library
![Page 100: Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead Prof N. Balakrishnan Indo-US Workshop on Open Digital Libraries.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649e155503460f94afef45/html5/thumbnails/100.jpg)
It happens only in
India