19 January 2010Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail...
-
Upload
donald-gregory -
Category
Documents
-
view
216 -
download
0
Transcript of 19 January 2010Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail...
19 January 2010 Kaiser: COMS E6125 1
COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)
COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)
Prof. Gail KaiserProf. Gail Kaiser
Spring 2010Spring 2010
19 January 2010 Kaiser: COMS E6125 2
What is this course about?
• Information Management on/for the Web• General hypertext and markup• Web protocols and mechanics• Structuring Web content• Developing Web applications
Lectures will survey basics
19 January 2010 Kaiser: COMS E6125 3
What is this course NOT about?
• Internet services• Network security• Search engines• User interfaces• Multimedia• Mobile computing
• The latest greatest technology from facebook, google, yahoo, <fill in here>
Lectures will not cover advanced topics
19 January 2010 Kaiser: COMS E6125 4
What is this course about?
• Internet services• Network security• Search engines• User interfaces• Multimedia• Mobile computing
• The latest greatest technology from facebook, google, yahoo, <fill in here>
Students choose their own advanced topics
19 January 2010 Kaiser: COMS E6125 5
Website• Course website:
http://bank.cs.columbia.edu/classes/cs6125– Syllabus, lecture slides (ppt and pdf),
assignments, everything else you need to know about the course
• We will also use CourseWorks– Assignment submission, optional discussion
board (e.g., to find team members for project)
19 January 2010 Kaiser: COMS E6125 6
Teaching Staff• Instructor: Prof. Gail Kaiser,
[email protected] (note the +6125, its important!)
• TAs: Mr. Swapneel Sheth, [email protected], and Mr. Suman Srinivasan, [email protected]
• Check website for office hours
19 January 2010 Kaiser: COMS E6125 7
Textbook• None - some sources will be
referenced, but generally you should find your own technical materials
19 January 2010 Kaiser: COMS E6125 8
Course Organization• First half of class sessions will consist of
overview/survey lectures (breadth)• Second half of class sessions will
consist of student presentations • Students should choose one or more
relevant areas of interest for their paper, project and presentation (depth)
19 January 2010 Kaiser: COMS E6125 9
Course Grading/Workload
• 45% individual research paper• 45% individual or team project • 10% individual presentation
19 January 2010 Kaiser: COMS E6125 10
First Assignment: Paper Proposal
• Sketch the topic you have in mind• Include tentative reference list
(specific background reading to learn about the topic)
• Long list of suggested topics at http://bank.cs.columbia.edu/classes/cs6125/topics.htm, or invent your own
19 January 2010 Kaiser: COMS E6125 11
First Assignment:“Goal” of Paper
• Do not simply survey some topic • Compare this to that, argue a position
in favor or against something, evaluate something according to some meaningful criteria, etc.
• Explain why your topic is relevant to this course (this may be obvious to you but may not be to me)
19 January 2010 Kaiser: COMS E6125 12
First Assignment: Background Reading
• List some specific materials you intend to read to learn about the topic– Scholarly papers from conferences or journals– White papers– Third-party reviews or commentaries (blogs ok)– System documentation– Specifications of "standards" (or proposed
standards)– Not advertising or publicity brochures– Not wikipedia
• Should include materials from at least two different points of view (e.g., do not take all your references from the same website)
19 January 2010 Kaiser: COMS E6125 13
First Assignment: Logistics
• Due Monday February 1st by 5pm• Maximum two pages (not including optional
figures and required reference list)• Submit by posting in Paper Proposals folder
on CourseWorks• Must be in a format I can read, which
means pdf, word, html, plain ascii text (with all figures embedded or viewable in a browser without special “plugins”)
19 January 2010 Kaiser: COMS E6125 14
Upcoming Assignments:
Paper• Paper outline due Monday
February 15th • Full paper due Friday March 12th
19 January 2010 Kaiser: COMS E6125 15
Heads Up on Project• Project Proposal due Monday March 22nd • Optionally work in teams (see
http://bank.cs.columbia.edu/classes/cs6125/team_advice)
• Build a new system or extend an existing system – submit code, demo system
• OR evaluate/compare one or more existing system(s) – submit procedures and findings, demo evaluation harness/process
• You may "continue" your paper topic towards the project, or do something entirely different
19 January 2010 Kaiser: COMS E6125 16
Heads Up on Presentation• Individual ~10 minute talk in class during
one of last few class sessions• One paragraph proposal, probably due
Monday March 29th (subject to change)• May be based on paper, project, or some
other topic (in the case of team members all presenting on the same project, please coordinate to avoid redundancy and discuss your plans with me in advance)
19 January 2010 Kaiser: COMS E6125 17
Today’s Topic• History of Hypertext and the WebWarning: The upcoming content is
not very technical, intended just to introduce the historical context
19 January 2010 Kaiser: COMS E6125 18
In The Beginning…
• “As We May Think”, by Vannevar Bush, in The Atlantic Monthly, July 1945
• Recommended that scientists work on inventing machines for storing, organizing, retrieving and sharing the increasing vast amounts of human knowledge
• He targeted physicists and electrical engineers - there were no computer scientists in 1945
19 January 2010 Kaiser: COMS E6125 19
“Memex” Proposal
• MEMEX = MEMory EXtension• Create and follow “associative trails”
(links) and annotations between microfilm documents
• Technically based on “rapid selectors” Bush built in 1930’s to search microfilm
• Conceptually based on human associative memory rather than indexing
19 January 2010 Kaiser: COMS E6125 20
“Memex” Design
19 January 2010 Kaiser: COMS E6125 21
Who was Vannevar Bush?
• MIT Prof, advisor to US President Roosevelt• Developed devices intended to detect
submarines during World War I• Organized government support of academic
research for military applications in World War II• Called for continued federal support of academic
research after the war, e.g., National Science Foundation (NSF)
• Memex (never implemented) was one possible suggestion for what scientists should “do” for the government now that the war was over
19 January 2010 Kaiser: COMS E6125 22
And then came “Hypertext”
• Ted Nelson coined term ~1965• The prefix hyper- ("over" or
"beyond") signifies the overcoming of the linear constraints of written text
19 January 2010 Kaiser: COMS E6125 23
NLS Too Early• Doug Engelbart at SRI starting ~1962• Developed oN Line System (NLS) to cross-
reference research papers for sharing among geographically distributed researchers
• Invented the computer mouse• Invented WYSIWYG word processing• Invented windows-based desktop, including
on-line help system• Invented online teleconferencing• All publicly demonstrated in 1968
19 January 2010 Kaiser: COMS E6125 24
"I don't know why we call it a mouse. It started that way and
we never changed it."
19 January 2010 Kaiser: COMS E6125 25
Xanadu Too Vaporware
• Ted Nelson defined sophisticated Xanadu all-encompassing hypermedia publishing system in 1967
• Bi-directional links, versioning (no deletion, no broken links)
• Excruciatingly concerned with copyrights, permissions and micropayments for linking and access (“transcopyright”)
• Rails against Web that “trivializes” Nelson’s hypertext model and states “We fight on.”
• Version 1.0 finally released in June 2007 (http://xanarama.net/)
19 January 2010 Kaiser: COMS E6125 26
19 January 2010 Kaiser: COMS E6125 27
19 January 2010 Kaiser: COMS E6125 28
Many Others Followed• Numerous academic and a few
commercial hypertext systems from ~1967 - 1980s – Brown University Hypertext Editing System– CMU ZOG– Xerox PARC NoteCards– Apple Hypercard
• Used for manuals/handbooks, museum exhibits, education, collaborative work
• First ACM Hypertext Conference in 1987
19 January 2010 Kaiser: COMS E6125 29
Problematic Issues• Some systems “closed”, with links directly
embedded in documents (markup)• Others “open”, with separate linkbase
(database of anchors and resource locators)• Search (information retrieval)• Cognitive overhead for authors• Disorientation for users (“lost in hyperspace”) Scaling to large numbers of documents and
links, and/or to large numbers of users
19 January 2010 Kaiser: COMS E6125 30
Proposed “Dexter” Standard
• Series of academic workshops 1988-1990• For comparison and interchange• 3 layers: run-time, storage and within-
component (anchors)• Computed links, multi-headed links, links
to links, typed links, links as components• Extended to support multimedia
synchronization, link context Still didn’t scale
19 January 2010 Kaiser: COMS E6125 31
Open Hypermedia Systems
• Early to Mid-1990s• Standard “open hypermedia protocol” (link
service) supporting client interoperability• Anchors and links maintained in a linkbase
separate from the [read-only] documents• Typically “wrap” document editors and
viewers to define anchors and follow links• No distinction between authors and users Still didn’t scale
19 January 2010 Kaiser: COMS E6125 32
And finally …the
World Wide Web• A big step backwards?
– Embedded links (markup), unidirectional, untyped, not application-independent, etc.
– Readers cannot easily be authors, no private or group annotations and links over read-only (to readers) documents
• But it scaled, perhaps because it indeed allowed dangling links (a hypertext no-no)
• And attracted authors and users like no other hypertext system before or since
19 January 2010 Kaiser: COMS E6125 33
Information Management:
A Proposal• By Tim Berners-Lee, then a Physicist at
CERN (Swiss National Physics Lab)• TBL had earlier (~1980) developed another
hypertext system for CERN – Enquire - which was little-used and eventually “lost” (manual still available)
• Proposal written March 1989, more widely circulated May 1990
• Originally called “Mesh” or “Information Mesh”
19 January 2010 Kaiser: COMS E6125 34
Information Management: A Proposal• Persuaded CERN management to fund development of a “global” hypertext system
• Goal to manage information about accelerators and physics experiments as projects evolved and staff turned over
• Development started October 1990, by TBL, Robert Cailliau and some visiting students, by now called “World Wide Web”
19 January 2010 Kaiser: COMS E6125 35
Problem: Information Loss
• CERN involved several thousand people, with very high turnover, organized into a multiply connected "web" whose interconnections evolve
• Information about what physics experiment facilities (including software) existed and how to find out about them traveled informally
• Much information never recorded, or too hard or time-consuming to find
19 January 2010 Kaiser: COMS E6125 36
Examples• Where is this module used? • Who wrote this code? Where does he/she
work? • What documents exist about that
concept? • Which laboratories are included in that
project? • Which systems depend on this device? • What documents refer to this one?
19 January 2010 Kaiser: COMS E6125 37
Predictions“CERN is a model in miniature of the
rest of world in a few years time. CERN meets now some problems which the rest of the world will have to face soon. In 10 years, there may be many commercial solutions to the problems above, while today we need something to allow us to continue.”
19 January 2010 Kaiser: COMS E6125 38
Solution: Linked Information
• Pool of information that could grow and evolve with the organization and the projects it describes
• "web" of nodes with links between them is far more useful than a fixed hierarchical system
19 January 2010 Kaiser: COMS E6125 39
Example Nodes• People • Groups of people • Software modules • Projects • Concepts • Documents • Types of hardware • Specific hardware objects
19 January 2010 Kaiser: COMS E6125 40
Example Links• A depends on B • A is part of B • A made B • A refers to B • A uses B • A is an example of B
19 January 2010 Kaiser: COMS E6125 41
General RequirementsSystem must allow any sort of information
to be enteredAnother user must be able to find the
information, sometimes without knowing what he/she is looking for
o System should be aware of the generic types of the links between items (e.g., dependencies), and the types of nodes (people, things, documents…) without imposing any limitations
19 January 2010 Kaiser: COMS E6125 42
System RequirementsRemote access across networksPlatform heterogeneity Non-Centralization - allow existing systems
to be linked together without requiring any central control or coordination
Access to existing data and databases in hypertext form
o Private links - one must be able to add one's own private links to and from public information, and also annotate links as well as nodes privately
19 January 2010 Kaiser: COMS E6125 43
Bells and Whistles: Graphics
“Storage of ASCII text, and display on 24x80 screens, is in the short term sufficient, and essential. Addition of graphics would be an optional extra with very much less penetration for the moment.”
19 January 2010 Kaiser: COMS E6125 44
Bells and Whistles: Automatic Data
Analysiso Search for anomalies such as undocumented
software or divisions which contain no peopleGenerate lists of people or devices for other
purposes, such as mailing lists of people to be informed of changes
o Look at the topology of an organization or a project, and draw conclusions about how it should be managed, and how it could evolve
19 January 2010 Kaiser: COMS E6125 45
Bells and Whistles: Visualization
o “Imagine making a large three-dimensional model, with people represented by little spheres, and strings between people who have something in common at work. Now imagine picking up the structure and shaking it, until you make some sense of the tangle: perhaps, you see tightly knit groups in some places, and in some places weak areas of communication spanned by only a few people. Perhaps a linked information system will allow us to see the real structure of the organisation in which we work.”
19 January 2010 Kaiser: COMS E6125 46
Bells and Whistles: Live Links
Allow documents to be linked into "live" data so that every time the link is followed, the information is retrieved
The data to which a link (or a hot spot) refers may be very static, or it may be temporary
[If one sacrifices portability], make following a link fire up a special application, so that diagnostic programs, for example, could be linked directly into the maintenance guide
19 January 2010 Kaiser: COMS E6125 47
Non-Requirements “Discussions on Hypertext have sometimes tackled
the problem of copyright enforcement and data security. These are of secondary importance at CERN, where information exchange is still more important than secrecy. Authorisation and accounting systems for hypertext could conceivably be designed which are very sophisticated, but they are not proposed here. In cases where reference must be made to data which is in fact protected, existing file protection systems should be sufficient.”
19 January 2010 Kaiser: COMS E6125 48
Specific ApplicationsDevelopment Project DocumentationDocument RetrievalPersonal Skills Inventory
19 January 2010 Kaiser: COMS E6125 49
Original Vision (CERN‘89)
19 January 2010 Kaiser: COMS E6125 50
Client/Server Architecture
19 January 2010 Kaiser: COMS E6125 51
Gateways to Existing Data
19 January 2010 Kaiser: COMS E6125 52
Implementation• Originally combination browser
and editor only on NeXT cubes• Later line-mode browser, GUI
browsers for X and Mac • First web server was
nxoc01.cern.ch, later called info.cern.ch
19 January 2010 Kaiser: COMS E6125 53
19 January 2010 Kaiser: COMS E6125 54
Deployment• Line-mode browser released for use
outside CERN in August 1991• Submission to 1991 ACM Hypertext
conference rejected• Various GUI browsers released in 1992• Mosaic released by NCSA in September
1993, developed by undergraduate Marc Andreesen (who later founded Netscape)
Load on info.cern.ch
19 January 2010 Kaiser: COMS E6125 56
But info.cern.ch existed before the Web
• Internet != Web• 1962 military packet switching network
invented (on paper)• 1969 ARPANET comes on line with 4
nodes• 1976-1983 UUCP, BITNET, CSNET, etc. • 1985 Merged Internet with 2k nodes• 1988 56k nodes, 1992 1.1G nodes, 1996
15G nodes, …
19 January 2010 Kaiser: COMS E6125 57
Internet Information Access• Lots of anonymous ftp resources available by
mid 1970s, but had to know where to look• 1989 McGill’s Archie (ARCHIvE) finds files by
name using regular expressions• 1990 Thinking Machine’s WAIS (Wide Area
Information Servers) adds content indexing• 1991 U. Minn.’s Gopher (“go for” and school
mascot) adds friendly menu-based UI, augmented by U. Nevada’s VERONICA spider indexing
19 January 2010 Kaiser: COMS E6125 58
Money and Politics• University of Minnesota announced that
they would begin to charge licensing fees for Gopher's use in February 1993
• US government’s Acceptable Use Policy previously prohibiting commercial use of the Internet “re-interpreted” in March 1993
• CERN's directors announce in April 1993 that WWW technology would be freely usable by anyone, with no fees payable to CERN
19 January 2010 Kaiser: COMS E6125 59
What happened to Tim Berners-Lee?
• Left CERN in 1994 for MIT to become the Director of the new World Wide Web Consortium (W3C)
• Technically a research staff member, not an MIT professor (only has a BA, in Physics)
• Knighted by Queen Elizabeth II in 2004, numerous other honors and awards
• Never got rich…
19 January 2010 Kaiser: COMS E6125 60
Summary• Many people trace the Web’s origins to Vannevar
Bush, although there were other early attempts to introduce something hypertext-like over microfiche and/or paper documents
• TBL’s World Wide Web “succeeded” whereas numerous earlier and contemporary hypertext systems “failed” because it was simple and scalable without trying to be perfect
• Many fancy ideas from other hypertext work are being re-introduced on top of Web (Web 2.0)
19 January 2010 Kaiser: COMS E6125 61
Reminders
• Paper proposal due February 1st • Project proposal due March 22nd • Paper must be individual, projects
may optionally be done in teams
19 January 2010 Kaiser: COMS E6125 62
COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)
COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)
Prof. Gail KaiserProf. Gail Kaiser
Spring 2010Spring 2010