1 Dr Alexiei Dingli Introduction to Web Science Web 1.0.
-
date post
19-Dec-2015 -
Category
Documents
-
view
221 -
download
3
Transcript of 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0.
1
Dr Alexiei Dingli
Introduction to Web Science
Web 1.0
2
• Packet switching network
• IP Addressing
• Internet Applications
• The WWW and markup
• Searching the WWW
• Intelligent Agents
• Internet Governance
Introducing Web 1.0
3
• Local area network (LAN)
– Network of computers located close together
• Wide area networks (WANs)
– Networks of computers connected over greater distances
• Circuit
– Combination of telephone lines and closed switches that connect them to each other
Packet-Switched Networks (1)
4
• Circuit switching is used in telephone communication
• The Internet uses packet switching
• Packet switching needs computers called ‘routers’ and the programs called ‘routing algorithms’
Packet-Switched Networks (2)
5
Packet-Switched Networks (3)
• Information is divided into packets
• It is passed from node to node
• It is recomposed as one chunk on the destination server
6
7
• Routing computers– Computers that decide how best to forward
packets
• Routing algorithms– Rules contained in programs on router computers
that determine the best path on which to send packets
– Programs apply their routing algorithms to information they have stored in routing tables
Routing Packets
8
• Communications protocol suite
– Packet switched protocol• No end-to-end connection is required• Each message broken down into small pieces called packets• Packets possibly routed to destination over different paths
– Transmission Control Protocol (TCP)• Breaks messages into packets• Numbers packets in order• Reorders packets at the destination
– Internet Protocol (IP)• Routes packets to the proper destination
TCP/IP
9
Open Systems Interconnections Model
OSI Model (also called TCP/IP protocol suite) layers (from the highest to the lowest):
7 Application
{ HTTP, SMTP, FTP, Telnet, SSH, Whois, etc.
6 Presentation
5 Session
4 Transport TCP, UDP
3 Network IP
2 Data Link Ethernet
1 Physical Wire, Radio, Fibre Optic
10
• Internet addresses are based on a 32-bit number called an IP address
• IP addresses appear as a series of up to four separate numbers delineated by a period
• An address such as 126.204.89.56 uniquely identifies a computer connected to the Internet
• IP Subnetting conceptually divides a large network into smaller sub-networks
IP Address
11
IP Classes (1)
12
IP Classes (2)
Class Leading Value
Network Numbers
Addresses Per Network
Class A 0 126 16,777,214
Class B 10 16,384 65,534
Class C 110 2,097,152 254
13
Subnetting
14
• Explosion in size of IP routing tables.
• Every time more address space was needed, the administrator would have to apply for a new block of addresses.
• Any changes to the internal structure of a company's network would potentially affect devices and sites outside the organization.
• Keeping track of all those different Class C networks would be a bit of a headache in its own right.
Without subnetting …
15
• Better Match to Physical Network Structure
• Flexibility
• Invisibility To Public Internet
• No Need To Request New IP Addresses
• No Routing Table Entry Proliferation
Benefits of Subnetting
16
• Network Layer• Developed in 1994
• Will replace the IP Vr4 standard– limits on network addresses will eventually lead to
exhaustion of available addresses (by 2023)– supports only 4,294,967,296 addresses (32bits)
• Improvements include– providing future cell phones and mobile devices their own
unique & permanent addresses– supports about 3.4 × 1038 (128bits)
IP Vr6 (or IP Next Generation)
17
• A Uniform Resource Locator (URL) consists of names and abbreviations that are much easier to remember than IP addresses
• The HTTP protocol defines how an Internet resource is accessed
• An address such as www.microsoft.com is called a domain name
• Domain Name System (DNS)– A database of Internet names– DNS Servers convert Internet names to IP addresses– Top level domains
Domain Names
18
Top-Level Domain Names
• Internet Corporation for Assigned Names and Numbers (ICANN)
– Responsible for managing domain names and coordinating them with IP address registrars
19
• The web was not an ‘open’ place
• One company available where you could buy a .com, .net or .org domain
• Price of 100 dollars and a two year minimum
• Back then, there was a big chance you would be able to buy a dictionary word as .com
• In 2000, they lost the monopoly position and domain prices dropped over 95%
• Since then innovation halted and Network Solutions became one of the thousands anonymous domain registrars
Domain Name case study
20
• File transfers
• Instant messaging (IM)
• Newsgroups
• Streaming audio and video
• Internet telephony
• World Wide Web (WWW)
Internet Applications
21
• Most popular and widely used Internet application
• 30 billion e-mails sent every day– Spam – junk e-mail messages– Spam costs corporate America $9 billion per year
• Every e-mail message contains head that describes source and destination for the message
• E-mail messages are text, but may have attachments of many types of digital data– Viruses often transmitted via e-mail
22
• E-mail is sent across the Internet is managed and stored by mail servers
• Simple Mail Transfer Protocol (SMTP) is the standard to send mails to the server
• Post Office Protocol (POP) is the standard to get mails from the server
• The Interactive Mail Access Protocol (IMAP) is a newer e-mail protocol
SMTP, POP, and IMAP (1)
23
SMTP, POP, and IMAP (2)
24
• Use complex email addresses rather than name and surname combination– Why? Bots? Name Directories?
• Control exposure of email address– How? Java script? JPEG?
• Use multiple email addresses for different purposes– In what occasions?
• Use content-filtering software– black list spam filter – white list spam filter – challenge response using graphical challenges ?
Controlling Spam
25
• Hotmail (1995)
• First place to get a free email address, disconnected from an ISP
• 4 years later, 30 million people worldwide were exchanging @hotmail email addresses
• Bought by Microsoft in 1998 for just 400 million dollars
• 2007 the end of Hotmail– transformation to “Live” mail to become an
integrated part of the Microsoft’s “Live” family
E-Mail Case Study
26
• File transfer protocol (FTP)– Protocol providing for transmission of a file between
an Internet server and a user’s computer
• Peer-to-peer (P2P) file sharing– Share data from one computer to another– Every user can be a server– Napster
• Kazaa• Gnutella• Torrent
– With P2P, every user on the network can make data available to every other user on the network
File Transfers
27
• Allows user to create a private chat session with another user
• IM started with AOL
• IM sneaking into corporate networks
• Many Web-based companies use IM technology for customer service– eBay
Instant Messaging
28
29
• ICQ abbreviation of “I seek you”
• 1996 first easy to use instant messenger program where you could add friends to your list, and see if they were online
• Back then it was revolutionary for the masses and it became the ‘application’ everybody had installed
• Acquired by AOL in June 1998 for a whopping $287 million
• Eventually the program got too many additional features that made the application heavy and unorganized
• Competition of AOL IM, Yahoo IM, and MSN Messenger increased, and friends on your ICQ-list left the application eventually resulting in a mass abandoning of the network
ICQ case study
30
• Online, bulletin board discussion forums
• Users post and read messages
• More than 100,000 newsgroups
• Millions of newsgroup readers
• Important information resource, especially for technical issues and products
• Newsgroup messages distributed using open standard – Many are uncensored
Usenet Newsgroups
31
• Creating and sending audio and video files
– Sports• Basketball at sports.yahoo.com• Major league baseball
– News• Fox News• CNN radio
– Business• ZDNet
– Education• Warriors of the Net
Streaming Audio and Video
32
• Voice-over Internet Protocol (VoIP)
• Use your computer like a telephone
• Software connects computers via the Internet and transmits voice data
• Savings comes from eliminating toll charges between locations
Internet Telephony
33
Internet TV
34
• Collection of hyperlinked computer files on the Internet
• Client-server application– Web servers– Web browsers as clients
• WWW standards– Hypertext markup language (HTML)
• Current standard for writing Web pages• Tags in HTML instruct the client browser how to format and display the
Web page content
– Hypertext transfer protocol (HTTP)• Establishes a connection between Web server and client
– Extensible markup language (XML)• A meta-markup language• Gives meaning to the data enclosed within XML tags
The World Wide Web
35
• Create your own free homepage on the web
• 1997 Fifth most popular website, with over 500,000 homepages created
• Yahoo bought Geocities two years later for $3.57 billion dollars and started to actively commercialize the homepages with various advertising types that resulted in their death sentence
• ‘Real’ web hosting becoming affordable for anybody, the need for free homepages in this form vanished
Website case study
36
• SGML is a rich meta language that is useful for defining markup languages
• HTML is particularly useful for displaying Web pages
• XML defines data structures for electronic commerce (and much more …)
Overview of Markup Languages
37
Development of Markup Languages
http://www.w3.org/
38
• The ISO adopted SGML standard in 1986
• SGML is nonproprietary and platform-independent
• SGML supports user-defined tags and architecture to complement the required richness of documents
Standard Generalized Markup Language
39
• XML is a descendant of SGML
• XML allows designers to easily describe and deliver structured data from any application in a standard, consistent way
• XML can be embedded within an HTML document
• XML allows you to create your own customized markup language.
Extensible Markup Language
40
• Tag – a piece of Markup– An opening tag <name>– A closing tag </name>
• Element – well formed usage of tags– <name>Alexiei</name>
• Attribute – properties– <name length=“7”>Alexiei</name>
• Rules to keep XML well formed1. Can be nested but not overlapping 2. Case sensitivity3. Quoted attributes4. Required end tag
• Short hand– <abc></abc> is equivalent to <abc/>
Learn XML in a slide
41
<book>E-Commerce</booK>
<book pages=100>E-Commerce</book>
<book pages=“100”><title>E-Commerce</book></title>
<book pages=“100”><title>E-Commerce</title></book>
<book pages=“100”><title>E-Commerce</title><author>
<name>Gary</name><surname>Schneider</surname>
</author></book>
Some XML examples
42
<book>E-Commerce</booK>
<book pages=100>E-Commerce</book>
<book pages=“100”><title>E-Commerce</book></title>
<book pages=“100”><title>E-Commerce</title></book>
<book pages=“100”><title>E-Commerce</title><author>
<name>Gary</name><surname>Schneider</surname>
</author></book>
Some XML examples
43
Processing a Request for an XML Page
• Why going through all this hassle?• How would you go about displaying HTML on a
– PC– Handheld – Mobile
44
• Tim Berners-Lee invented HTML
• HTML is a document production language that includes a set of tags that define the format and style of a document
• HTML is based on SGML
• HTML is an instance of one particular SGML document type – Document Type Definition (DTD)
Hypertext Markup Language
45
• An HTML document contains both document content and tags
• The tags are the HTML codes inserted in a document to specify the format on screen
• Each tag is enclosed in brackets (< >)
• Most tags are two-sided – opening and closing tags
• Well formed tags, bots, meta tags?? Why are they important?
HTML Tags
46
• Hyperlinks are bits of text that connect the current document to:– Another location in the same document– Another document on the same host machine– Another document on the Internet– Can they link to a toaster at home?
• Hyperlinks are created using the HTML anchor tag
• Two popular link structures:– Linear hyperlink structure– Hierarchical hyperlink structure
HTML Links
47
• HTML version 1.0 was introduced in 1991
• HTML 2.0 was released in Sept. 1995
• HTML 3.2 was introduced in 1997
• HTML 4.0 was released by W3C in Dec 1997
• HTML 4.01 was released in Dec 1999
• XHTML 1.0 became a W3C recommendation in Jan 2000
HTML Version History
48
• Low end editor displays HTML code on the screen and allow you to insert HTML tag pairs by clicking selected buttons
• High end editor are Web site builder programs, they provide a rich environment that displays the Web page, not the HTML code
• Microsoft FrontPage and Macromedia Dreamweaver are examples of Web site builders
HTML Editors (1)
49
HTML Editors (2)
50
• HTML and XML only display and exchange data• No interactivity; no processing of data
• Scripting languages– Provides basic interactivity
• Rollovers• Crawling text
– JavaScript– VBScript
• Full-featured Web programming– Java– Client side scripting or browser side scripting– Applets– J2EE
• Common Gateway Interface (CGI)– Allows passing of data between a static HTML page and a
computer program
Static versus Dynamic Pages
51
• Most data on the Internet is part of the WWW
• Search engines – large databases that index WWW content
• Building the search engine database– Submit a site to the search engine administrator for listing
– Spiders• Metatags
– Google– Yahoo
Searching the WWW
52
• A search engine is a special kind of Web page software that finds other Web pages that match a word or phrase you entered
• A Web directory is a listing of hyperlinks to Web pages that is organized into hierarchical categories Eg: http://directory.google.com/
• Search engines contain three major parts: spider, index, and utility
Search Engines
53
Popular Search Engines
54
Spiders and Crawlers
55
Indexing
56
• Search engine AltaVista was the Google of the last millennium
• First real effort to index the World Wide Web
• One of the few search engines that actually came up with good search results
• Had a hard time fighting spam listings in their results
• While spam grew logarithmic in Altavista, some company named Google found a way to prioritize web pages more intelligently, and thus keep spam out better
Search Engine case study
57
• PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value
• Google interprets a link from page A to page B as a vote, by page A, for page B
• But Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote
• Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important."
Case Study: ’s PageRank
58
• An intelligent agent is a program that performs functions such as – information gathering, – information filtering, – mediation running, – in the background on behalf of a person or
entity
• What agents can you think of?
Intelligent Agents
59
• Search Agents– Improve your information retrieval on the Internet – Used to find pages on the Web easily and quickly
• Meta Agents, Specialised (MP3), etc
• Web Agents– Improve browsing experience
• Automate form filling, off-line browsing, etc
• Monitoring Agents– Monitor web sites or specific themes – Used to get automatic alerts about the latest news
Intelligent Agents (2)
60
• Virtual Assistants– Artificial life– Characters, plants, animals or people living on your desktop
• Shop Bots– Allow users to compare prices on the Internet– Find the best price for books, CDs, movies, etc.
• Webmastering Agents– Make it easy to manage a Web site and make it more effective– Monitor broken links, content gathering etc.
Intelligent Agents (3)
61
• Other agents …
– Development agents• Used to develop other agents
– Games agents• Used in games
Intelligent Agents (4)
62
Ms Dewey not your ordinary search agent!
63
• Internet Engineering Task Force (IETF)– Works in groups to develop standards
• Internet Engineering Steering Group (IESG)– Approves or disapproves standards developed by the
IETF
• Internet Architecture Board (IAB)– The oversight authority for the standards development
process
• World Wide Web Consortium (W3C)– Promotes the WWW and develops new web technologies
and standards
Internet Governance
64
• We’re all very familiar with Web 1.0
• But what makes Web 2.0?
• Next lecture …
Conclusion
65
Questions?