Basic Internet and Networking Concepts Representation and Management of Data on the Internet.
-
Upload
bruce-parks -
Category
Documents
-
view
216 -
download
0
Transcript of Basic Internet and Networking Concepts Representation and Management of Data on the Internet.
Basic Internet and Basic Internet and
Networking ConceptsNetworking Concepts
Representation and Management of Data on the Internet
The InternetThe Internet
A worldwide network connecting millions of hosts
Interconnecting many Local Area Networks (LANs) (inter-network or just Internet)
The LANs connected to the Internet can be of various types
A host is a computer that is connected to the Internet
History of the InternetHistory of the Internet It started as a United States government project,
sponsored by the Advanced Research Projects Agency (ARPA), and was originally called the ARPANET
The Internet grew quickly throughout the 1980s and 90s
Less than 600 computers were connected to the Internet in 1983; now there are over 10 million
The WebThe Web
The term World-Wide Web (Web or WWW) refers to pieces of information found on the Internet
These pieces of information can be reached by hosts connected to the Internet
The Web allows many different types of information to be accessed using a common interface (Web browser)
A Web document usually contains links to other Web documents, creating a hypermedia environment
The term Web comes from the fact that information is not organized in a linear fashion
The WebThe Web
The term World-Wide Web (or simply Web) describes a collection of many pieces of information that are found on the Internet
Internet hosts can access this information The Web allows many different types of
information to be accessed using a common interface (Web browser)
A Web document usually contains links to other Web documents, creating a hypermedia environment
The term Web comes from the fact that information is not organized in a linear fashion
Web ServersWeb Servers
These pieces of information are stored as files on particular hosts of the Internet
These hosts are called Web servers
Information Types on the WebInformation Types on the Web
The information pieces of the Web can be of textual nature, images, video, audio, programs or any other type of information
Every type of information can have different formats for storing it as a file
For example, some formats for storing images are jpeg, bmp, gif, ps, pdf
HTMLHTML
Much of the information that is found on the Web is stored as HTML files
HTML is a markup language for formatting text. In addition, HTML facilitates inclusion of other types of information (such as images) in our text documents
Here is an example of an HTML document This is how it looks like when displayed
inside a browser
BrowsersBrowsers
We use a browser to display HTML documents
The browser is responsible for fetching the documents and displaying their contents according to the HTML rules
BrowsingBrowsing
HTML documents can also contain links to other HTML documents (or files of other types, such as images, etc.). The user can follow these links (by clicking them) to view other related documents and files
Browsing/surfing refers to the activity of viewing documents in the Internet and following their links
URLsURLs
Each information piece on the Web has a unique identifying address which is called a URL (Uniform Resource Locator)
A URL takes the following form: http://www.huji.ac.il/index.html
It has 3 parts: a protocol field, a hostname field and a file field
protocol filehostname
URL FieldsURL Fields The protocol field (“http” in the previous
example) specifies the way in which the information should be accessed
The host field specifies the host on which the information is found
The file field specifies the particular location in the host's file system where the file is found
There could be more complex forms of URLs, but we do not discuss them
Search EnginesSearch Engines
What are search engines? How do they work? Shortcomings of search engines Some popular search engines: Infoseek,
HotBot, Altavista, Excite, Lycos, Yahoo!, Jeeves,...
HTTP DaemonsHTTP Daemons
The information pieces of the Web are stored as files on Web servers
In order to make these information pieces available to other hosts, each server runs an HTTP-daemon
HTTP Daemons (continued)HTTP Daemons (continued)
An HTTP-daemon is an application that is constantly running on the server and waits for requests from remote hosts
A host can request the daemon for a document (a file) that is located on the server
Technically, any host connected to the Internet can act as a Web server by running an HTTP-daemon application
Browser - HTTPD InteractionBrowser - HTTPD Interaction
host www.cs.huji.ac.il
HTTPD
applicationDisk
Browser
user requests
http:// www.cs.huji.ac.il /index.html
GET /index.html
sends the
content of
index.html
Browser - HTTPD InteractionBrowser - HTTPD Interaction The user requests http://www.cs.huji.ac.il/index.html The browser contacts the HTTP-daemon running on
the host www.cs.huji.ac.il and requests the document /index.html
The HTTP-daemon translates the requested name to a specific file in its local file system
The HTTP-daemon reads the file index.html from the disk and sends the contents of the file to the browser
The browser receives the document, parses it according to the HTML rules and displays it
IP (Internet-Protocol) AddressesIP (Internet-Protocol) Addresses
Hostnames are used by people. The network mechanism uses IP-addresses instead
Every host connected to the Internet has a unique IP address that identifies it
IP addresses are 32-bit numbers that are usually written as four decimal numbers separated by dots, e.g. 135.17.98.240, where the numbers refer to the four bytes composing this address
IP PacketsIP Packets Information that is sent over a network is often
broken down in parts, called packets, which are sent to the receiving machine and then reassembled
In the Internet, data is transferred from one host to another is divided into IP-packets
Routing IP PacketsRouting IP Packets The essential role of the Internet is to enable every
host to send IP-packets to any other host Each IP-packet contains source and target IP-
addresses There is a routing protocol that handles the
transfer of packets to their target hosts, according to the target IP addresses
The sending host only needs to know the IP address of the target host it wishes to communicate with
Using IP AddressesUsing IP Addresses
How does the browser know the IP address of the Web server?
One possibility is that the user explicitly specifies the IP address of the server in the host field of the URL, for example:
http://135.17.98.240/index.html However, it is inconvenient for people to
remember such addresses
Internet AddressesInternet Addresses Many hosts have, in addition to IP address,
human-readable Internet Address (or hostnames) Here are some examples of Internet Addresses:
www.cs.huji.ac.ilwww.cocacola.comwww.yellowpages.co.ilwww.isdn.net.il
The first part is the name of a particular host (i.e., computer)
The rest is the domain name
Internet Addresses (continued)Internet Addresses (continued) Hostnames have a hierarchical structure
www.cs.huji.ac.ilwww is a computer in the Dept. of
Computer Science (cs) at the Hebrew University of Jerusalem, Israel (huji), which is an Academic Campus (ac) of Israel (il)
The rightmost name describes the main domain of the host (il - Israel). Left to it, there is a sub-domain, and then further to the left, there are more specific sub-domains
Generic DomainsGeneric Domains There are 7 special domains that are called generic
domains • com - commercial organizations
(www.cocacola.com) • edu - educational institutions (www.berkeley.com) • gov - U.S. governmental organizations
(www.cia.gov) • int - international organizations • mil - U.S. military • net - networks (InterNIC) • org - other organizations (www.w3.org)
Country DomainsCountry Domains Generic domains usually refer to hosts inside the
U.S. Other countries use two-letter country domains:• il - Israel • uk - United Kingdom • jp - Japan • se - Sweden
These domains usually have sub-domains that correspond to the generic domains. For example, co.il is the domain of all the commercial organizations in Israel, and ac.il is the domain of all the academic institutions inside Israel
Back to the BrowserBack to the Browser When we address a host in the Internet, we usually
use its hostname (e.g., using a hostname in a URL) The browser needs to map this hostname into the
corresponding IP address of the given host There is no one-to-one correspondence between
the sections of an IP address and the sections of a hostname
Translating IP Addresses to Translating IP Addresses to HostnamesHostnames
The translation of IP addresses to hostnames requires a lookup table
Since there are millions of hosts on the Internet, it is not feasible for the browser to hold a table which maps all hostnames to their IP-addresses
Moreover, new hosts are added to the Internet every day and hosts change their names
DNSDNS
The browser (and other Internet applications) use a DNS-Server to map hostnames to IP addresses
DNS (Domain Name System) is an hierarchical scheme for naming hosts
Proxy ServersProxy Servers A proxy server acts as a delegate of browsers for
accessing the Web The browser transfers the requests for a document
to the Proxy The Proxy contacts the suitable Web-server and
fetches the document on behalf of the browser
Proxy ServerProxy Server
Proxy server
Proxy
application
Browser
user requests a document
browser request the document
from the proxy
sends the
content of
index.html
proxy asks the
document from
the HTTPD
Cache
Advantages of Proxy ServersAdvantages of Proxy Servers
Proxy servers have several advantages over direct access: • They can be combined with a firewall to
enable restricted access to the Internet
• They enable caching of popular documents
• They can enlarge the functionality of the browser by translating from one protocol to another (for example, from FTP to HTTP and vice-versa)
FirewallsFirewalls A firewall poses restrictions on the traffic in
or out of a local-area network Examples:
Hides sensitive data from the outside world Prevents access of local users to specific sites
outside the local-area network
How a Firewall WorksHow a Firewall Works All the traffic (of IP-packets) in or out of
the local-area network is forced to go through a single host
A firewall application is installed on this host
The firewall examines all the in and out traffic of IP-packets and discards illegal packets
Dynamically Generated Dynamically Generated DocumentsDocuments
host www.excite.com
HTTPD
applicationBrowser
user requests
http://www.excite.com/search?what=something
GET /search?what=something
sends the
contents of
index.html
execution of
search program
39
Local-Area NetworksLocal-Area Networks
LANLAN
A A Local-Area NetworkLocal-Area Network(LAN) covers a small(LAN) covers a smalldistance and a smalldistance and a smallnumber of computersnumber of computers
A LAN often connects the machinesA LAN often connects the machinesin a single room or buildingin a single room or building
LANs (Local-Area Networks)LANs (Local-Area Networks) Limited size Privately owned
• Centrally managed
• Usually hosts physically connected via a cable
• Homogeneous devices & protocols
• Known features (latency, bandwidth,..)
42
Wide-Area NetworksWide-Area Networks
LANLAN
A A Wide-Area NetworkWide-Area Network (WAN) (WAN)connects two or more LANs,connects two or more LANs,often over long distancesoften over long distances
A LAN is usually ownedA LAN is usually ownedby one organization, butby one organization, buta WAN often connectsa WAN often connectsdifferent groups indifferent groups indifferent countriesdifferent countries
LANLAN
What is a protocol?What is a protocol?
06 7647834
Welcome to Mount Hermon ski site. For ski conditions press 1, for reservation of ski packages press 5, ...
5
Please select the type of your credit card. For Visa press 1, ...
45
TCP/IPTCP/IP A protocol is a set of rules that determine how things
communicate with each other
The software which manages Internet communication follows a suite of protocols called TCP/IP
The Internet Protocol (IP) determines the format of the information as it is transferred
The Transmission Control Protocol (TCP) dictates how messages are reassembled and handles lost information
TCP/IP protocol suiteTCP/IP protocol suite
Application HTTP, FTP, TELNET,...
Transport TCP, UDP
Internet IP
Link Ethernet, Token-Ring,...
IP AddressesIP Addresses
Class
A
B
C
D
E
From
0.0.0.0
128.0.0.0
192.0.0.0
224.0.0.0
240.0.0.0
Till
127.255.255.255
191.255.255.255
233.255.255.255
239.255.255.255
247.255.255.255
Net ID
7 bit
14 bit
21 bit
28 bit
27 bit
Host ID
24 bit
16 bit
8 bit
-
-
Class Network ID Host ID
32 bit
InterNIC
Transport LayerTransport Layer
TCP • Connection oriented
• Reliable, keeps order UDP
• Connectionless
• Unreliable
• Fast
Client-Server ModelClient-Server Model
Server application Client applicationPort 5746
Server machine
144.12.34.99
Client machine
190.30.42.155
HTTP ProtocolHTTP Protocol
Hypertext Transfer Protocol Used between Web-clients (e.g., browsers)
and Web-servers (and proxies) Text based Built on top of TCP Stateless protocol
HTTP Transaction -- ClientHTTP Transaction -- Client Client request:
• Sends a request
GET /index.html HTTP/1.0• Sends optional header information
User-Agent: browser name
Accept:formats the browser understands
...• Sends a blank line (\n) • Can send post data
HTTP Transaction -- ServerHTTP Transaction -- Server
Server response:• sends status line HTTP/1.0 200 OK
• sends header information Content-type: text/html
Content-length: 3022
...
sends a blank line (\n) sends document data
Reacting to Responses of ClientsReacting to Responses of Clients
HTML pages are static documents To achieve interaction with the user, there
is a need for Internet tools and techniques that get input from the user and react according to this input
Sometimes there is a need to produce output as a result of querying a database. The output in this case is not known in advance
Server TechnologiesServer Technologies Some Web applications use online input to create
pages on the fly (for example, search engines) A request will include, in addition to the URL of
the service provider, a list of parameters For example,
http://www.google.com/search?q=search-word The creation of the pages may also require
interaction with some applications (for example, database queries)
Creating Pages on the Fly Creating Pages on the Fly in the Serverin the Server
There are four common ways to serve page requests that include input parameters:• CGI (Common Gateway Interface) programming
• Java Servlets
• JSP -- Java Server Pages, or
• Microsoft ASP -- Active Server Pages (similar to JSP)
CGI ProgrammingCGI Programming
CGI is a scripting languageA cgi script works with an application that runs on
the server and creates HTML codeAn early technology
Java ServletsJava Servlets
Servlets are java applications that some Web servers can run
A Servlet creates pages on the fly and these pages are returned to the requesting browser
JSP and ASPJSP and ASP JSP (Java Server Pages)
• Create an HTML page that has Java code inside HTML tags
This page is actually a template The code, for example, could issue a database query
and create an HTML table for the result• The Web server executes the code in the template
and produces a pure HTML page that is returned to the client
Microsoft ASP (Active Server Pages)• The code is VB (Visual Basic) scripts• The Web server must be Microsoft IIS server
Client TechnologiesClient Technologies Some technologies interact with the user on the
client level (Web browser) Java Script is a scripting language that can be
added to HTML pages Web browsers can run the script and change the
output accordingly There is a slight interaction of the script with the
file system using cookies Cookies are small files that store some personal
information in the file system of the client
Separating Contents from StyleSeparating Contents from Style
In HTML, the contents and the style of pages are inseparable• HTML tags actually refer only to the style
XML (eXtensible Markup Language) is a new markup language for marking the semantics (meaning) of the data
XML tags describe the meaning of each portion of text in an XML document
XML TagsXML Tags
XML tags are similar to attributes in a relation
However, the attributes are the same for all the records of the relation
In XML documents, each portion of text has its own tag• <course> databases </course>
• <course> operating systems </course> XML tags can be nested
Parsing XML DocumentsParsing XML Documents
XML facilitates easy parsing of documents according to their semantics
For example, the CS Department has many Web pages of courses
Can we write a program that reads all these pages and prints a list of the names of courses?
If XML tags are used, it is easy to do that
Using XMLUsing XML
XML is important in the context of data exchange between applications
It is possible to define a common set of tags that are suited for specific applications
For example, MathML is used for exchanging mathematical information
Showing XML Document in Showing XML Document in BrowsersBrowsers
XML documents contain data with semantic tags
For a graphical representation, information about the style must be added • For example, HTML tags provide information
about the style
Style SheetsStyle Sheets
Style is added to XML documents by means of style sheets
There are two style-sheet languages• CSS -- Cascading Style Sheets
• Describe how to graphically show the data
• XSL -- XML Style-sheet Language• Can also transform the data
Putting it All TogetherPutting it All Together A common architecture for Web applications has
several tiers • DBMS (database management system) for storing
and processing information
• A Web server for producing pages as a result of client requests
• A browser that supports dynamic pages using Java scripts (for creating dynamic pages) and CSS (for creating the desired visual output)
How Should XML be UsedHow Should XML be Used ? ? How can we query easily and effectively XML
documents? How can we store efficiently XML documents? What is the proper way to include other resources
in XML documents (i.e., figures, sounds, etc.)? How can we use
a general style, and information that is semantically well defined
without making the process of creating documents too cumbersome?