Internet applications unit1

156
INTERNET APPLICATIONS Internet applications (IA) are web applications that have the features and functionality of traditional desktop applications. IAs typically transfer the processing necessary for the user interface to the web client but keep the bulk of the data (i.e maintaining the state of the program, the data etc) back on the application server. IAS TYPICALLY: run in a web browser, or do not require software installation run locally in a secure environment called a sandbox can be "occasionally connected" wandering in and out of hot- spots or from office to office. HISTORY OF IAS The term " Internet application" was introduced in a Macromedia whitepaper in March 2002, though the concept had been around for a number of years before that under different names such as: Remote Scripting, by Microsoft, circa 1998 X Internet, by Forrester Research in October 2000 (Web) clients web application Comparison to standard web applications Traditional web applications centered all activity around a client-server architecture with a thin client.

Transcript of Internet applications unit1

Page 1: Internet applications unit1

INTERNET APPLICATIONS

Internet applications (IA) are web applications that have the features and

functionality of traditional desktop applications. IAs typically transfer the processing

necessary for the user interface to the web client but keep the bulk of the data (i.e

maintaining the state of the program, the data etc) back on the application server.

IAS TYPICALLY:

run in a web browser, or do not require software installation

run locally in a secure environment called a sandbox

can be "occasionally connected" wandering in and out of hot-spots or from office to

office.

HISTORY OF IAS

The term " Internet application" was introduced in a Macromedia whitepaper in

March 2002, though the concept had been around for a number of years before that under

different names such as:

Remote Scripting, by Microsoft, circa 1998

X Internet, by Forrester Research in October 2000

(Web) clients

web application

Comparison to standard web applications

Traditional web applications centered all activity around a client-server

architecture with a thin client. Under this system all processing is done on the server, and

the client is only used to display static (in this case HTML) content. The biggest

drawback with this system is that all interaction with the application must pass through

the server, which requires data to be sent to the server, the server to respond, and the page

to be reloaded on the client with the response. By using a client side technology which

can execute instructions on the client's computer, IAs can circumvent this slow and

synchronous loop for many user interactions. This difference is somewhat analogous to

the difference between "terminal and mainframe" and Client-server/Fat client approaches.

Internet standards have evolved slowly and continually over time to accommodate these

techniques, so it is hard to draw a strict line between what constitutes an IA and what

Page 2: Internet applications unit1

does not. But all IAs share one characteristic: they introduce an intermediate layer of

code, often called a client engine, between the user and the server. This client engine is

usually downloaded at the beginning of the application, and may be supplemented by

further code downloads as the application progresses. The client engine acts as an

extension of the browser, and usually takes over responsibility for rendering the

application's user interface and for server communication.

What can be done in an IA may be limited by the capabilities of the system used

on the client. But in general, the client engine is programmed to perform application

functions that its designer believes will enhance some aspect of the user interface, or

improve its responsiveness when handling certain user interactions, compared to a

standard Web browser implementation. Also, while simply adding a client engine does

not force an application to depart from the normal synchronous pattern of interactions

between browser and server, in most IAs the client engine performs additional

asynchronous communications with servers.

BENEFITS

Because IAs employ a client engine to interact with the user, they are:

er. They can offer user-interface behaviors not obtainable using only the HTML

widgets available to standard browser-based Web applications. This er functionality may

include anything that can be implemented in the technology being used on the client side,

including drag and drop, using a slider to change data, calculations performed only by the

client and which do not need to be sent back to the server (e.g. an insurance rate

calculator), etc.

More responsive. The interface behaviors are typically much more responsive

than those of a standard Web browser that must always interact with the server.

The most sophisticated examples of IAs exhibit a look and feel approaching that of a

desktop environment. Using a client engine can also produce other performance benefits:

Client/Server balance. The demand for client and server computing resources is

better balanced, so that the Web server need not be the workhorse that it is with a

traditional Web application. This frees server resources, allowing the same server

hardware to handle more client sessions concurrently.

Page 3: Internet applications unit1

Asynchronous communication. The client engine can interact with the server

asynchronously -- that is, without waiting for the user to perform an interface action like

clicking on a button or link. This option allows IA designers to move data between the

client and the server without making the user wait. Perhaps the most common application

of this is prefetching, in which an application anticipates a future need for certain data,

and downloads it to the client before the user requests it, thereby speeding up a

subsequent response. Google Maps uses this technique to move adjacent map segments to

the client before the user scrolls their view.

Network efficiency. The network traffic may also be significantly reduced because

an application-specific client engine can be more intelligent than a standard Web browser

when deciding what data needs to be exchanged with servers. This can speed up

individual requests or responses because less data is being transferred for each

interaction, and overall network load is reduced. However, use of asynchronous

prefetching techniques can neutralize or even reverse this potential benefit. Because the

code cannot anticipate exactly what every user will do next, it is common for such

techniques to download extra data, not all of which is actually needed, to many or all

clients.

Page 4: Internet applications unit1

SHORTCOMINGS AND RESTRICTIONS

Shortcomings and restrictions associated with IAs are:

Sandbox. Because IAs run within a sandbox, they have restricted access to system

resources. If assumptions about access to resources are incorrect, IAs may fail to operate

correctly.

Disabled scripting. JavaScript or another scripting language is often required. If

the user has disabled active scripting in their browser, the IA may not function properly,

if at all.

Client processing speed. To achieve platform independence, some IAs use client-

side scripts written in interpreted languages such as JavaScript, with a consequential loss

of performance. This is not an issue with compiled client languages such as Java, where

performance is comparable to that of traditional compiled languages, or with Flash

movies, in which the bulk of the operations are performed by the native code of the Flash

player.

Script download time. Although it does not have to be installed, the additional

client-side intelligence (or client engine) of IA applications needs to be delivered by the

server to the client. While much of this is usually automatically cached it needs to be

transferred at least once. Depending on the size and type of delivery, script download

time may be unpleasantly long. IA developers can lessen the impact of this delay by

compressing the scripts, and by staging their delivery over multiple pages of an

application.

Loss of integrity. If the application-base is X/HTML, conflicts arise between the

goal of an application (which naturally wants to be in control of its presentation and

behaviour) and the goals of X/HTML (which naturally wants to give away control). The

DOM interface for X/HTML makes it possible to create IAs, but by doing so makes it

impossible to guarantee correct function. Because an IA client can modify the IA's basic

structure and override presentation and behaviour, it can cause an irrecoverable client

failure or crash. Eventually, this problem could be solved by new client-side mechanisms

that granted an IA client more limited permission to modify only those resources within

the scope of its application. (Standard software running natively does not have this

Page 5: Internet applications unit1

problem because by definition a program automatically possesses all rights to all its

allocated resources).

Loss of visibility to search engines. Search engines may not be able to index the

text content of the application.

MANAGEMENT COMPLICATIONS

The advent of IA technologies has introduced considerable additional complexity

into Web applications. Traditional Web applications built using only standard HTML,

having a relatively simple software architecture and being constructed using a limited set

of development options, are relatively easy to design and manage. For the person or

organization using IA technology to deliver a Web application, their additional

complexity makes them harder to design, test, measure, and support.

Use of IA technology poses several new Service Level Management ("SLM")

challenges, not all of which are completely solved today. SLM concerns are not always

the focus of application developers, and are rarely if ever perceived by application users,

but they are vital to the successful delivery of an online application. Aspects of the IA

architecture that complicate management processes[1] are:

Greater complexity makes development harder. The ability to move code to the

client gives application designers and developers far more creative freedom. But this in

turn makes development harder, increases the likelihood of defects (bugs) being

introduced, and complicates software testing activities. These complications lengthen the

software development process, regardless of the particular methodology or process being

employed. Some of these issues may be mitigated through the use of a Web application

framework to standardize aspects of IA design and development. However, increasing

complexity in a software solution can complicate and lengthen the testing process, if it

increases the number of use cases to be tested. Incomplete testing lowers the application's

quality and its reliability during use.

IA architecture breaks the Web page paradigm. Traditional Web applications can

be viewed as a series of Web pages, each of which requires a distinct download, initiated

by an HTTP GET request. This model has been characterized as the Web page paradigm.

IAs invalidate this model, introducing additional asynchronous server communications to

Page 6: Internet applications unit1

support a more responsive user interface. In IAs, the time to complete a page download

may no longer correspond to something a user perceives as important, because (for

example) the client engine may be prefetching some of the downloaded content for future

use. New measurement techniques must be devised for IAs, to permit reporting of

response time quantities that reflect the user's experience. In the absence of standard tools

that do this, IA developers must instrument their application code to produce the

measurement data needed for SLM.

Asynchronous communication makes it harder to isolate performance problems.

Paradoxically, actions taken to enhance application responsiveness also make it harder to

measure, understand, report on, and manage responsiveness. Some IAs do not issue any

further HTTP GET requests from the browser after their first page, using asynchronous

requests from the client engine to initiate all subsequent downloads. The IA client engine

may be programmed to continually download new content and refresh the display, or (in

applications using the Comet approach) a server-side engine can keep pushing new

content to the browser over a connection that never closes. In these cases, the concept of

a "page download" is no longer applicable. These complications make it harder to

measure and subdivide application response times, a fundamental requirement for

problem isolation and service level management. Tools designed to measure traditional

Web applications may -- depending on the details of the application and the tool -- report

such applications either as a single Web page per HTTP request, or as an unrelated

collection of server activities. Neither conclusion reflects what is really happening at the

application level.

The client engine makes it harder to measure response time. For traditional Web

applications, measurement software can reside either on the client machine or on a

machine that is close to the server, provided that it can observe the flow of network

traffic at the TCP and HTTP levels. Because these protocols are synchronous and

predictable, a packet sniffer can read and interpret packet-level data, and infer the user’s

experience of response time by tracking HTTP messages and the times of underlying

TCP packets and acknowledgments. But the IA architecture reduces the power of the

packet sniffing approach, because the client engine breaks the communication between

user and server into two separate cycles operating asynchronously -- a foreground (user-

Page 7: Internet applications unit1

to-engine) cycle, and a background (engine-to-server) cycle. Both cycles are important,

because neither stands alone; it is their relationship that defines application behavior. But

that relationship depends only on the application design, which cannot (in general) be

inferred by a measurement tool, especially one that can observe only one of the two

cycles. Therefore the most complete IA measurements can only be obtained using tools

that reside on the client and observe both cycles.

THE CURRENT STATUS OF IA DEVELOPMENT AND ADOPTION

IAs are still in the early stages of development and user adoption. There are a

number of restrictions and requirements that remain:

Browser adoption: Many IAs require modern web browsers in order to run. Advanced

JavaScript engines must be present in the browser as IAs use techniques such as

XMLHTTPRequest for client-server communication, and DOM Scripting and advanced

CSS techniques to enable the user interface.

Web standards: Differences between web browsers can make it difficult to write an IA

that will run across all major browsers. The consistency of the Java platform, particularly

after Java 1.1, makes this task much simpler for IAs written as Java applets.

Development tools: Some Ajax Frameworks and products like Adobe Flex provide an

integrated environment in which to build IA and B2B web applications.

Accessibility concerns: Additional interactivity may require technical approaches that

limit applications' accessibility.

User adoption: Users expecting standard web applications may find that some accepted

browser functionality (such as the "Back" button) may have somewhat different or even

undesired behaviour.

Page 8: Internet applications unit1

JUSTIFICATIONS

Although developing applications to run in a web browser is a much more

limiting, difficult, and intricate a process than developing a regular desktop application,

the efforts are often justified because:

installation is not required -- updating and distributing the application is an

instant, automatically handled process

users can use the application from any computer with an internet connection, and

usually regardless of what operating system that computer is running

web-based applications are generally less prone to viral infection than running an

actual executable

as web usage increases, computer users are becoming less willing to go to the

trouble of installing new software if a browser-based alternative is available

This last point is often true even if this alternative is slower or not as feature-. A

good example of this phenomenon is webmail.

METHODS AND TECHNIQUES

JAVASCRIPT

The first major client side language and technology available with the ability to

run code and installed on a majority of web clients was JavaScript. Although its uses

were relatively limited at first, combined with layers and other developments in DHTML

it has become possible to piece together an IA system without the use of a unified client-

side solution. Ajax is a new term coined to refer to this combination of techniques and

has recently been used most prominently by Google for projects such as Gmail and

Google Maps. However, creating a large application in this framework is very difficult,

as many different technologies must interact to make it work, and browser compatibility

requires a lot of effort. In order to make the process easier, several AJAX Frameworks

have been developed.

The "" in " Internet applications" may also suffer from an all-JavaScript approach,

because you are still bound by the media types predictably supported by the world's

various deployed browsers -- video will display in different ways in different browsers

Page 9: Internet applications unit1

with an all-JavaScript approach, audio support will be unpredictable, realtime

communications, whiteboarding, outbound webcams, opacity compositing, socket

support, all of these are implemented in different ways in different browsers, so all-

JavaScript approaches tend to cluster their "ness" around text refreshes and image

refreshes.

ADOBE FLASH AND APOLLO

Adobe Flash is another way to build Internet Application. This technology is

cross-platform and quite powerful to create application UI. Adobe Flex provides the

possibility to create Flash user interface by compiling MXML, a XML based interface

description language. Adobe is currently working on providing a more powerful IA

platform with the product Adobe Apollo, a technology combining Flash and PDF.

WINDOWS PRESENTATION FOUNDATION

With the .NET 3.0 Framework, Microsoft introduced Windows Presentation

Foundation (WPF) which provides a way to build single-platform applications with some

similarities to IAs using XAML and languages like C# and Visual Basic. In addition,

Microsoft has announced Windows Presentation Foundation/Everywhere which may

eventually provide a subset of WPF functionality on devices and other platforms.

ACTIVEX CONTROLS

Embedding ActiveX controls into HTML is a very powerful way to develop

Internet applications, however they are only guaranteed to run properly in Internet

Explorer. Furthermore, since they can break the sandbox model, they are potential targets

for computer viruses and malware making them high security risks. At the time of this

writing, the Adobe Flash Player for Internet Explorer is implemented as an ActiveX

control for Microsoft environments, as well as in multi-platform Netscape Plugin

wrappers for the wider world. Only if corporations have standardized on using Internet

Page 10: Internet applications unit1

Explorer as the primary web browser, is ActiveX per se a good choice for building

corporate applications.

JAVA APPLETS

Java applets run in standard HTML pages and generally start automatically when

their web page is opened with a modern web browser. Java applets have access to the

screen (inside an area designated in its page's HTML), speakers, keyboard and mouse of

any computer their web page is opened on, as well as access to the Internet, and provide a

sophisticated environment capable of real time applications.

JAVA APPLICATIONS

Java based IAs can be launched from within the browser or as free standing

applications via Java Web Start. Java IAs can take advantage of the full power of the Java

platform to deliver functionality, 2D & 3D graphics, and off-line capabilities, but at the

cost of delayed startup.

Numerous frameworks for Java IAs exist, including XUL-like XML-based

frameworks such as XUI and Swixml.

USER INTERFACE LANGUAGES

As an alternative to HTML/XHTML new user interface markup languages can be

used in IAs. For instance, the Mozilla Foundation's XML-based user interface markup

language XUL - this could be used in IAs though it would be restricted to Mozilla-based

browsers, since it is not a de facto or de jure standard. The W3Cs Web Clients

Activity[2] has initiated a Web Application Formats Working Group whose mission

includes the development of such standards [3].

IA's user interfaces can also become er through the use of scriptable SVG (though

not all browsers support native SVG rendering yet) as well SMIL.

Page 11: Internet applications unit1

OTHER TECHNIQUES

IAs could use XForms to enhance their functionality.

Using XML and XSLT along with some XHTML, CSS and JavaScript can also be used

to generate er client side UI components like data tables that can be resorted locally on

the client without going back to the server. Mozilla and Internet Explorer browsers both

support this.

The Omnis Web Client is an ActiveX control or Netscape plug-in which can be

embedded into an HTML page providing a application interface in the end-user's web

browser.

IA WITH REAL-TIME PUSH

Traditionally, web pages have been delivered to the client only when the client

requested for it. For every client request, the browser initiates an HTTP connection to the

web server, which then returns the data and the connection is closed. The drawback of

this approach was that the page displayed was updated only when the user explicitly

refreshes the page or moves to a new page. Since transferring entire pages can take a long

time, refreshing pages can introduce a long latency.

DEMAND FOR LOCALISED USAGE OF IA

With the increasing adoption and improvement in broadband technologies, fewer

users experience poor peformance caused by remote latency. Furthermore one of the

critical reasons for using an IA is that many developers are looking for a language to

serve up desktop applications that is not only desktop OS neutral but also installation and

system issue free.

IA running in the ubiquitous web browser is a potential candidate even when used

standalone or over a LAN, with the required webserver functionalities hosted locally.

[edit] Client-side functionalities and development tools for IA needed

With client-side functionalities like Javascript and DHTML, IA can operate on top of a

range of OS and webserver functionalities.

Page 12: Internet applications unit1

DIRECTORY SERVICE

A directory service is a software application — or a set of applications — that

stores and organizes information about a computer network's users and network

resources, and that allows network administrators to manage users' access to the

resources. Additionally, directory services act as an abstraction layer between users and

shared resources.

A directory service should not be confused with the directory repository itself;

which is the database that holds information about named objects that are managed in the

directory service. In the case of the X.500 distributed directory services model, one or

more namespaces (trees of objects) are used to form the directory service. The directory

service provides the access interface to the data that is contained in one or more directory

namespaces. The directory service interface acts as a central/common authority that can

securely authenticate the system resources that manage the directory data.

Like a database, a directory service is highly optimized for reads and provides

advanced search on the many different attributes that can be associated with objects in a

directory. The data that is stored in the directory is defined by an extendible and

modifiable schema. Directory services use a distributed model for storing their

information and that information is usually replicated between directory servers. [1]

INTRODUCTION

A simple directory service called a naming service maps the names of network

resources to their respective network addresses. With the name service type of directory,

a user doesn't have to remember the physical address of a network resource; providing a

name will locate the resource. Each resource on the network is considered an object on

the directory server. Information about a particular resource is stored as attributes of that

object. Information within objects can be made secure so that only users with the

available permissions are able to access it. More sophisticated directories are designed

Page 13: Internet applications unit1

with namespaces as Subscribers, Services, Devices, Entitlements, Preferences, Content

and so on. This design process is highly related to Identity management.

A directory service defines the namespace for the network. A namespace in this

context is the term that is used to hold one or more objects as named entries. The

directory design process normally has a set of rules that determine how network

resources are named and identified. The rules specify that the names be unique and

unambiguous. In X.500 (the directory service standards) and LDAP the name is called

the distinguished name (DN) and is used to refer to a collection of attributes (relative

distinguished names) which make up the name of a directory entry.

A directory service is a shared information infrastructure for locating, managing,

administrating, and organizing common items and network resources, which can include

volumes, folders, files, printers, users, groups, devices, telephone numbers and other

objects. A directory service is an important component of a NOS (Network Operating

System). In the more complex cases a directory service is the central information

repository for a Service Delivery Platform. For example, looking up "computers" using a

directory service might yield a list of available computers and information for accessing

them.

Replication and Distribution have very distinct meanings in the design and

management of a directory service. The term replication is used to indicate that the same

directory namespace (the same objects) are copied to another directory server for

redundancy and throughput reasons. The replicated namespace is governed by the same

authority. The term distribution is used to indicate that multiple directory servers, that

hold different namespaces, are interconnected to form a distributed directory service.

Each distinct namespace can be governed by different authorities.

DIRECTORY SERVICES SOFTWARE

Directory services produced by different vendors and standards bodies include the

following offerings:

Windows NT Directory Services (NTDS) for Windows NT

Active Directory for Windows 2000, Server 2003

Page 14: Internet applications unit1

Apple Open Directory in Mac OS X Server

Novell eDirectory - formerly called Novell Directory Services (NDS) for Novell

NetWare version 4.x-5.x

OpenLDAP

Fedora Directory Server

Sun Directory Services

COMPARISON WITH RELATIONAL DATABASES

There are a number of things that distinguish a traditional directory service from a

relational database.

Depending on the directory application, the information is generally read more

often than it is written. Hence the usual database features of transactions and rollback are

not implemented in some directory systems. Data may be made redundant, but the

objective is to get a faster response time during searches.

Data can be organized in a strictly hierarchical manner which is sometimes seen

to be problematic. To overcome the issues of deep namespaces, some directories

dismantle the object namespace hierarchy in their storage mechanisms in order to

optimize navigation. That is, these directories find the item based on their data attributes

and then determine their namespace values as this is faster than navigating large

namespaces to find the item. In terms of cardinality, traditional directories do not have

many-to-many relations. Instead, such relations must be maintained explicitly using lists

of distinguished names or other identifiers (similar to the cross table identifiers used in

relational databases).

Originally X.500 type directory information hierarchies were considered

problematic against relational data designs. Today Java based object-oriented databases

are being developed and XML document forms have adopted an hierarchical object

model - indicating an evolution from traditional relational data engineering.

A schema is defined as object classes, attributes, name bindings and knowledge

(namespaces).

An objectClass has:

Page 15: Internet applications unit1

Must-attributes that each of its instances must have

May-attributes that can be defined for an instance, but could also be omitted when the

object is created. The lack of a certain attribute is somewhat like a NULL in relational

databases

Attributes are sometimes multi-valued in directories allowing multiple naming

attributes at one level such as machine type and serial number concatenated or multiple

phone numbers for "work phone".

Attributes and objectClasses are standardized throughout the industry and

formally registered with the IANA for their object ID. Therefore directory applications

seek to reuse much of the standard classes and attributes to maximize the benefit of

existing directory server software.

Object instances are slotted into namespaces. That is, each objectClass inherits

from its parent objectClass (and ultimately from the root of the hierarchy) adding

attributes to the must/may list.

Directory services are often a central component in the security design of an IT

system and have a correspondingly fine granularity regarding access control: who may

operate in which manner on what information. Also see: ACLs

Directory design is quite different from relational database design. With databases

one tends to design a data model for the business issues and process requirements,

sometimes with the online customer, service, user management, presence and system

scale issues omitted. With directories however, if one is placing information into a

common repository for many applications and users, then its information (and identity)

design and schema must be developed around what the objects are representing in real

life. In most cases, these objects represent users, address books, rosters, preferences,

entitlements, products and services, devices, profiles, policies, telephone numbers,

routing information, etc. In addition one must also consider the operational aspects of

design in regard to performance and scale. A quick check on the operational design is to

take eg. 1 million users, 50 objects each with users or applications accessing these objects

up to 5000 times a second, minute, or hour (to authorize and update their service

environments), and check if the server and network machinery considered can support

this.

Page 16: Internet applications unit1

The major difference with databases and directories is at the system level where a

database is used to automate a process with a dedicated (relational) data model, but a

directory is used to hold "identified" objects that can be used by many applications in

random ways. A Directory service is applied where "multi governance" (many

applications and users) are, for integrity and efficiency reasons, using the same

information. This approach to system design gives greater scale and flexibility so that the

larger scale functions such as Service Delivery Platforms can be specified correctly.

SDPs now need to support 100s of millions of objects (HSS/HLR, address books, user

entitlements, VOIP telephone numbers, user and device information, etc) in real time,

random ways and be managed from BSS/OSS/CRM type systems as well as the customer

self care applications. See Service Delivery Platform

Symptomatic of database designs is that the larger companies have hundreds (if

not thousands) of them for different processes and are now trying to converge their user

and service identity information and their online goods and services management, and

deliver these in real time, cost effectively. So a large scale directory service should be in

their solution architecture.

IMPLEMENTATIONS OF DIRECTORY SERVICES

Directory services were part of an Open Systems Interconnection (OSI) initiative

to get everyone in the industry to agree to common network standards to provide multi-

vendor interoperability. In the 1980s the ITU and ISO came up with a set of standards -

X.500, for directory services, initially to support the requirements of inter-carrier

electronic messaging and network name lookup. The Lightweight Directory Access

Protocol, LDAP, is based on the directory information services of X.500, but uses the

TCP/IP stack and a string encoding scheme of the X.509 protocol DAP, giving it more

relevance on the Internet.

There have been numerous forms of directory service implementations from

different vendors. Among them are:

NIS: The Network Information Service (NIS) protocol, originally named Yellow

Pages (YP), was Sun Microsystems' implementation of a directory service for Unix

Page 17: Internet applications unit1

network environments. (Sun has, in the early 2000s, merged its iPlanet alliance Netscape

and developed its LDAP-based directory service to become part of Sun ONE, now called

Sun Java Enterprise.)

eDirectory: This is Novell's implementation of directory services. It supports

multiple architectures including Windows, NetWare, Linux and several flavours of Unix

and has long been used for user administration, configuration management, and software

management. eDirectory has evolved into a central component in a broader range of

Identity management products. It was previously known as Novell Directory Services.

Red Hat Directory Server: Red Hat released a directory service, that it acquired from

AOL's Netscape Security Solutions unit[1], as a commercial product running on top of

Red Hat Enterprise Linux called Red Hat Directory Server and as part of Fedora Core

called Fedora Directory Server.

Active Directory: Microsoft's directory service is the Active Directory which is

included in the Windows 2000 and Windows Server 2003 operating system versions.

Open Directory: Apple's Mac OS X Server offers a directory service called Open

Directory which integrates with many open standard protocols such as LDAP and

Kerberos as well as proprietary directory solutions like Active Directory and eDirectory.

Apache Directory Server: Apache Software Foundation offers a directory service

called ApacheDS.

Oracle Internet Directory: (OID) is Oracle Corporation's directory service, which

is compatible with LDAP version 3.

Computer Associates Etrust Directory:

Sun Java System Directory Server: Sun Microsystems' current directory service offering,

found at [2].

OpenDS: A next generation and open source directory service, backed by Sun

Microsystems and hosted at [3].

There are also plenty of open-source tools to create directory services, including

OpenLDAP and the Kerberos (protocol), and Samba software which can act as a Domain

Controller with Kerberos and LDAP backends.

Page 18: Internet applications unit1

NEXT GENERATION DIRECTORY SYSTEMS

Databases have been with the IT industry since the dawn of the computer age and

traditional directories for the last 20-30 years, and they will be with us in the future.

However, with the larger scale, converged services and event driven (presence) systems

now being developed world wide (e.g. 3G-IMS), information, identity and presence

services engineering and the technologies that support it will require some evolution.

This could take the form of CADS (Composite Adaptive Directory Services) and CADS

supported Service Delivery Platforms, see www.wwite.com and Service Delivery

Platform. CADS is an advanced directory service and contains functions for managing

identity, presence, content and adaption algorithms for self-tuning and with its unique

functions, greatly simplifies and enhances the design of converged services SDPs. See

Service Delivery Platform

Page 19: Internet applications unit1

DOMAIN NAME SYSTEM

On the Internet, the domain name system (DNS) stores and

associates many types of information with domain names; most

importantly, it translates domain names (computer hostnames) to IP

addresses. It also lists mail exchange servers accepting e-mail for each

domain. In providing a worldwide keyword-based redirection service, DNS

is an essential component of contemporary Internet use.

Useful for several reasons, the DNS pre-eminently makes it

possible to attach easy-to-remember domain names (such as

"wikipedia.org") to hard-to-remember IP addresses (such as

66.230.200.100). People take advantage of this when they recite URLs and

e-mail addresses. In a subsidiary function, the domain name system makes

it possible for people to assign authoritative names without needing to

communicate with a central registrar each time.

HISTORY OF THE DNS

The practice of using a name as a more human-legible abstraction

of a machine's numerical address on the network predates even TCP/IP,

and goes all the way back to the ARPAnet era. Originally, each computer

on the network retrieved a file called HOSTS.TXT from SRI (now SRI

International) which mapped an address (such as 192.0.34.166) to a name

(such as www.example.net.) The Hosts file still exists on most modern

operating systems, either by default or through configuration, and

allows users to specify an IP address to use for a hostname without

checking the DNS. This file now serves primarily for troubleshooting DNS

errors or for mapping local addresses to more organic names. Systems

based on a HOSTS.TXT file have inherent limitations, because of the

obvious requirement that every time a given computer's address changed,

every computer that seeks to communicate with it would need an update to

its Hosts file.

The growth of networking called for a more scalable system: one

that recorded a change in a host's address in one place only. Other

hosts would learn about the change dynamically through a notification

system, thus completing a globally accessible network of all hosts'

names and their associated IP Addresses.

Paul Mockapetris invented the DNS in 1983 and wrote the first

implementation. The original specifications appear in RFC 882 and 883.

In 1987, the publication of RFC 1034 and RFC 1035 updated the DNS

Page 20: Internet applications unit1

specification and made RFC 882 and RFC 883 obsolete. Several more-recent

RFCs have proposed various extensions to the core DNS protocols.

In 1984, four Berkeley students — Douglas Terry, Mark Painter,

David Riggle and Songnian Zhou — wrote the first UNIX implementation,

which was maintained by Ralph Campbell thereafter. In 1985, Kevin Dunlap

of DEC significantly re-wrote the DNS implementation and renamed it

BIND. Mike Karels, Phil Almquist and Paul Vixie have maintained BIND

since then. BIND was ported to the Windows NT platform in the early

1990s.

Due to its long history of security issues, several alternative

nameserver/resolver programs have been written and distributed in recent

years.

HOW THE DNS WORKS IN THEORY

Domain names, arranged in a tree, cut into zones, each served by a

nameserver.

The domain name space consists of a tree of domain names. Each node or

leaf in the tree has one or more resource records, which hold

information associated with the domain name. The tree sub-divides into

zones. A zone consists of a collection of connected nodes

authoritatively served by an authoritative DNS nameserver. (Note that a

single nameserver can host several zones.)

Page 21: Internet applications unit1

When a system administrator wants to let another administrator

control a part of the domain name space within his or her zone of

authority, he or she can delegate control to the other administrator.

This splits a part of the old zone off into a new zone, which comes

under the authority of the second administrator's nameservers. The old

zone becomes no longer authoritative for what comes under the authority

of the new zone.

A resolver looks up the information associated with nodes. A

resolver knows how to communicate with name servers by sending DNS

requests, and heeding DNS responses. Resolving usually entails recursing

through several name servers to find the needed information.

Some resolvers function simplistically and can only communicate

with a single name server. These simple resolvers rely on a recursing

name server to perform the work of finding information for them.

UNDERSTANDING THE PARTS OF A DOMAIN NAME

A domain name usually consists of two or more parts (technically

labels), separated by dots. For example wikipedia.org.

The rightmost label conveys the top-level domain (for example, the

address en.wikipedia.org has the top-level domain org).

Each label to the left specifies a subdivision or subdomain of the

domain above it. Note that "subdomain" expresses relative dependence,

not absolute dependence: for example, wikipedia.org comprises a

subdomain of the org domain, and en.wikipedia.org comprises a subdomain

of the domain wikipedia.org. In theory, this subdivision can go down to

127 levels deep, and each label can contain up to 63 characters, as long

as the whole domain name does not exceed a total length of 255

characters. But in practice some domain registries have shorter limits

than that.

A hostname refers to a domain name that has one or more associated

IP addresses. For example, the en.wikipedia.org and wikipedia.org

domains are both hostnames, but the org domain is not.

The DNS consists of a hierarchical set of DNS servers. Each domain

or subdomain has one or more authoritative DNS servers that publish

information about that domain and the name servers of any domains

"beneath" it. The hierarchy of authoritative DNS servers matches the

hierarchy of domains. At the top of the hierarchy stand the root

Page 22: Internet applications unit1

servers: the servers to query when looking up (resolving) a top-level

domain name (TLD).

THE ADDRESS RESOLUTION MECHANISM

In theory a full host name may have several name segments, (e.g

ahost.ofasubnet.ofabiggernet.inadomain.example). In practice, in the

experience of the majority of public users of Internet services, full

host names will frequently consist of just three segments

(ahost.inadomain.example, and most often www.inadomain.example).

For querying purposes, software interprets the name segment by

segment, from right to left, using an iterative search procedure. At

each step along the way, the program queries a corresponding DNS server

to provide a pointer to the next server which it should consult.

A DNS recurser consults three nameservers to resolve the address

www.wikipedia.org.

As originally envisaged, the process was as simple as: the local

system is pre-configured with the known addresses of the root servers in

a file of root hints, which need to be updated periodically by the local

administrator from a reliable source to be kept up to date with the

changes which occur over time.

Query one of the root servers to find the server authoritative for

the next level down (so in the case of our simple hostname, a root

server would be asked for the address of a server with detailed

knowledge of the example top level domain).

Querying this second server for the address of a DNS server with

detailed knowledge of the second-level domain (inadomain.example in our

example) repeating the previous step to progress down the name, until

the final step which would, rather than generating the address of the

next DNS server, return the final address sought.

The diagram illustrates this process for the real host

www.wikipedia.org.

Page 23: Internet applications unit1

The mechanism in this simple form has a difficulty: it places a huge

operating burden on the collective of root servers, with each and every

search for an address starting by querying one of them. Being as

critical as they are to the overall function of the system such heavy

use would create an insurmountable bottleneck for trillions of queries

placed every day. In practice there are two key additions to the

mechanism.

Firstly, the DNS resolution process allows for local recording and

subsequent consultation of the results of a query (or caching) for a

period of time after a successful answer (the server providing the

answer initially dictates the period of validity, which may vary from

just seconds to days or even weeks). In our illustration, having found a

list of addresses of servers capable of answering queries about

the .example domain, the local resolver will not need to make the query

again until the validity of the currently known list expires, and so on

for all subsequent steps. Hence having successfully resolved the address

of ahost.inadomain.example it is not necessary to repeat the process for

some time since the address already reached will be deemed reliable for

a defined period, and resolution of anotherhost.anotherdomain.example

can commence with already knowing which servers can answer queries for

the .example domain. Caching significantly reduces the rate at which the

most critical name servers have to respond to queries, adding the extra

benefit that subsequent resolutions are not delayed by network transit

times for the queries and responses.

Secondly, most domestic and small-business clients "hand off"

address resolution to their ISP's DNS servers to perform the look-up

process, thus allowing for the greatest benefit from those same ISPs

having busy local caches serving a wide variety of queries and a large

number of users.

CIRCULAR DEPENDENCIES AND GLUE RECORDS

Name servers in delegations appear listed by name, rather than by

IP address. This means that a resolving name server must issue another

DNS request to find out the IP address of the server to which it has

been referred. Since this can introduce a circular dependency if the

Page 24: Internet applications unit1

nameserver referred to is under the domain that it is authoritative of,

it is occasionally necessary for the nameserver providing the delegation

to also provide the IP address of the next nameserver. This record is

called a glue record.

For example, assume that the sub-domain en.wikipedia.org contains

further sub-domains (such as something.en.wikipedia.org) and that the

authoritative nameserver for these lives at ns1.en.wikipedia.org. A

computer trying to resolve something.en.wikipedia.org will thus first

have to resolve ns1.en.wikipedia.org. Since ns1 is also under the

en.wikipedia.org subdomain, resolving something.en.wikipedia.org

requires resolving ns1.en.wikipedia.org which is exactly the circular

dependency mentioned above. The dependency is broken by the glue record

in the nameserver of wikipedia.org that provides the IP address of

ns1.en.wikipedia.org directly to the requestor, enabling it to bootstrap

the process by figuring out where ns1.en.wikipedia.org is located.

DNS IN PRACTICE

When an application (such as a web browser) tries to find the IP

address of a domain name, it doesn't necessarily follow all of the steps

outlined in the Theory section above. We will first look at the concept

of caching, and then outline the operation of DNS in "the real world."

CACHING AND TIME TO LIVE

Because of the huge volume of requests generated by a system like

the DNS, the designers wished to provide a mechanism to reduce the load

on individual DNS servers. The mechanism devised provided that when a

DNS resolver (i.e. client) received a DNS response, it would cache that

response for a given period of time. A value (set by the administrator

of the DNS server handing out the response) called the time to live

(TTL), defines that period of time. Once a response goes into cache, the

resolver will consult its cached (stored) answer; only when the TTL

expires (or when an administrator manually flushes the response from the

resolver's memory) will the resolver contact the DNS server for the same

information.

Generally, the Start of Authority (SOA) record specifies the time

to live. The SOA record has the parameters:

Page 25: Internet applications unit1

Serial — the zone serial number, incremented when the zone file is

modified, so the slave and secondary name servers know when the zone has

been changed and should be reloaded.

Refresh — the number of seconds between update requests from

secondary and slave name servers.

Retry — the number of seconds the secondary or slave will wait

before retrying when the last attempt has failed.

Expire — the number of seconds a master or slave will wait before

considering the data stale if it cannot reach the primary name server.

Minimum — previously used to determine the minimum TTL, this

offers negative caching.

CACHING TIME

As a noteworthy consequence of this distributed and caching

architecture, changes to the DNS do not always take effect immediately

and globally. This is best explained with an example: If an

administrator has set a TTL of 6 hours for the host www.wikipedia.org,

and then changes the IP address to which www.wikipedia.org resolves at

12:01pm, the administrator must consider that a person who cached a

response with the old IP address at 12:00pm will not consult the DNS

server again until 6:00pm. The period between 12:01pm and 6:00pm in this

example is called caching time, which is best defined as a period of

time that begins when you make a change to a DNS record and ends after

the maximum amount of time specified by the TTL expires. This

essentially leads to an important logistical consideration when making

changes to the DNS: not everyone is necessarily seeing the same thing

you're seeing. RFC 1537 helps to convey basic rules for how to set the

TTL.

Note that the term "propagation", although very widely used, does

not describe the effects of caching well. Specifically, it implies that

[1] when you make a DNS change, it somehow spreads to all other DNS

servers (instead, other DNS servers check in with yours as needed), and

[2] that you do not have control over the amount of time the record is

cached (you control the TTL values for all DNS records in your domain,

except your NS records and any authoritative DNS servers that use your

domain name).

Some resolvers may override TTL values, as the protocol supports caching

for up to 68 years or no caching at all. Negative caching (the non-

Page 26: Internet applications unit1

existence of records) is determined by name servers authoritative for a

zone which MUST include the SOA record when reporting no data of the

requested type exists. The MINIMUM field of the SOA record and the TTL

of the SOA itself is used to establish the TTL for the negative answer.

RFC 2308

Many people incorrectly refer to a mysterious 48 hour or 72 hour

propagation time when you make a DNS change. When one changes the NS

records for one's domain or the IP addresses for hostnames of

authoritative DNS servers using one's domain (if any), there can be a

lengthy period of time before all DNS servers use the new information.

This is because those records are handled by the zone parent DNS servers

(for example, the .com DNS servers if your domain is example.com), which

typically cache those records for 48 hours. However, those DNS changes

will be immediately available for any DNS servers that do not have them

cached. And, any DNS changes on your domain other than the NS records

and authoritative DNS server names can be nearly instantaneous, if you

choose for them to be (by lowering the TTL once or twice ahead of time,

and waiting until the old TTL expires before making the change).

DNS IN THE REAL WORLD

DNS resolving from program to OS-resolver to ISP-resolver to

greater system.

Users generally do not communicate directly with a DNS resolver. Instead

DNS resolution takes place transparently in client applications such as

Page 27: Internet applications unit1

web browsers (like Internet Explorer, Opera, Mozilla Firefox, Safari,

Netscape Navigator, etc), mail clients (Outlook Express, Mozilla

Thunderbird, etc), and other Internet applications. When a request is

made which necessitates a DNS lookup, such programs send a resolution

request to the local DNS resolver in the operating system which in turn

handles the communications required.

The DNS resolver will almost invariably have a cache (see above)

containing recent lookups. If the cache can provide the answer to the

request, the resolver will return the value in the cache to the program

that made the request. If the cache does not contain the answer, the

resolver will send the request to a designated DNS server or servers. In

the case of most home users, the Internet service provider to which the

machine connects will usually supply this DNS server: such a user will

either configure that server's address manually or allow DHCP to set it;

however, where systems administrators have configured systems to use

their own DNS servers, their DNS resolvers will generally point to their

own nameservers. This name server will then follow the process outlined

above in DNS in theory, until it either successfully finds a result, or

does not. It then returns its results to the DNS resolver; assuming it

has found a result, the resolver duly caches that result for future use,

and hands the result back to the software which initiated the request.

BROKEN RESOLVERS

An additional level of complexity emerges when resolvers violate

the rules of the DNS protocol. Some people have suggested that a number

of large ISPs have configured their DNS servers to violate rules

(presumably to allow them to run on less-expensive hardware than a fully

compliant resolver), such as by disobeying TTLs, or by indicating that a

domain name does not exist just because one of its name servers does not

respond.

As a final level of complexity, some applications such as Web

browsers also have their own DNS cache, in order to reduce the use of

the DNS resolver library itself. This practice can add extra difficulty

to DNS debugging, as it obscures which data is fresh, or lies in which

cache. These caches typically have very short caching times of the order

of one minute. A notable exception is Internet Explorer; recent versions

cache DNS records for half an hour.

Page 28: Internet applications unit1

OTHER DNS APPLICATIONS

The system outlined above provides a somewhat simplified scenario.

The DNS includes several other functions:

Hostnames and IP addresses do not necessarily match on a one-to-

one basis. Many hostnames may correspond to a single IP address:

combined with virtual hosting, this allows a single machine to serve

many web sites. Alternatively a single hostname may correspond to many

IP addresses: this can facilitate fault tolerance and load distribution,

and also allows a site to move physical location seamlessly.

There are many uses of DNS besides translating names to IP

addresses. For instance, Mail transfer agents use DNS to find out where

to deliver e-mail for a particular address. The domain to mail exchanger

mapping provided by MX records accommodates another layer of fault

tolerance and load distribution on top of the name to IP address

mapping.

Sender Policy Framework and DomainKeys instead of creating own

record types were designed to take advantage of another DNS record type,

the TXT record.

To provide resilience in the event of computer failure, multiple DNS

servers provide coverage of each domain. In particular, thirteen root

servers exist worldwide. DNS programs or operating systems have the IP

addresses of these servers built in. At least nominally, the USA hosts

all but three of the root servers. However, because many root servers

actually implement anycast, where many different computers can share the

same IP address to deliver a single service over a large geographic

region, most of the physical (rather than nominal) root servers now

operate outside the USA.

The DNS uses TCP and UDP on port 53 to serve requests. Almost all

DNS queries consist of a single UDP request from the client followed by

a single UDP reply from the server. TCP typically comes into play only

when the response data size exceeds 512 bytes, or for such tasks as zone

transfer. Some operating systems such as HP-UX are known to have

resolver implementations that use TCP for all queries, even when UDP

would suffice.

EXTENSIONS TO DNS

Page 29: Internet applications unit1

EDNS is an extension of the DNS protocol which enhances the

transport of DNS data in UDP packages, and adds support for expanding

the space of request and response codes. It is described in RFC 2671.

IMPLEMENTATIONS OF DNS

For a commented list of DNS server-side implementations, see

Comparison of DNS server software.

STANDARDS

RFC 882 Concepts and Facilities (Deprecated by RFC 1034)

RFC 883 Domain Names: Implementation specification (Deprecated by RFC

1035)

RFC 1032 Domain administrators guide

RFC 1033 Domain administrators operations guide

RFC 1034 Domain Names - Concepts and Facilities.

RFC 1035 Domain Names - Implementation and Specification

RFC 1101 DNS Encodings of Network Names and Other Types

RFC 1123 Requirements for Internet Hosts -- Application and Support

RFC 1183 New DNS RR Definitions

RFC 1706 DNS NSAP Resource Records

RFC 1876 Location Information in the DNS (LOC)

RFC 1886 DNS Extensions to support IP version 6

RFC 1912 Common DNS Operational and Configuration Errors

RFC 1995 Incremental Zone Transfer in DNS

RFC 1996 A Mechanism for Prompt Notification of Zone Changes (DNS

NOTIFY)

RFC 2136 Dynamic Updates in the domain name system (DNS UPDATE)

RFC 2181 Clarifications to the DNS Specification

RFC 2182 Selection and Operation of Secondary DNS Servers

RFC 2308 Negative Caching of DNS Queries (DNS NCACHE)

RFC 2317 Classless IN-ADDR.ARPA delegation

RFC 2671 Extension Mechanisms for DNS (EDNS0)

RFC 2672 Non-Terminal DNS Name Redirection (DNAME record)

RFC 2782 A DNS RR for specifying the location of services (DNS SRV)

RFC 2845 Secret Key Transaction Authentication for DNS (TSIG)

RFC 2874 DNS Extensions to Support IPv6 Address Aggregation and

Renumbering

Page 30: Internet applications unit1

RFC 3403 Dynamic Delegation Discovery System (DDDS) (NAPTR records)

RFC 3696 Application Techniques for Checking and Transformation of Names

RFC 4398 Storing Certificates in the Domain Name System

RFC 4408 Sender Policy Framework (SPF) (SPF records)

[edit] Types of DNS records

Important categories of data stored in the DNS include the following:

An A record or address record maps a hostname to a 32-bit IPv4 address.

An AAAA record or IPv6 address record maps a hostname to a 128-bit IPv6

address.

A CNAME record or canonical name record is an alias of one name to

another. The A record that the alias is pointing to can be either local

or remote - on a foreign name server. Useful when running multiple

services from a single IP address, where each service has its own entry

in DNS.

An MX record or mail exchange record maps a domain name to a list

of mail exchange servers for that domain.

A PTR record or pointer record maps an IPv4 address to the

canonical name for that host. Setting up a PTR record for a hostname in

the in-addr.arpa domain that corresponds to an IP address implements

reverse DNS lookup for that address. For example (at the time of

writing), www.icann.net has the IP address 192.0.34.164, but a PTR

record maps 164.34.0.192.in-addr.arpa to its canonical name,

referrals.icann.org.

An NS record or name server record maps a domain name to a list of

DNS servers authoritative for that domain. Delegations depend on NS

records.

An SOA record or start of authority record specifies the DNS

server providing authoritative information about an Internet domain, the

email of the domain administrator, the domain serial number, and several

timers relating to refreshing the zone. An SRV record is a generalized

service location record.

A TXT record allows an administrator to insert arbitrary text into

a DNS record. For example, this record is used to implement the Sender

Policy Framework and DomainKeys specifications.

Page 31: Internet applications unit1

NAPTR records ("Naming Authority Pointer") are a newer type of DNS

record that support regular expression based rewriting.

Other types of records simply provide information (for example, a

LOC record gives the physical location of a host), or experimental data

(for example, a WKS record gives a list of servers offering some well

known service such as HTTP or POP3 for a domain).

INTERNATIONALISED DOMAIN NAMES

INTERNATIONALIZED DOMAIN NAME

While domain names in the DNS have no restrictions on the

characters they use and can include non-ASCII characters, the same is

not true for host names. Host names are the names most people see and

use for things like e-mail and web browsing. Host names are restricted

to a small subset of the ASCII character set that includes the Roman

alphabet in upper and lower case, the digits 0 through 9, the dot, and

the hyphen. (See RFC 3696 section 2 for details.) This prevented the

representation of names and words of many languages natively. ICANN has

approved the Punycode-based IDNA system, which maps Unicode strings into

the valid DNS character set, as a workaround to this issue. Some

registries have adopted IDNA.

SECURITY ISSUES IN DNS

DNS was not originally designed with security in mind, and thus

has a number of security issues. DNS responses are traditionally not

cryptographically signed, leading to many attack possibilities; DNSSEC

modifies DNS to add support for cryptographically signed responses.

There are various extensions to support securing zone transfer

information as well.

Some domain names can spoof other, similar-looking domain names.

For example, "paypal.com" and "paypa1.com" are different names, yet

users may be unable to tell the difference. This problem is much more

serious in systems that support internationalized domain names, since

many characters that are different (from the point of view of ISO 10646)

appear identical on typical computer screens.

LEGAL USERS OF DOMAINS

REGISTRANT

Page 32: Internet applications unit1

No one in the world really "owns" a domain name except the Network

Information Centre (NIC), or domain name registry.[citation needed] Most

of the NICs in the world receive an annual fee from a legal user in

order for the legal user to utilize the domain name (i.e. a sort of a

leasing agreement exists, subject to the registry's terms and

conditions). Depending on the various naming convention of the

registries, legal users become commonly known as "registrants" or as

"domain holders".

ICANN holds a complete list of domain registries in the world. One can

find the legal user of a domain name by looking in the WHOIS database

held by most domain registries.

For most of the more than 240 country code top-level domains

(ccTLDs), the domain registries hold the authoritative WHOIS

(Registrant, name servers, expiry dates etc). For instance, DENIC,

Germany NIC holds the authoritative WHOIS to a .DE domain name.

However, some domain registries, such as for .COM, .ORG, .INFO,

etc., use a registry-registrar model. There are hundreds of Domain Name

Registrars that actually perform the domain name registration with the

end-user (see lists at ICANN or VeriSign). By using this method of

distribution, the registry only has to manage the relationship with the

registrar, and the registrar maintains the relationship with the end-

users, or 'registrants'. For .COM, .NET domain names, the domain

registries, VeriSign holds a basic WHOIS (registrar and name servers

etc). One can find the detailed WHOIS (Registrant, name servers, expiry

dates etc) at the registrars.

Since about 2001, most gTLD registries (.ORG, .BIZ, .INFO) have adopted

a so-called "thick" registry approach, i.e. keeping the authoritative

WHOIS with the various registries instead of the registrars.

[edit] Administrative contact

A registrant usually designates an administrative contact to manage the

domain name. In practice, the administrative contact usually has the

most immediate power over a domain. Management functions delegated to

the administrative contacts may include (for example):

the obligation to conform to the requirements of the domain registry in

order to retain the right to use a domain name

authorization to update the physical address, e-mail address and

telephone number etc in WHOIS

[edit] Technical contact

Page 33: Internet applications unit1

A technical contact manages the name servers of a domain name. The many

functions of a technical contact include:

making sure the configurations of the domain name conforms to the

requirements of the domain registry

updating the domain zone

providing the 24x7 functionality of the name servers (that leads to the

accessibility of the domain name)

[edit] Billing contact

The party whom a NIC invoices.

[edit] Name servers

Namely the authoritative name servers that host the domain name zone of

a domain name.

[edit] Politics

Many investigators have voiced criticism of the methods currently used

to control ownership of domains. Critics commonly claim abuse by

monopolies or near-monopolies, such as VeriSign, Inc. Particularly

noteworthy was the VeriSign Site Finder system which redirected all

unregistered .com and .net domains to a VeriSign webpage. Despite

widespread criticism, VeriSign only reluctantly removed it after ICANN

threatened to revoke its contract to administer the root name servers.

There is also significant disquiet regarding United States political

influence over the Internet Corporation for Assigned Names and Numbers

(ICANN). This was a significant issue in the attempt to create a .xxx

Top-level domain and sparked greater interest in Alternative DNS roots

that would be beyond the control of any single country.

[edit] Truth in Domain Names Act

In the United States, the "Truth in Domain Names Act", in combination

with the PROTECT Act, forbids the use of a misleading domain name with

the intention of attracting people into viewing a visual depiction of

sexually explicit conduct on the Internet.

ELECTRONIC MAIL

Page 34: Internet applications unit1

Electronic mail (abbreviated "e-mail" or, often, "email") is a store and forward

method of composing, sending, storing, and receiving messages over electronic

communication systems. The term "e-mail" (as a noun or verb) applies both to the

Internet e-mail system based on the Simple Mail Transfer Protocol (SMTP) and to

intranet systems allowing users within one organization to e-mail each other. Often these

workgroup collaboration organizations may use the Internet protocols for internal e-mail

service.

ORIGINS OF E-MAIL

E-mail predates the Internet; existing e-mail systems were a crucial tool in

creating the Internet. MIT first demonstrated the Compatible Time-Sharing System

(CTSS) in 1961. [1] It allowed multiple users to log into the IBM 7094 [2] from remote

dial-up terminals, and to store files online on disk. This new ability encouraged users to

share information in new ways. E-mail started in 1965 as a way for multiple users of a

time-sharing mainframe computer to communicate. Although the exact history is murky,

among the first systems to have such a facility were SDC's Q32 and MIT's CTSS.

E-mail was quickly extended to become network e-mail, allowing users to pass

messages between different computers. The messages could be transferred between users

on different computers by 1966, but it is possible the SAGE system had something

similar some time before.

The ARPANET computer network made a large contribution to the evolution of

e-mail. There is one report [1] which indicates experimental inter-system e-mail transfers

on it shortly after its creation, in 1969. Ray Tomlinson initiated the use of the @ sign to

separate the names of the user and their machine in 1971 [2]. The ARPANET

significantly increased the popularity of e-mail, and it became the killer app of the

ARPANET.

MODERN INTERNET E-MAIL

How Internet e-mail works

Page 35: Internet applications unit1

The diagram above shows a typical sequence of events that takes place when

Alice composes a message using her mail user agent (MUA). She types in, or selects

from an address book, the e-mail address of her correspondent. She hits the "send"

button.

Her MUA formats the message in Internet e-mail format and uses the Simple Mail

Transfer Protocol (SMTP) to send the message to the local mail transfer agent (MTA), in

this case smtp.a.org, run by Alice's Internet Service Provider (ISP).

The MTA looks at the destination address provided in the SMTP protocol (not from the

message header), in this case [email protected]. An Internet e-mail address is a string of the

form [email protected], which is known as a Fully Qualified Domain Address

(FQDA). The part before the @ sign is the local part of the address, often the username of

the recipient, and the part after the @ sign is a domain name. The MTA looks up this

domain name in the Domain Name System to find the mail exchange servers accepting

messages for that domain.

The DNS server for the b.org domain, ns.b.org, responds with an MX record

listing the mail exchange servers for that domain, in this case mx.b.org, a server run by

Bob's ISP.

Page 36: Internet applications unit1

smtp.a.org sends the message to mx.b.org using SMTP, which delivers it to the

mailbox of the user bob.

Bob presses the "get mail" button in his MUA, which picks up the message using

the Post Office Protocol (POP3).

This sequence of events applies to the majority of e-mail users. However, there

are many alternative possibilities and complications to the e-mail system:

Alice or Bob may use a client connected to a corporate e-mail system, such as

IBM's Lotus Notes or Microsoft's Exchange. These systems often have their own internal

e-mail format and their clients typically communicate with the e-mail server using a

vendor-specific, proprietary protocol. The server sends or receives e-mail via the Internet

through the product's Internet mail gateway which also does any necessary reformatting.

If Alice and Bob work for the same company, the entire transaction may happen

completely within a single corporate e-mail system.

Alice may not have a MUA on her computer but instead may connect to a

webmail service.

Alice's computer may run its own MTA, so avoiding the transfer at step 1.

Bob may pick up his e-mail in many ways, for example using the Internet Message

Access Protocol, by logging into mx.b.org and reading it directly, or by using a webmail

service.

Domains usually have several mail exchange servers so that they can continue to

accept mail when the main mail exchange server is not available.

Emails are not secure if email encryption is not used correctly.

It used to be the case that many MTAs would accept messages for any recipient on the

Internet and do their best to deliver them. Such MTAs are called open mail relays. This

was important in the early days of the Internet when network connections were

unreliable. If an MTA couldn't reach the destination, it could at least deliver it to a relay

that was closer to the destination. The relay would have a better chance of delivering the

message at a later time. However, this mechanism proved to be exploitable by people

sending unsolicited bulk e-mail and as a consequence very few modern MTAs are open

mail relays, and many MTAs will not accept messages from open mail relays because

such messages are very likely to be spam.

Page 37: Internet applications unit1

INTERNET E-MAIL FORMAT

The format of Internet e-mail messages is defined in RFC 2822 and a series of

RFCs, RFC 2045 through RFC 2049, collectively called Multipurpose Internet Mail

Extensions (MIME). Although as of July 13, 2005 (see [3]) RFC 2822 is technically a

proposed IETF standard and the MIME RFCs are draft IETF standards, these documents

are the de facto standards for the format of Internet e-mail. Prior to the introduction of

RFC 2822 in 2001 the format described by RFC 822 was the de facto standard for

Internet e-mail for nearly two decades; it is still the official IETF standard. The IETF

reserved the numbers 2821 and 2822 for the updated versions of RFC 821 (SMTP) and

RFC 822, honoring the extreme importance of these two RFCs. RFC 822 was published

in 1982 and based on the earlier RFC 733.

Internet e-mail messages consist of two major sections:

Header - Structured into fields such as summary, sender, receiver, and other

information about the e-mail

Body - The message itself as unstructured text; sometimes containing a signature

block at the end

The header is separated from the body by a blank line.

INTERNET E-MAIL HEADER

The message header consists of fields, usually including at least the following:

From: The e-mail address, and optionally name, of the sender of the message

To: The e-mail address[es], and optionally name[s], of the receiver[s] of the message

Subject: A brief summary of the contents of the message

Date: The local time and date when the message was originally sent

Each header field has a name and a value. RFC 2822 specifies the precise syntax.

Informally, the field name starts in the first character of a line, followed by a ":",

followed by the value which is continued on non-null subsequent lines that have a space

Page 38: Internet applications unit1

or tab as their first character. Field names and values are restricted to 7-bit ASCII

characters. Non-ASCII values may be represented using MIME encoded words.

Note that the "To" field in the header is not necessarily related to the addresses to which

the message is delivered. The actual delivery list is supplied in the SMTP protocol, not

extracted from the header content. The "To" field is similar to the greeting at the top of a

conventional letter which is delivered according to the address on the outer envelope.

Also note that the "From" field does not have to be the real sender of the e-mail message.

It is very easy to fake the "From" field and let a message seem to be from any mail

address. It is possible to digitally sign e-mail, which is much harder to fake. Some

Internet service providers do not relay e-mail claiming to come from a domain not hosted

by them, but very few (if any) check to make sure that the person or even e-mail address

named in the "From" field is the one associated with the connection. Some internet

service providers apply e-mail authentication systems to e-mail being sent through their

MTA to allow other MTAs to detect forged spam that might apparently appear to be from

them.

Cc: carbon copy

Received: Tracking information generated by mail servers that have previously handled a

message

Content-Type: Information about how the message has to be displayed, usually a MIME

type

Many e-mail clients present "Bcc" (Blind carbon copy, recipients not visible in the "To"

field) as a header field. Since the entire header is visible to all recipients, "Bcc" is not

included in the message header. Addresses added as "Bcc" are only added to the SMTP

delivery list, and do not get included in the message data.

IANA maintains a list of standard header fields.

E-MAIL CONTENT ENCODING

Page 39: Internet applications unit1

E-mail was originally designed for 7-bit ASCII. Much e-mail software is 8-bit

clean but must assume it will be communicating with 7-bit servers and mail readers. The

MIME standard introduced charset specifiers and two content transfer encodings to

encode 8 bit data for transmission: quoted printable for mostly 7 bit content with a few

characters outside that range and base64 for arbitrary binary data. The 8BITMIME

extension was introduced to allow transmission of mail without the need for these

encodings but many mail transport agents still don't support it fully. For international

character sets, Unicode is growing in popularity.

SAVED MESSAGE FILENAME EXTENSION

Most, but not all, e-mail clients save individual messages as separate files, or

allow users to do so. Different applications save e-mail files with different filename

extensions.

.eml - This is the default e-mail extension for Mozilla Thunderbird and is used by

Microsoft Outlook Express.

.emlx - Used by Apple Mail.

.msg - Used by Microsoft Office Outlook.

MESSAGES AND MAILBOXES

Messages are exchanged between hosts using the Simple Mail Transfer Protocol

with software like Sendmail. Users can download their messages from servers with

standard protocols such as the POP or IMAP protocols, or, as is more likely in a large

corporate environment, with a proprietary protocol specific to Lotus Notes or Microsoft

Exchange Servers.

Mail can be stored either on the client, on the server side, or in both places.

Standard formats for mailboxes include Maildir and mbox. Several prominent e-mail

clients use their own proprietary format and require conversion software to transfer e-

mail between them.

Page 40: Internet applications unit1

When a message cannot be delivered, the recipient MTA must send a bounce

message back to the sender, indicating the problem.

SPAMMING AND E-MAIL WORMS

The usefulness of e-mail is being threatened by three phenomena: spamming,

phishing and e-mail worms.

Spamming is unsolicited commercial e-mail. Because of the very low cost of

sending e-mail, spammers can send hundreds of millions of e-mail messages each day

over an inexpensive Internet connection. Hundreds of active spammers sending this

volume of mail results in information overload for many computer users who receive tens

or even hundreds of junk messages each day.

E-mail worms use e-mail as a way of replicating themselves into vulnerable

computers. Although the first e-mail worm affected UNIX computers, the problem is

most common today on the more popular Microsoft Windows operating system.

The combination of spam and worm programs results in users receiving a constant

drizzle of junk e-mail, which reduces the usefulness of e-mail as a practical tool.

A number of anti-spam techniques mitigate the impact of spam. In the United

States, U.S. Congress has also passed a law, the Can Spam Act of 2003, attempting to

regulate such e-mail. Australia also has very strict spam laws restricting the sending of

spam from an Australian ISP (http://www.aph.gov.au/library/pubs/bd/2003-

04/04bd045.pdf), but its impact has been minimal since most spam comes from regimes

that seem reluctant to regulate the sending of spam.

PRIVACY PROBLEMS REGARDING E-MAIL

E-MAIL PRIVACY

E-mail privacy, without some security precautions, can be compromised because

e-mail messages are generally not encrypted;

E-mail messages have to go through intermediate computers before reaching their

destination, meaning it is relatively easy for others to intercept and read messages;

Page 41: Internet applications unit1

Many Internet Service Providers (ISP) store copies of your e-mail messages on

their mail servers before they are delivered. The backups of these can remain up to

several months on their server, even if you delete them in your mailbox;

The Received: headers and other information in the email can often identify the

sender, preventing anonymous communication.

There are cryptography applications that can serve as a remedy to one or more of

the above. For example, Virtual Private Networks or the Tor anonymity network can be

used to encrypt traffic from the user machine to a safer network while GPG, PGP or

S/MIME can be used for end-to-end message encryption, and SMTP STARTTLS or

SMTP over Transport Layer Security/Secure Sockets Layer can be used to encrypt

communications for a single mail hop between the SMTP client and the SMTP server.

Another risk is that e-mail passwords might be intercepted during sign-in. One

may use encrypted authentication schemes such as SASL to help prevent this.

FINGER

In computer networking, the Name/Finger protocol and the Finger user

information protocol are simple network protocols for the exchange of human-oriented

status and user information.

Name/Finger protocol

The Name/Finger protocol is based on Request for comments document 742

(December 1977) as an interface to the name and finger programs that provide status

reports on a particular computer system or a particular person at network sites. The finger

program was written in 1971 by Les Earnest who created the program to solve the need

of users who wanted information on other users of the network. Information on who is

logged-in was useful to check the availability of a person to meet.

Page 42: Internet applications unit1

Prior to the finger program, the only way to get this information was with a who

program that showed IDs and terminal line numbers for logged-in users, and people used

to run their fingers down the who list. Earnest named his program after this concept.

Finger user information protocol

The Finger user information protocol is based on RFC 1288 (The Finger User

Information Protocol, December 1991). Typically the server side of the protocol is

implemented by a program fingerd (for finger daemon), while the client side is

implemented by the name and finger programs which are supposed to return a friendly,

human-oriented status report on either the system at the moment or a particular person in

depth. There is no required format, and the protocol consists mostly of specifying a single

command line. It is most often implemented on Unix or Unix-like systems.

The program would supply information such as whether a user is currently

logged-on, e-mail address, full name etc. As well as standard user information, finger

displays the contents of the .project and .plan files in the user's home directory. Often

this file (maintained by the user) contains either useful information about the user's

current activities, or alternatively all manner of humor.

SECURITY CONCERNS

Supplying such detailed information as e-mail addresses and full names was

considered acceptable and convenient in the early days of the Internet, but later was

considered questionable for privacy and security reasons. Finger information has been

frequently used by crackers as a way to initiate a social engineering attack on a

company's computer security system. By using a finger client to get a list of a company's

employee names, email addresses, phone numbers, and so on, a cracker can telephone or

email someone at a company requesting information while posing as another employee.

The finger daemon has also had several exploitable security holes which crackers have

used to break into systems. The Morris worm exploited an overflow vulnerability in

fingerd (among others) to spread.

Page 43: Internet applications unit1

For these reasons, while finger was widely used during the early days of Internet,

by the 1990s the vast majority of sites on the internet no longer offered the service.

Notable exceptions include John Carmack and Justin Frankel, who until recently still

updated their status information occasionally. In late 2005, John Carmack switched to

using a blog, instead of his old .plan site.

FILE TRANSFER PROTOCOL (FTP)

FTP or File Transfer Protocol is used to connect two computers over the Internet

so that the user of one computer can transfer files and perform file commands on the

other computer.

Specifically, FTP is a commonly used protocol for exchanging files over any

network that supports the TCP/IP protocol (such as the Internet or an intranet). There are

two computers involved in an FTP transfer: a server and a client. The FTP server,

running FTP server software, listens on the network for connection requests from other

computers. The client computer, running FTP client software, initiates a connection to the

server. Once connected, the client can do a number of file manipulation operations such

as uploading files to the server, download files from the server, rename or delete files on

the server and so on. Any software company or individual programmer is able to create

FTP server or client software because the protocol is an open standard. Virtually every

computer platform supports the FTP protocol. This allows any computer connected to a

TCP/IP based network to manipulate files on another computer on that network

regardless of which operating systems are involved (if the computers permit FTP access).

OVERVIEW

FTP runs exclusively over TCP. FTP servers by default listen on port 21 for

incoming connections from FTP clients. A connection to this port from the FTP Client

forms the control stream on which commands are passed to the FTP server from the FTP

client and on occasion from the FTP server to the FTP client. For the actual file transfer

Page 44: Internet applications unit1

to take place, a different connection is required which is called the data stream.

Depending on the transfer mode, the process of setting up the data stream is different.

In active mode, the FTP client opens a random port (> 1023), sends the FTP server the

random port number on which it is listening over the control stream and waits for a

connection from the FTP server. When the FTP server initiates the data connection to the

FTP client it binds the source port to port 20 on the FTP server.

In passive mode, the FTP Server opens a random port (> 1023), sends the FTP

client the port on which it is listening over the control stream and waits for a connection

from the FTP client. In this case the FTP client binds the source port of the connection to

a random port greater than 1023.

While data is being transferred via the data stream, the control stream sits idle.

This can cause problems with large data transfers through firewalls which time out

sessions after lengthy periods of idleness. While the file may well be successfully

transferred, the control session can be disconnected by the firewall, causing an error to be

generated.

When FTP is used in a UNIX environment, there is an often-ignored but valuable

command, "reget" (meaning "get again") that will cause an interrupted "get" command to

be continued, hopefully to completion, after a communications interruption. The principle

is obvious—the receiving station has a record of what it got, so it can spool through the

file at the sending station and re-start at the right place for a seamless splice. The

converse would be "reput" but is not available. Again, the principle is obvious: The

sending station does not know how much of the file was actually received, so it would

not know where to start.

The objectives of FTP, as outlined by its RFC, are:

To promote sharing of files (computer programs and/or data).

To encourage indirect or implicit use of remote computers.

To shield a user from variations in file storage systems among different hosts.

To transfer data reliably, and efficiently.

CRITICISMS OF FTP

Page 45: Internet applications unit1

Passwords and file contents are sent in clear text, which can be intercepted by

eavesdroppers. There are protocol enhancements that circumvent this.

Multiple TCP/IP connections are used, one for the control connection, and one for each

download, upload, or directory listing. Firewall software needs additional logic to

account for these connections.

It is hard to filter active mode FTP traffic on the client side by using a firewall,

since the client must open an arbitrary port in order to receive the connection. This

problem is largely resolved by using passive mode FTP.

It is possible to abuse the protocol's built-in proxy features to tell a server to send

data to an arbitrary port of a third computer; see FXP.

FTP is a high latency protocol due to the number of commands needed to initiate

a transfer.

No integrity check on the receiver side. If transfer is interrupted the receiver has

no way to know if the received file is complete or not. It is necessary to manage this

externally for example with MD5 sums or cyclic redundancy checking.

No error detection. FTP relies on the underlying TCP layer for error control,

which uses a weak checksum by modern standards.

No date/timestamp attribute transfer. Uploaded files are given a new current

timestamp, unlike other file transfer protocols such as SFTP, which allow attributes to be

included. There is no way in the standard FTP protocol to set the time-last-modified (or

time-created) datestamp that most modern filesystems preserve. There is a draft of a

proposed extension that adds new commands for this, but as of yet, most of the popular

FTP servers do not support it.

SECURITY PROBLEMS

The original FTP specification is an inherently insecure method of transferring

files because there is no method specified for transferring data in an encrypted fashion.

This means that under most network configurations, user names, passwords, FTP

commands and transferred files can be "sniffed" or viewed by anyone on the same

network using a packet sniffer. This is a problem common to many Internet protocol

specifications written prior to the creation of SSL such as HTTP, SMTP and Telnet. The

common solution to this problem is to use either SFTP (SSH File Transfer Protocol), or

Page 46: Internet applications unit1

FTPS (FTP over SSL), which adds SSL or TLS encryption to FTP as specified in RFC

4217.

FTP RETURN CODES

FTP server return codes indicate their status by the digits within them. A brief

explanation of various digits' meanings are given below:

1yz: Positive Preliminary reply. The action requested is being initiated but there

will be another reply before it begins.

2yz: Positive Completion reply. The action requested has been completed. The

client may now issue a new command.

3yz: Positive Intermediate reply. The command was successful, but a further

command is required before the server can act upon the request.

4yz: Transient Negative Completion reply. The command was not successful, but

the client is free to try the command again as the failure is only temporary.

5yz: Permanent Negative Completion reply. The command was not successful

and the client should not attempt to repeat it again.

x0z: The failure was due to a syntax error.

x1z: This response is a reply to a request for information.

x2z: This response is a reply relating to connection information.

x3z: This response is a reply relating to accounting and authorization.

x4z: Unspecified as yet

x5z: These responses indicate the status of the Server file system vis-a-vis the

requested transfer or other file system action

ANONYMOUS FTP

Many sites that run FTP servers enable so-called "anonymous ftp". Under this

arrangement, users do not need an account on the server. The user name for anonymous

access is typically 'anonymous' or 'ftp'. This account does not need a password. Although

users are commonly asked to send their email addresses as their passwords for

authentication, usually there is trivial or no verification, depending on the FTP server and

its configuration. Internet Gopher has been suggested as an alternative to anonymous

FTP, as well as Trivial File Transfer Protocol.

Page 47: Internet applications unit1

While transferring data over the network, several data representations can be

used. The two most common transfer modes are:

ASCII mode

Binary mode

The two types differ in the way they send the data. When a file is sent using an

ASCII-type transfer, the individual letters, numbers, and characters are sent using their

ASCII character codes. The receiving machine saves these in a text file in the appropriate

format (for example, a Unix machine saves it in a Unix format, a Macintosh saves it in a

Mac format). Hence if an ASCII transfer is used it can be assumed plain text is sent,

which is stored by the receiving computer in its own format. Translating between text

formats entails substituting the end of line and end of file characters used on the source

platform with those on the destination platform, e.g. a Windows machine receiving a file

from a Unix machine will replace the line feeds with carriage return-line feed pairs.

ASCII transfer is also marginally faster, as the highest-order bit is dropped from each

byte in the file.[1]

Sending a file in binary mode is different. The sending machine sends each file bit

for bit and as such the recipient stores the bitstream as it receives it. Any form of data that

is not plain text will be corrupted if this mode is not used.

By default, most FTP clients use ASCII mode. Some clients try to determine the required

transfer-mode by inspecting the file's name or contents.

The FTP specifications also list the following transfer modes:

EBCDIC mode

Local mode

In practice, these additional transfer modes are rarely used. They are however still

used by some legacy mainframe systems.

FTP AND WEB BROWSERS

Most recent web browsers and file managers can connect to FTP servers, although

they may lack the support for protocol extensions such as FTPS. This allows

manipulation of remote files over FTP through an interface similar to that used for local

files. This is done via an FTP URL, which takes the form

Page 48: Internet applications unit1

ftp(s)://<ftpserveraddress>  (e.g., [2]). A password can optionally be given in the URL,

e.g.:   ftp(s)://<login>:<password>@<ftpserveraddress>:<port>. Most web-browsers

require the use of passive mode FTP, which not all FTP servers are capable of handling.

Some browsers allow only the downloading of files, but offer no way to upload files to

the server.

FTP OVER SSH

FTP over SSH refers to the practice of tunneling a normal FTP session over an

SSH connection.

Because FTP uses multiple TCP connections (unusual for a TCP/IP protocol that

is still in use), it is particularly difficult to tunnel over SSH. With many SSH clients,

attempting to set up a tunnel for the control channel (the initial client-to-server

connection on port 21) will only protect that channel; when data is transferred, the FTP

software at either end will set up new TCP connections (data channels) which will bypass

the SSH connection, and thus have no confidentiality, integrity protection, etc.

If the FTP client is configured to use passive mode and to connect to a SOCKS server

interface that many SSH clients can present for tunnelling, it is possible to run all the FTP

channels over the SSH connection.

Otherwise, it is necessary for the SSH client software to have specific knowledge

of the FTP protocol, and monitor and rewrite FTP control channel messages and

autonomously open new forwardings for FTP data channels. Version 3 of SSH

Communications Security's software suite, and the GPL licensed FONC are two software

packages that support this mode.

FTP over SSH is sometimes referred to as secure FTP; this should not be

confused with other methods of securing FTP, such as with SSL/TLS (FTPS). Other

methods of transferring files using SSH that are not related to FTP include SFTP and

SCP; in each of these, the entire conversation (credentials and data) is always protected

by the SSH protocol.

AN OVERVIEW OF THE FILE TRANSFER PROTOCOL

Page 49: Internet applications unit1

The File Transfer Protocol (FTP) was one of the first efforts to create a standard

means of exchanging files over a TCP/IP network, so the FTP has been around since the

1970's.  The FTP was designed with as much flexibility as possible, so it could be used

over networks other than TCP/IP, as well as being engineered to have the capability with

exchanging files with a broad variety of machines.

The base specification is RFC 959 and is dated October 1985.  There are some

additional RFCs relating to FTP, but it should be noted that even as of this writing

(December 2001) that most of the new additions are not in widespread use.  The purpose

of this document is to provide general information about how the protocol works without

getting into too many technical details.  RFC 959 should be consulted for details on the

protocol.

Control Connection -- the conversation channel

The protocol can be thought of as interactive, because clients and servers actually

have a conversation where they authenticate themselves and negotiate file transfers.  In

addition, the protocol specifies that the client and server do not exchange data on the

conversation channel.  Instead, clients and servers negotiate how to send data files on

separate connections, with one connection for each data transfer.  Note that a directory

listing is considered a file transfer.

To illustrate, we'll just present (an admittedly contrived) example of how the FTP

would work between human beings rather than computer systems.  For our example, we'll

assume we have a client, Carl Clinton, who wishes to transfer files from Acme Mail

Service that manages his post office box.  Below is a transcript of a phone call between

Carl Clinton and Acme Mail Service.

Clinton: (Dials the phone number for the mail service)

Service: "Hello, this is the Acme Mail Service.  How may I help you today?"

Clinton: "Hello, this is Carl Clinton.  I would like to access mailbox number

MB1234."

Service: "OK, Mr. Clinton, I need to verify that you may access mailbox

MB1234.  What is your password?"

Clinton: "My password is QXJ4Z2AF."

Service: "Thank you Mr. Clinton, you may proceed."

Page 50: Internet applications unit1

Clinton: "For now, I'm only interested in looking at the bills and invoices, so

look at the folder marked "bills" in my mailbox."

Service: "OK."

Clinton: "Please prepare to have your assistant call my secretary at +1 402 555

1234."

Service: "OK."

Clinton: "Now call my secretary and tell him the names of all the items in the

bills folder of my mailbox.  Tell me when you have finished."

Server: "My assistant is calling your secretary now."

Server: "My assistant has sent the names of the items."

Clinton: (Receives the list from his secretary and notices a bill from Yoyodyne

Systems.)

"Please prepare to have your assistant send to my fax machine +1 402

555 7777."

Service: "OK."

Clinton: "Now fax a copy of the bill from Yoyodyne Systems."

Server: "My assistant is calling your fax machine now."

Server: "My assistant has finished faxing the item."

Clinton: "Thank you, that is all.  Good bye."

Server: "Goodbye."

Now let's look at how this same conversation would appear between computer systems

communicating with the FTP protocol over a TCP/IP connection.

Client: Connects to the FTP service at

port 21 on the IP address

172.16.62.36.

Server: 220 Hello, this is the Acme Mail

Service.

Client: USER MB1234

Page 51: Internet applications unit1

Server: 331 Password required to access user

account MB1234.

Client: PASS QXJ4Z2AF Note that this password is not

encrypted.  The FTP is

susceptible to eavesdropping!

Server: 230 Logged in.

Client: CWD Bills Change directory to "Bills."

Server: 250 "/home/MB1234/Bills" is new

working directory.

Client: PORT 192,168,1,2,7,138 The client wants the server to

send to port number 1930 on IP

address 192.168.1.2.  In this

case, 192.168.1.2 is the IP

address of the client machine.

Server: 200 PORT command successful.

Client: LIST Send the list of files in "Bills."

Server: 150 Opening ASCII mode data

connection for /bin/ls.

The server now connects out

from its port 20 on

172.16.62.36  to port 1930 on

192.168.1.2.

Server: 226 Listing completed. That succeeded, so the data is

now sent over the established

data connection.

Client: PORT 192,168,1,2,7,139 The client wants the server to

send to port number 1931 on

the client machine.

Server: 200 PORT command successful.

Client: RETR Yoyodyne.TXT Download "Yoyodyne.TXT."

Server: 150 Opening ASCII mode data

connection for Yoyodyne.TXT.

The server now connects out

from its port 20 on

Page 52: Internet applications unit1

172.16.62.36 to port 1931 on

192.168.1.2.

Server: 226 Transfer completed. That succeeded, so the data is

now sent over the established

data connection.

Client: QUIT

Server: 221 Goodbye.

When using FTP, users use FTP client programs rather than directly communicating with

the FTP server.  Here's our same example using the stock "ftp" program which is usually

installed as /usr/bin/ftp on UNIX systems (and FTP.EXE on Windows).  The items the

user types are in bold.

ksh$ /usr/bin/ftp

ftp> open ftp.acmemail.example.com

Connected to ftp.acmemail.example.com (172.16.62.36).

220 Hello, this is the Acme Mail Service.

Name (ftp.acmemail.example.com:root): MB1234

331 Password required to access user account MB1234.

Password: QXJ4Z2AF

230 Logged in.

ftp> cd Bills

250 "/home/MB1234/Bills" is new working directory.

ftp> ls

200 PORT command successful.

150 Opening ASCII mode data connection for /bin/ls.

-rw-r--r-- 1 ftpuser ftpusers 14886 Dec 3 15:22 Acmemail.TXT

-rw-r--r-- 1 ftpuser ftpusers 317000 Dec 4 17:40 Yoyodyne.TXT

226 Listing completed.

ftp> get Yoyodyne.TXT

local: Yoyodyne.TXT remote: Yoyodyne.TXT

200 PORT command successful.

Page 53: Internet applications unit1

150 Opening ASCII mode data connection for Yoyodyne.TXT.

226 Transfer completed.

317000 bytes received in 0.0262 secs (1.2e+04 Kbytes/sec)

ftp> quit

221 Goodbye.

As you can see, FTP is designed to allow users to browse the filesystem much like

you would with a regular UNIX  login shell or MS-DOS command prompt.  This differs

from other protocols that are transactional (i.e. HTTP), where a connection is established,

clients issue a single message to a server that replies with a single reply, and the

connection is closed.  On the other hand, client programs can be constructed to simulate a

transactional environment if they know in advance what they need to do.  In effect, FTP

is a stateful sequence of one or more transactions.

Command primitives, result codes and textual responses

The client is always responsible for initiating requests.  These requests are issued

with FTP command primitives, which are typically 3 or 4 characters each.  For example,

the command primitive to change the working directory is CWD.

The server replies are specially formatted to contain a 3-digit result code first, followed

by a space character, followed by descriptive text (there is also a format for multi-line

responses).  The protocol specifies that clients must only rely upon the numeric result

code, since the descriptive text is allowed to vary (with a few exceptions).  In practice,

the result text is often helpful for debugging, but is generally no longer useful for end

users.

AUTHENTICATION

Although it is not required by protocol, in effect clients must always login to the

FTP server with a username and password before the server will allow the client to access

the service.  

There is also a de facto standard for guest access, where "anonymous" (or "ftp")

are used as the username and an e-mail address is customarily used as the password in a

way for a polite netizen to let the server administrator know who is using the guest login. 

Because users do not want to divulge their e-mail addresses to protect against unsolicited

Page 54: Internet applications unit1

bulk e-mail, this has subsequently evolved to the point where the password is just some

arbitrary text.

TYPES OF DATA CONNECTIONS

The protocol has built-in support for different types of data transfers.  The two

mandated types are ASCII for text (specified by the client sending "TYPE A" to the

server), and "image" for binary data (specified by "TYPE I").

ASCII transfers are useful when the server machine and client machine have different

standards for text.  For example, MS-DOS and Microsoft Windows use a carriage return

and linefeed sequence to denote an end-of-line, but UNIX systems use just a linefeed. 

When ASCII transfers are specified, this enables a client to always be able to translate the

data into its own native text format.

Binary transfers can be used for any type of raw data that requires no translation.  

Client programs should use binary transfers unless they know that the file in question is

text.

The protocol does not have any advanced support for character sets for pathnames

nor file contents.  There is no way to specify UNICODE, for example.  For ASCII, it is 7-

bit ASCII only.

Unfortunately, the burden of deciding what transfer type to use is left to the client,

unlike HTTP, which can inform the client what type of data is being sent.  Clients often

simply choose to transfer everything in binary, and perform any necessary translation

after the file is downloaded.  Additionally, binary transfers are inherently more efficient

to send over the network since the client and server do not need to perform on-the-fly

translation of the data.

It should be noted that ASCII transfers are mandated by the protocol as the default

transfer type unless the client requests otherwise!

The PORT and PASV conundrum -- Active and Passive data connections

Although it was purposely designed into the protocol as a feature, FTP's use of

separate data connections cause numerous problems for things like firewalls, routers,

proxies which want to restrict or delegate TCP connections, as well as things like IP

stacks which want to do dynamic stateful inspection of TCP connections.  

Page 55: Internet applications unit1

The protocol does not mandate a particular port number or a direction that a data

connection uses.  For example, the easy way out would have been for the protocol's

designers to mandate that all data connections must originate from the client machine and

terminate at port 20 on the server machine.

Instead, for maximum flexibility, the protocol allows the client to choose one of

two methods.  The first method, which we'll call "Active", is where the client requests

that the server originate a data connection and terminate at an IP address and port number

of the client's choosing.  The important thing to note here is that the server connects out

to the client.

Client: "Please connect to me at port 1931 on IP address 192.168.1.2, then

send the data."

Server: "OK"

Or, the client can request that the server to assign an IP address and port number on the

server side and have the client originate a connection to the server address.  We call this

method "Passive" and note that the client connects out to the server.

Client: "Please tell me where I can get the data."

Server: "Connect to me at port 4023 on 172.16.62.36."

The active method uses the FTP command primitive PORT, so the first example using

the actual FTP protocol would resemble this:

Client: PORT 192,168,1,2,7,139

Server: 200 PORT command successful.

The passive method uses the FTP command primitive PASV, so the second example

using the actual FTP protocol would resemble this:

Client: PASV

Server: Entering Passive Mode (172,16,62,36,133,111)

It should be noted that FTP servers are required to implement PORT, but are not required

to implement PASV.  The default has traditionally been PORT for this reason, but in

practice it is now preferred to use PASV whenever possible because firewalls may be

present on the client side which often cause problems.

Partial data connections -- resuming downloads

Page 56: Internet applications unit1

The protocol provides a means to only transfer a portion of a file, by having a

client specify a starting offset into the file (using the REST primitive, i.e. "restart point"). 

If an FTP session fails while a data transfer is in progress and has to be reestablished, a

client can request that the server restart the transfer at the offset the client specifies.  Note

that not all FTP servers support this feature.

DIRECTORY LISTINGS

The base standard of the FTP protocol provides two types of listings, a simple

name list (NLST) and a human-readable extended listing (LIST).  The name list consists

of lines of text, where each line contains exactly one file name and nothing else.

The extended listing is not intended to be machine-readable and the protocol does not

mandate any particular format.  The de facto standard is for it to be in UNIX "/bin/ls -l"

format, but although most servers try to emulate that format even on non-UNIX FTP

servers, it is still common for servers to provide their own proprietary format.  The

important thing to note here is that this listing can contain any type of data and cannot be

relied upon.  Additionally, even those that appear in "/bin/ls -l" format cannot be relied

upon for the validity of the fields.  For example the date and time could be in local time

or GMT.

Newer FTP server implementations support a machine-readable listing primitive

(MLSD) which is suitable for client programs to get reliable metadata information about

files, but this feature is still relatively rare.  That leaves the simple name list as the only

reliable way to get filenames, but it doesn't tell a client program anything else (such as if

the item is a file or a directory!).

FUNCTIONAL CONCERNS

Despite a rich feature set, there are some glaring omissions.  For example, the

base specification doesn't even provide for clients to query a file's size or modification

date.  However, most FTP servers in use now support a de facto extension to the

specification which provides the SIZE and MDTM primitives, and even newer servers

support the extremely useful MLSD and MSLT primitives which can provide a wealth of

information in a standardized format.

There is also no 100% accurate way for a client to determine if a particular

pathname refers to a file or directory, unless MLSD or MLST is available.  Since the

Page 57: Internet applications unit1

protocol also does not provide a way to transfer an entire directory of items at once, the

consequence is that there is no 100% accurate way to download an entire directory tree.

The end result is that FTP is not particularly suited to "mirroring" files and directories,

although FTP client programs use heuristics to make calculated guesses when possible.

Despite the guesswork that clients can use for determining metadata for files to

download, there's little they can do for files that they upload.  There is no standard way to

preserve an uploaded file's modification time.  FTP is platform agnostic, so there aren't

standard ways to preserve platform-specific metadata such as UNIX permissions and user

IDs or Mac OS file type and creator codes.

Separate connections for data transfers are also a mixed blessing.  For high

performance it would be best to use a single connection and perform multiple data

transfers before closing it.  Even better would be for a method to use a single connection

for both the control connection conversation and data transfers.  Since each data

connection uses an ephemeral (random) port number, it is possible to "run out" of

connections.  For details on this phenomenon, a separate article is available.

SECURITY CONCERNS

It is important to note that the base specification, as implemented by the vast

majority of the world's FTP servers, does not have any special handling for encrypted

communication of any kind.  When clients login to FTP servers, they are sending clear

text usernames and passwords!  This means that anyone with a packet sniffer between the

client and server could surreptitiously steal passwords.  

Besides passwords, potential attackers could not only monitor the entire

conversation on the FTP control connection, they could also monitor the contents of the

data transfers themselves. There have been proposals to make the FTP protocol more

secure, but these proposals have not seen widespread adoption.

Therefore, unless the IP protocol layer itself is secure (for example, encrypted

using IPsec), FTP should not be used if sensitive login information is to be exchanged

over an insecure network, or if the files containing sensitive material are being transferred

over an insecure network.

Page 58: Internet applications unit1

HISTORY OF INTERNET

The History of the Internet dates back to the early development of communication

networks. The idea of a computer network intended to allow general communication

among users of various computers has developed through a large number of stages. The

melting pot of developments brought together the network of networks that we know as

the Internet. This included both technological developments and the merging together of

existing network infrastructure and telecommunication systems.

The infrastructure of the Internet spread across the globe to create the world wide

network of computers we know today. It spread throughout the Western countries before

entering the developing countries, thus creating both unprecedented worldwide access to

information and communications and a digital divide in access to this new infrastructure.

The Internet went on to fundamentally alter the world economy, including the economic

implications of the dot-com bubble.

History of computing

Hardware before 1960

Hardware 1960s to present

Hardware in Soviet Bloc countries

Operating systems

Software engineering

Programming languages

Graphical user interface

In the fifties and early sixties, prior to the widespread inter-networking that led to

the Internet, most communication networks were limited by their nature to only allow

communications between the stations on the network. Some networks had gateways or

bridges between them, but these bridges were often limited or built specifically for a

Page 59: Internet applications unit1

single use. One prevalent computer networking method was based on the central

mainframe method, simply allowing its terminals to be connected via long leased lines.

This method was used in the 1950s by Project RAND to support researchers such as

Herbert Simon, in Pittsburgh, Pennsylvania, when collaborating across the continent with

researchers in Santa Monica, California, on automated theorem proving and artificial

intelligence.

THREE TERMINALS AND AN ARPA

A fundamental pioneer in the call for a global network, J.C.R. Licklider,

articulated the idea in his January 1960 paper, Man-Computer Symbiosis.

"a network of such [computers], connected to one another by wide-band communication

lines" which provided "the functions of present-day libraries together with anticipated

advances in information storage and retrieval and [other] symbiotic functions. "—

J.C.R. Licklider

In October 1962, Licklider was appointed head of the United States Department

of Defense's DARPA information processing office, and formed an informal group

within DARPA to further computer research. As part of the information processing

office's role, three network terminals had been installed: one for System Development

Corporation in Santa Monica, one for Project Genie at the University of California,

Berkeley and one for the Multics project SHOPPING at the Massachusetts Institute of

Technology (MIT). Licklider's need for inter-networking would be made evident by the

problems this caused.

"For each of these three terminals, I had three different sets of user commands.

So if I was talking online with someone at S.D.C. and I wanted to talk to

someone I knew at Berkeley or M.I.T. about this, I had to get up from the

S.D.C. terminal, go over and log into the other terminal and get in touch with

them.

I said, oh, my goodness gracious me, it's obvious what to do (But I don't want

to do it): If you have these three terminals, there ought to be one terminal that

goes anywhere you want to go where you have interactive computing. That idea

is the arpanet."

Page 60: Internet applications unit1

-Robert W. Taylor, co-writer with Licklider of "The Computer as a

Communications Device", in an interview with the New York Times

SWITCHED PACKETS

At the tip of the inter-networking problem lay the issue of connecting separate

physical networks to form one logical network. During the 1960s, Donald Davies (NPL),

Paul Baran (RAND Corporation), and Leonard Kleinrock (MIT) developed and

implemented packet switching. The notion that the Internet was developed to survive a

nuclear attack has its roots in the early theories developed by RAND. Baran's research

had approached packet switching from studies of decentralisation to avoid combat

damage compromising the entire network

Networks that led to the Internet

Office at ARPA, Robert Taylor intended to realize Licklider's ideas of an

interconnected networking system. Bringing in Larry Roberts from MIT, he initiated a

project to build such a network. The first ARPANET link was established between the

University of California, Los Angeles and the Stanford Research Institute on 21

November 1969. By 5 December 1969, a 4-node network was connected by adding the

University of Utah and the University of California, Santa Barbara. Building on ideas

developed in alohanet, the ARPANET started in 1972 and was growing rapidly by 1981.

The number of hosts had grown to 213, with a new host being added approximately every

twenty days.

ARPANET became the technical core of what would become the Internet, and a

primary tool in developing the technologies used. ARPANET development was centered

around the Request for Comments (RFC) process, still used today for proposing and

distributing Internet Protocols and Systems. RFC 1, entitled "Host Software", was written

by Steve Crocker from the University of California, Los Angeles, and published on April

7, 1969. These early years were documented in the 1972 film Computer Networks: The

HERALDS OF RESOURCE SHARING.

International collaborations on ARPANET were sparse. For various political

reasons, European developers were concerned with developing the X.25 networks.

Page 61: Internet applications unit1

Notable exceptions were the Norwegian Seismic Array (NORSAR) in 1972, followed in

1973 by Sweden with satellite links to the Tanum Earth Station and University College

London.

X.25 AND PUBLIC ACCESS

Following on from DARPA's research, packet switching network standards were

developed by the International Telecommunication Union (ITU) in the form of X.25 and

related standards. In 1974, X.25 formed the basis for the sercnet network between British

academic and research sites, which later became JANET. The initial ITU Standard on

X.25 was approved in March 1976. This standard was based on the concept of virtual

circuits.

The British Post Office, Western Union International and Tymnet collaborated to

create the first international packet switched network, referred to as the International

Packet Switched Service (IPSS), in 1978. This network grew from Europe and the US to

cover Canada, Hong Kong and Australia by 1981. By the 1990s it provided a worldwide

networking infrastructure.

Unlike arpanet, X.25 was also commonly available for business use. X.25 would

be used for the first dial-in public access networks, such as Compuserve and Tymnet. In

1979, compuserve became the first service to offer electronic mail capabilities and

technical support to personal computer users. The company broke new ground again in

1980 as the first to offer real-time chat with its CB Simulator. There were also the

America Online (AOL) and Prodigy dial in networks and many bulletin board system

(BBS) networks such as The WELL and fidonet. Fidonet in particular was popular

amongst hobbyist computer users, many of them hackers and amateur radio operators.

UUCP

In 1979, two students at Duke University, Tom Truscott and Jim Ellis, came up

with the idea of using simple Bourne shell scripts to transfer news and messages on a

serial line with nearby University of North Carolina at Chapel Hill. Following public

release of the software, the mesh of UUCP hosts forwarding on the Usenet news rapidly

expanded. Uucpnet, as it would later be named, also created gateways and links between

Page 62: Internet applications unit1

fidonet and dial-up BBS hosts. UUCP networks spread quickly due to the lower costs

involved, and ability to use existing leased lines, X.25 links or even ARPANET

connections. By 1983 the number of UUCP hosts had grown to 550, nearly doubling to

940 in 1984.

MERGING THE NETWORKS AND CREATING THE INTERNET

With so many different network methods, something needed to unify them.

Robert E. Kahn of DARPA and ARPANET recruited Vint Cerf of Stanford University to

work with him on the problem. By 1973, they had soon worked out a fundamental

reformulation, where the differences between network protocols were hidden by using a

common internetwork protocol, and instead of the network being responsible for

reliability, as in the ARPANET, the hosts became responsible. Cerf credits Hubert

Zimmerman, Gerard lelann and Louis Pouzin (designer of the CYCLADES network)

with important work on this design. With the role of the network reduced to the bare

minimum, it became possible to join almost any networks together, no matter what their

characteristics were, thereby solving Kahn's initial problem. DARPA agreed to fund

development of prototype software, and after several years of work, the first somewhat

crude demonstration of a gateway between the Packet Radio network in the SF Bay area

and the ARPANET was conducted. By November 1977 a three network demonstration

was conducted including the ARPANET, the Packet Radio Network and the Atlantic

Packet Satellite network—all sponsored by DARPA. Stemming from the first

specifications of TCP in 1974, TCP/IP emerged in mid-late 1978 in nearly final form. By

1981, the associated standards were published as rfcs 791, 792 and 793 and adopted for

use. DARPA sponsored or encouraged the development of TCP/IP implementations for

many operating systems and then scheduled a migration of all hosts on all of its packet

networks to TCP/IP. On 1 January 1983, TCP/IP protocols became the only approved

protocol on the ARPANET, replacing the earlier NCP protocol.

ARPANET TO NSFNET

After the ARPANET had been up and running for several years, ARPA looked for

another agency to hand off the network to; ARPA's primary business was funding

Page 63: Internet applications unit1

cutting-edge research and development, not running a communications utility.

Eventually, in July 1975, the network had been turned over to the Defense

Communications Agency, also part of the Department of Defense. In 1983, the U.S.

military portion of the ARPANET was broken off as a separate network, the MILNET.

The networks based around the ARPANET were government funded and

therefore restricted to noncommercial uses such as research; unrelated commercial use

was strictly forbidden. This initially restricted connections to military sites and

universities. During the 1980s, the connections expanded to more educational

institutions, and even to a growing number of companies such as Digital Equipment

Corporation and Hewlett-Packard, which were participating in research projects or

providing services to those who were.

Another branch of the U.S. government, the National Science Foundation (NSF),

became heavily involved in internet research and started development of a successor to

ARPANET. In 1984 this resulted CSNET, the first Wide Area Network designed

specifically to use TCP/IP. CSNET connected with ARPANET using TCP/IP, and ran

TCP/IP over X.25, but it also supported departments without sophisticated network

connections, using automated dial-up mail exchange. This grew into the nsfnet backbone,

established in 1986, and intended to connect and provide access to a number of

supercomputing centers established by the NSF.

THE TRANSITION TOWARD AN INTERNET

The term "Internet" was adopted in the first RFC published on the TCP protocol

Internet Transmission Control Protocol). It was around the time when ARPANET was

interlinked with nsfnet, that the term Internet came into more general use, with "an

internet" meaning any network using TCP/IP. "The Internet" came to mean a global and

large network using TCP/IP, which at the time meant nsfnet and ARPANET. Previously

"internet" and "internetwork" had been used interchangeably, and "internet protocol" had

been used to refer to other networking systems such as Xerox Network Services.

As interest in wide spread networking grew and new applications for it arrived,

the Internet's technologies spread throughout the rest of the world. TCP/IP's network-

Page 64: Internet applications unit1

agnostic approach meant that it was easy to use any existing network infrastructure, such

as the IPSS X.25 network, to carry Internet traffic. In 1984, University College London

replaced its transatlantic satellite links with TCP/IP over IPSS.

Many sites unable to link directly to the Internet started to create simple gateways

to allow transfer of e-mail, at that time the most important application. Sites which only

had intermittent connections used UUCP or fidonet and relied on the gateways between

these networks and the Internet. Some gateway services went beyond simple e-mail

peering, such as allowing access to FTP sites via UUCP or e-mail.

TCP/IP BECOMES WORLDWIDE

The first arpanet connection outside the US was established to NORSAR in

Norway in 1973, just ahead of the connection to Great Britain. These links were all

converted to TCP/IP in 1982, at the same time as the rest of the Arpanet.

CERN, the European internet, the link to the Pacific and beyond.

Between 1984 and 1988 CERN began installation and operation of TCP/IP to

interconnect its major internal computer systems, workstations, PC's and an accelerator

control system. CERN continued to operate a limited self-developed system CERNET

internally and several incompatible (typically proprietary) network protocols externally.

There was considerable resistance in Europe towards more widespread use of TCP/IP and

the CERN TCP/IP intranets remained isolated from the rest of the Internet until 1989.

In 1988 Daniel Karrenberg, from CWI in Amsterdam, visited Ben Segal, CERN's

TCP/IP Coordinator, looking for advice about the transition of the European side of the

UUCP Usenet network (much of which ran over X.25 links) over to TCP/IP. In 1987,

Ben Segal had met with Len Bosack from the then still small company Cisco about

purchasing some TCP/IP routers for CERN, and was able to give Karrenberg advice and

forward him on to Cisco for the appropriate hardware. This expanded the European

portion of the Internet across the existing UUCP networks, and in 1989 CERN opened its

first external TCP/IP connections. This coincided with the creation of Réseaux IP

Européens (RIPE), initially a group of IP network administrators who met regularly to

Page 65: Internet applications unit1

carry out co-ordination work together. Later, in 1992, RIPE was formally registered as a

cooperative in Amsterdam.

At the same time as the rise of internetworking in Europe, adhoc networking to

ARPA and in-between Australian universities formed, based on various technologies

such as X.25 and uucpnet. These were limited in their connection to the global networks,

due to the cost of making individual international UUCP dial-up or X.25 connections. In

1989, Australian universities joined the push towards using IP protocols to unify their

networking infrastructures. Aarnet was formed in 1989 by the Australian Vice-

Chancellors' Committee and provided a dedicated IP based network for Australia.

The Internet began to penetrate Asia in the late 1980s. Japan, which had built the UUCP-

based network JUNET in 1984, connected to nsfnet in 1989. It hosted the annual meeting

of the Internet Society, INET'92, in Kobe. Singapore developed TECHNET in 1990, and

Thailand gained a global Internet connection between Chulalongkorn University and

UUNET in 1992.

A DIGITAL DIVIDE

While developed countries with technological infrastructures were joining the

Internet, developing countries began to experience a digital divide separating them from

the Internet. At the beginning of the 1990s, African countries relied upon X.25 IPSS and

2400 baud modem UUCP links for international and internetwork computer

communications. In 1996 a USAID funded project, the Leland initative, started work on

developing full Internet connectivity for the continent. Guinea, Mozambique, Madagascar

and Rwanda gained satellite earth stations in 1997, followed by Côte d'Ivoire and Benin

in 1998.

In 1991, the People's Republic of China saw its first TCP/IP college network,

Tsinghua University's TUNET. The PRC went on to make its first global Internet

connection in 1994, between the Beijing Electro-Spectrometer Collaboration and

Stanford University's Linear Accelerator Center. However, China went on to implement

its own digital divide by implementing a country-wide content filter.

Page 66: Internet applications unit1

OPENING THE NETWORK TO COMMERCE

The interest in commercial use of the Internet became a hotly debated topic.

Although commercial use was forbidden, the exact definition of commercial use could be

unclear and subjective. Uucpnet and the X.25 IPSS had no such restrictions, which would

eventually see the official barring of uucpnet use of ARPANET and nsfnet connections.

Some UUCP links still remained connecting to these networks however, as administrators

cast a blind eye to their operation.

During the late 1980s, the first Internet service provider (ISP) companies were

formed. Companies like psinet, UUNET, Netcom, and Portal Software were formed to

provide service to the regional research networks and provide alternate network access,

UUCP-based email and Usenet News to the public. The first dial-up ISP, world.std.com,

opened in 1989.

This caused controversy amongst university users, who were outraged at the idea

of noneducational use of their networks. Eventually, it was the commercial Internet

service providers who brought prices low enough that junior colleges and other schools

could afford to participate in the new arenas of education and research.

By 1990, ARPANET had been overtaken and replaced by newer networking technologies

and the project came to a close. In 1994, the nsfnet, now renamed ANSNET (Advanced

Networks and Services) and allowing non-profit corporations access, lost its standing as

the backbone of the Internet. Both government institutions and competing commercial

providers created their own backbones and interconnections. Regional network access

points (naps) became the primary interconnections between the many networks and the

final commercial restrictions ended.

THE IETF AND A STANDARD FOR STANDARDS

The Internet has developed a significant subculture dedicated to the idea that the

Internet is not owned or controlled by any one person, company, group, or organization.

Nevertheless, some standardization and control is necessary for the system to function.

Page 67: Internet applications unit1

The liberal Request for Comments (RFC) publication procedure engendered confusion

about the Internet standardization process, and led to more formalization of official

accepted standards. The IETF started in January of 1986 as a quarterly meeting of U.S.

government funded researchers. Representatives from non-government vendors were

invited starting with the fourth IETF meeting in October of that year.

Acceptance of an RFC by the RFC Editor for publication does not automatically

make the RFC into a standard. It may be recognized as such by the IETF only after

experimentation, use, and acceptance have proved it to be worthy of that designation.

Official standards are numbered with a prefix "STD" and a number, similar to the RFC

naming style. However, even after becoming a standard, most are still commonly referred

to by their RFC number.

In 1992, the Internet Society, a professional membership society, was formed and

the IETF was transferred to operation under it as an independent international standards

body.

NIC, INTERNIC, IANA AND ICANN

The first central authority to coordinate the operation of the network was the

Network Information Centre (NIC) at Stanford Research Institute (SRI) in Menlo Park,

California. In 1972, management of these issues was given to the newly created Internet

Assigned Numbers Authority (IANA). In addition to his role as the RFC Editor, Jon

Postel worked as the manager of IANA until his death in 1998.

As the early ARPANET grew, hosts were referred to by names, and a

HOSTS.TXT file would be distributed from SRI International to each host on the

network. As the network grew, this became cumbersome. A technical solution came in

the form of the Domain Name System, created by Paul Mockapetris. The Defense Data

Network—Network Information Center (DDN-NIC) at SRI handled all registration

services, including the top-level domains (tlds) of .mil, .gov, .edu, .org, .net, .com and .us,

root nameserver administration and Internet number assignments under a United States

Department of Defense contract. In 1991, the Defense Information Systems Agency

(DISA) awarded the administration and maintenance of DDN-NIC (managed by SRI up

Page 68: Internet applications unit1

until this point) to Government Systems, Inc., who subcontracted it to the small private-

sector Network Solutions, Inc. Since at this point in history most of the growth on the

Internet was coming from non-military sources, it was decided that the Department of

Defense would no longer fund registration services outside of the .mil TLD. In 1993 the

U.S. National Science Foundation, after a competitive bidding process in 1992, created

the internic to manage the allocations of addresses and management of the address

databases, and awarded the contract to three organizations. Registration Services would

be provided by Network Solutions; Directory and Database Services would be provided

by AT&T; and Information Services would be provided by General Atomics.

In 1998 both IANA and internic were reorganized under the control of ICANN, a

California non-profit corporation contracted by the US Department of Commerce to

manage a number of Internet-related tasks. The role of operating the DNS system was

privatized and opened up to competition, while the central management of name

allocations would be awarded on a contract tender basis.

USE AND CULTURE

Email and Usenet—The growth of the text forum

E-mail is often called the killer application of the Internet. However, it actually

predates the Internet and was a crucial tool in creating it. E-mail started in 1965 as a way

for multiple users of a time-sharing mainframe computer to communicate. Although the

history is unclear, among the first systems to have such a facility were SDC's Q32 and

MIT's CTSS.

The ARPANET computer network made a large contribution to the evolution of

e-mail. There is one report indicating experimental inter-system e-mail transfers on it

shortly after ARPANET's creation. In 1971 Ray Tomlinson created what was to become

the standard Internet e-mail address format, using the @ sign to separate user names from

host names.

A number of protocols were developed to deliver e-mail among groups of time-

sharing computers over alternative transmission systems, such as UUCP and IBM's

VNET e-mail system. E-mail could be passed this way between a number of networks,

Page 69: Internet applications unit1

including ARPANET, BITNET and nsfnet, as well as to hosts connected directly to other

sites via UUCP.

In addition, UUCP allowed the publication of text files that could be read by

many others. The News software developed by Steve Daniel and Tom Truscott in 1979

was used to distribute news and bulletin board-like messages. This quickly grew into

discussion groups, known as newsgroups, on a wide range of topics. On ARPANET and

nsfnet similar discussion groups would form via mailing lists, discussing both technical

issues and more culturally focused topics (such as science fiction, discussed on the

sflovers mailing list).

A WORLD LIBRARY—FROM GOPHER TO THE WWW

As the Internet grew through the 1980s and early 1990s, many people realized the

increasing need to be able to find and organize files and information. Projects such as

Gopher, WAIS, and the FTP Archive list attempted to create ways to organize distributed

data. Unfortunately, these projects fell short in being able to accommodate all the existing

data types and in being able to grow without bottlenecks.

One of the most promising user interface paradigms during this period was

hypertext. The technology had been inspired by Vannevar Bush's "memex" and

developed through Ted Nelson's research on Project Xanadu and Douglas Engelbart's

research on NLS. Many small self-contained hypertext systems had been created before,

such as Apple Computer's hypercard.

In 1991, Tim Berners-Lee was the first to develop a network-based

implementation of the hypertext concept. This was after Berners-Lee had repeatedly

proposed his idea to the hypertext and Internet communities at various conferences to no

avail—no one would implement it for him. Working at CERN, Berners-Lee wanted a

way to share information about their research. By releasing his implementation to public

use, he ensured the technology would become widespread. Subsequently, Gopher became

the first commonly-used hypertext interface to the Internet. While Gopher menu items

were examples of hypertext, they were not commonly perceived in that way. One early

popular web browser, modeled after hypercard, was violawww.

Page 70: Internet applications unit1

Scholars generally agree, however, that the turning point for the World Wide Web began

with the introduction of the Mosaic (web browser)] in 1993, a graphical browser

developed by a team at the National Center for Supercomputing Applications at the

University of Illinois at Urbana-Champaign (NCSA-UIUC), led by Marc Andreessen.

Funding for Mosaic came from the High-Performance Computing and Communications

Initiative, a funding program initiated by then-Senator Al Gore's High Performance

Computing and Communication Act of 1991 also known as the Gore Bill. Indeed,

Mosaic's graphical interface soon became more popular than Gopher, which at the time

was primarily text-based, and the WWW became the preferred interface for accessing the

Internet.

Mosaic was eventually superseded in 1994 by Andreessen's Netscape Navigator,

which replaced Mosaic as the world's most popular browser. Competition from Internet

Explorer and a variety of other browsers has almost completely displaced it. Another

important event held on January 11, 1994, was the The Superhighway Summit at UCLA's

Royce Hall. This was the "first public conference bringing together all of the major

industry, government and academic leaders in the field [and] also began the national

dialogue about the Information Superhighway and its implications."

Finding what you need—The search engine

Even before the World Wide Web, there were search engines that attempted to

organize the Internet. The first of these was the Archie search engine from mcgill

University in 1990, followed in 1991 by WAIS and Gopher. All three of those systems

predated the invention of the World Wide Web but all continued to index the Web and

the rest of the Internet for several years after the Web appeared. There are still Gopher

servers as of 2006, although there are a great many more web servers.

As the Web grew, search engines and Web directories were created to track pages

on the Web and allow people to find things. The first full-text Web search engine was

webcrawler in 1994. Before webcrawler, only Web page titles were searched. Another

early search engine, Lycos, was created in 1993 as a university project, and was the first

to achieve commercial success. During the late 1990s, both Web directories and Web

search engines were popular—Yahoo! (founded 1995) and Altavista (founded 1995) were

the respective industry leaders.

Page 71: Internet applications unit1

By August 2001, the directory model had begun to give way to search engines,

tracking the rise of Google (founded 1998), which had developed new approaches to

relevancy ranking. Directory features, while still commonly available, became after-

thoughts to search engines.

Database size, which had been a significant marketing feature through the early

2000s, was similarly displaced by emphasis on relevancy ranking, the methods by which

search engines attempt to sort the best results first. Relevancy ranking first became a

major issue circa 1996, when it became apparent that it was impractical to review full

lists of results. Consequently, algorithms for relevancy ranking have continuously

improved. Google's pagerank method for ordering the results has received the most press,

but all major search engines continually refine their ranking methodologies with a view

toward improving the ordering of results. As of 2006, search engine rankings are more

important than ever, so much so that an industry has developed ("search engine

optimizers", or "SEO") to help web-developers improve their search ranking, and an

entire body of case law has developed around matters that affect search engine rankings,

such as use of trademarks in metatags. The sale of search rankings by some search

engines has also created controversy among librarians and consumer advocates.

THE DOT-COM BUBBLE

The suddenly low price of reaching millions worldwide, and the possibility of

selling to or hearing from those people at the same moment when they were reached,

promised to overturn established business dogma in advertising, mail-order sales,

customer relationship management, and many more areas. The web was a new killer app

—it could bring together unrelated buyers and sellers in seamless and low-cost ways.

Visionaries around the world developed new business models, and ran to their nearest

venture capitalist. Of course a proportion of the new entrepreneurs were truly talented at

business administration, sales, and growth; but the majority were just people with ideas,

and didn't manage the capital influx prudently. Additionally, many dot-com business

plans were predicated on the assumption that by using the Internet, they would bypass the

distribution channels of existing businesses and therefore not have to compete with them;

Page 72: Internet applications unit1

when the established businesses with strong existing brands developed their own Internet

presence, these hopes were shattered, and the newcomers were left attempting to break

into markets dominated by larger, more established businesses. Many did not have the

ability to do so.

The dot-com bubble burst on March 10, 2000, when the technology heavy

NASDAQ Composite index peaked at 5048.62 (intra-day peak 5132.52), more than

double its value just a year before. By 2001, the bubble's deflation was running full

speed. A majority of the dot-coms had ceased trading, after having burnt through their

venture capital, often without ever making a gross profit.

RECENT TRENDS

The World Wide Web has led to a widespread culture of individual self

publishing and co-operative publishing. The moment to moment accounts of blogs, photo

publishing Flickr and the information store of Wikipedia are a result of the open ease of

creating a public website. In addition, the communication capabilities of the internet are

being realised with VOIP telephone services such as Skype, Vonage, or viatalk.

Increasingly complex on-demand content provision have led to the delivery of all forms

of media, including those that had been found in the traditional media forms of

newspapers, radio, television and movies, via the Internet. The Internet's peer-to-peer

structure has also influenced social and economic theory, most notably with the rise of

file sharing.

HYPER TEXT TRANSFER PROTOCOL

HTTP is a method used to transfer or convey information on the World Wide

Web. Its original purpose was to provide a way to publish and retrieve HTML pages.

Development of HTTP was coordinated by the World Wide Web Consortium and the

Internet Engineering Task Force, culminating in the publication of a series of RFCs, most

Page 73: Internet applications unit1

notably RFC 2616 (1999), which defines HTTP/1.1, the version of HTTP in common use

today.

HTTP is a request/response protocol between clients and servers. The originating

client, such as a web browser, spider, or other end-user tool, is referred to as the user

agent. The destination server, which stores or creates resources such as HTML files and

images, is called the origin server. In between the user agent and origin server may be

several intermediaries, such as proxies, gateways, and tunnels.

An HTTP client initiates a request by establishing a Transmission Control

Protocol (TCP) connection to a particular port on a remote host (port 80 by default; see

List of TCP and UDP port numbers). An HTTP server listening on that port waits for the

client to send a request message.

Upon receiving the request, the server sends back a status line, such as "HTTP/1.1

200 OK", and a message of its own, the body of which is perhaps the requested file, an

error message, or some other information.

REQUEST MESSAGE

The request message consists of the following:

Request line, such as GET /images/logo.gif HTTP/1.1, which requests the file logo.gif

from the /images directory

Headers, such as Accept-Language: en

An empty line

An optional message body

The request line and headers must all end with CRLF (i.e. a carriage return followed by a

line feed). The empty line must consist of only CRLF and no other whitespace.

In the HTTP/1.1 protocol, all headers except Host are optional.

REQUEST METHODS

HTTP defines eight methods (sometimes referred to as "verbs") indicating the

desired action to be performed on the identified resource.

HEAD

Page 74: Internet applications unit1

Asks for the response identical to the one that would correspond to a GET

request, but without the response body. This is useful for retrieving meta-information

written in response headers, without having to transport the entire content.

GET

Requests a representation of the specified resource. By far the most common

method used on the Web today. Should not be used for operations that cause side-effects

(using it for actions in web applications is a common mis-use). See 'safe methods' below.

POST

Submits data to be processed (e.g. from an HTML form) to the identified

resource. The data is included in the body of the request. This may result in the creation

of a new resource or the updates of existing resources or both.

PUT

Uploads a representation of the specified resource.

DELETE

Deletes the specified resource.

TRACE

Echoes back the received request, so that a client can see what intermediate

servers are adding or changing in the request.

OPTIONS

Returns the HTTP methods that the server supports. This can be used to check the

functionality of a web server.

CONNECT

Page 75: Internet applications unit1

For use with a proxy that can change to being an SSL tunnel.

HTTP servers are supposed to implement at least the GET and HEAD methods and,

whenever possible, also the OPTIONS method.

SAFE METHODS

Some methods (e.g. HEAD or GET) are defined as safe, which means they are

intended only for information retrieval and should not change the state of the server (in

other words, they should not have side effects). Unsafe methods (such as POST, PUT and

DELETE) should be displayed to the user in a special way, typically as buttons rather

than links, thus making the user aware of possible obligations (such as a button that

causes a financial transaction).

Despite the required safety of GET requests, in practice they can cause changes

on the server. For example, an HTML page may use a simple hyperlink to initiate

deletion of a domain database record, thus causing a change of the server's state as a side-

effect of a GET request. This is discouraged, because it can cause problems for Web

caching, search engines and other automated agents, who can make unintended changes

on the server. Another case is that a GET request may cause the server to create a cache

space. This is perfectly fine because the effect is not visible to the client and the client is

not responsible for the effect.

OBLIGATED METHODS

Methods (eg. POST, PUT, or DELETE) that are intended to cause "real-world"

effects are defined as Obligated because the client that initiates the request is responsible

for such effects.

IDEMPOTENT METHODS

Methods GET, HEAD, PUT and DELETE are defined to be idempotent, meaning

that multiple identical requests should have the same effect as a single request. Methods

OPTIONS and TRACE, being safe, are inherently idempotent.

Page 76: Internet applications unit1

HTTP VERSIONS

HTTP has evolved into multiple, mostly backwards-compatible protocol versions.

RFC 2145 describes the use of HTTP version numbers. Basically, the client tells in the

beginning of the request the version it uses, and the server uses the same or earlier

version in the response.

Only supports one command, GET — which does not specify the HTTP version.

Does not support headers. Since this version does not support POST, the client can't pass

much information to the server.

HTTP/1.0 (May 1996)

This is the first protocol revision to specify its version in communications and is

still in wide use, especially by proxy servers.

HTTP/1.1 (June 1999)

Current version; persistent connections enabled by default and works well with

proxies. Also supports request pipelining, allowing multiple requests to be sent at the

same time, allowing the server to prepare for the workload and potentially transfer the

requested resources more quickly to the client.

HTTP/1.2

The initial 1995 working drafts of PEP — an Extension Mechanism for HTTP

prepared by W3C and submitted to IETF were aiming to become a distinguishing feature

of HTTP/1.2. In later PEP working drafts however the reference to HTTP/1.2 was

removed. PEP later became subsumed by the experimental RFC 2774 — HTTP

Extension Framework.

STATUS CODES

Page 77: Internet applications unit1

In HTTP/1.0 and since, the first line of the HTTP response is called the status line

and includes a numeric status code (such as "404") and a textual reason phrase (such as

"Not Found"). The way the user agent handles the response primarily depends on the

code and secondarily on the response headers. Custom status codes can be used since, if

the user agent encounters a code it does not recognize, it can use the first digit of the code

to determine the general class of the response.

Also, the standard reason phrases are only recommendations and can be replaced

with "local equivalents" at the web developer's discretion. If the status code indicated a

problem, the user agent might display the reason phrase to the user to provide further

information about the nature of the problem. The standard also allows the user agent to

attempt to interpret the reason phrase, though this might be unwise since the standard

explicitly specifies that status codes are machine-readable and reason phrases are human-

readable.

PERSISTENT CONNECTIONS

In HTTP/0.9 and 1.0, the connection is closed after a single request/response pair.

In HTTP/1.1 a keep-alive-mechanism was introduced, where a connection could be

reused for more than one request.

Such persistent connections improve lag perceptably, because the client does not

need to re-negotiate the TCP connection after the first request has been sent.

Version 1.1 of the protocol also introduced chunked encoding to allow content on

persistent connections to be streamed, rather than buffered, and HTTP pipelining, which

allows clients to send some types of requests before the previous response has been

received, further reducing lag.

HTTP SESSION STATE

HTTP can occasionally pose problems for Web developers (Web Applications),

because HTTP is stateless. The advantage of a stateless protocol is that hosts don't need

to retain information about users between requests, but this forces the use of alternative

Page 78: Internet applications unit1

methods for maintaining users' state, for example, when a host would like to customize

content for a user who has visited before. The common method for solving this problem

involves the use of sending and requesting cookies. Other methods include server side

sessions, hidden variables (when current page is a form), and URL encoded parameters

(such as index.php?userid=3).

SECURE HTTP

There are currently two methods of establishing a secure HTTP connection: the

https URI scheme and the HTTP 1.1 Upgrade header. The https URI scheme has been

deprecated by RFC 2817, which introduced the Upgrade header; however, as browser

support for the Upgrade header is nearly non-existent, the https URI scheme is still the

dominant method of establishing a secure HTTP connection.

HTTP 1.1 UPGRADE HEADER

HTTP 1.1 introduced support for the Upgrade header. In the exchange, the client

begins by making a clear-text request, which is later upgraded to TLS. Either the client or

the server may request (or demand) that the connection be upgraded. The most common

usage is a clear-text request by the client followed by a server demand to upgrade the

connection, which looks like this:

Client:

GET /encrypted-area HTTP/1.1

Host: www.example.com

Server:

HTTP/1.1 426 Upgrade Required

Upgrade: TLS/1.0, HTTP/1.1

Connection: Upgrade

Page 79: Internet applications unit1

The server returns a 426 status-code because 400 level codes indicate a client failure (see

List of HTTP status codes), which correctly alerts legacy clients that the failure was

client-related.

The benefits of using this method for establishing a secure connection are:

that it removes messy and problematic redirection and URL rewriting on the server side,

and it reduces user confusion by providing a single way to access a particular resource.

SAMPLE

Below is a sample conversation between an HTTP client and an HTTP server

running on www.example.com, port 80.

Client request (followed by a blank line, so that request ends with a double newline, each

in the form of a carriage return followed by a line feed):

GET /index.html HTTP/1.1

Host: www.example.com

The "Host" header distinguishes between various DNS names sharing a single IP address,

allowing name-based virtual hosting. While optional in HTTP/1.0, it is mandatory in

HTTP/1.1.

Server response (followed by a blank line and text of the requested page):

HTTP/1.1 200 OK

Date: Mon, 23 May 2005 22:38:34 GMT

Server: Apache/1.3.27 (Unix) (Red-Hat/Linux)

Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT

Etag: "3f80f-1b6-3e1cb03b"

Accept-Ranges: bytes

Content-Length: 438

Connection: close

Content-Type: text/html; charset=UTF-8

Page 80: Internet applications unit1

INTERNET ADDRESSING

An IP address (Internet Protocol address) is a unique address that certain

electronic devices use in order to identify and communicate with each other on a

computer network utilizing the Internet Protocol standard (IP)—in simpler terms, a

computer address. Any participating network device—including routers, computers,

time-servers, printers, Internet fax machines, and some telephones—can have their own

unique address. Also, many people can find personal information through IP addresses.

An IP address can also be thought of as the equivalent of a street address or a

phone number (compare: VoIP (voice over (the) internet protocol)) for a computer or

other network device on the Internet. Just as each street address and phone number

uniquely identifies a building or telephone, an IP address can uniquely identify a specific

computer or other network device on a network.

An IP address can appear to be shared by multiple client devices either because

they are part of a shared hosting web server environment or because a proxy server (e.g.,

an ISP or anonymizer service) acts as an intermediary agent on behalf of its customers, in

which case the real originating IP addresses might be hidden from the server receiving a

request. The analogy to telephone systems would be the use of predial numbers (proxy)

and extensions (shared).

IP addresses are managed and created by the Internet Assigned Numbers

Authority. IANA generally allocates super-blocks to Regional Internet Registries, who in

turn allocate smaller blocks to Internet service providers and enterprises.

IP VERSIONS

The Internet Protocol has two primary versions in use. Each version has its own

definition of an IP address. Because of its prevalence, "IP address" typically refers to

those defined by IPv4.

Page 81: Internet applications unit1

IP VERSION 4

IPv4 uses 32-bit (4 byte) addresses, which limits the address space to

4,294,967,296 (232) possible unique addresses. However, many are reserved for special

purposes, such as private networks (~18 million addresses) or multicast addresses (~1

million addresses). This reduces the number of addresses that can be allocated as public

Internet addresses, and as the number of addresses available is consumed, an IPv4

address shortage appears to be inevitable in the long run. This limitation has helped

stimulate the push towards IPv6, which is currently in the early stages of deployment and

is currently the only contender to replace IPv4.

Example: 127.0.0.1

Page 82: Internet applications unit1

IP VERSION 5

What would be considered IPv5 existed only as an experimental non-IP real time

streaming protocol called ST2, described in RFC 1819. In keeping with standard UNIX

release conventions, all odd-numbered versions are considered experimental, and this

version was never intended to be implemented, thus not abandoned. RSVP has replaced it

to some degree.

IP VERSION 6

In IPv6, the new (but not yet widely deployed) standard protocol for the Internet,

addresses are 128 bits wide, which, even with a generous assignment of netblocks, will

more than suffice for the foreseeable future. In theory, there would be exactly 2128, or

about 3.403 × 1038 unique host interface addresses. The exact number is:

340,282,366,920,938,463,463,374,607,431,768,211,456

This large address space will be sparsely populated, which makes it possible to

again encode more routing information into the addresses themselves.

This enormous magnitude of available IPs will be sufficiently large for the indefinite

future, even though mobile phones, cars and all types of personal devices are coming to

rely on the Internet for everyday purposes.

Example: 2001:0db8:85a3:08d3:1319:8a2e:0370:7334

IP VERSION 6 PRIVATE ADDRESSES

Just as there are addresses for private, or internal networks in IPv4 (one example

being the 192.168.0.1 - 192.168.0.254 range), there are blocks of addresses set aside in

IPv6 for private addresses. Addresses starting with FE80: are called link-local addresses

and are routable only on your local link area. Which means if you have several hosts

connected to each other through a hub or switch then they would talk to each with their

link-local IPv6 address. There was going to be an address range used for "private"

addressing but that has changed and IPv6 won't include private addressing anymore. The

prefix that was used for that is the FEC0: range. There are still some of these addresses in

use now but no new addresses in this range will be used. These are called site-local

Page 83: Internet applications unit1

addresses and are routable within a particular site just like IPv4 private addresses. Neither

of these address ranges are routable over the internet.

With IPV6, virtually every device in the world can have an IP address: cars,

refrigerators, lawnmowers and so on. If one's refrigerator stopped working, for example,

a repair specialist could identify the problem without ever visiting in person. It might

even be possible to make repairs from abroad, depending on the severity of the problem.

NEWSGROUP

A newsgroup is a repository usually within the Usenet system, for messages

posted from many users at different locations. The term is somewhat confusing, because

it is usually a discussion group. Newsgroups are technically distinct from, but

functionally similar to, discussion forums on the World Wide Web. Newsreader software

is used to read newsgroups.

Hierarchies

Newsgroups are often arranged into hierarchies, theoretically making it simpler to

find related groups. The term top-level hierarchy refers to the hierarchy defined by the

prefix prior to the first dot.

The most commonly known hierarchies are the usenet hierarchies. So for instance

newsgroup rec.arts.sf.starwars.games would be in the rec.* top-level usenet hierarchy,

where the asterisk (*) is defined as a wildcard character. There were seven original major

hierarchies of usenet newsgroups, known as the "Big 7":

comp.* — Discussion of computer-related topics

news.* — Discussion of Usenet itself

sci.* — Discussion of scientific subjects

rec.* — Discussion of recreational activities (e.g. games and hobbies)

soc.* — Socialising and discussion of social issues.

talk.* — Discussion of contentious issues such as religion and politics.

misc.* — Miscellaneous discussion—anything which doesn't fit in the other

hierarchies.

Page 84: Internet applications unit1

These were all created in the Great Renaming of 1986–1987, prior to which all of

these newsgroups were in the net.* hierarchy. At that time there was a great controversy

over what newsgroups should be allowed. Among those that the usenet cabal (who

effectively ran the Big 7 at the time) did not allow were those concerning recipes, drugs,

and sex.

This resulted in the creation of an alt.* (short for "alternative") usenet hierarchy

where these groups would be allowed. Over time the laxness of rules on newsgroup

creation in alt.* compared to the Big 7 meant that many new topics could, given time,

gain enough popularity to get a Big 7 newsgroup. This resulted in a rapid growth of alt.*

which continues to this day. Due to the anarchistic nature with which the groups sprung

up, some jokingly referred to ALT standing for "Anarchists, Lunatics and Terrorists".

In 1995, humanities.* was created for the discussion of the humanities (e.g.

literature, philosophy), and the Big 7 became the Big 8.

Page 85: Internet applications unit1

The alt.* hierarchy has discussion of all kinds of topics, and many hierarchies for

discussion specific to a particular geographical area or in a language other than English.

Before a new Big 8 newsgroup can be created, an RFD (Request For Discussion) must be

posted into the newsgroup news.announce.newgroups, which is then discussed in

news.groups.proposals. Once the proposal has been formalized with a name, description,

charter, the Big-8 Management Board will vote on whether to create the group. If the

proposal is approved by the Big-8 Management Board, the group is created. Groups are

removed in a similar manner.

Creating a new group in the alt.* hierarchy is not subject to the same rules;

anybody can create a newsgroup, and anybody can remove them, but most news

administrators will ignore these requests unless a local user requests the group by name.

FURTHER HIERARCHIES

There are a number of newsgroup hierarchies outside of the Big 8 (& ALT), that

can be found at many news servers. These include non-English language groups, groups

managed by companies or organizations about their products, geographic/local

hierarchies, and even non-internet network boards routed into NNTP. Examples include

(alphabetic):

ba.* — Discussion in the San Francisco Bay Area

ca.* — Discussion in California

can.* — Canadian news groups

cn.* — Chinese news groups

de.* — Discussions in German

england.* — Discussions (mostly) local to England, see also uk.*

fidonet.* — Discussions routed from FidoNet

fr.* — Discussions in French

fj.* — "From Japan," discussions in Japanese

gnu.* — Discussions about GNU software

hawaii.* — Discussions (mostly) local to Hawaii

harvard.* — Discussions (mostly) local to Harvard

hp.* — Hewlett-Packard internal news groups

microsoft.* — Discussions about Microsoft products

Page 86: Internet applications unit1

tw.* — Taiwan news groups

uk.* — Discussions on matters in the UK

Additionally, there is the free.* hierarchy, which can be considered "more alt than

alt.*". There are many local sub-hierarchies within this hierarchy, usually for specific

countries or cultures (such as free.it.* for Italy).

TYPES OF NEWSGROUPS

Typically, a newsgroup is focused on a particular topic such as "pigeon hunting".

Some newsgroups allow the posting of messages on a wide variety of themes, regarding

anything a member chooses to discuss as on-topic, while others keep more strictly to their

particular subject, frowning on off-topic postings. The news admin (the administrator of a

news server) decides how long articles are kept before being expired (deleted from the

server). Usually they will be kept for one or two weeks, but some admins keep articles in

local or technical newsgroups around longer than articles in other newsgroups.

Newsgroups generally come in either of two types, binary or text. There is no

technical difference between the two, but the naming differentiation allows users and

servers with limited facilities the ability to minimize network bandwidth usage.

Generally, Usenet conventions and rules are enacted with the primary intention of

minimizing the overall amount of network traffic and resource usage.

Newsgroups are much like the public message boards on old bulletin board

systems. For those readers not familiar with this concept, envision an electronic version

of the corkboard in the entrance of your local grocery store.

Newsgroups frequently become cliquish and are subject to sporadic flame wars

and trolling, but they can also be a valuable source of information, support and

friendship, bringing people who are interested in specific subjects together from around

the world.

There are currently well over 100,000 Usenet newsgroups, but only 20,000 or so

of those are active. Newsgroups vary in popularity, with some newsgroups only getting a

few posts a month while others get several hundred (and in a few cases several thousand)

messages a day.

Weblogs have replaced some of the uses of newsgroups (especially because, for a

while, they were less prone to spamming).

Page 87: Internet applications unit1

A website called DejaNews began archiving Usenet in the 1990s. DejaNews also

provided a searchable web interface. Google bought the archive from them and made

efforts to buy other Usenet archives to attempt to create a complete archive of Usenet

newsgroups and postings from its early beginnings. Like DejaNews, Google has a web

search interface to the archive, but Google also allows newsgroup posting.

Non-Usenet newsgroups are possible and do occur, as private individuals or

organizations set up their own nntp servers. Examples include the newsgroups Microsoft

run to allow peer-to-peer support of their products and those at news://news.grc.com.

HOW NEWSGROUPS WORK

Newsgroup servers are hosted by various organizations and institutions. Most

Internet Service Providers host their own News Server, or rent access to one, for their

subscribers. There are also a number of companies who sell access to premium news

servers.

Every host of a news server maintains agreements with other news servers to

regularly synchronize. In this way news servers form a network. When a user posts to one

news server, the message is stored locally. That server then shares the message with the

servers that are connected to it if both carry the newsgroup, and from those servers to

servers that they are connected to, and so on. For newsgroups that are not widely carried,

sometimes a carrier group is used as a crosspost to aid distribution. This is typically only

useful for groups that have been removed or newer alt.* groups. Crossposts between

hierarchies, outside of the big eight and alt, are prone to failure.

Page 88: Internet applications unit1

BINARY NEWSGROUPS

While Newsgroups were not created with the intention of distributing binary files,

they have proven to be quite effective for this. Due to the way they work, a file uploaded

once will be spread and can then be downloaded by an unlimited number of users. More

useful is the fact that every user is drawing on the bandwidth of their own news server.

This means that unlike P2P technology, the user's download speed is under their

own control, as opposed to under the willingness of other people to share files. In fact this

is another benefit of Newsgroups: it is usually not expected that users share. If every user

makes uploads then the servers would be flooded; thus it is acceptable and often

encouraged for users to just leech.

There were originally a number of obstacles to the transmission of binary files

over Usenet. Firstly, Usenet was designed with the transmission of text in mind. Due to

this, for a long period of time, it was impossible to send binary data as it was. So, a

workaround, Uuencode (and later on Base64 and yEnc), was developed which mapped

the binary data from the files to be transmitted (e.g. sound or video files) to text

characters which would survive transmission over Usenet. At the receiver's end, the data

needed to be decoded by the user's news client. Additionally, there was a limit on the size

of individual posts such that large files could not be sent as single posts. To get around

this, Newsreaders were developed which were able to split long files into several posts.

Intelligent newsreaders at the other end could then automatically group such split files

into single files, allowing the user to easily retrieve the file. These advances have meant

that Usenet is used to send and receive many Gigabytes of files per day.

There are two main issues that pose problems for transmitting binary files over

Newsgroups. The first is completion rates and the other is Retention Rates. The business

of premium News Servers is generated primarily on their ability to offer superior

Completion and Retention Rates, as well as their ability to offer very fast connections to

users. Completion rates are significant when users wish to download large files that are

split into pieces; if any one piece is missing, it is impossible to successfully download

and reassemble the desired file. To work around this, a redundancy scheme known as

PAR is commonly used.

Page 89: Internet applications unit1

A number of websites exist for the purpose of keeping an index of the files posted

to binary Newsgroups.

MODERATED NEWSGROUPS

A moderated newsgroup has one or more individuals who must approve articles

before they are posted at large. A separate address is used for the submission of posts and

the moderators then propagate posts which are approved for the readership. The first

moderated newsgroups appeared in 1984 under mod.* according to RFC 2235, "Hobbes'

Internet Timeline"

TRANSMISSION CONTROL PROTOCOL

The Internet protocol suite is the set of communications protocols that implements

the protocol stack on which the Internet and many commercial networks run. It is part of

the TCP/IP protocol suite, which is named after two of the most important protocols in it:

the Transmission Control Protocol (TCP) and the Internet Protocol (IP), which were also

the first two networking protocols defined. A review of TCP/IP is given under that

heading. Note that todays TCP/IP networking represents a synthesis of two developments

that began in the 1970's, namely LAN's (Local Area Networks) and the Internet, that

revolutionalised computing.

The Internet protocol suite — like many protocol suites — can be viewed as a set

of layers. Each layer solves a set of problems involving the transmission of data, and

provides a well-defined service to the upper layer protocols based on using services from

some lower layers. Upper layers are logically closer to the user and deal with more

abstract data, relying on lower layer protocols to translate data into forms that can

eventually be physically transmitted. The original TCP/IP reference model consisted of

four layers, but has evolved into a five-layer model.

The OSI model describes a fixed, seven-layer stack for networking protocols.

Comparisons between the OSI model and TCP/IP can give further insight into the

significance of the components of the IP suite. The OSI model with its increased numbers

of layers provides for more flexibility. Both the OSI and the TCP/IP models are

Page 90: Internet applications unit1

'standards' and application developers will often implement solutions without strict

adherence to proposed 'division' of labour within the standard whilst providing for

functionality within the application suite. This separation of 'practice' from theory often

leads to confusion.

HISTORY

The Internet protocol suite came from work done by DARPA in the early 1970s.

After building the pioneering ARPANET, DARPA started work on a number of other

data transmission technologies. In 1972, Robert E. Kahn was hired at the DARPA

Information Processing Technology Office, where he worked on both satellite packet

networks and ground-based radio packet networks, and recognized the value of being

able to communicate across them. In the spring of 1973, Vinton Cerf, the developer of the

existing ARPANET Network Control Program (NCP) protocol, joined Kahn to work on

open-architecture interconnection models with the goal of designing the next protocol for

the ARPANET.

By the summer of 1973, Kahn and Cerf had soon worked out a fundamental

reformulation, where the differences between network protocols were hidden by using a

common internetwork protocol, and instead of the network being responsible for

reliability, as in the ARPANET, the hosts became responsible. (Cerf credits Hubert

Zimmerman and Louis Pouzin [designer of the CYCLADES network] with important

influences on this design.)

With the role of the network reduced to the bare minimum, it became possible to

join almost any networks together, no matter what their characteristics were, thereby

solving Kahn's initial problem. (One popular saying has it that TCP/IP, the eventual

product of Cerf and Kahn's work, will run over "two tin cans and a string", and it has in

fact been implemented using homing pigeons.) A computer called a gateway (later

changed to router to avoid confusion with other types of gateway) is provided with an

interface to each network, and forwards packets back and forth between them.

The idea was worked out in more detailed form by Cerf's networking research

group at Stanford in the 1973–74 period. (The early networking work at Xerox PARC,

Page 91: Internet applications unit1

which produced the PARC Universal Packet protocol suite, much of which was

contemporaneous, was also a significant technical influence; people moved between the

two.)

DARPA then contracted with BBN Technologies, Stanford University, and the

University College London to develop operational versions of the protocol on different

hardware platforms. Four versions were developed: TCP v1, TCP v2, a split into TCP v3

and IP v3 in the spring of 1978, and then stability with TCP/IP v4 — the standard

protocol still in use on the Internet today.

In 1975, a two-network TCP/IP communications test was performed between

Stanford and University College London (UCL). In November, 1977, a three-network

TCP/IP test was conducted between the U.S., UK, and Norway. Between 1978 and 1983,

several other TCP/IP prototypes were developed at multiple research centres. A full

switchover to TCP/IP on the ARPANET took place January 1, 1983.[1]

In March 1982,[2] the US Department of Defense made TCP/IP the standard for all

military computer networking. In 1985, the Internet Architecture Board held a three day

workshop on TCP/IP for the computer industry, attended by 250 vendor representatives,

helping popularize the protocol and leading to its increasing commercial use.

On November 9, 2005 Kahn and Cerf were presented with the Presidential Medal of

Freedom for their contribution to American culture.[3]

Layers in the Internet protocol suite stack

Page 93: Internet applications unit1

The IP suite uses encapsulation to provide abstraction of protocols and services.

Generally a protocol at a higher level uses a protocol at a lower level to help accomplish

its aims. The Internet protocol stack can be roughly fitted to the four layers of the original

TCP/IP model:

4. Application

DNS, TFTP, TLS/SSL, FTP, HTTP, IMAP, IRC, NNTP, POP3,

SIP, SMTP, SNMP, SSH, TELNET, ECHO, BitTorrent, RTP,

PNRP, rlogin, ENRP, …

Routing protocols like BGP, which for a variety of reasons run

over TCP, may also be considered part of the application or

network layer.

3. Transport TCP, UDP, DCCP, SCTP, IL, RUDP, …

2. Internet

Routing protocols like OSPF, which run over IP, are also to be

considered part of the network layer, as they provide path

selection. ICMP and IGMP run over IP are considered part of

the network layer, as they provide control information.

IP (IPv4, IPv6)

ARP and RARP operate underneath IP but above the link layer

so they belong somewhere in between.

1. Network accessEthernet, Wi-Fi, token ring, PPP, SLIP, FDDI, ATM, Frame

Relay, SMDS, …

.. In many modern textbooks, this model has evolved into the seven layer OSI

model, where the Network access layer is split into a Data link layer on top of a Physical

layer, and the Internet layer is called Network layer..................................

IMPLEMENTATIONS

Today, most commercial operating systems include and install the TCP/IP stack

by default. For most users, there is no need to look for implementations. TCP/IP is

Page 94: Internet applications unit1

included in all commercial Unix systems, Mac OS X, and all free-software Unix-like

systems such as Linux distributions and BSD systems, as well as Microsoft Windows.

Unique implementations include Lightweight TCP/IP, an open source stack

designed for embedded systems and KA9Q NOS, a stack and associated protocols for

amateur packet radio systems and personal computers connected via serial lines.

TELNET

TELNET (TELetype NETwork) is a network protocol used on the Internet or

local area network (LAN) connections. It was developed in 1969 and standardized as

IETF STD 8, one of the first Internet standards. It has limitations that are considered to be

security risks.

The term telnet also refers to software which implements the client part of the

protocol. TELNET clients have been available on most Unix systems for many years and

are available for virtually all platforms. Most network equipment and OSs with a TCP/IP

stack support some kind of TELNET service server for their remote configuration

(including ones based on Windows NT). However with recent advancements SSH has

become more dominant in remote access for Unix-based machines.

"To telnet" is also used as a verb meaning to establish or use a TELNET or other

TCP connection, as in, "To change your password, telnet to the server and run the passwd

command".

Most often, a user will be telneting to a unix-like server system or a simple

network device such as a switch. For example, a user might "telnet in from home to

check his mail at school". In doing so, he would be using a telnet client to connect from

his computer to one of his servers. Once the connection is established, he would then log

in with his account information and execute operating system commands remotely on that

computer, such as ls or cd.

On many systems, the client may also be used to make interactive raw-TCP

sessions.

PROTOCOL DETAILS

Page 95: Internet applications unit1

TELNET is a client-server protocol, based on a reliable connection-oriented

transport. Typically this is TCP port 23, although TELNET predates TCP/IP and was

originally run on NCP.

The protocol has many extensions, some of which have been adopted as Internet

Standards. IETF standards STD 27 through STD 32 define various extensions, most of

which are extremely common. Other extensions are on the IETF standards track as

PROPOSED STANDARDS.

SECURITY

When TELNET was initially developed in 1969, most users of networked

computers were in the computer departments of academic institutions, or at large private

and government research facilities. In this environment, security was not nearly as much

of a concern as it became after the bandwidth explosion of the 1990s. The rise in the

number of people with access to the Internet, and by extension, the number of people

attempting to crack into other people's servers made encrypted alternatives much more

necessary.

Experts in computer security, such as SANS Institute, and the members of the

comp.os.linux.security newsgroup recommend that the use of TELNET for remote logins

should be discontinued under all normal circumstances, for the following reasons:

TELNET, by default, does not encrypt any data sent over the connection (including

passwords), and so it is trivial to eavesdrop on the communications and use the password

later for malicious purposes; anybody who has access to a router, switch, or gateway

located on the network between the two hosts where TELNET is being used can intercept

the packets passing by and easily obtain login and password information (and whatever

else is typed) with any of several common utilities like tcpdump and Wireshark.

Most implementations of TELNET lack an authentication scheme that makes it possible

to ensure that communication is carried out between the two desired hosts, and not

intercepted in the middle.

Commonly used TELNET daemons have several vulnerabilities discovered over

the years.

Page 96: Internet applications unit1

These security-related shortcomings have seen the usage of the TELNET protocol

drop rapidly, especially on the public Internet, in favor of a more secure and functional

protocol called SSH, first released in 1995. SSH provides all functionality of telnet, with

the addition of strong encryption to prevent sensitive data such as passwords from being

intercepted, and public key authentication, to ensure that the remote computer is actually

who it claims to be.

As has happened with other early Internet protocols, extensions to the TELNET

protocol provide TLS security and SASL authentication that address the above issues.

However, most TELNET implementations do not support these extensions; and there has

been relatively little interest in implementing these as SSH is adequate for most purposes.

The main advantage of TLS-TELNET would be the ability to use certificate-authority

signed server certificates to authenticate a server host to a client that does not yet have the

server key stored. In SSH, there is a weakness in that the user must trust the first session

to a host when it has not yet acquired the server key.

CURRENT STATUS

As of the mid-2000s, while the TELNET protocol itself has been mostly

superseded, TELNET clients are still used, usually when diagnosing problems, to

manually "talk" to other services without specialized client software. For example, it is

sometimes used in debugging network services such as an SMTP or HTTP server, by

serving as a simple way to send commands to the server and examine the responses.

On UNIX, though, other software such as nc (netcat) or socat are finding greater favor

with some system administrators for testing purposes, as they can be called with

arguments to not send any terminal control handshaking data.

TELNET is still very popular in enterprise networks to access host applications, i.e. on

Page 97: Internet applications unit1

IBM MAINFRAMES.

TELNET is also heavily used for MUD games played over the Internet, as well as

talkers, MUSH es, MUCKs and MOOes. By using image-to-ASCII algorithms, it can

also be used for primitive "video" streaming. Recently, ASCII-WM offered live

broadcasts of the 2006 World Cup.

TELNET can also be used as a rudimentary IRC client if one knows the protocol

well enough.

TELNET CLIENTS

WINDOWS

AbsoluteTelnet is a client for all versions of Windows, and includes telnet, SSH1,

and SSH2. Hyperterminal Private Edition is another Windows telnet client, free for

personal use.

TeraTerm is a free telnet/SSH client for Windows that offers more features than

the built-in telnet as well as offering a free SSH plug-in.

Windows comes with a built in telnet client, accessible from the command prompt. Note

that in Windows Vista, as of RC1, the telnet client is not installed by default, and needs to

be installed as an optional Windows component.

MACINTOSH

Tn3270 is a free TELNET client for Macintosh designed to work with IBM

mainframe systems that use the TN3270 protocol.

Terminal is a TELNET capable command line interface application that comes as part of

all versions of Macintosh OS X.

dataComet is a full-featured Telnet & SSH application for the Macintosh.

MULTIPLATFORM

PuTTY is a free SSH, TELNET, rlogin, and raw TCP client for Windows, Linux,

and Unix.

Page 98: Internet applications unit1

mTelnet is a free full-screen TELNET client for Windows & OS/2. Easy to use

client with Zmodem download capability.

Twisted Conch includes a telnet client/server implementation.

IVT is a free multisession TELNET client for Windows & DOS. Also supports

SSH and Kerberos (not free). Includes useful features like auto-login and scripting.

UNIX TO UNIX COPY

UUCP stands for Unix to Unix CoPy. The term generally refers to a suite of

computer programs and protocols allowing remote execution of commands and transfer

of files, email and netnews between computers. Specifically, uucp is one of the programs

in the suite; it provides a user interface for requesting file copy operations. The UUCP

suite also includes uux (user interface for remote command execution), uucico

(communication program), uustat (reports statistics on recent activity), uuxqt (execute

commands sent from remote machines), and uuname (reports the uucp name of the local

system).

Although UUCP was originally developed on and is most closely associated with

Unix, UUCP implementations exist for several other operating systems, including

Microsoft's MS-DOS, Digital's VAX/VMS, and Mac OS.

TECHNOLOGY

UUCP can use several different types of physical connections and link-layer

protocols, but was most commonly used over dial-up connections. Before the widespread

availability of Internet connectivity, computers were only connected by smaller private

networks within a company or organization. They were also often equipped with modems

so they could be used remotely from character-mode terminals via dial-up lines. UUCP

uses the computers' modems to dial out to other computers, establishing temporary,

point-to-point links between them. Each system in a UUCP network has a list of neighbor

systems, with phone numbers, login names and passwords, etc. When work (file transfer

or command execution requests) is queued for a neighbor system, the uucico program

typically calls that system to process the work. The uucico program can also poll its

Page 99: Internet applications unit1

neighbors periodically to check for work queued on their side; this permits neighbors

without dial-out capability to participate.

Today, UUCP is rarely used over dial-up links, but is occasionally used over

TCP/IP. One example of the current use of UUCP is in the retail industry by Epicor|CRS

Retail Solutions for transferring batch files between corporate and store systems via TCP

and dial-up on SCO Unix, Red Hat Linux, and Microsoft Windows (with Cygwin). The

number of systems involved, as of early 2006, ran between 1500 and 2000 sites across 60

enterprises. UUCP's longevity can be attributed to its low/zero cost, extensive logging,

native failover to dialup, and persistent queue management. However, this technology is

anticipated to be retired in favor of Windows-only alternatives.

HISTORY

UUCP was originally written at AT&T Bell Laboratories, by Mike Lesk, and

early versions of UUCP are sometimes referred to as System V UUCP. The original

UUCP was rewritten by AT&T researchers Peter Honeyman, David A. Nowitz, and

Brian E. Redman and the rewrite is referred to as HDB or HoneyDanBer uucp which was

later enhanced, bug fixed, and repackaged as BNU UUCP ("Basic Network Utilities").

All of these versions had security holes which allowed some of the original internet

worms to remotely execute unexpected shell commands, which inspired Ian Lance Taylor

to write a new version from scratch. Taylor UUCP was released under the GNU General

Public License and became the most stable and bug free version. Taylor uucp also

incorporates features of all previous versions of uucp, allowing it to communicate with

any other version with the greatest level of compatibility and even use similar config file

formats from other versions.

One surviving feature of uucp is the chat file format, largely inherited by the

expect software package.

UUCP FOR MAIL ROUTING

The uucp and uuxqt capabilities could be used to send e-mail between machines,

with suitable mail user interface and delivery agent programs. A simple uucp mail

address was formed from the adjacent machine name, an exclamation mark or bang,

followed by the user name on the adjacent machine. For example, the address barbox!

user would refer to user user on adjacent machine barbox.

Page 100: Internet applications unit1

Mail could furthermore be routed through the network, traversing any number of

intermediate nodes before arriving at its destination. Initially, this had to be done by

specifying the complete path, with a list of intermediate host names separated by bangs.

For example, if machine barbox is not connected to the local machine, but it is known

that barbox is connected to machine foovax which does communicate with the local

machine, the appropriate address to send mail to would be foovax!barbox!user.

User barbox!user might publish their UUCP email address in a form such as …!

bigsite!foovax!barbox!user. This directs people to route their mail to machine bigsite

(presumably a well-known and well-connected machine accessible to everybody) and

from there through the machine foovax to the account of user user on barbox. Many users

would suggest multiple routes from various large well-known sites, providing even better

and perhaps faster connection service from the mail sender.

Bang paths of eight to ten machines (or hops) were not uncommon in 1981, and

late-night dial-up UUCP links would cause week-long transmission times. Bang paths

were often selected by both transmission time and reliability, as messages would often

get lost. Some hosts went so far as to try to "rewrite" the path, sending mail via "faster"

routes — this practice tended to be frowned upon.

The "pseudo-domain" ending .uucp was sometimes used to designate a hostname

as being reachable by UUCP networking, although this was never formally in the Internet

root as a top-level domain.

UUCPNET AND MAPPING

UUCPNET was the name for the totality of the network of computers connected

through UUCP. This network was very informal, maintained in a spirit of mutual

cooperation between systems owned by thousands of private companies, universities, and

so on. Often, particularly in the private sector, UUCP links were established without

official approval from the companies' upper management. The UUCP network was

constantly changing as new systems and dial-up links were added, others were removed,

etc.

The UUCP Mapping Project was a volunteer, largely successful effort to build a

map of the connections between machines that were open mail relays and establish a

managed namespace. Each system administrator would submit, by e-mail, a list of the

Page 101: Internet applications unit1

systems to which theirs would connect, along with a ranking for each such connection.

These submitted map entries were processed by an automatic program that combined

them into a single set of files describing all connections in the network. These files were

then published monthly in a newsgroup dedicated to this purpose. The UUCP map files

could then be used by software such as "pathalias" to compute the best route path from

one machine to another for mail, and to supply this route automatically. The UUCP maps

also listed contact information for the sites, and so gave sites seeking to join UUCPNET

an easy way to find prospective neighbors.

CONNECTIONS WITH THE INTERNET

Many uucp hosts, particularly those at universities, were also connected to the

Internet in its early years, and e-mail gateways between Internet SMTP-based mail and

UCP mail were developed. A user at a system with UUCP connections could thereby

exchange mail with Internet users, and the Internet links could be used to bypass large

portions of the slow UUCP network. A "UUCP zone" was defined within the Internet

domain namespace to facilitate these interfaces.

With this infrastructure in place, UUCP's strength was that it permitted a site to

gain Internet e-mail connectivity with only a dial-up modem link to another, cooperating

computer. This was at a time when true Internet access required a leased data line

providing a connection to an Internet Point of Presence, both of which were expensive

and difficult to arrange. By contrast, a link to the UUCP network could usually be

established with a few phone calls to the administrators of prospective neighbor systems.

Neighbor systems were often close enough to avoid all but the most basic charges for

telephone calls.

DECLINE

UUCP usage began to die out with the rise of ISPs offering inexpensive SLIP and

PPP services. The UUCP Mapping Project was formally shut down in late 2000.

Usenet traffic was originally transmitted using the UUCP network, and bang paths are

still in use within Usenet message format Path header lines. They now have only an

informational purpose, and are not used for routing, although they can be used to ensure

that loops do not occur. In general, this form of e-mail address has now been superseded

by the SMTP "@ notation", even by sites still using uucp.

Page 102: Internet applications unit1

Currently UUCP is used mainly over high cost links (e.g., marine satellite links).

UUCP over TCP/IP (preferably encrypted, such as via the SSH protocol) can be used

when a computer doesn't have any fixed IP addresses but is still willing to run a standard

mail transfer agent (MTA) like Sendmail or Postfix.