Spider Course Day 1
-
Upload
harishankaran-k -
Category
Technology
-
view
1.465 -
download
4
Transcript of Spider Course Day 1
![Page 1: Spider Course Day 1](https://reader036.fdocuments.in/reader036/viewer/2022062704/555ae4c9d8b42a62528b54d0/html5/thumbnails/1.jpg)
INTRODUCTION TO WEBSpider Web Weaving Course.
Day 1
Harishankaran K
![Page 2: Spider Course Day 1](https://reader036.fdocuments.in/reader036/viewer/2022062704/555ae4c9d8b42a62528b54d0/html5/thumbnails/2.jpg)
HTTP
Hypertext Transfer Protocol HTTP is a request/response protocol between
a client and a server. The client making an HTTP request—such as
a web browser, spider, or other end-user tool—is referred to as the user agent.
The responding server—which stores or creates resources such as HTML files and images—is called the origin server.
In between the user agent and origin server may be several intermediaries, such as proxies, gateways, and tunnels.
![Page 3: Spider Course Day 1](https://reader036.fdocuments.in/reader036/viewer/2022062704/555ae4c9d8b42a62528b54d0/html5/thumbnails/3.jpg)
REQUEST/RESPONSE PROTOCOL
The client sends the request. The server sends a response according to the
request from the client.
![Page 4: Spider Course Day 1](https://reader036.fdocuments.in/reader036/viewer/2022062704/555ae4c9d8b42a62528b54d0/html5/thumbnails/4.jpg)
REQUEST/RESPONSE PROTOCOL
The client sends a HTTP request. The server receives the request. Server may do some processing according to
the request sent. It returns HTTP response.
![Page 5: Spider Course Day 1](https://reader036.fdocuments.in/reader036/viewer/2022062704/555ae4c9d8b42a62528b54d0/html5/thumbnails/5.jpg)
REQUEST/RESPONSE PROTOCOL
![Page 6: Spider Course Day 1](https://reader036.fdocuments.in/reader036/viewer/2022062704/555ae4c9d8b42a62528b54d0/html5/thumbnails/6.jpg)
WEB SERVER
Program that accepts HTTP requests from the client, and provides an HTTP response to the client.
The HTTP response usually consists of an HTML document, but can also be a raw file, an image, or some other type of document.
Examples are Apache, Microsoft IIS, Google GFE, lighthttpd etc.
![Page 7: Spider Course Day 1](https://reader036.fdocuments.in/reader036/viewer/2022062704/555ae4c9d8b42a62528b54d0/html5/thumbnails/7.jpg)
WEB BROWSER
Web browsers communicate with Web servers primarily using HTTP to fetch web pages.
Examples are Firefox, Opera, Internet Explorer, Elinks, Safari etc
Web browsers format HTML information for display, so the appearance of a Web page may differ between browsers.
![Page 8: Spider Course Day 1](https://reader036.fdocuments.in/reader036/viewer/2022062704/555ae4c9d8b42a62528b54d0/html5/thumbnails/8.jpg)
SPIDER
A program or automated script which browses the World Wide Web in a methodical, automated manner.
Other names are web crawler, ants, automatic indexers, bots, and worms.
Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine that will index the downloaded pages to provide fast searches.
![Page 9: Spider Course Day 1](https://reader036.fdocuments.in/reader036/viewer/2022062704/555ae4c9d8b42a62528b54d0/html5/thumbnails/9.jpg)
HTTP REQUEST
GET /course/ HTTP/1.1 Host: spider User-Agent: Mozilla/5.0 (X11; U; Linux i686;
en-US; rv:1.8.1.4) Gecko/20070603 Fedora/2.0.0.4-2.fc7 Firefox/2.0.0.4
Accept: text/html Accept-Language: en-us,en Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8 Keep-Alive: 300 Connection: keep-alive
![Page 10: Spider Course Day 1](https://reader036.fdocuments.in/reader036/viewer/2022062704/555ae4c9d8b42a62528b54d0/html5/thumbnails/10.jpg)
HTTP RESPONSE
HTTP/1.x 200 OK Date: Mon, 04 Feb 2008 03:58:24 GMT Server: Apache/2.2.0 (Fedora) X-Powered-By: PHP/5.2.2 Content-Length: 2742 Connection: close Content-Type: text/html; charset=UTF-8
![Page 11: Spider Course Day 1](https://reader036.fdocuments.in/reader036/viewer/2022062704/555ae4c9d8b42a62528b54d0/html5/thumbnails/11.jpg)
HTTP STATUS CODES
1** Informational 2** Success 3** Redirection 4** Client Error 5** Server Error
200 – OK 403 – Forbidden 404 – Not Found 500 – Internal Server Error
![Page 12: Spider Course Day 1](https://reader036.fdocuments.in/reader036/viewer/2022062704/555ae4c9d8b42a62528b54d0/html5/thumbnails/12.jpg)
SCRIPTING LANGUAGES
Web browser should support the scripting languages.
Apache supports php, python, and many other languages.
Web server executes the php file in the server to produce dynamic HTML content.
Examples are PHP, Python, Ruby, Perl etc…
![Page 13: Spider Course Day 1](https://reader036.fdocuments.in/reader036/viewer/2022062704/555ae4c9d8b42a62528b54d0/html5/thumbnails/13.jpg)
DATABASE SERVER
Data Base Management System is used to store the data.
Most scripting languages have inbuilt API support to connect to the database server and process the data.
The database server can be a separate server or can run in the same server.
Example are MySQL, MSSQL, PostgreSQL
![Page 14: Spider Course Day 1](https://reader036.fdocuments.in/reader036/viewer/2022062704/555ae4c9d8b42a62528b54d0/html5/thumbnails/14.jpg)
WEB MODEL
![Page 15: Spider Course Day 1](https://reader036.fdocuments.in/reader036/viewer/2022062704/555ae4c9d8b42a62528b54d0/html5/thumbnails/15.jpg)
LAMP
WAMP
Role Name
Operating System Linux
Web server Apache
Database MySQL
Scripting Language PHP
Role Name
Operating System Windows
Web server Apache
Database MySQL
Scripting Language PHP
![Page 16: Spider Course Day 1](https://reader036.fdocuments.in/reader036/viewer/2022062704/555ae4c9d8b42a62528b54d0/html5/thumbnails/16.jpg)
LAMP
![Page 17: Spider Course Day 1](https://reader036.fdocuments.in/reader036/viewer/2022062704/555ae4c9d8b42a62528b54d0/html5/thumbnails/17.jpg)
HTTP SESSION STATE
HTTP is a stateless protocol. The advantage of a stateless protocol is that
hosts do not need to retain information about users between requests, but this forces web developers to use alternative methods for maintaining users' states.
A common method for solving this problem involves sending and requesting cookies.
![Page 18: Spider Course Day 1](https://reader036.fdocuments.in/reader036/viewer/2022062704/555ae4c9d8b42a62528b54d0/html5/thumbnails/18.jpg)
STATELESS HTTP
![Page 19: Spider Course Day 1](https://reader036.fdocuments.in/reader036/viewer/2022062704/555ae4c9d8b42a62528b54d0/html5/thumbnails/19.jpg)
COOKIE
![Page 20: Spider Course Day 1](https://reader036.fdocuments.in/reader036/viewer/2022062704/555ae4c9d8b42a62528b54d0/html5/thumbnails/20.jpg)
INTRODUCTION TO UNIFORM SERVERSpider Web Weaving Course.
Day 1
Harishankaran K
![Page 21: Spider Course Day 1](https://reader036.fdocuments.in/reader036/viewer/2022062704/555ae4c9d8b42a62528b54d0/html5/thumbnails/21.jpg)
STEPS
Download Uniserver from the URL mentioned.
Extract the file. Start the server using Server_start file in
the Uniserver folder. Add the files under W:/www folder. Stop the server using Stop file
![Page 22: Spider Course Day 1](https://reader036.fdocuments.in/reader036/viewer/2022062704/555ae4c9d8b42a62528b54d0/html5/thumbnails/22.jpg)
SAMPLE FILES
Index.html <html> <head><title>Hello</title></head> <body> Ha ha ha. He he he. Hoo hoo hoo. My first
web page is ready. :). </body> </html>
![Page 23: Spider Course Day 1](https://reader036.fdocuments.in/reader036/viewer/2022062704/555ae4c9d8b42a62528b54d0/html5/thumbnails/23.jpg)
SAMPLE FILES
index.php <?php echo phpinfo(); ?>
![Page 24: Spider Course Day 1](https://reader036.fdocuments.in/reader036/viewer/2022062704/555ae4c9d8b42a62528b54d0/html5/thumbnails/24.jpg)
Thank you