Training Project Report on Search Engines

Post on 22-Nov-2014

86 views 5 download

Tags:

description

This is a Summer Training Project Report Prepared by me to be submitted in my College... This report consist of a Tiny WEB SQL Search Engine made by me during training period...

Transcript of Training Project Report on Search Engines

1

SHIVAM SAXENAB.Tech CSE1131910024

INDUSTRIAL TRAINING PROJECT REPORT

A Project ReportSubmitted tothe Faculty of

RAKSHPAL BAHADUR COLLEGE OF ENGG. AND TECH.in Partial Fulfilment of the

Requirement forthe Degree of

of Bachelor Of TechnologyIn

Computer Science and Engineering

ON“A Customized Web Search Engine Using a Tiny

WebSQL Query Language”UsingPHP

2

INDUSTRIAL TRAINING PROJECT REPORT

SHIVAM SAXENAB.Tech CSE1131910024

ABSTRACT

This project proposes :

1)the Tiny WebSQL search engine;

2)the Tiny WebSQL query as the restrictions for a spider to collect Web information.

3)A Customized Web Spider (The Tiny Spider)

4)Web-Database Connectivity

3

INTRODUCTION

The following list gives the objectives of this project:1. Study how search engines work.2. Examine simple and advanced services from some popular search engines.3. Define an advanced search method by using an SQL-like language, the Tiny4.WebSQL Query language.

4

5

How much information is on the web?

35 GB? 300 GB? 3 TB? more?Mid 1999 estimate: 800 million pagesMid 2000 estimate: 3 billion (מיליארד) pagesMid 2003 estimate: 15 billion pages + “Deep Web”Google now indexes (only?) well over 4 billion

Early 2001 “Deep Web” estimate: 500 billionHow do you even estimate?How can you find what you are looking for?Doesn’t this remind you of going to the

library???

6

How much information is on the web?

7

Search Engines

E-Commerce

Prof. Sheizaf Rafaeli 8

Not all are American or even English, here, eg., are several Hebrew engines

: וואלהhttp://www.walla.co.il : אחלהhttp://www.achla.co.il : תפוזhttp://www.tapuz.co.il : נענעhttp://www.nana.co.il : סבבהhttp://www.sababa.co.il הארץ וIOL נדב הראל וiguide

How far do people look for results?

E-Commerce

Prof. Sheizaf Rafaeli 10

What do Search Engines search?

They do NOT search the Web! That is, they do not search the web

the very moment you ask for something. Rather they search their databases or indexes

Search engines store the contents of millions of websites in an index or DB, and your query is matched up against that

11

What do Search Engines search?

They don’t even catalog the entire contents of the WWW! Nowhere near, in fact... you only get

what they have! For the most part, they don’t have the

contents of the websites they show you, only links to these sites

12

How do they find it?

They use Spiders, webbots and bots Crawlers, worms, and harvesters Wanderers, indexers, and sitesuckers

What are they? Self-directed browsers which go from

link to link, retrieving all or part of the contents of any given site for inclusion in the search engine's database.

Crawling the Web

Mode of crawl: BFS…

Frequency of crawl: importantrobots.txt gives explicit directions on what not to crawl…

Parallel machines crawl all the time..

Architecture of My Tiny Search Engine

The Web

Ad indexes

Web Results 1 - 10 of about 7,310,000 for miele. (0.12 seconds)

Miele, Inc -- Anything else is a compromise At the heart of your home, Appliances by Miele. ... USA. to miele.com. Residential Appliances. Vacuum Cleaners. Dishwashers. Cooking Appliances. Steam Oven. Coffee System ... www.miele.com/ - 20k - Cached - Similar pages

Miele Welcome to Miele, the home of the very best appliances and kitchens in the world. www.miele.co.uk/ - 3k - Cached - Similar pages

Miele - Deutscher Hersteller von Einbaugeräten, Hausgeräten ... - [ Translate this page ] Das Portal zum Thema Essen & Geniessen online unter www.zu-tisch.de. Miele weltweit ...ein Leben lang. ... Wählen Sie die Miele Vertretung Ihres Landes. www.miele.de/ - 10k - Cached - Similar pages

Herzlich willkommen bei Miele Österreich - [ Translate this page ] Herzlich willkommen bei Miele Österreich Wenn Sie nicht automatisch weitergeleitet werden, klicken Sie bitte hier! HAUSHALTSGERÄTE ... www.miele.at/ - 3k - Cached - Similar pages

Sponsored Links

CG Appliance Express Discount Appliances (650) 756-3931 Same Day Certified Installation www.cgappliance.com San Francisco-Oakland-San Jose, CA Miele Vacuum Cleaners Miele Vacuums- Complete Selection Free Shipping! www.vacuums.com Miele Vacuum Cleaners Miele-Free Air shipping! All models. Helpful advice. www.best-vacuum.com

Tiny Web spider

Indexer

Indexes

Search

User

User Interface Of My Tiny Web Spider

15

User Interface Of My Tiny Search Software

16

17

Know the Keywords

Full-Text Indexing An indexing method in which every

word in the web page is put into the database, with the exception of prepositions, conjuctions, and the like.

Controlled-language indexing How directories are implemented

Both of these are done for you by the Search Engine

18

Know the Keywords

Stemming A type of search that uses the common

root of a word to include all possible occurrences of that word

Example:"child*" would yield results that include

childhood, childless, children, etc.

E-Commerce

Prof. Sheizaf Rafaeli 19

Problems with search engines

Coverage

E-Commerce

Prof. Sheizaf Rafaeli 20

Problems with search engines

Invalid

Links

21

Problems with search engines

22

Search Engines Refer Only A Small Percentage Of Traffic To Web Sites Worldwide

                                                                                       

Are Search Engines truly so important?

E-Commerce

Prof. Sheizaf Rafaeli 23

The “Deep Web”

24

The “Deep Web”

500 times larger than surface web95% of it is public and freeContent in deep web 1000+ times

better quality7,500 TerraBytes (TB) of information45,000 search engines in “surface

web”

25

Meta-Search Engines

Use multiple search engines in parallel to provide an answer to a single query

Front-ends to other search engines and their collections and typically do not contain their own databases

Examples Surfwax, Vivisimo, Ask Jeeves,

Metacrawler, The Mining Company

26

The Best Search Engine is…

Whichever one you can actually find things with Sometimes their indexing is a little more

“natural” to you Some people prefer search engines that

use directories (Yahoo! and others) and some prefer simple indexing (AltaVista and others)

Some people prefer the “human touch” (“bibliographies”, “about” The Mining Company).

27

Resources

www.MetaSpy.com

THANK YOU

SHIVAM SAXENAB.Tech CSE1131910024