ENGS4 2004 Lecture 4 ENGS 4 - Lecture 4 Technology of Cyberspace Winter 2004 Thayer School of...

23
ENGS4 2004 Lecture 4 ENGS 4 - Lecture 4 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko, x6-3843 [email protected] Assistant: Sharon Cooper (“Shay”), x6-3546 Course webpage: www.whoopis.com/engs4
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    0

Transcript of ENGS4 2004 Lecture 4 ENGS 4 - Lecture 4 Technology of Cyberspace Winter 2004 Thayer School of...

Page 1: ENGS4 2004 Lecture 4 ENGS 4 - Lecture 4 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 4

ENGS 4 - Lecture 4 Technology of Cyberspace

Winter 2004Thayer School of Engineering

Dartmouth College

Instructor: George Cybenko, x6-3843

[email protected]

Assistant: Sharon Cooper (“Shay”), x6-3546

Course webpage: www.whoopis.com/engs4

Page 2: ENGS4 2004 Lecture 4 ENGS 4 - Lecture 4 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 4

Today’s Class• Discussion of Morgridge’s talk• Assignment (due Jan 20)• Web page and HTML status• Basics of search technologies• Break• Phillip’s presentation• Chad’s presentation• Rule-based and expert systems

Page 3: ENGS4 2004 Lecture 4 ENGS 4 - Lecture 4 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 4

Cisco – Technology and Culture

• What were your reactions?• What were his main points?• Was it an effective presentation?• How could it have been improved?• What are Cisco’s strengths?• What are Cisco’s weaknesses?• Would you invest in Cisco?• Would you want to work for Cisco?

Page 4: ENGS4 2004 Lecture 4 ENGS 4 - Lecture 4 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 4

Homework 1 – Due Jan 201. Estimate the number of bytes in the ORC (2003-2004

edition, printed)

2. How much time would downloading it require on a 56 kbps modem line?

3. How much time would downloading it require on a 10 mbps ethernet?

4. How much time would downloading it require on a 100 mbps ethernet?

5. What is the bandwidth and latency of the NASA Mars Rover to earth channel?

6. Create a web page with the answers to these questions on the webpage.

Page 5: ENGS4 2004 Lecture 4 ENGS 4 - Lecture 4 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 4

Homework 1 – Due Jan 20

Create a web page with the answers to these questions on the webpage.

Page 6: ENGS4 2004 Lecture 4 ENGS 4 - Lecture 4 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 4

Web pages and basic HTML

• Questions?• Have you tested your web account?• Try something simple first and build up from that

– simple page with “hello”, upload and test it, add text, add graphics, etc.

• Read about more advanced HTML and try to use advanced constructs

• Try to copy interesting/clever constructs you have seen on other pages

Page 7: ENGS4 2004 Lecture 4 ENGS 4 - Lecture 4 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 4

Basic web search technology

• Visit

www.google.com

www.excite.com

• What are some differences?

• How does the basic technology work?

Page 8: ENGS4 2004 Lecture 4 ENGS 4 - Lecture 4 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 4

Vector Space Model in Information Retrieval

• List all words in your “dictionary”– eg aardvark, aaron, able, act, advertise, bad, …

• A “stop list” consists of words too common to be useful for retrieval – eg, the, is, a, up

• Process a document to obtain a “vector” of word frequencies:– “Aaron the acting aardvark was able to join the Aardvark

Society of Actors.”– becomes (2, 1, 1, 0, 0, 1, …)– this is a document word-frequency representation– no syntax, grammar, semantics…just words

Page 9: ENGS4 2004 Lecture 4 ENGS 4 - Lecture 4 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 4

Comparing two word-frequency vectors

aardvark, aaron, able, act, advertise, bad, …• Another document:

– “An aardvark would be a bad actor.”– becomes (1, 0, 0, 0, 0, 1,...)

– “Aaron the acting aardvark was able to join the Aardvark Society of Actors.”

– was (2, 1, 1, 0, 0, 1, …)

• The score between the two documents is obtained by multiplying coordinates and adding.

Page 10: ENGS4 2004 Lecture 4 ENGS 4 - Lecture 4 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 4

Comparing two word-frequency vectors

– “An aardvark would be a bad actor.”– is (1, 0, 0, 0, 0, 0)– “Aaron the acting aardvark was able to join the Aardvark

Society of Actors.”– is (2, 1, 1, 0, 0, 1, …)

• The score between the two documents is obtained by multiplying coordinates and adding.

• 1*2 + 0*1 + 0*1 + … = 2• Stemming: reduce words to roots (ie actor, actors,

acting, etc have “act” as root.• Score becomes larger. IE, 4.

Page 11: ENGS4 2004 Lecture 4 ENGS 4 - Lecture 4 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 4

First generation web search• Each web page is represented as a word-

frequency vector after stemming and other normalizations

• a user search is made into another word-frequency vector

• the search vector is compared against web page vectors that have been indexed

• pages with the highest scores are listed as results for that search

Page 12: ENGS4 2004 Lecture 4 ENGS 4 - Lecture 4 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 4

Embellishments

• Adding searches to retailers and others who pay for their pages to be ranked highly (Seems like Excite does that…how does Google handle this revenue opportunity?)

• Taking the highest ranking pages and doing some more advanced processing to determine the “hubs” and “authorities” (Google does something like this)

• Mini-lecture topic – economics of search engines, revenue models, etc

Page 13: ENGS4 2004 Lecture 4 ENGS 4 - Lecture 4 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 4

Hubs and authorities

page

page

page

page

page

pagepage

page

page

page

page

page

pagepage

pagehub

page authority

Google ranks hubs andauthorities differently thanother pages

Page 14: ENGS4 2004 Lecture 4 ENGS 4 - Lecture 4 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 4

Small world graphs

• Social networks

• Biological networks

• Infrastructure networks

• Kevin Bacon

• Milliken’s experiment

• Power-law distributions

• Mini-lecture topic…volunteers?

Page 15: ENGS4 2004 Lecture 4 ENGS 4 - Lecture 4 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 4

Break

Page 16: ENGS4 2004 Lecture 4 ENGS 4 - Lecture 4 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 4

Phillip’s Mini-lecture

Page 17: ENGS4 2004 Lecture 4 ENGS 4 - Lecture 4 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 4

Chad’s Mini-lecture

Page 18: ENGS4 2004 Lecture 4 ENGS 4 - Lecture 4 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 4

Rule-based systems• Visit advanced search in Google• User constructs a “Boolean query”• EG.

– must include – dartmouth, hockey– may include – women female– does not include – men

• Boolean expression is: “dartmouth and hockey and (women or female) and (not men)”

Page 19: ENGS4 2004 Lecture 4 ENGS 4 - Lecture 4 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 4

Aristotelian logic• predicates: A, B, C, etc.• Basic operators:

– and : A and B true when both true– or: A or B true when either true– not: not A true when A false

• Derived operators: if A then B– true providing B is true whenever A is true– only false when A is true but B is false– (if A then B) is equivalent to: not (A and not B)

Page 20: ENGS4 2004 Lecture 4 ENGS 4 - Lecture 4 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 4

Aristotelian Logic at work

• Set of “rules”– eg All humans are mortal.– logical form: “if (x is human) then (x is mortal).”– (x is human) is a predicate with variable x

• Set of “axioms”: statements known to be true– eg (Aristotle is human).

• Combine them to get: Aristotle is mortal.

Page 21: ENGS4 2004 Lecture 4 ENGS 4 - Lecture 4 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 4

Such logic is the basis for “expert systems” or “rule-based systems”

• Early automated medical diagnosis• Maintenance procedures for complicated

machinery (cars, planes, etc)• It is the easiest and most prevalent way to

implement some sort of “artificial intelligence”• What are the limitations?

– inability to deal with uncertainty (ie probability)– large sets of rules developed by many people often

become inconsistent, brittle, unmaintainable

Page 22: ENGS4 2004 Lecture 4 ENGS 4 - Lecture 4 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 4

Next lecture

• Classical uses of rule-based systems to “predict the future” with distributed information

• Current uses of rule-based systems on the internet

• Critique

Page 23: ENGS4 2004 Lecture 4 ENGS 4 - Lecture 4 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,

ENGS4 2004 Lecture 4

Mini-lecture topics

• Technology behind recommender systems such as Amazon, Netflix, etc.