c raigslist++
description
Transcript of c raigslist++
![Page 1: c raigslist++](https://reader036.fdocuments.in/reader036/viewer/2022062315/5681658b550346895dd852ef/html5/thumbnails/1.jpg)
craigslist++
sean anastasijoseph chen
tatiana gershanovichandreas sekine
cse454 craigslist++
![Page 2: c raigslist++](https://reader036.fdocuments.in/reader036/viewer/2022062315/5681658b550346895dd852ef/html5/thumbnails/2.jpg)
• to enhance craigslist’s interface– show related items also being sold at craigslist– show related items from other third-party sites
our goal
cse454 craigslist++
![Page 3: c raigslist++](https://reader036.fdocuments.in/reader036/viewer/2022062315/5681658b550346895dd852ef/html5/thumbnails/3.jpg)
• main components– crawler (heretrix)– clusterer (carrot2)– relevance sorting– user interface (greasemonkey)– other stuff
how we do it
cse454 craigslist++
![Page 4: c raigslist++](https://reader036.fdocuments.in/reader036/viewer/2022062315/5681658b550346895dd852ef/html5/thumbnails/4.jpg)
• specific crawling needs– volatile data– questionable legalities
• heritrix– only crawling one domain– problematic setup
• our setup– 2 crawlers for new posts, 1 cleaner
crawler
cse454 craigslist++
![Page 5: c raigslist++](https://reader036.fdocuments.in/reader036/viewer/2022062315/5681658b550346895dd852ef/html5/thumbnails/5.jpg)
• Carrot2– what to cluster (title, body or title + body)?– need of reclustering and combination
• WordNet– combination of synonym clusters
clusterer
cse454 craigslist++
![Page 6: c raigslist++](https://reader036.fdocuments.in/reader036/viewer/2022062315/5681658b550346895dd852ef/html5/thumbnails/6.jpg)
relevance sorting
cse454 craigslist++
![Page 7: c raigslist++](https://reader036.fdocuments.in/reader036/viewer/2022062315/5681658b550346895dd852ef/html5/thumbnails/7.jpg)
relevance sorting (cont.)
cse454 craigslist++
![Page 8: c raigslist++](https://reader036.fdocuments.in/reader036/viewer/2022062315/5681658b550346895dd852ef/html5/thumbnails/8.jpg)
• greasemonkey– show related posts (grouped by clusters)– show which items have data
• jquery– folding item lists– mouseover details/images
user interface
cse454 craigslist++
![Page 9: c raigslist++](https://reader036.fdocuments.in/reader036/viewer/2022062315/5681658b550346895dd852ef/html5/thumbnails/9.jpg)
• amazon product advertising api• yahoo term extraction• botnet
other
cse454 craigslist++
![Page 10: c raigslist++](https://reader036.fdocuments.in/reader036/viewer/2022062315/5681658b550346895dd852ef/html5/thumbnails/10.jpg)
• greasemonkey plugin– https://addons.mozilla.org/en-US/firefox/addon/748
• craigslist++ script– http://cubist.cs.washington.edu/~lidor7/craigslistpp.user.js
• craigslist– http://seattle.craigslist.org/
demo
cse454 craigslist++