Recommendation System for Opinion Articles in Turkish ...canf/CS533/hwSpring13... · Article...

12
Recommendation System for Opinion Articles in Turkish Newspapers Üstün Özgür

Transcript of Recommendation System for Opinion Articles in Turkish ...canf/CS533/hwSpring13... · Article...

Page 1: Recommendation System for Opinion Articles in Turkish ...canf/CS533/hwSpring13... · Article Metadata Scraper (contd) Rewritten in node.js Due to impedance mismatch between developer

Recommendation System for Opinion Articles in Turkish Newspapers

Üstün Özgür

Page 2: Recommendation System for Opinion Articles in Turkish ...canf/CS533/hwSpring13... · Article Metadata Scraper (contd) Rewritten in node.js Due to impedance mismatch between developer

System Components

● Article Metadata Scraper● Article Metadata Consumer● Article Text Extractor● Article Text Analyzer

Page 3: Recommendation System for Opinion Articles in Turkish ...canf/CS533/hwSpring13... · Article Metadata Scraper (contd) Rewritten in node.js Due to impedance mismatch between developer

Article Metadata Scraper

● Article Metadata Consumer ● Article Text Scraper

● Article Text Analyzer

Page 4: Recommendation System for Opinion Articles in Turkish ...canf/CS533/hwSpring13... · Article Metadata Scraper (contd) Rewritten in node.js Due to impedance mismatch between developer

Article Metadata Scraper

Page 5: Recommendation System for Opinion Articles in Turkish ...canf/CS533/hwSpring13... · Article Metadata Scraper (contd) Rewritten in node.js Due to impedance mismatch between developer

Article Metadata Scraper (contd)

● Rewritten in node.js● Due to impedance mismatch between

developer tools an Python● Outputs a JSON document containing an array

of documents● Each document has several metadata, such as

author name, newspaper name, article link

Page 6: Recommendation System for Opinion Articles in Turkish ...canf/CS533/hwSpring13... · Article Metadata Scraper (contd) Rewritten in node.js Due to impedance mismatch between developer
Page 7: Recommendation System for Opinion Articles in Turkish ...canf/CS533/hwSpring13... · Article Metadata Scraper (contd) Rewritten in node.js Due to impedance mismatch between developer

●Article Metadata Consumer

● Existing Python codebase modified● Data stored in RDMS● Just consumes incoming data● “Dumb” on purpose

Page 8: Recommendation System for Opinion Articles in Turkish ...canf/CS533/hwSpring13... · Article Metadata Scraper (contd) Rewritten in node.js Due to impedance mismatch between developer

●Article Text Extractor

● Consumes either the output of metadata scraper (currently implemented) or metadata consumer

● Separate scrapers for each article content

Page 9: Recommendation System for Opinion Articles in Turkish ...canf/CS533/hwSpring13... · Article Metadata Scraper (contd) Rewritten in node.js Due to impedance mismatch between developer
Page 10: Recommendation System for Opinion Articles in Turkish ...canf/CS533/hwSpring13... · Article Metadata Scraper (contd) Rewritten in node.js Due to impedance mismatch between developer

●Article Text Analyzer

Page 11: Recommendation System for Opinion Articles in Turkish ...canf/CS533/hwSpring13... · Article Metadata Scraper (contd) Rewritten in node.js Due to impedance mismatch between developer

Demo

● http://localhost:3000/yazi-short/286● http://localhost:3000/yazi-short/100

http://localhost:3000/yazi-short/3

Page 12: Recommendation System for Opinion Articles in Turkish ...canf/CS533/hwSpring13... · Article Metadata Scraper (contd) Rewritten in node.js Due to impedance mismatch between developer

Remaining Work

● More sophisticated comparison methods● Other similarity measures● Most common words and phrases for

categorization– Documents containing those