Meshwork - Insight Data Engineering Project
-
Upload
justin-cano -
Category
Technology
-
view
213 -
download
2
Transcript of Meshwork - Insight Data Engineering Project
Motivation • The internet is huge • How does your page rank amongst others in your mesh?
• What is the reach of your website? Which pages are affecting your page rank?
Data Source • Common Crawl Organization
• More than 7 years of web page data, over 500TB • CC April 2015 web corpus ~168TB • Processed ~445GB for project • Readily available in S3
About Me Justin Cano UC Riverside BS Computer Engineering
Previous work experience Software Engineer @
Hobbies I like building things!
• Hardware, software Learning and using new technologies Moviegoer Outdoor activities: biking, snowboarding Interests: design, app dev Favorite TV Shows: Futurama & The Daily Show
Embedded Systems Developer @
Software Engineer Intern @