Exploring the Enron Email Dataset with Kiji and Hive
-
Upload
wibidata -
Category
Technology
-
view
1.076 -
download
1
description
Transcript of Exploring the Enron Email Dataset with Kiji and Hive
![Page 1: Exploring the Enron Email Dataset with Kiji and Hive](https://reader033.fdocuments.in/reader033/viewer/2022052601/5591d8551a28ab0d468b46bc/html5/thumbnails/1.jpg)
![Page 2: Exploring the Enron Email Dataset with Kiji and Hive](https://reader033.fdocuments.in/reader033/viewer/2022052601/5591d8551a28ab0d468b46bc/html5/thumbnails/2.jpg)
●
●
●
●
![Page 3: Exploring the Enron Email Dataset with Kiji and Hive](https://reader033.fdocuments.in/reader033/viewer/2022052601/5591d8551a28ab0d468b46bc/html5/thumbnails/3.jpg)
●○○○○
●●●
![Page 4: Exploring the Enron Email Dataset with Kiji and Hive](https://reader033.fdocuments.in/reader033/viewer/2022052601/5591d8551a28ab0d468b46bc/html5/thumbnails/4.jpg)
●●●
●
●
![Page 5: Exploring the Enron Email Dataset with Kiji and Hive](https://reader033.fdocuments.in/reader033/viewer/2022052601/5591d8551a28ab0d468b46bc/html5/thumbnails/5.jpg)
![Page 6: Exploring the Enron Email Dataset with Kiji and Hive](https://reader033.fdocuments.in/reader033/viewer/2022052601/5591d8551a28ab0d468b46bc/html5/thumbnails/6.jpg)
![Page 7: Exploring the Enron Email Dataset with Kiji and Hive](https://reader033.fdocuments.in/reader033/viewer/2022052601/5591d8551a28ab0d468b46bc/html5/thumbnails/7.jpg)
…
![Page 9: Exploring the Enron Email Dataset with Kiji and Hive](https://reader033.fdocuments.in/reader033/viewer/2022052601/5591d8551a28ab0d468b46bc/html5/thumbnails/9.jpg)
CREATE EXTERNAL TABLE emails ( mid STRUCT<ts: TIMESTAMP, value: STRING>, dateLong STRUCT<ts: TIMESTAMP, value: BIGINT>, fromStr STRUCT<ts: TIMESTAMP, value: STRING>, toStr STRUCT<ts: TIMESTAMP, value: STRING>, subject STRUCT<ts: TIMESTAMP, value: STRING>, body STRUCT<ts: TIMESTAMP, value: STRING>,) STORED BY 'org.kiji.hive.KijiTableStorageHandler'WITH SERDEPROPERTIES ( 'kiji.columns' = ‘info:mid[0],info:date[0],info:from[0],info:to[0],’ + ‘info:subject[0],info:body[0]’) TBLPROPERTIES ( 'kiji.table.uri' = ' kiji://.env/enron_email/emails ');
![Page 10: Exploring the Enron Email Dataset with Kiji and Hive](https://reader033.fdocuments.in/reader033/viewer/2022052601/5591d8551a28ab0d468b46bc/html5/thumbnails/10.jpg)
SELECT
fromStr.value AS fromStr,
count(1) AS count
FROM emails
GROUP BY fromStr.value
ORDER BY count DESC
LIMIT 10;
![Page 11: Exploring the Enron Email Dataset with Kiji and Hive](https://reader033.fdocuments.in/reader033/viewer/2022052601/5591d8551a28ab0d468b46bc/html5/thumbnails/11.jpg)
![Page 12: Exploring the Enron Email Dataset with Kiji and Hive](https://reader033.fdocuments.in/reader033/viewer/2022052601/5591d8551a28ab0d468b46bc/html5/thumbnails/12.jpg)
SELECT fromStr.value AS fromStr, trim(splitToStr) AS toStr, count(1) AS countFROM emails LATERAL VIEW explode(split(toStr.value,',')) tos AS splitToStrGROUP BY fromStr.value,trim(splitToStr)ORDER BY count DESCLIMIT 10;
![Page 13: Exploring the Enron Email Dataset with Kiji and Hive](https://reader033.fdocuments.in/reader033/viewer/2022052601/5591d8551a28ab0d468b46bc/html5/thumbnails/13.jpg)
![Page 14: Exploring the Enron Email Dataset with Kiji and Hive](https://reader033.fdocuments.in/reader033/viewer/2022052601/5591d8551a28ab0d468b46bc/html5/thumbnails/14.jpg)
●●
●○
○
![Page 15: Exploring the Enron Email Dataset with Kiji and Hive](https://reader033.fdocuments.in/reader033/viewer/2022052601/5591d8551a28ab0d468b46bc/html5/thumbnails/15.jpg)
![Page 16: Exploring the Enron Email Dataset with Kiji and Hive](https://reader033.fdocuments.in/reader033/viewer/2022052601/5591d8551a28ab0d468b46bc/html5/thumbnails/16.jpg)
User Emails
Emails Table Sentiment
Producer
![Page 17: Exploring the Enron Email Dataset with Kiji and Hive](https://reader033.fdocuments.in/reader033/viewer/2022052601/5591d8551a28ab0d468b46bc/html5/thumbnails/17.jpg)
SELECT ((year(datelong.ts)-1999)*52+weekofyear(datelong.ts)) AS weeknum, avg(sentiment.value) AS avgsentiment, stddev(sentiment.value) AS stddevsentiment, count(1) AS nummessagesFROM emailsWHERE regexp_replace(fromStr.value,".*@","")=="enron.com" GROUP BY ((year(datelong.ts)-1999)*52+weekofyear(datelong.ts));
![Page 18: Exploring the Enron Email Dataset with Kiji and Hive](https://reader033.fdocuments.in/reader033/viewer/2022052601/5591d8551a28ab0d468b46bc/html5/thumbnails/18.jpg)
![Page 19: Exploring the Enron Email Dataset with Kiji and Hive](https://reader033.fdocuments.in/reader033/viewer/2022052601/5591d8551a28ab0d468b46bc/html5/thumbnails/19.jpg)
![Page 20: Exploring the Enron Email Dataset with Kiji and Hive](https://reader033.fdocuments.in/reader033/viewer/2022052601/5591d8551a28ab0d468b46bc/html5/thumbnails/20.jpg)
![Page 21: Exploring the Enron Email Dataset with Kiji and Hive](https://reader033.fdocuments.in/reader033/viewer/2022052601/5591d8551a28ab0d468b46bc/html5/thumbnails/21.jpg)
![Page 22: Exploring the Enron Email Dataset with Kiji and Hive](https://reader033.fdocuments.in/reader033/viewer/2022052601/5591d8551a28ab0d468b46bc/html5/thumbnails/22.jpg)
SELECT lword AS word, sum(sentiment) AS totalsentimentFROM ( SELECT mid.value AS mid, lower(word) AS lword, sentiment.value AS sentiment FROM emails LATERAL VIEW explode(sentences(body.value)[0]) wds AS word WHERE regexp_replace(fromStr.value,".*@","")=="enron.com") subqueryGROUP BY lwordORDER BY totalsentiment ASC;
![Page 23: Exploring the Enron Email Dataset with Kiji and Hive](https://reader033.fdocuments.in/reader033/viewer/2022052601/5591d8551a28ab0d468b46bc/html5/thumbnails/23.jpg)
![Page 24: Exploring the Enron Email Dataset with Kiji and Hive](https://reader033.fdocuments.in/reader033/viewer/2022052601/5591d8551a28ab0d468b46bc/html5/thumbnails/24.jpg)
![Page 25: Exploring the Enron Email Dataset with Kiji and Hive](https://reader033.fdocuments.in/reader033/viewer/2022052601/5591d8551a28ab0d468b46bc/html5/thumbnails/25.jpg)
![Page 26: Exploring the Enron Email Dataset with Kiji and Hive](https://reader033.fdocuments.in/reader033/viewer/2022052601/5591d8551a28ab0d468b46bc/html5/thumbnails/26.jpg)
●●
●
●
![Page 27: Exploring the Enron Email Dataset with Kiji and Hive](https://reader033.fdocuments.in/reader033/viewer/2022052601/5591d8551a28ab0d468b46bc/html5/thumbnails/27.jpg)
●
●
●
![Page 28: Exploring the Enron Email Dataset with Kiji and Hive](https://reader033.fdocuments.in/reader033/viewer/2022052601/5591d8551a28ab0d468b46bc/html5/thumbnails/28.jpg)