Web Log Data Analytics with Hadoop Presented by Yang-Syuan Chen, 2015-12-05.

23
Web Log Data Analytics with Hadoop Presented by Yang-Syuan Chen, 2015-12-05

Transcript of Web Log Data Analytics with Hadoop Presented by Yang-Syuan Chen, 2015-12-05.

Page 1: Web Log Data Analytics with Hadoop Presented by Yang-Syuan Chen, 2015-12-05.

Web Log Data Analytics with Hadoop

Presented by Yang-Syuan Chen, 2015-12-05

Page 2: Web Log Data Analytics with Hadoop Presented by Yang-Syuan Chen, 2015-12-05.

2

Outline• Analyzing Web Application Log Files with Hadoop

▫ Introduction to Cloud Computing▫Hadoop: An Overview▫System Architecture and Implementation▫Result of Analyzing Web Application Log

• LASyM: A Learning Analytics System for MOOCs▫ Learning Analytics for MOOCs▫ LASyM: Architecture, Implementation and Evaluation

• Big Log Analysis for E-Learning Ecosystem▫Characteristics of Big Log Analysis▫ Logging Architecture for E-learning Ecosystem▫Applications of Logging Architecture

• Conclusion

Page 3: Web Log Data Analytics with Hadoop Presented by Yang-Syuan Chen, 2015-12-05.

Analyzing Web Application Log Files to Find Hit Count through the Utilization of Hadoop

Mapreduce in Cloud Computing Environment

Sayalee Narkhede, Trupti Baraskar, Debajyoti MukhopadhyayConference on IT in Business, Industry and Government (CSIBIG), March. 8-9, 2014, pp.1-7

Page 4: Web Log Data Analytics with Hadoop Presented by Yang-Syuan Chen, 2015-12-05.

4

Introduction to Cloud Computing

•Cloud computing is a kind of internet-based computing, where shared resources and information are provided to computers and other devices on-demand.▫Characteristic: low cost hardware, storage capacity,

increase in computing power and huge data size.•The main challenge in the cloud is how to effectively

store, query, analyze, and utilize immense datasets.▫Solution: MapReduce model & Hadoop

• Log files contain tons of information which is useful for making business decisions and future assessment.

Page 5: Web Log Data Analytics with Hadoop Presented by Yang-Syuan Chen, 2015-12-05.

5

Hadoop: An Overview

•Hadoop▫Open-source framework ▫Distributed processing of massive data sets on

clusters.•HDFS

▫Split the file into blocks which allocated in the nodes.▫Duplication mechanism gives reliability and

availability regardless of node failures.•MapReduce

▫MapReduce delivers a mechanism for programmers to process the data sets on a distributed system.

Fig. 1 MapReduce Framework

Page 6: Web Log Data Analytics with Hadoop Presented by Yang-Syuan Chen, 2015-12-05.

6

System Architecture and Implementation

•The system is composed of two phases involving log preprocessing and analysis phase.

Preprocessing

Analysis

Fig. 2 System Workflow Fig. 3 Preprocessed Log File

Page 7: Web Log Data Analytics with Hadoop Presented by Yang-Syuan Chen, 2015-12-05.

7

System Architecture and Implementation

Fig. 4 System ArchitectureFig. 5 Results of Analysis

Page 8: Web Log Data Analytics with Hadoop Presented by Yang-Syuan Chen, 2015-12-05.

8

Result of Analyzing Web Application Log

Fig. 7 Performance of Different Clusters

Fig. 6 Hits for Each City (Bar Chart)Hits for Each Quarter of the Year(Pie Chart)

Page 9: Web Log Data Analytics with Hadoop Presented by Yang-Syuan Chen, 2015-12-05.

LASyM: A Learning Analytics System for MOOCs

Yassine Tabaa, Abdellatif MedouriInternational Journal of Advanced Computer Science and Applications, Vol. 4, No. 5, 2013, pp. 113-119

Page 10: Web Log Data Analytics with Hadoop Presented by Yang-Syuan Chen, 2015-12-05.

10

Learning Analytics for MOOCs

•MOOCs▫An online course aimed at unlimited participation and

open access via the web. Two features: Open accessibility & Scalability

• Learning Analytics▫The measurement, collection, analysis and reporting

of data about learners.▫For purposes of understanding and optimizing

learning and the environments in which it occurs.

Page 11: Web Log Data Analytics with Hadoop Presented by Yang-Syuan Chen, 2015-12-05.

11

Learning Analytics for MOOCs

•MOOCs' Big Data▫Coursera in 2012

3.1 million students 332 courses

▫EdX in 2014 2.5 million students Over 200 courses

•Our Solution▫LASyM, a Learning

Analytics System forMOOCs.

Fig. 8 Lifecycle of MOOCs’ big data

Page 12: Web Log Data Analytics with Hadoop Presented by Yang-Syuan Chen, 2015-12-05.

12

Learning Analytics for MOOCs

•MOOC Student Patterns▫Based on the Phil classification of student types in a

coursera-MOOC style, we redefine selected groups in the following modified classification list: Ghosts Observers Non-completers Passive Participants Active Participants

“at-risk student”

Page 13: Web Log Data Analytics with Hadoop Presented by Yang-Syuan Chen, 2015-12-05.

13

Learning Analytics for MOOCs

•A method to identify "at-risk" students in MOOC environments.▫Two principal characteristics:

Interaction

Persistence

▫ Engagement Degree

Fig. 9 Method of identify “at-risk” learners

Page 14: Web Log Data Analytics with Hadoop Presented by Yang-Syuan Chen, 2015-12-05.

14

LASyM:Architecture, Implementation and Evaluation

•Experimental setup▫Based on Hadoop▫1 resource manager▫12 nodes▫Data Integrator▫Based on MapReduce

application

Fig. 10 LASyM Architecture

Page 15: Web Log Data Analytics with Hadoop Presented by Yang-Syuan Chen, 2015-12-05.

15

LASyM:Architecture, Implementation and Evaluation

•Evaluation▫We executed the developed MapReduce-based

application into LASyM in different number of parallel nodes.

Fig. 11 Learning analytics speedup using LASyM

Page 16: Web Log Data Analytics with Hadoop Presented by Yang-Syuan Chen, 2015-12-05.

Big Log Analysis for E-Learning Ecosystem

Qinghua Zheng, Huan He, Tian Ma, Ni Xue, Bing Li, Bo DongIEEE 11th International Conference on e-Business Engineering (ICEBE) , Nov. 5-7, 2014, pp.258-263

Page 17: Web Log Data Analytics with Hadoop Presented by Yang-Syuan Chen, 2015-12-05.

17

Characteristics of Big Log Analysis

•There are three challenges to take full advantage of e-Learning log data: 1. Multi-dimension data2. Massive log data with various sources, formats and

applications of the ecosystem.3. Complexity and variety of log analysis.

Page 18: Web Log Data Analytics with Hadoop Presented by Yang-Syuan Chen, 2015-12-05.

18

Characteristics of Big Log Analysis

Page 19: Web Log Data Analytics with Hadoop Presented by Yang-Syuan Chen, 2015-12-05.

19

Logging Architecture for E-learning Ecosystem

•The logging architecture consists of five modules.

Fig. 12 Logging architecuture for e-Learning ecosystem

Page 20: Web Log Data Analytics with Hadoop Presented by Yang-Syuan Chen, 2015-12-05.

20

Applications of Logging Architecture

•Computing students' admission and attendance situations.

Fig. 13 An implemented logging architecture based on BlueSky ecosystem

Page 21: Web Log Data Analytics with Hadoop Presented by Yang-Syuan Chen, 2015-12-05.

21

Applications of Logging Architecture

•Results of log analytics▫Raw data size: 17 GB (15,489,655 rows) ▫Results of log analytics data size: 1.2 GB

Fig. 14 Statistics for number of students attending class Fig. 15 Statistics for total number of all online students

Page 22: Web Log Data Analytics with Hadoop Presented by Yang-Syuan Chen, 2015-12-05.

22

Analyzing Web Application Log

Files with Hadoop

Big LogAnalysis forE-Learning Ecosystem

LASyM: A Learning Analytics System for MOOCs

Conclusion

•The Correlation

E-Learning

Hadoop

Loganalysis

Web appanalytics

Page 23: Web Log Data Analytics with Hadoop Presented by Yang-Syuan Chen, 2015-12-05.

23

Conclusion

•Hadoop reduced the latency time to analyze the huge amount of data.

•The data analytics life cycle from the architectures: ▫Collection, Transport, Storage, Computation and

Service•The architecture which covers e-learning ecosystem

is more complexity but also provides more analysis services.