Web Log Data Analytics with Hadoop Presented by Yang-Syuan Chen, 2015-12-05.
-
Upload
katherine-glenn -
Category
Documents
-
view
212 -
download
0
Transcript of Web Log Data Analytics with Hadoop Presented by Yang-Syuan Chen, 2015-12-05.
Web Log Data Analytics with Hadoop
Presented by Yang-Syuan Chen, 2015-12-05
2
Outline• Analyzing Web Application Log Files with Hadoop
▫ Introduction to Cloud Computing▫Hadoop: An Overview▫System Architecture and Implementation▫Result of Analyzing Web Application Log
• LASyM: A Learning Analytics System for MOOCs▫ Learning Analytics for MOOCs▫ LASyM: Architecture, Implementation and Evaluation
• Big Log Analysis for E-Learning Ecosystem▫Characteristics of Big Log Analysis▫ Logging Architecture for E-learning Ecosystem▫Applications of Logging Architecture
• Conclusion
Analyzing Web Application Log Files to Find Hit Count through the Utilization of Hadoop
Mapreduce in Cloud Computing Environment
Sayalee Narkhede, Trupti Baraskar, Debajyoti MukhopadhyayConference on IT in Business, Industry and Government (CSIBIG), March. 8-9, 2014, pp.1-7
4
Introduction to Cloud Computing
•Cloud computing is a kind of internet-based computing, where shared resources and information are provided to computers and other devices on-demand.▫Characteristic: low cost hardware, storage capacity,
increase in computing power and huge data size.•The main challenge in the cloud is how to effectively
store, query, analyze, and utilize immense datasets.▫Solution: MapReduce model & Hadoop
• Log files contain tons of information which is useful for making business decisions and future assessment.
5
Hadoop: An Overview
•Hadoop▫Open-source framework ▫Distributed processing of massive data sets on
clusters.•HDFS
▫Split the file into blocks which allocated in the nodes.▫Duplication mechanism gives reliability and
availability regardless of node failures.•MapReduce
▫MapReduce delivers a mechanism for programmers to process the data sets on a distributed system.
Fig. 1 MapReduce Framework
6
System Architecture and Implementation
•The system is composed of two phases involving log preprocessing and analysis phase.
Preprocessing
Analysis
Fig. 2 System Workflow Fig. 3 Preprocessed Log File
7
System Architecture and Implementation
Fig. 4 System ArchitectureFig. 5 Results of Analysis
8
Result of Analyzing Web Application Log
Fig. 7 Performance of Different Clusters
Fig. 6 Hits for Each City (Bar Chart)Hits for Each Quarter of the Year(Pie Chart)
LASyM: A Learning Analytics System for MOOCs
Yassine Tabaa, Abdellatif MedouriInternational Journal of Advanced Computer Science and Applications, Vol. 4, No. 5, 2013, pp. 113-119
10
Learning Analytics for MOOCs
•MOOCs▫An online course aimed at unlimited participation and
open access via the web. Two features: Open accessibility & Scalability
• Learning Analytics▫The measurement, collection, analysis and reporting
of data about learners.▫For purposes of understanding and optimizing
learning and the environments in which it occurs.
11
Learning Analytics for MOOCs
•MOOCs' Big Data▫Coursera in 2012
3.1 million students 332 courses
▫EdX in 2014 2.5 million students Over 200 courses
•Our Solution▫LASyM, a Learning
Analytics System forMOOCs.
Fig. 8 Lifecycle of MOOCs’ big data
12
Learning Analytics for MOOCs
•MOOC Student Patterns▫Based on the Phil classification of student types in a
coursera-MOOC style, we redefine selected groups in the following modified classification list: Ghosts Observers Non-completers Passive Participants Active Participants
“at-risk student”
13
Learning Analytics for MOOCs
•A method to identify "at-risk" students in MOOC environments.▫Two principal characteristics:
Interaction
Persistence
▫ Engagement Degree
Fig. 9 Method of identify “at-risk” learners
14
LASyM:Architecture, Implementation and Evaluation
•Experimental setup▫Based on Hadoop▫1 resource manager▫12 nodes▫Data Integrator▫Based on MapReduce
application
Fig. 10 LASyM Architecture
15
LASyM:Architecture, Implementation and Evaluation
•Evaluation▫We executed the developed MapReduce-based
application into LASyM in different number of parallel nodes.
Fig. 11 Learning analytics speedup using LASyM
Big Log Analysis for E-Learning Ecosystem
Qinghua Zheng, Huan He, Tian Ma, Ni Xue, Bing Li, Bo DongIEEE 11th International Conference on e-Business Engineering (ICEBE) , Nov. 5-7, 2014, pp.258-263
17
Characteristics of Big Log Analysis
•There are three challenges to take full advantage of e-Learning log data: 1. Multi-dimension data2. Massive log data with various sources, formats and
applications of the ecosystem.3. Complexity and variety of log analysis.
18
Characteristics of Big Log Analysis
19
Logging Architecture for E-learning Ecosystem
•The logging architecture consists of five modules.
Fig. 12 Logging architecuture for e-Learning ecosystem
20
Applications of Logging Architecture
•Computing students' admission and attendance situations.
Fig. 13 An implemented logging architecture based on BlueSky ecosystem
21
Applications of Logging Architecture
•Results of log analytics▫Raw data size: 17 GB (15,489,655 rows) ▫Results of log analytics data size: 1.2 GB
Fig. 14 Statistics for number of students attending class Fig. 15 Statistics for total number of all online students
22
Analyzing Web Application Log
Files with Hadoop
Big LogAnalysis forE-Learning Ecosystem
LASyM: A Learning Analytics System for MOOCs
Conclusion
•The Correlation
E-Learning
Hadoop
Loganalysis
Web appanalytics
23
Conclusion
•Hadoop reduced the latency time to analyze the huge amount of data.
•The data analytics life cycle from the architectures: ▫Collection, Transport, Storage, Computation and
Service•The architecture which covers e-learning ecosystem
is more complexity but also provides more analysis services.