UAEU-Humanities-AR-2020 · Title: UAEU-Humanities-AR-2020 Created Date: 1/28/2020 3:02:06 PM
Big Data Airline Project at UAEU
-
Upload
ziyad-saleh -
Category
Data & Analytics
-
view
54 -
download
1
Transcript of Big Data Airline Project at UAEU
![Page 1: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/1.jpg)
Big DataAirlines ProjectZIYAD SALEH
![Page 2: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/2.jpg)
What is Big Data? Big data is a broad term for very large or complex data sets that are
difficult to process using traditional data processing applications . Big Data is Terra bytes (1024 GB) of data to be processed and
analyzed, terra bytes of new data is being generated daily, which means the speed of analyzing this huge flow of data is a challenge.
Big data can be described by the 4 Vs which are: Volume, Velocity, Variety and Veracity.
![Page 3: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/3.jpg)
![Page 4: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/4.jpg)
![Page 5: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/5.jpg)
![Page 6: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/6.jpg)
Small Data Vs. Big Data
![Page 7: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/7.jpg)
![Page 9: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/9.jpg)
Map Reduce
![Page 10: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/10.jpg)
Map Reduce model
![Page 11: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/11.jpg)
![Page 12: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/12.jpg)
Project Scope
![Page 13: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/13.jpg)
The Scope is limited to :1. Installing and configuring Hadoop Map/Reduce
platform. 2. Analyzing a big data sample belonging to U.S
domestic flights performance and delay for 5 years to try to figure out
1. Top carriers experiencing delays. 2. Top airports and states with departure delays.
3. Plotting state delay in a thematic map of USA
![Page 14: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/14.jpg)
Source of Data for the project Datasets will be collected
from: U.S. Department of
Transportation's (DOT) – Statistical Computing
![Page 15: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/15.jpg)
Dataset size will be between 500 MB and 1 TB and covering 5 years of flight statistics.
Size of Data
![Page 16: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/16.jpg)
Field Name Description Year Year of the scheduled flightMonth Month of the scheduled flight (1–12).Day Day of the month (1–31).DepTime Actual departure time of the flightCRSDepTime Scheduled departure timeArrTime Actual arrival time in HH/MM formatCRSArrTime Scheduled arrival timeFlightNum Flight number.ArrDelay Arrival delayDepDelay departure delay, in minutesCarrierDelay Delay (in minutes) caused by factors within control of the carrier.WeatherDelay Delay (in minutes) caused by extreme weather conditionsNASDelay Delay (in minutes) within the control of the National Airspace System (NAS)
SecurityDelay Security delay (in minutes) caused by security reasonsLateAircraftDelay Delay (in minutes) due to the same aircraft arriving late at a previous airport.
Table 1 : Airline Dataset Dictionary.
![Page 17: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/17.jpg)
Data Pre-Processing , Processing and Analytics
Data pre-processing:Data will be cleansed and some artifacts will be filtered out as necessary. Many fields in the airline data set need to be discarded as they are irrelevant to the subject of delay that we are concerned on.
Data Processing and Analytics :Data will be processed using java programming on Map/Reduce to reduce the size of the data and produce an organized smaller datasets. Next, the resulting datasets will be analyzed using additional tools like R.
![Page 18: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/18.jpg)
Data Storage
Data will be stored in the HDFS multiple storage nodes with total size between 500 GB and 1 TB.
Airlines Big Data
HDFS
![Page 19: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/19.jpg)
Target Analysis: During the 5 years of all US domestic airlines flight
information
1. Which carriers have the most aggregated delay in their flights ?
2. What are the states with most delays. ) ?
![Page 20: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/20.jpg)
Design
![Page 21: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/21.jpg)
Airlines Project Workflow and Design
Master Node Node 1
Node 2
Node 3
Node 4
Name Node
Job Tracker
Airlines Big Data
Task
Java Code
Reducer Node
HDFSMapper
ReducerTop Airlines
![Page 22: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/22.jpg)
Implementation
![Page 23: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/23.jpg)
Software and Tools1. CentOS Linux Operating System.2. Apache Hadoop3. Cloudera CDH 5.3 virtual machine4. Oracle VM Virtual Box Manager5. Eclipse IDE6. Java (Oracle JDK )7. Maven8. Microsoft Excel and Access 2010.9. The R statistical tool
![Page 24: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/24.jpg)
Mapper :
![Page 25: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/25.jpg)
Reducer:
![Page 26: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/26.jpg)
R:
![Page 27: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/27.jpg)
Findings
![Page 28: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/28.jpg)
US Airlines Delay (Per Carrier)
WN AA OO MQ US DL UA XE NW CO EV 9E FL YV OH B6 AS F9 HA AQ PI HP EA PS TW0
0.2
0.4
0.6
0.8
1
1.2
ArrivalOnTimeArrivalDelaysDepartureOnTimeDepartureDelaysCancellationsDiversions
![Page 29: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/29.jpg)
Thematic Map of US Airlines Delay (Per State)
![Page 30: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/30.jpg)
Conclusion
![Page 31: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/31.jpg)
Conclusion:
Big Data is the large amount of continuously generated data that cannot be processed and analyzed using traditional data management tools .
Big data is a new topic that is rising dramatically , reshaping the future , and a large demand for big data scientist is taking place and will continue to happen during the coming period of time.
Hadoop is an open source framework for storing and processing large datasets using clusters of commodity hardware.
Big Data analytics is attracting both business and policy makers to leverage from this new phenomenon towards more informed decisions and planning for the future.
Big Data now , Normal Data tomorrow.
![Page 32: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/32.jpg)
Big Data Tutorials
![Page 33: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/33.jpg)
Online Big Data Tutorials:
1. Udemy : https://www.udemy.com/course/subscribe/?courseId=336982&dtcode=lGCe31035ujY
2. Udacity : https://www.udacity.com/courses#!/data-science
3. EMC : https://education.emc.com/guest/campaign/data_science.aspx
4. Coursera : https://www.coursera.org/course/datasci
5. CalTech’s : Learning from Data http://work.caltech.edu/telecourse.html
6. MIT : Open Courseware http://ocw.mit.edu/courses/sloan-school-of-management/15-062-data-mining-spring-2003/index.htm
7. Stanford’s OpenClassroom http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning
8. Big Data University : https://bigdatauniversity.com/curriculum-map/
![Page 34: Big Data Airline Project at UAEU](https://reader036.fdocuments.in/reader036/viewer/2022062420/55c74798bb61ebf2268b4689/html5/thumbnails/34.jpg)
Thank You
Ziyad Saleh
34
علمتنا .. بما وانفعنا ينفعنا ما علمنا اللهمعلما وزدنا