The United States air transportation network analysis
description
Transcript of The United States air transportation network analysis
The United States air transportation network analysis
Dorothy Cheung
Introduction• The problem and its importance
• Missing Pieces
• Related works in summary
• Methodology– Data set– Network Generation– Network Analysis
• Conclusion
Outline• The problem and its importance
• Missing Pieces
• Related works
• Methodology– Data set– Network Generation– Network Analysis
• Conclusion
The problem and its importance
• Problem– Analysis the air transportation network in the U.S.
• Network driven by profits and politics• Better understand the network structure not maximize
utility
• Importance– Economy: transport of good and services– Air traffic flow: convenience– Health studies: propagation of diseases
Outline• The problem and its importance
• Missing Pieces
• Related works
• Methodology– Data set– Network Generation– Network Analysis
• Conclusion
Missing pieces
• Sufficient amount of researches on the network with focuses on utility optimization.
• Commercial enterprises: OAG and Innovata
• But … lack of research on analyzing the network features studied in class.
Outline• The problem and its importance
• Missing Pieces
• Related works
• Methodology– Data set– Network Generation– Network Analysis
• Conclusion
Related worksAir transportation networks analysis
• WAN – World-wide Airport Network
• ANI – Airport Network of India
• ANC – Airport Network of China
Related worksSummary:
Features of air transportation networks
• Small world network (compared with random graphs)
– Small average shortest path– High average clustering coefficient– Degree mixing differs
• Scale free power law degree distribution
WAN ANI ANCAvg. shortest path 4.4 4 2.067
Avg. Clustering Coef. 0.62 0.6574 0.733
Degree mixing Associative Dissociative Dissociative
WAN ANI ANC
Power law exponent
1.0 2.2 +/- 0.1 1.65
Outline• The problem and its importance
• Missing Pieces
• Related works
• Methodology– Data set– Network Generation– Network Analysis
• Conclusion
Methodology
• Data Set
• Network Generation
• Network Analysis
Methodology – Data Set
Legends
OAI : Office of Airline Information RITA : Research and Innovative Technology AdministrationBTS : Bureau of Transportation Statistics
T100
OAI RITA
BTS DATABASE
My data
Methodology – Data Set
Domestic Air Traffic Hubs [1]
Methodology – Data Set• Domestic scheduled flights– Passengers, cargos, and mails– Military excluded
• Market Data vs. Segment Data– Market : Used
• Accounts for passenger once on the same flight number– Segment : Not used
• Accounts for passenger more than once per leg
• Month specific : July 2011
Methodology – Data Set• Relevant information• Number of Passengers
• Number of Cargos : Freight and Mail
• Origin City
• Destination City
PASSENGERS FREIGHT MAIL ORIGIN_CITY_NAME DEST_CITY_NAME
DEST_CITY_NUM
DEST_STATE_ABR
DEST_STATE_FIPS
DEST_STATE_NM DEST_WAC YEAR QUARTER MONTH
DISTANCE_GROUP CLASS
59 700 17 Akhiok, AK Kodiak, AK 1017 AK 2 Alaska 1 2011 3 7 1 F19 200 2 Akhiok, AK Kodiak, AK 1017 AK 2 Alaska 1 2011 3 7 1 L24 0 0 Akhiok, AK Kodiak, AK 1017 AK 2 Alaska 1 2011 3 7 1 F
2 0 0 Akiachak, AK Akiak, AK 1024 AK 2 Alaska 1 2011 3 7 1 F176 47748 2250 Adak Island, AK Anchorage, AK 1029 AK 2 Alaska 1 2011 3 7 3 F
20 0 0 Adak Island, AK Anchorage, AK 1029 AK 2 Alaska 1 2011 3 7 3 L105 28 320 Akiachak, AK Bethel, AK 1055 AK 2 Alaska 1 2011 3 7 1 F
Sample .csv from BTS
Methodology – Network Generation
• Network– 850 Nodes: airports
– 21405 entries• Weighted edges: sum of passengers and cargos
– Directed and Undirected network input files for Pajak [2] and GUESS [5].
Methodology – Network Generation
Microsoft.Jet.OLEDB4.0Provider
ParseCSV
GenerateNwk
Data Table
.CSV
PajekDirected.net
PajekUndirected.net
GUESSDirected.gdf
GUESSUndirected.gdf
LINQ
Network Generation Tool written in C# using LINQ (Language Integrated Query)
Methodology – Network Generation
The U.S. Air Transportation Network drawn in Pajek
Methodology – Network Analysis• Metrics
– Degree distributions and correlations• Top 10 most connected cities• Top 10 most central cites
– Small world network?• Shortest path length• Clustering coefficient• Compare against WAN, ANI, and ANC
– Cumulative degree distribution and the power law
– Resilience
– Associativity : Rich-club?
– Random graph
– Z-Score TBD?
Methodology – Network Analysis– Degree distributions and correlations
• Directed network• Pajek:
In degree : Net -> Partitions -> Degree -> Input Out degree : Net -> Partitions -> Degree -> Output Both : Net -> Partitions -> Degree -> All
– Shortest path length• Directed network• Pajek:
Net -> Paths between 2 vertices -> Diameter
– Clustering coefficient• Directed network• Pajek:
Net -> Paths between 2 vertices -> Diameter
Methodology – Network Analysis– Cumulative degree distribution and the power law
• Directed networkStep 1 in Pajek:
– Create a partition of all degree– Export the partition in a tab delimited file Tools -> Export to Tab Delimited File -> Current Partition
Step 2 in MatLab [6]: – Generating a power law integer distributionX = GetInput.m : reads the partition from the tab delimited file (X => X.name, X.label, X.degree)– Calculating the cumulative distributioncumulativecounts.m [4][xlincumulative,ylincumulative] = cumulativecounts(X.degree)
Methodology – Network Analysis– ResilienceWhat % of nodes are removed to reduce the size of the Giant component by half?
• Consider:– Random attack– Targeted attack : remove nodes with the highest degree and betweenness
centrality measures
• Undirected network with 850 nodes
• GUESS toolbars: resiliencedegree.py and resiliencebetweenness.py that are downloaded from cTools [4]
• Compare against a random network (Random and targeted attacks)GUESS : makeSimpleRandom(numberOfNodes, numberOfEdges)=> numberOfNodes = 850 numberOfEdges = 21405
Methodology – Network Analysis
– Associativity : Rich-club?• Draw conclusion from graphical analysis in GUESS
– Random graph• Difficulty in constructing a realistic random network
that models the real network [3].
– Z-Score?• To Be Determined.
Methodology – Network Analysis
• Expectations/Predictions– Larger degree nodes are more central (betweenness).
Consider LAX, SFO, HOU, JFK, etc.
– Small world as compared to WAN, ANI, and ANC
– Scale free power law distribution
– Dissociate
Outline• The problem and its importance
• Missing Pieces
• Related works
• Methodology– Data set– Network Generation– Network Analysis
• Conclusion
Conclusion
The United States air transportation network analysis
• The problem and its importance
• Missing Pieces
• Related works – WAN, ANI, ANC
• Methodology Data set : BTS : Bureau of Transportation Statistics Network Generation : Directed and Undirected network input files Network Analysis :
Degree distribution Small world network as compared to WAN, ANI, and ANC Cumulative degree distribution and power law Resilience Associativity z-score – TBD?
References for this presentation1. T-100 reporting guide, RITA, http://www.rita.dot.gov/, www.transtats.bts.gov,
http://www.bts.gov/programs/airline_information/.2. Pajak, program for large network analysis,
http://vlado.fmf.uni-lj.si/pub/networks/pajek/.3. Albert-Laszlo Barabasi and Reka Albert, “Emergence of Scaling in Random
Networks”, Department of Physics, University of Notre-Dame, October, 1999.4. CTools, https://ctools.umich.edu/portal.5. GUESS, graph exploration system, http://graphexploration.cond.org/.6. Matlab, The language of technical computing, http
://www.mathworks.com/products/matlab/index.html