UIR Alert Agent : An alert system for identifying suspicious web-site browsing leading to unintended...
-
date post
21-Dec-2015 -
Category
Documents
-
view
218 -
download
2
Transcript of UIR Alert Agent : An alert system for identifying suspicious web-site browsing leading to unintended...
UIR Alert Agent : An alert system for identifying suspicious web-site browsing leading to unintended information revelation(UIR)
Rohini K. SrihariState University of New York at Buffalo
May 6, 2003
FAA Workshop May 2003 2
Tracking suspicious web browsing
Should we let him see it? Should we monitor his next moves?
What Information has the user obtained till now? What was inferred from the visited pages? What additional information can they infer with this new web-
page? Did we intend to reveal this information? Should we be alerted if this is unintended?
User has visited these pageshttp://www.faa.gov/apa/safer_skies/fsstats.htmhttp://www.faa.gov/certification/aircraft/sfar88/01hstry2.pps
User is requesting http://www.awp.faa.gov/fsdo/docs/spm_info/what/fy2000/sdplan00.doc
Measuring Unintended Information Revelation(UIR) for visited and requested pages will answer these questions
FAA Workshop May 2003 3
Outline
Unintended Information Revelation Problem Definition
Solutions with Existing Technology Proposed Solution
UIR System Architecture Extracting Concepts and Associations Creating Concept Chain Graphs (CCG) Mining and visualization of CCGs
Evaluation Methodology Preliminary Results Summary
FAA Workshop May 2003 4
User’s previous request
Important Conceptssafer skies, fatal accidents,
runway incursions, hijack, etc.
Interesting InformationNumber and percentage of
Fatal Accidents in 1996 Runway Incursions Ice/Snow In-Flight fire
Fact Sheet: Aviation Accident Statistics http://www.faa.gov/apa/safer_skies/fsstats.htm
FAA Workshop May 2003 5
User’s current request
Fuel tank ignition eventshttp://www.faa.gov/certification/aircraft/sfar88/01hstry2.pps
Important Conceptsfatalities, fuel tank ignition,
hull loss, electrostatics, etc.
Interesting InformationIdentifies causes for fuel
tank ignition accidents
Small bomb Faulty Wiring Pump Faults
FAA Workshop May 2003 6
Synthesized Information
In-flight fire can cause accidentsFuel-tank ignitions caused by small bombs, faulty pumps/wirings, etc.Domain Knowledge: In-flight fires and fuel-tank ignitions are aviation
hazards. Inference: faulty wirings can cause in-flight fires
FAA Workshop May 2003 7
1
4
12
4
6
2
UIR Alert AgentUIR is a phenomenon where information synthesized from multiple documents is more than the information provided by the sum of the individual documents Generate alerts for unintended information revelation based on user’s browsing history and requested pages
1
1
9
3
7
11
UIRAlertAgent
A
C
B
Alert Generated on User B
Alerts Log
User Browsing History
Information Extraction
UIR
2 3
1110
7
412
1 10
Pre-existing Domain Ontology/Lexicon
(e.g Aviation Ontology)
85
21 4
7
3
6
9 1110 12
DocumentCollection
(web pages)
Concept Chain Graphs (CCG)
Input: User surfing web pages on sites of interest to national security
Document subset
Accident-hazard-fuel tank -…
ice/snow-hazard-fatalities-…
CCG instantiated for subset of interest
UIR Alert
Module
User alerts / logs
Output: web pages that reveal too much information; human monitor can visualize paths in CCG
Architecture of UIR System
FAA Workshop May 2003 9
Proposed SolutionStep 1: Determine significant concepts and associations
intarget domain (offline, semi-automatic)
use of existing ontologies such as DAML ontology on aviation use of information extraction to automatically extract concepts
and associations from representative document collection
Step 2: Create Concept Chain Graph (CCG) consolidates underlying domain knowledge, specific documents weights concepts and associations using both domain weights,
individual document weights
Step 3: Visualization and text mining operations on CCGStep 4: UIR Alert agent invoked
tracking user surfing patterns what-if scenarios
FAA Workshop May 2003 10
Evaluation Methodology
TREC Query:
find pages that discuss ways of causing air disasters
TREC Narrative:
Pages that are relevant to causing air disasters will mention aircraft maintenance operations or passenger screening procedures
Ranked web pages
Relevant web pages
Evaluate ability to generate narrative
Evaluate precision and recall of IR system
Typical IR evaluation
UIR Evaluation
IR system
includes query expansion
UIR System
CCG
FAA Workshop May 2003 11
Step 1: Extracting Concepts and AssociationsExtracting Concepts: Use InfoXtract engine from Cymfony Named Entity Tagger (NE) identifies common Entities like Date,
Time, Location, State, Country, Organization, Person. InfoXtract also identifies significant noun groups, verb groups
e.g. fuel tanker, runway de-icing
Extracting Associations: Concept Co-occurrence in documents Concept Proximity in sentences/paragraphs
Advanced Techniques using machine learning
… The designation for one end of the runway should be used on the sign only when the taxiway intersects the beginning of that runway. Taxiways that intersect the runway at intermediate points must have the designations for both runway ends. ...
Association Learning
(runway, taxiway): 0.85
Output implies: System has 85% confidence that runway and taxiway associated by some relation.
FAA Workshop May 2003 12
Sample Information Extraction output
DATE: October 23, 1992 NO. 92-03
TO: AIRPORT CERTIFICATION PROGRAM INSPECTORS
TOPIC: Effects Of Type II Deicing Fluid On Runway Friction
The FAA's Technical Center in conjunction with the Port Authority of New York and New Jersey conducted tests to determine the effects of Type II aircraft deicing fluids on runway friction. The tests were conducted this past July and August at La Guardia and John F. Kennedy International Airports on grooved asphaltic pavement. Since the tests were conducted in the summer no attempt was made to simulate ice or snow on the pavement surface. (See future test programs.) Two specially instrumented B-727's and two Saab friction devices were used to measure the runway friction.
The purpose of this effort was to test the premise that Type II deicing fluid deposited on a runway poses a hazard to aircraft landing on the runway. At the present time it is unknown to what extent Type II actually falls off a departing aircraft and what portion of it is deposited on the runway. (See future test programs.)
Concepts and Named Entities are marked up during information extraction
FAA Workshop May 2003 13
Step 2: Create Concept Chain Graph Create concept chain graph based on underlying domain knowledge
(concepts, associations). Weight concept nodes based on frequency, type, user-defined importance weight associations based on proximity, importance of concepts they link,
uniqueness Project/Map documents viewed by user onto CCG
A document is represented as a probabilistic sub-graph in the CCG Proximity and other metrics are used to assign weights on the concepts(nodes) and
associations(edges) discovered in a document
Aviation Ontology
1
0.124
0.2324
0.54
0.013
0.101
0.123
0. 239
0. 088
0. 1065
0. 0
1
Document-specific concepts, associations, with weights
FAA Workshop May 2003 14
Associations inDocumentDomain Knowledge
Fuel tank ignition events
Accident Statistics
Step 2: Instantiated Concept Chain Graph
ACCIDENT
Ice/snow
Windshear
HAZARD
In-flight fire
Air_traffic__control_tower
Runway Incursions
AVIATION
AIRPLANE
Statistics
Fuel Tank
Fuel Tank Ignition events
hull losses
Fatalities
Lightning
Wiring
Pumps
Small Bomb
FAA Workshop May 2003 15
Step 3: Mining the CCG Goals
detecting information-rich concept chains e.g. air disaster - onboard explosion - fuel tanker
quantifying information revealed issue alerts when too much information is revealed “what-if” scenarios to enable dissemination of benign
information
Graph traversal generate CCG representing documents viewed by user start with explicit query/search terms as seed concepts;
could be multiple terms strategies:
try to find best paths/chains that connect “seed” concepts; could generate multiple chains
try to find best subgraph various graph traversal algorithms are suitable
FAA Workshop May 2003 16
Graph Traversal Techniques minimum cover techniques
INSTANCE: Graph G = {V, E}
SOLUTION: A vertex cover for G, i.e., a subset V’ V such that, for each edge (u,v) E, at least one of u and v belongs to V'.
MEASURE: Cardinality of the vertex cover, i.e., |V’ |.
Flow networks given a network (G,s,t,c) where G = (V,E) is a directed graph
with n vertices and m edges, s and t are two vertices (source and sink), and c: E-> R+ is a function that defines capacities of edges
find maximum flow from s to t that satisfies capacity constraints
Energy minimization (used in image processing) active contours (e.g. snakes) used for tracking various shapes,
including road detection
dynamic programming solutions available
FAA Workshop May 2003 17
Step 4: Track user surfing with UIR module
ACCIDENT
Ice/snow
Windshear
HAZARD
In-flight fire
Air_traffic__control_tower
Runway Incursions
AVIATION
AIRPLANE
Statistics
Fuel Tank
Fuel Tank Ignition events
hull losses
Fatalities
Lightning
Wiring
Pumps
Small Bomb
UIR module determines that these two documents reveal new association between wiring and accidents.
Previously viewed page(s)
requested page
FAA Workshop May 2003 18
Preliminary Experiments
FAA Workshop May 2003 19
Summary
Benefits to FAA Automated monitoring information acquired by users of the FAA
website and alert mechanism for unintentionally revealed information.
Shortlist and identify documents and concepts seen by the user that reveal unintended information
Domain map visualization tool facilitates concept and association based queries
Claims new, richer representation for information retrieval that
combines keyword statistics (bag-of-words model) with NLP-based information extraction
Solution is general to any domain; only domain map needs to be customized/retrained
Experts can intervene, guide the process, if desired; tools provided