Copyright© 2003 Avaya Inc. All rights reserved Avaya Interactive Dashboard (AID): An Interactive...
-
date post
19-Dec-2015 -
Category
Documents
-
view
218 -
download
0
Transcript of Copyright© 2003 Avaya Inc. All rights reserved Avaya Interactive Dashboard (AID): An Interactive...
Copyright© 2003 Avaya Inc. All rights reserved
Avaya Interactive Dashboard (AID): An Interactive Tool for Mining Avaya Problem
Ticket Database
Ziyang WangDepartment of Computer Science
New York University
Amit BaggaAvaya Labs Research
Presenter: Amit Bagga
2Copyright© 2003 Avaya Inc. All rights reserved
Outline of the presentation
• Motivations and goals
• Application overview
• Functionalities
• Architecture
• Algorithms
• Implementation features
• Demo
3Copyright© 2003 Avaya Inc. All rights reserved
Motivations and Goals
• Motivations– Current data mining techniques do not look at unstructured
text information in large database.
– NSM and service engineers currently manually scan text fields to identify, track and classify problems across customers, locations and products.
• Goals– Develop interface that helps automatic text analysis done by
NSM and service engineers.
– Provide advanced functionality to help them quickly and conveniently verify intuitions about problems.
4Copyright© 2003 Avaya Inc. All rights reserved
Overview: what we develop
• Interactive Dashboard– A tool using techniques of search engine and data mining
– Find similar problems
– Identify sub problems
– Trace similar problems of certain customer and product
5Copyright© 2003 Avaya Inc. All rights reserved
Overview: data source
• Maestro database
– Huge amount of information
– High dynamics
• The database maintained by Patrick Tendick
– A subset of Maestro database
– 1 million records, 600 thousand tickets
– TOOS (Totally Out Of Service) cases: high severity
• Unstructured data as pure text
– Ticket description
– Resolution description.
• Structured data in relational database
– Case ID, timestamp, product, customer, location, etc.
6Copyright© 2003 Avaya Inc. All rights reserved
Overview: algorithms and implementation
• Interactive Dashboard
– Programming languages: Java, Perl, C
– Service model: sockets, client/server model
– Database management: Oracle, JDBC
– Relevance metric: TF*IDF
– Clustering: hierarchical clustering
– Web interface: Perl, CGI
7Copyright© 2003 Avaya Inc. All rights reserved
Functionalities: Major ones
• Search relevant tickets– Help to find similar problems
– Relevance score: the similarity of unstructured text data.
– Search constrains: product name, customer code, time and severity of tickets.
– Top level summary: ticket case ID, relevance score, ticket description.
• Cluster relevant tickets– Group similar tickets into clusters
– Help to identify sub problems
– Keyword expansion
– Adaptive online search
8Copyright© 2003 Avaya Inc. All rights reserved
Functionalities: Supporting ones
• Categorize a set of tickets– Categorized by product name, customer name, and location name
– Provide a high level summary
– Discover similar problems of certain or different customers, products
• Retrieve detailed ticket information– Complete product/customer/location information, ticket resolution
note, etc.
9Copyright© 2003 Avaya Inc. All rights reserved
Functionalities (cont.)
• Accessibility
Web portal
Relevant TicketsCategorized
Set
ClusteredRelevant Tickets
TicketInformation
10Copyright© 2003 Avaya Inc. All rights reserved
An example
PLAT Csr cld to report trouble on Paging System, TOOS. No overhead music, no MOH. DPO tech SEV 4 dispatch to diagnose.
PLAT Csr cld to report trouble on Paging System, TOOS. No overhead music, no MOH. DPO tech SEV 4 dispatch to diagnose.
PLAT Csr cld to report paging system TOOS, no overhead music at all. System has been reset by csr (power confirmed). DPO tech SEV 4 to diagnose.
PLAT Csr cld to report trouble with overhead music, TOOS. Paging appears OK, but they cannot get music output. DPO tech SEV 4 dispatch to check volume levels.
……
paging plat dpotech music csroverhead custcheck report
tech dpo platcsr access assist
speakers overheaddiagnose x15255
paging plat dpo tech sev power csr diagnose
overhead carrier
……
11Copyright© 2003 Avaya Inc. All rights reserved
Interactive Dashboard
Architecture: Main Frame
• Main frame: application server infrastructure
– 3-tier server architecture
– Integrated central server: service provider and server logic organizer
DatabaseWeb
InterfaceIntegrated
Central ServerCGI JDBC
12Copyright© 2003 Avaya Inc. All rights reserved
Architecture: Integrated Central Server
Integrated Central Server
Server Socket Module
Query Engine
Database module
Text Analysis Module
Response Module
Incoming requests
Output results
Database
13Copyright© 2003 Avaya Inc. All rights reserved
Architecture: Text Analysis Module
Text Analysis Module
Database module Database
Stop wordsText Filter
Structured data fields
Data module Functional module
Clustering
Dictionary
Relevance Evaluator
Keywords/Sample
Unstructured data
TFIDF Module
TopRelevantTickets
Response Module
Output Manager Document Frequency
Categorizing
14Copyright© 2003 Avaya Inc. All rights reserved
Algorithm: TFIDF
• TFIDF: a similarity metric for text data
– Text document view: a bag of words.
– Document representation: a vector .
– The similarity of two documents is the normalized inner product of two vectors (the cosine of two vector).
,....),( 21 iii wwD
kikkikik DF
NTFIDFTFw 0log)()()(
22jkik
jkik
ji
iiij
ww
ww
DD
DDSimilarity
15Copyright© 2003 Avaya Inc. All rights reserved
Algorithm: TFIDF (cont.)
• Issues– Document frequency
• Global vs. local
• Vocabulary: 10,708 terms after text filtering
• Solution: offline scan of database
– Term frequency
• Online scan of ticket description
• Text filtering
– Computing the similarity of ticket description
• Searching relevant tickets: 1-to-N similarity
• Clustering: N-to-N similarity
• The TFIDF modular: two different versions.
16Copyright© 2003 Avaya Inc. All rights reserved
Algorithm: hierarchical clustering
• Clustering– A data mining technique to find data
aggregates in multi-dimensional space.
– Data representation
• Each data item has many different attributes
• Each attribute is a dimension in vector space.
• Each data item is a vector whose elements are values of attributes.
17Copyright© 2003 Avaya Inc. All rights reserved
Algorithm: hierarchical clustering (cont.)
• Hierarchical clustering– Similarity metric of data vector: TFIDF, Euclidean
– Hierarchical clustering
• Step-by-step bottom-up cluster merging
• Merging criteria: complete linkage
• Cost: N-square performance
18Copyright© 2003 Avaya Inc. All rights reserved
Implementation: features
• Abstract database SQL manager for parallel requests– Mapping parallel requests to single database connection:
• Loading database driver and authentication are done only once.
• Reducing the slow start of database connection.
• Using multiple JDBC SQL statements over one database connection can schedule data transmission “looks like” parallel retrieval.
– Stateful abstract database connection manager
• Unified error message processor– Exception catching and re-throwning
– Goodness
• Format error message as HTML text
• Secure database connection status to be consistent
19Copyright© 2003 Avaya Inc. All rights reserved
Implementation: features (cont.)
• Multiple system-dependent process interaction through java runtime
– Kernel clustering modular is written in C
• High performance for numerical computation
• Unix/Linux OS required
– Communication of processes
– I/O redirection
• Extensibility– Search space
– Localized index engine
20Copyright© 2003 Avaya Inc. All rights reserved
Demo
• Web entry– http://amit-pc.woods.avayalabs.com/xui/xui-web/xui.shtm
21Copyright© 2003 Avaya Inc. All rights reserved
Future directions
• Search precision– Refine algorithms of relevance computation
– Refine algorithms of clustering
– Text filtering
• Search performance– Database organization
– Java primitive functions
• Automatic classification of root cause of problems– Machine learning approach
• Scalability
• Adaptive search– Users’ feedback