L.A.S.I.
description
Transcript of L.A.S.I.
L.A.S.I.
Feasibility PresentationPresented by: CS410 Red Group
November 12, 2012
Linguistic Analysis for Subject Identification
2
•Team Red Staff Chart• Introduction•Societal Problem•Case Study•Proposed Solution•Major Component Diagram•Algorithm•The Competition•Risk•Conclusion
Outline
November 12, 2012 410 Red Group
3
Team Red Staff Chart
Scott MinterProject Co Leader
Software Specialist
Brittany
JohnsonProject Co Leader
Documentation Specialist
Dustin PatrickAlgorithm Specialist
Expert Liaison
Richard OwensDocumentation SpecialistCommunication Specialist
Aluan HaddadAlgorithm Specialist Software Specialist
Erik RogersMarketing Specialist
GUI Developer
November 12, 2012 410 Red Group
4
What is a theme?
November 12, 2012 410 Red Group
5
A specific and distinctive quality, characteristic, or concern.1
1“Theme” Merriam Webster
November 12, 2012 410 Red Group
6
What are you looking for when you are identifying a theme?
November 12, 2012 410 Red Group
7
•Who•What•When•Where•Why•How
5 W’s & 1 H
November 12, 2012 410 Red Group
8
Bill’s stove was broken. He has been saying for months that he would go to the appliance store to buy a new one. He had some free time yesterday, so he drove to the store to buy a new stove.
410 Red Group
November 12, 2012
9
Who Bill
What He travelled to some place
When Yesterday
Where
The store
Why To buy a stove because his broke
How By driving
410 Red Group
November 12, 2012
10
Bill drove to the store yesterday to buy a new stove because his broke.
410 Red Group
November 12, 2012
The Theme from the 5 W’s & 1 H
11
Why are themes important?
•Comprehension
•Summarization
•Assists in communication between people
November 12, 2012 410 Red Group
12
Societal Problem
It is difficult for people to identify a common theme over a large set of
documents in a timely, consistent, and objective manner.
November 12, 2012 410 Red Group
13
How long does it take?
•Finding a theme over multiple documents is a time-consuming process.
•The average reading speed of an adult is 250 words per minute.2
2Thomas "What Is the Average Reading Speed and the Best Rate of Reading?"
November 12, 2012 410 Red Group
14
Consistency and Objectivity
•The criteria for evaluation may vary from person to person.
•Large quantities of documents must be mentally digested, assessed, and interrelated.
November 12, 2012 410 Red Group
15
Dr. Patrick Hester
“My research interests include multi-objective decision making under
uncertainty, probabilistic and non probabilistic uncertainty analysis,
critical infrastructure protection, and decision making using modeling and
simulation.” 3
- Dr. Hester
Ph. D. from Vanderbilt University, 2007Major: Risk and Reliability Engineering and Management
3Patrick Hester Website
November 12, 2012 410 Red Group
16
•Dr. Hester is a systems analyst and researcher▫He Must
Conduct extensive research
Quickly become familiar with client systems
Formulate concise, objective assessments
•LASI will help with all of this
410 Red Group
November 12, 2012
17
Assessment Improvement Design (A.I.D.)
•Preliminary Problem statement Identified from document
•Problem statement then used to find Critical Operational Issues (COI’s)
•COIs used to find Measures of Effectiveness (MOE’s)
•MOE’s used to find Measures of Performance (MOP’s)
November 12, 2012 410 Red Group
18
Customer Contact
Situational Awareness Meeting
Will NCSOSE
be needed?
Client Goes Elsewhere
no
yes Document Gathering Process
Document Analysis
Is Custome
r satisfied
?no
Problem Statement
Presentation
yes
Current MethodContinue on to the rest of the A.I.D Process
November 12, 2012 410 Red Group
19
LASI: Linguistic Analysis for Subject Identification
THEMESLASI
November 12, 2012 410 Red Group
20
Our Proposed Solution
•LASI is a linguistic analysis decision support tool used to help determine a common theme across multiple documents. It is our goal with LASI to:▫accurately find themes▫be system efficient▫provide consistent results
November 12, 2012 410 Red Group
21
What do we mean by “linguistic analysis”?
The contextual study of written works and how the words combine to form an overall
meaning.
November 12, 2012 410 Red Group
Linguistic analysis involves
Syntactic Semantic
• Logical grammar• Statistical Data
• Alphabetical Frequencies
• Word Counts• Parts of Speech
• Word Dependencies
• Relating syntactic structures to language-independent meanings
• Extracting meaning and conceptional arguments
• Summarization
22
November 12, 2012 410 Red Group
23
The Wills and Will Nots of LASI
What LASI Will Do What LASI Will Not Do
• Analyze multiple documents to find common themes
• Provide statistical data to help a user make a decision
• Provide a concise synopsis
• Provide a single theme
November 12, 2012 410 Red Group
24
Who Would This Appeal To?
•Researchers
•Consultants
•Academics
•Students
November 12, 2012 410 Red Group
25
Benefits To The Customer
•Time saving
•Objective output
•Consistent output
•Cost saving solution
November 12, 2012 410 Red Group
26
How does LASI fit into our Case Study?
November 12, 2012 410 Red Group
27
Customer Contact
Situational Awareness Meeting
Will NCSOSE
be needed?
Client Goes Elsewhere
no
yes Document Gathering Process
Document Analysis
Is the Custome
r satisfied
?no
Problem Statement
Presentation
yes
Before LASINovember 12, 2012
Continue on to the rest of the A.I.D Process
410 Red Group
28
Customer Contact
Situational Awareness Meeting
Will NCSOSE
be needed?
Client Goes Elsewhere
no
yes Document Gathering Process
LASI Aided Document Analysis
Is the Custome
r satisfied
?no
Problem Statement
Presentation
yes
After LASINovember 12, 2012
Continue on to the rest of the A.I.D Process
410 Red Group
29
Major Functional Components
User Interface:- Multi-Level Views- Weighted Phrase List- Detailed Breakdown - Step by Step Justification
Software
High End Notebook PC- Computation Quad-Core CPU- Primary Memory 8.0 GB DDR3 RAM- Document Storage Solid State Storage~$1500 USD
Algorithm:Extrapolates the most likely congruence of themes and ideas across all documents in the input domain
Hardware
November 12, 2012 410 Red Group
30
Linguistic Analysis Algorithm
Secondary Analysis:
Associative Identification
Bind Pronouns to Nouns, Updating
Frequency
Identify Potential Noun Phrases
Bind Adjectives to Nouns
Primary Analysis:Word Count and
Syntactic Assessment
Identify Corresponding Parts
of Speech
Determine Frequency by
Grammatical Role
Traverse Document in Word-Wise
Manner
Tertiary Analysis:Semantic
Relationship Assessment
Identify Potential Synonyms
Assess Potential Subject-Object-Verb
Relationships
Output List of Weighted Themes
November 12, 2012 410 Red Group
31
The Competition
November 12, 2012 410 Red Group
32
The Competition
November 12, 2012 410 Red Group
33
WordStatNovember 12, 2012 410 Red
Group
34
Stanford CoreNLPNovember 12, 2012 410 Red
Group
35
ReadMeNovember 12, 2012 410 Red
Group
36
AutomapNovember 12, 2012 410 Red
Group
37
Risk Matrix
Customer RisksC1 -- Product Interest C2 -- Maintenance C3 -- Trust Technical RisksT1 -- System LimitationsT2 -- Scanned Text RecognitionT3 -- Jargon RecognitionT4 – Illegal Character Handling
November 12, 2012 410 Red Group
38
Customer Risks
C1. Product Interest Probability 2 Impact 4
Mitigation: LASI offers unique functionality and user friendliness.
C2. MaintenanceProbability 3 Impact 2
Mitigation: LASI will be a free, open source application allowing the community to maintain and extend it over time.
C3. TrustProbability 3Impact 3
Mitigation: LASI will provide a step by step breakdown of output analysis and algorithm reasoning
November 12, 2012 410 Red Group
39
Technical Risks
T1. System LimitationsProbability 4 Impact 2
Mitigation: LASI will be designed from the ground up in native C++ for memory and CPU efficient code.
T2. Scanned Text RecognitionProbability 4 Impact 3
Mitigation: LASI will implement an optical character recognition algorithm to handle scanned text
November 12, 2012 410 Red Group
40
Technical Risks
T3. Jargon RecognitionProbability 3 Impact 2
Mitigation: LASI will have domain specific dictionaries and feature intuitive contextual inference.
T4. Illegal Character HandlingProbability 4 Impact 2
Mitigation: LASI will providers contextual inference, synonym recognition and statistical methods
November 12, 2012 410 Red Group
41
•LASI is feasible.•LASI is a decision support tool not a
decision making tool.•Implications of success affect a wide area
of study and professions.•In order for LASI to succeed the output
needs to immediately usable and the interface user-friendly.
Conclusion
November 12, 2012 410 Red Group
42
References
1. "Theme." Def. 1b. Merriam Webster. N.p., n.d. Web. 19 Oct. 2012. <http://www.merriam-webster.com/dictionary/theme >.
2. Thomas, Mark. "What Is the Average Reading Speed and the Best Rate of Reading?" What Is the Average Reading Speed and the Best Rate of Reading? Web. 19 Oct. 2012. <http://www.healthguidance.org/entry/13263/1/What-Is-the-Average-
Reading-Speed-and-the-Best-Rate-of-Reading.html>.3. “Patrick Hester" Old Dominion University. N.p., n.d. Web. 24 Sept. 2012
<http://www.odu.edu/directory/people/p/pthester>.Stanislaw Osinski, Dawid Weiss. 13 August, 2012 . Carrot 2. 9/25/2012
<http://project.carrot2.org>.”WordStat” Provalis Research. Web. 24 Sept. 2012.
<http://provalisresearch.com/products/content-analysis-software/>.“ReadMe: Software for Automated Content Analysis” Web. 24 Sept. 2012.
<http://gking.harvard.edu/node/4520/rbuild_documentation/readme.pdf>
"AlchemyAPI Overview." AlchemyAPI. N.p., n.d. Web. 19 Oct. 2012. <http://www.alchemyapi.com/api/>.
"AutoMap:." Project. N.p., n.d. Web. 19 Oct. 2012. <http://www.casos.cs.cmu.edu/projects/automap/>.
"CL Research Home Page." CL Research Home Page. N.p., n.d. Web. 19 Oct. 2012. <http://www.clres.com/>.
November 12, 2012 410 Red Group