Tutorial Datasys 2020 - iaria.org
Transcript of Tutorial Datasys 2020 - iaria.org
![Page 1: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/1.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
1
hs-mittweida.de
Michael Spranger
Text Mining as a Tool in Repressive and Preventive Investigation Process
Tutorial Datasys 2020
hs-mittweida.de
![Page 2: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/2.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
2
Criminalistic Cycle
Suspicion
Analyzing
data
Forming
hypotheses
Specify
program
Retrieve
missing data
Current
mainly manual
Forensic
(textual) information management
Specialized algorithms:
β’ Text mining
β’ Information extraction
β’ Knowledge representation
Co
mp
ute
r
Sci
en
ce
![Page 3: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/3.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
3
Real World Case
Prosecutor
General's Office
Hamburg
Investigation for support
of a terrorist group 5,093 messages / 381 chats
27,578 messages / 640 chats
9,735 messages / 39 Chats
29,823 messages / 351 chats
13,665 messages / 41 chats
323 messages / 293 chats
7,986 messages / 794 chats
132,640 messages / 1432 chats
weeks minutes
total: 226,843 messages in 3,971 chats to analyze
![Page 4: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/4.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
4
Is text mining well researched in this domain?
β’ well-formed english texts
β’ closed, limited domains
retrieval/filtering of relevant documents
obtaining (forensic) information
visualizing (forensic) information
Challenges
β’ rich of slang
β’ little context
β’ socio-economically shaped
β’ heterogeneous
β’ hidden semantics
β’ language-economically
eroded
β’ non-english forensic texts
β’ interdisciplinary domains
TEX
TM
ININ
G KNOWLEDGE
![Page 5: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/5.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
5
Adding a knowledge model (investigative knowledge, legal norms) to text mining processes leads to comparable quality in the interdisciplinary and cross-lingual domain of forensic texts.
Hypotheses
![Page 6: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/6.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
6
Forensisc Knowledge Representationas Central Element
β’ investigative know-
ledge/experience
β’ legal norms
Mo
NA
So
NA
Sem
an
TA
KNOWLEDGE
Forensic Topic Map
![Page 7: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/7.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
7
ForensicsAnalysis of mobile communication
![Page 8: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/8.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
8
Universal Approach
β’ In extremely erroneous, slang and low-context texts, such as SMS, forensic information can only be
detected with high reliability by incorporating investigator knowledge.
β’ An error margin can be determined.
KNOWLEDGE
VisualizationClassification
reduction of complexity
![Page 9: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/9.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
9
Methods
β’ Semi-supervised approaches
β’ probabilistic language models (unigram, char-n-gram) + rules
β’ performance β poor
β’ Difference Analysis β mainly individual spellings
β’ phonetic algorithms (e.g., KΓΆlner Phonetik, Double Metaphone)
low precision
low sensitivity
Sta
te-o
f-th
e-A
rt
Meth
od
s
![Page 10: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/10.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
10
Search space reduction and
conservative word matching result in
high sensitivity with acceptable
precision.
Hypothesis Positive side effect:
Conservation of the context β
Increase of comprehensibility
![Page 11: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/11.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
11
Method for Detection of Conversations
π΅π‘ = π΅0πβππ‘
Assumption
{ππππ‘, π΅πππ‘} = argminπ,π΅0
π‘
π΅π‘ππππ β π΅π‘
πππ‘
Best Fit (Regression)
πΉ π΅ = 0π‘π π΅0π
βππ‘ππ‘ = 1 β π
Determining Cut-Off
Frequency of response time of all messages Frequency of response times of relevant messages
Clustering
![Page 12: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/12.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
12
Entire Process
KNOWLEDGE
β’ semantic relations
β’ pattern (entities)
β’ Multi (cross)-linguality
![Page 13: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/13.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
13
Analysis platform for mobile communication(MoNA)
Reduction of the manual
effort by >> 70%
Interactive analysis allows
constant adaptation of the
criminalistic hypothesis
Cross-lingual through
parallel knowledge model
![Page 14: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/14.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
14
PreventionAnalysis of social networks
![Page 15: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/15.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
15
Is crime predictable using social networks and scientific methods?
Rioting in the wake of demonstrations, sporting events or as a result of political
dissatisfaction often becomes apparent in advance in the social media.
Terrorists often recruit their future assassins via social networks.Amok runners often
signal their readiness in social networks.
![Page 16: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/16.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
16
Rioters often announce themselves in social networks
![Page 17: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/17.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
17
By monitoring social networks,
damage events in the real world can
be predicted.
Hypothesis
![Page 18: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/18.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
18
Process model for hazard prediction
topic
analysis
π β Ξπππ π
sentiment
analysis
SΟ > π
profile
selection
ππΆ
Ξπππ π
long-term development forecast
trend
associated
profiles
|ππ|
opinion
leader
multipliers
ππΏ, ππ
Risiko-Bewertung
extraction
location
extraction
time
geo-
coding
ππππ π(π, ππ, |ππ|)
Visualisierung
KNOWLEDGEkurzfristiges Risiko
![Page 19: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/19.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
19
Prediction of events through sentiment analysis
Sentiment scores of the Facebook page of Pegida e.V.
real events
Date
nlΓΌ
cke
95% - prediction interval
Cooling phases often mark real events
β¦
![Page 20: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/20.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
20
Analysis Platform for Social Networks(SoNA)
Opinion leader detection
Evaluation of the risk
potential
Cross-lingual through
parallel knowledge model
![Page 21: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/21.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
21
ChallengesHuge amount of potential (hazardous)
profiles
Closed/Secret groups and bots
![Page 22: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/22.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
22
By transferring strategies of the
human immune system, threats in
social networks can be effectively
identified.
Hypothesis Strategies:
β’ pattern recognition
β’ adaptation
![Page 23: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/23.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
23
Agent-based analysis of social networks
π πππ = π
πππ’ππ‘(πΌπ , πππ)
Οππβππ πππ’ππ‘(πΌπ , ππ
π)+ (1 β π)
1
πΌππ
π=1
πΌππ
π€ππΌππ(ππ
π)
πΌπ΄ πππ = α
1, ππ π πππ > π
0, π πππ π‘
Scoring function
Activation function
Actors of an artificial immune system for social networks
![Page 24: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/24.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
24
An Artificial Immune System
Which profiles should be contacted ?
Which profile provides the most valuable information?
Activities in an artificial immune system for social networks (process view)
![Page 25: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/25.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
25
Opinion Leader
What exactly does that mean?
βOpinion leadership is the degree to which an individual is able to influence informally other
individualsβ attitudes or overt behavior in a desired way with relative frequency.β [Rogers, 1962, p. 331]
What makes an influencer?
Katz and Lazarsfeld 1957 defined the following features:
(1) personification of certain values,
(2) competence,
(3) strategic position in the social network (topology).
![Page 26: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/26.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
26
What does "influence" mean ?
Spreading information quickly Write something of importance
Meaning
β’ depending on a strategic position in the
network
β’ strategic position is mainly determined by
mentions (quotations)
β’ own activity is the most important factor β’ dependence on the topic
Are Social Bots Influencers? Change over time!
Approaches β’ topology-basedβ’ topology-based
β’ content-based
Methods
β’ network centrality measures, PageRank,
LeaderRank
β’ network centrality measures, PageRank,
LeaderRank, sentiment analysis, topic
mining
![Page 27: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/27.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
27
How does LeaderRank work?
β’ Users are nodes, directed edges connect
followers with leaders
β’ Random walk on this graph, starting with
π π 0 = 1, π π 0 = 0
π π π‘ + 1 =
π=1
π+1πππ
ππππ’π‘ π π(π‘)
β’ Finds nodes that spread information further
and faster.
![Page 28: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/28.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
28
Problems with LeaderRank
In networks with star topology :
β’ the network owner is highly centralized
β’ high centrality of a fraction of the nodes leads to a
strongly distorted LeaderRank distribution
β’ competence is not considered
β’ peripheral nodes are not adequately represented
Facebook network: βDie Linke" over 5 monthsβ’LeaderRank is not meaningful!
![Page 29: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/29.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
29
Using the normalized LeaderRank
skewness, star topologies can be
detected in network graphs.
Hypothesis
![Page 30: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/30.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
30
Detection of a Star-Shaped Topology
How can the degree of approximation to the star topology be quantified?
Normalized LeaderRank-Scewness ΰ·π:
Normalized LeaderRank skewness [0,1] shows
how strongly a network is distorted towards
the star topology.
β’ ΖΈπ for regular graphs = 0
β’ ΖΈπ for star-shaped graphs = 1
ππΏπ =1
π
π
π§(πΏπ π)3 ΖΈπ =
π β ππππ
ππππ₯ β ππππ
Normalized Graph-Entropy π―:
Normalized graph entropy quantifies the
uncertainty of a specific path of information
distribution.
β’ π» for regular graphs = 1
β’ π» for star-shaped graphs = 0
π» = β
π=1
πdeg(π£π)
Οπ=1π deg(π£π)
log2deg(π£π)
Οπ=1π deg(π£π)
![Page 31: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/31.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
31
Comparison of both measures
β’ 6 networks with star topology with fixed number of
nodes (N=16, 32, 64, 128, 256, 512)
β’ mutation over 100 generations towards a regular
graph
β’ In each generation, edges are randomly added or
removed between each pair of nodes
Experiment
Normalized Graph-Entropy
Normalized LeaderRank-Scewness
![Page 32: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/32.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
32
Test on real networks
βDie Linkeβ - FacebookβEpinionsβ - Network
The normalized LeaderRank skewness, as a function of network
regularity, enables a stable detection of star-shaped topologies.
norm. Graph-Entropy and norm. LeaderRank-Scewness in Comparison
(almost regular) (almost completely star-shaped)
![Page 33: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/33.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
33
The irregularity of a star graph can be
compensated by punishing high
activity with low mentioning.
Hypothesis
![Page 34: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/34.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
34
Way out : CompetenceRank
Variant of the LeaderRank
adapted to competence
πΆπ πΏπ =πΏπ (πΏπ)
1 +ππππ’π‘
ππ‘ππ‘ππππ’π‘ (πΏπ π‘ππ‘ππ = π)
πΆπ πΏπ =πΏπ (πΏπ)
1 +π·ππ·π
=1
2πΏπ (πΏπ)
In regular graphs :
ππππ’π‘ = ππ
ππ’π‘ = π·β (π£π , π£π)
Assumption:
πΏπ πΏπ = πΆπ (πΏπ)πΆπ πΏπ = 2
πΏπ (πΏπ)
1 +ππππ’π‘
ππ‘ππ‘ππππ’π‘ π
π=1
π
πΆπ πΏπ β πΏπ (πΏπ)Cumulative discrepancy is a
function of network regularity
![Page 35: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/35.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
35
An Artificial Immune System
Which profiles should be contacted ?
Profiles with a high CompetenceRank!
Activities in an artificial immune system for social networks (process view)
![Page 36: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/36.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
36
Conclusionβ Investigator knowledge helps to improve text mining in forensics
β MoNA is an analysis platform for mobile communication that incorporates this paradigm.
β Algorithm for conversation detection
β Rating algorithm with search space reduction and conservative word matching
β With SoNA, an analysis platform for social networks was created incorporating this
paradigm.
β Process for predicting potential hazardous events
β Model of an artificial immune system for social networks
β CompetenceRank as an improved measure of opinion leadership
![Page 37: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/37.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
37
Future Work
β’ Joint Semantic Analysis: joint analysis of media and text for mobile devices
β’ Incorporation of context data (CPLSA/NetPLSA)
β’ Time related analysis of messages β Prediction of cyclic recurring topics
β’ Evolution of topics
β’ Multilingual text analysis with minimal amount of training data
β’ approach through adaptation and expansion of Human Behaviour-based Optimization
![Page 38: Tutorial Datasys 2020 - iaria.org](https://reader036.fdocuments.in/reader036/viewer/2022062409/62ac69db5e022e064070de73/html5/thumbnails/38.jpg)
Text Mining as a Tool in Repressive and Preventive Investigation Process | Michael Spranger(C) 24.09.2020 University of Applied Sciences Mittweida
38
Questions?
Feel free to contact me: