110524 IAPA Social Networks
-
Upload
rohan-baxter -
Category
Documents
-
view
227 -
download
0
Transcript of 110524 IAPA Social Networks
-
8/2/2019 110524 IAPA Social Networks
1/16
Social Network Analysis forIntelligence and Analytics
Rohan Baxter
Corporate Analytics
Office of the Chief Knowledge Officer
Australian Taxation Office
24 May 2011
-
8/2/2019 110524 IAPA Social Networks
2/16
Social Network Mining 2
Overview
What is a Social Network?
Analytics vs Intelligence/Visualisation
A scalable network-finding algorithm
Four case studies (abstracted, de-identified)
A method for representing networks
Future work/lessons learnt
-
8/2/2019 110524 IAPA Social Networks
3/16
Social Network Mining 3
What is a Social Network
Network consists of nodes and links
Real Social Network: nodes are people and links
mean is a friend of
Node type examples: forms, companies,
individuals, various asset classes
Link type examples: share an attribute,
distributes to, is owner of
-
8/2/2019 110524 IAPA Social Networks
4/16
Social Network Mining 4
Social Network Analysis Uses
Issues around network data collection
Data Analysis
Low volume, manual Specific targets, Complex
Analysis
(for Intelligence or Audit work)
High volume, automated risk of assessment of
large populations of networks
(for Analytics for larger scale treatments)
Visualisation (usually low volume, but see later)
-
8/2/2019 110524 IAPA Social Networks
5/16
Social Network Mining 5
Social Network Risk Assessment
-
8/2/2019 110524 IAPA Social Networks
6/16
Social Network Mining 6
SQL Implementation
Consolidation of internal implementations in a
variety of systems and languages (e.g. Visual
Basic, Python, SAS, R, Netmap )
SQL implementation is about 45 lines, follows
Union-Find algorithm.
Demonstrated advantages: Scalable, Correct,
Concise
-
8/2/2019 110524 IAPA Social Networks
7/16
Social Network Mining 7
Algorithm has Fast Convergence
Network Dete ction Algorithm Convergen
1
10
100
1000
10000
100000
1000000
10000000
1 2 3 4 5 6 7 8 9
Numbe r of Iteration
Log(Number
of
Components
to
Merge)
-
8/2/2019 110524 IAPA Social Networks
8/16
Social Network Mining 8
Case Studies
Purpose No. of Nodes No. ofLinks
Filter low-risk entities from a
list of high risk entities
2,000 8,000
Using starting-list of identifiedhigh-risk entities and find
related unknown high riskentities
300-2,000
2,000-16,000
-
8/2/2019 110524 IAPA Social Networks
9/16
Social Network Mining 9
Case Studies
Purpose No. of nodes
No. ofLinks
Find non-agent tax
returns that appear tohave a common
guiding mind
1.2m
Networks ofinterest:
6,000
20m
Find high-risk entitynetworks involvingcompany structures
2m 18m
-
8/2/2019 110524 IAPA Social Networks
10/16
Social Network Mining 10
A High Risk Social Network
-
8/2/2019 110524 IAPA Social Networks
11/16
Social Network Mining 11
Network Representation in a database field
(coy -> ind)
(ptr -> ind)
(trt->ind unk)
(coy -> (coy -> ind))
-
8/2/2019 110524 IAPA Social Networks
12/16
Social Network Mining 12
Network Representation in a Database Field
-
8/2/2019 110524 IAPA Social Networks
13/16
Social Network Mining 13
Power laws: No. of Networks vs Network Size
1
10
100
1000
10000
0 20 40 60 80 100
Network Size (Trus ts-Beneficiarie
NumberofNetworks(
scale)
-
8/2/2019 110524 IAPA Social Networks
14/16
Social Network Mining 14
Risk Differentiation for Networks
-
8/2/2019 110524 IAPA Social Networks
15/16
Social Network Mining 15
Related Work
PWC Research Centre, San Jose, Ca: Used networkdetection algorithm to assist with financial accounts
audit, by highlight high-risk entries in a general ledger
Internal Revenue Service(IRS), US: Built database of
company networks and a graphical tool to query the
networks with known scheme structures
Detecting securities fraud based on network of
relationships between brokers
-
8/2/2019 110524 IAPA Social Networks
16/16
Social Network Mining 16
Lessons Learnt
Continued awareness of network -level risk
assessments in forward plan of risk assessment work
Using links has helped reduce false positive rate fordiscovering non-compliant entities
Network discovery is intuitive and reinvented atleast 6 times across organisation, but advantages in
corporate approach to get it scalable and correct