Detecting Blackhole and Volcano Patterns in Directed Networks
Transcript of Detecting Blackhole and Volcano Patterns in Directed Networks
![Page 1: Detecting Blackhole and Volcano Patterns in Directed Networks](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a2d1d61a28abe1338b6a72/html5/thumbnails/1.jpg)
Detecting Blackhole and Volcano Patterns in Directed Networks
Zhongmou Li1, Hui Xiong1, Yanchi Liu1,2, Aoying Zhou31MSIS Department, Rutgers Business School - Newark and New Brunswick
Rutgers, the State University of New Jersey, Newark, NJ 07102, [email protected], [email protected]
2School of Economics and Management, University of Science and Technology Beijing, Beijing, [email protected]
3Software Engineering Institute, East China Normal University, Shanghai, [email protected]
AbstractโIn this paper, we formulate a novel problem forfinding blackhole and volcano patterns in a large directedgraph. Specifically, a blackhole pattern is a group which ismade of a set of nodes in a way such that there are only inlinksto this group from the rest nodes in the graph. In contrast,a volcano pattern is a group which only has outlinks to therest nodes in the graph. Both patterns can be observed in realworld. For instance, in a trading network, a blackhole patternmay represent a group of traders who are manipulating themarket. In the paper, we first prove that the blackhole miningproblem is a dual problem of finding volcanoes. Therefore,we focus on finding the blackhole patterns. Along this line,we design two pruning schemes to guide the blackhole findingprocess. In the first pruning scheme, we strategically prunethe search space based on a set of pattern-size-independentpruning rules and develop an iBlackhole algorithm. Thesecond pruning scheme follows a divide-and-conquer strategyto further exploit the pruning results from the first pruningscheme. Indeed, a target directed graphs can be divided intoseveral disconnected subgraphs by the first pruning scheme,and thus the blackhole finding can be conducted in eachdisconnected subgraph rather than in a large graph. Basedon these two pruning schemes, we also develop an iBlackhole-DC algorithm. Finally, experimental results on real-world datashow that the iBlackhole-DC algorithm can be several ordersof magnitude faster than the iBlackhole algorithm, which hasa huge computational advantage over a brute-force method.
Keywords-blackhole pattern; volcano pattern; fraud detec-tion; graph mining; network model
I. INTRODUCTION
Financial institutions and government agencies, such asU.S. Securities and Exchange Commission (SEC), are facingsome daunting challenges in the field of financial frauddetection. The sophistication of criminalsโ tactics makesdetecting and preventing fraud difficult, especially as thenumber of trading accounts and the volume of transac-tions grow dramatically. Indeed, the trading networks arevulnerable to these fast-growing accounts and the volumeof transactions. Particularly, criminals know fraud detectionsystems are not good at correlating user behavior acrossmultiple trading accounts. This weakness opens the doorfor cross-account collaborative fraud, which is difficult todiscover, track and resolve because the activities of the
fraudsters usually appear to be normal trading activities. Forinstance, consider a trading network with a large number ofnodes and directed edges, a trader or a group of traderscan perform trading only within several accounts for thepurpose of manipulating the market. This kind of illegaltrading activities is widely known as trading ring.
In this paper, we study a special type of trading-ringpatterns, called blackhole and volcano patterns. Given adirected graph, a blackhole pattern is a group which is madeof a set of nodes in a way such that there are only inlinks tothis group from the rest nodes in the graph. In contrast,a volcano pattern is a group which only has outlinks tothe rest nodes in the graph. To the best of our knowledge,this is the first time to have the concepts of blackholeand volcano patterns in the directed graphs. In fact, bothblackhole and volcano patterns can be observed in real-world trading networks. For example, a blackhole patterncan represent a group of traders who are manipulating themarket by performing transactions on a specific stock amongthemselves for a specific time period. In other words, theoverall shares of the target stock in their trading accountscan only increase during this time period, while these tradershave produced a large volume of transactions on this stock.After the stock price goes up to a certain degree, thesetraders start selling off their shares to the public. In thisstage, these trading accounts form a volcano pattern whichonly has outlinks to the rest public accounts.
However, the process for finding blackhole and volcanopatterns can be computationally prohibited, since this is acombinatorial problem in nature. To address this challenge,we first prove that the blackhole pattern mining problem is adual problem of finding volcano patterns. Therefore, we canfocus on finding the blackhole patterns. Along this line, wedesign two pruning schemes to guide the blackhole patternmining process. In the first pruning scheme, we identifya set of pattern-size-independent pruning rules by studyingthe structural graph properties of blackhole patterns. Thesepruning rules can be used for pruning the search space nomatter the size of the patterns is. Based on the first pruningscheme, we design an iBlackhole algorithm for finding
2010 IEEE International Conference on Data Mining
1550-4786/10 $26.00 ยฉ 2010 IEEE
DOI 10.1109/ICDM.2010.37
294
![Page 2: Detecting Blackhole and Volcano Patterns in Directed Networks](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a2d1d61a28abe1338b6a72/html5/thumbnails/2.jpg)
blackhole patterns. In contrast, the second pruning schemefollows a divide-and-conquer strategy to further exploit thepruning results from the first pruning scheme. Specifically,because a target directed graph have been divided intoseveral disconnected subgraphs by the first pruning scheme,it becomes much more efficient to find blackhole patternsin each disconnected subgraphs rather than in a large graph.Based on these two pruning schemes, we develop an evenmore effective algorithm, named iBlackhole-DC, for miningblackhole patterns in directed graphs. Furthermore, we haveprovided the proof of the completeness and correctness ofboth iBlackhole and iBlackhole-DC algorithms.
Finally, experimental results on several real-world datasets are provided to show the pruning effect of two pruningschemes. As shown in experiments, the iBlackhole algorithmhas a huge computational advantage over the brute-forceapproach. Also, the iBlackhole-DC algorithm is severalorders of magnitude faster than the iBlackhole algorithm.Finally, we show the effectiveness of blackhole patterns forfinding some interesting stock movement patterns.
II. PRELIMINARIES
In this section, we introduce some basic concepts andnotations that will be used in this paper.
First, consider a directed graph ๐บ = (๐,๐ธ) [1], where ๐is the set of all nodes and ๐ธ is the set of all edges. Assumethat ๐บ has no self-loop and has no more than one edge fromone node to another. A directed edge ๐ in ๐บ is denoted as๐ = (๐ฅ, ๐ฆ), where ๐ฅ and ๐ฆ are nodes of ๐บ and an arc isdirected from ๐ฅ to ๐ฆ. Each edge ๐ has a positive weight,denoted as ๐๐, associated with this edge.
Definition 1 (in-weight / out-weight): Given a directedgraph ๐บ = (๐,๐ธ), ๐ต is a set of nodes and ๐ต โ ๐ .Let ๐ถ = ๐ โ ๐ต. The in-weight of ๐ต is defined as:๐๐๐(๐ต) =
โ๐=(๐ฅ,๐ฆ)โ๐ธ,๐ฅโ๐ถ,๐ฆโ๐ต ๐๐. Also, the definition of
the out-weight of B is: ๐๐๐ข๐ก(๐ต) =โ
๐=(๐ฅ,๐ฆ)โ๐ธ,๐ฅโ๐ต,๐ฆโ๐ถ ๐๐.
Figure 1 shows an example of the in-weight and out-weight of a set of nodes. The number associated with eachedge is the weight of that edge. In this figure, the in-weightof ๐ต is 6+5 = 11, while the out-weight is 3+3+1+2 = 9.
3
1
2
2
1
54
4
3
3
4
5
3
12
2
B
6
Figure 1. Illustration: in-weight and out-weight.
Next, we give the definition of blackhole and volcano ina directed graph as follows.
Definition 2 (blackhole): Given a directed graph ๐บ =(๐,๐ธ), we say that a set of nodes ๐ต โ ๐ form a blackhole,if and only if the following two conditions are satisfied: 1)โฃ๐ตโฃ โฅ 2, and the subgraph ๐บ(๐ต) induced by ๐ต is weaklyconnected, and 2) ๐๐๐(๐ต)โ ๐๐๐ข๐ก(๐ต) > ๐, where ๐ is a pre-defined positive threshold and is typically a very large value.
Definition 3 (volcano): Given a directed graph ๐บ =(๐,๐ธ), we say that a set of nodes ๐ ๐๐ โ ๐ form a volcano,if and only if the following two conditions are satisfied: 1)โฃ๐ ๐๐โฃ โฅ 2, and the subgraph ๐บ(๐ ๐๐) induced by ๐ ๐๐ isweakly connected, and 2) ๐๐๐ข๐ก(๐ ๐๐)โ๐๐๐(๐ ๐๐) > ๐, where๐ is a pre-defined positive threshold and is typically a verylarge value.
III. PROBLEM FORMULATION
In this section, we formulate the problems of detectingblackhole and volcano patterns in a directed graph.
A. A General Problem Formulation
Given a directed graph ๐บ = (๐,๐ธ), the goal of detectingblackhole patterns in ๐บ is to find out the blackhole set,denoted as ๐ต๐๐๐๐โ๐๐๐, such that, 1) for each element๐ต โ ๐ต๐๐๐๐โ๐๐๐, ๐ต โ ๐ and ๐ต satisfies the definitionof blackhole, and 2) for any other set of nodes ๐ถ โ ๐and ๐ถ โโ ๐ต๐๐๐๐โ๐๐๐, C does not satisfy the definition ofblackhole. The problem of detecting volcano patterns canbe formulated in a similar fashion.
Next, we show that the problem of detecting blackholepatterns is a dual problem of detecting volcano patterns.
Theorem 1: The problem of finding out the blackhole setin a directed graph is a dual problem of finding out thevolcano set in the same directed graph.
Proof: Consider a directed graph ๐บ = (๐,๐ธ). Let ๐บโฒ =(๐,๐ธโฒ) be the inverse graph of ๐บ, where all the nodes in ๐บโฒ
are the same as in ๐บ; while for each edge ๐ = (๐ฅ, ๐ฆ) โ ๐ธ,there is an edge ๐โฒ = (๐ฆ, ๐ฅ) โ ๐ธโฒ, and the weight associatedwith ๐โฒ are exactly the same as the weight associated with ๐.Therefore, the in-weight of a set of nodes ๐ต in ๐บ are exactlythe same as the out-weight of ๐ต in ๐บโฒ, and vice versa. If ๐ตis a blackhole in ๐บ, which means ๐๐๐(๐ต)โ ๐๐๐ข๐ก(๐ต) > ๐ in๐บ, then in ๐บโฒ, we have ๐๐๐ข๐ก(๐ต)โ๐๐๐(๐ต) > ๐. Therefore, ๐ตforms a volcano in ๐บโฒ. As a result, the problem of findingout the blackhole set in ๐บ is equivalent to the problem offinding out the volcano set in ๐บโฒ.
Now that we know the problem of detecting the blackholeset in the original directed graph is equivalent to the problemof detecting the volcano set in the inverse graph. Therefore,in the rest of this paper, we can focus on detecting blackholepatterns in a directed graph.
B. A Simplified Problem Formulation
The above general problem of detecting blackhole patternsis very complex. Instead, in this paper, we focus on a more
295
![Page 3: Detecting Blackhole and Volcano Patterns in Directed Networks](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a2d1d61a28abe1338b6a72/html5/thumbnails/3.jpg)
Black hole 1 B lackhole 2
Figure 2. An illustration of the simplified blackhole
practical version of this problem. Specifically, we exploittwo constraints to simplify the general problem as follows.1) The weights associated with all edges are all equal to 1.This constraint results that the in-weight of a node becomesthe in-degree and the out-weight becomes the out-degree; 2)Instead of considering the general version of a blackhole,which satisfies ๐๐๐(๐ต) โ ๐๐๐ข๐ก(๐ต) > ๐, we simplify the๐๐๐๐๐โ๐๐๐ definition with ๐๐๐ข๐ก(๐ต) = 0.
Figure 2 shows an example of the simplified blackholepatterns. In this figure, there are two blackhole patterns,which have been highlighted by dashed circles.
IV. ALGORITHM DESIGN
In this section, we introduce the algorithms for detectingblackhole patterns in a directed graph.
A. A Brute-Force Approach
First, we present a brute-force approach for finding black-hole patterns. From the definition, a set of nodes ๐ต โ ๐ isa simplified blackhole, if and only if: 1) โฃ๐ตโฃ โฅ 2, and thesubgraph ๐บ(๐ต) induced by ๐ต is weakly connected, and 2)๐๐๐ข๐ก(๐ต) = 0. Therefore, the intuition is really simple: allthe possible combinations of the nodes in ๐บ are checkedusing the exhaustive search method. For each combination๐ต, if the subgraph ๐บ(๐ต) induced by ๐ต is weakly connectedand ๐๐๐ข๐ก(๐ต) = 0, then ๐ต is a blackhole in ๐บ.
In real-world scenarios, it is typically computational pro-hibited to find all blackhole patterns, since the number ofcombinations of the nodes is exponentially increased asthe number of nodes increase. A practical way is to findblackhole patterns which include only limited number ofnodes. Here, we introduce a concept of ๐-๐๐๐๐ ๐๐๐๐๐โ๐๐๐.
Definition 4 (n-node blackhole): Given a directed graph๐บ = (๐,๐ธ), we say that a set of nodes ๐ต โ ๐ is an n-nodeblackhole, if and only if the following two conditions aresatisfied: 1) ๐ต is a blackhole in ๐บ, and 2) ๐ต โ ๐ (๐), where๐ (๐) is the set of all possible subsets containing ๐ nodesin V; that is, โฃ๐ตโฃ = ๐.
Figure 3 shows the pseudocode of the brute-force algo-rithm to detect 2 through n-node blackhole patterns in adirected graph ๐บ. Since we have considered all the possiblecombinations of nodes in ๐บ from 2 through n, this algorithmis complete. Also, since for each combination of nodes, wehave checked whether it satisfies the definition of blackhole,this algorithm is correct.
ALGORITHM BRUTE-FORCE(๐บ = (๐,๐ธ), ๐)Input:
๐บ: the directed graph๐ : the set of all nodes๐ธ: the set of all edges๐: max number of nodes each blackhole may contain
Output:๐ต๐๐๐๐โ๐๐๐: 2 to n-node blackhole set of ๐บ
1. ๐ต๐๐๐๐โ๐๐๐โ โ 2. for ๐โ 2 to ๐ do3. for each ๐ต โ ๐ (๐) do4. if ๐บ(๐ต) is weakly connected then5. if ๐๐๐ข๐ก(๐ต) == 0 then6. ๐ต๐๐๐๐โ๐๐๐โ ๐ต๐๐๐๐โ๐๐๐
โช๐ต
7. end if8. end if9. end for10. end for11. return ๐ต๐๐๐๐โ๐๐๐
Figure 3. The brute-force algorithm
B. A Scheme of the iBlackhole Algorithm
In general, finding blackhole patterns in a directed graph isa combinatorial problem. Therefore, as the number of nodes๐ increases, the computation time increases exponentially,making the brute-force algorithm unrealistic to obtain theresult for a large ๐ value. To this end, we introduce somepattern-size-independent pruning rules to reduce the searchspace. The key idea behind these pruning rules is to findout irrelevant nodes that have no chance to form an n-nodeblackhole as many as possible, and eliminate these nodesfrom the candidate search list. In this way, the search spacecan be reduced dramatically. The algorithm developed basedon these pruning rules is named as iBlackhole.
ALGORITHM iBlackhole(๐บ = (๐,๐ธ), ๐)Input:
๐บ: the directed graph๐ : the set of all nodes๐ธ: the set of all edges๐: max number of nodes each blackhole may contain
Output:๐ต๐๐๐๐โ๐๐๐: 2 to n-node blackhole set of ๐บ
1. ๐ต๐๐๐๐โ๐๐๐โ โ 2. for ๐โ 2 to ๐ do3. establish potential list ๐๐
4. remove irrelevant nodes from ๐๐, get candidate list ๐ถ๐
5. remove irrelevant nodes from ๐ถ๐, get final list ๐น๐
6. apply the Brute-Force Algorithm on ๐น๐ to7. find out i-node blackhole pattern set ๐ต๐๐๐๐โ๐๐๐๐8. ๐ต๐๐๐๐โ๐๐๐โ ๐ต๐๐๐๐โ๐๐๐
โช๐ต๐๐๐๐โ๐๐๐๐
9. end for10. return ๐ต๐๐๐๐โ๐๐๐
Figure 4. A scheme of the iBlackhole algorithm
Figure 4 shows the scheme of the iBlackhole algorithmfor detecting 2 through n-node blackhole patterns in a
296
![Page 4: Detecting Blackhole and Volcano Patterns in Directed Networks](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a2d1d61a28abe1338b6a72/html5/thumbnails/4.jpg)
S V
P 3
(a)
S V
P 3
(b)
S V
P 3
(c)
Figure 5. Illustration: the cascading delete process
directed graph ๐บ. In this algorithm, all blackhole patterns areidentified one by one according to their number of nodes. Ineach step of finding the i-node blackhole patterns, a potentiallist ๐๐ is first established. Only the nodes in this potentiallist have possibilities to form an i-node blackhole. In otherwords, nodes that are not in this list have no chance tobe in an i-node blackhole pattern. Then nodes in ๐๐ willbe examined one after another and irrelevant nodes will bedeleted based on some pruning rules. The results of thispruning form a candidate list ๐ถ๐. For each node v in ๐ถ๐,we will check it again and remove irrelevant nodes from ๐ถ๐
using some additional pruning rules. Finally, we will havethe final search list ๐น๐, and then we can apply the brute-forcealgorithm on ๐น๐ to find out all i-node blackhole patterns.More details about this algorithm will be given after weintroduce some pruning rules.
C. Pruning Rules
In this section, we introduce pruning rules associated withthe potential list, the candidate list, and the final search list.
Definition 5 (directed path): Given a directed graph ๐บ =(๐,๐ธ), ๐ฃ0, ๐ฃ1, ๐ฃ2, . . . , ๐ฃ๐ โ ๐ , ๐1, ๐2, . . . , ๐๐ โ ๐ธ,where ๐๐ = (๐ฃ๐โ1, ๐ฃ๐). We say that the sequence of๐ฃ0๐1๐ฃ1๐2๐ฃ2 . . . ๐๐๐ฃ๐ forms a directed path from ๐ฃ0 to ๐ฃ๐,if ๐ฃ๐ โ= ๐ฃ๐ for all 0 โค ๐, ๐ โค ๐, ๐ โ= ๐. The length of thisdirected path is ๐.
Definition 6 (reachable): Given a directed graph ๐บ =(๐,๐ธ), ๐ข, ๐ฃ โ ๐ . We say that ๐ฃ is reachable from ๐ข ifthere is a directed path that starts from ๐ข and ends at ๐ฃ.
Definition 7 (predecessor and successor): Given a di-rected graph ๐บ = (๐,๐ธ), ๐ข, ๐ฃ โ ๐ . If ๐ฃ is reachable from๐ข, then we say ๐ข is a predecessor of ๐ฃ, and ๐ฃ is a successorof ๐ข. If there is an edge from ๐ข to ๐ฃ, then ๐ข is a directpredecessor of ๐ฃ, and ๐ฃ is a direct successor of ๐ข.
Lemma 1: If a node ๐ฃ โ ๐ต, where ๐ต โ ๐ is a blackhole,then all the direct successors of ๐ฃ are all in ๐ต.
Proof: This can be proved by contradiction. Assumethat there is at least one of ๐ฃโฒ๐ direct successors ๐ , and๐ โโ ๐ต, then we have ๐๐๐ข๐ก(๐ต) โฅ 1 since ๐ = (๐ฃ, ๐ ) โ ๐ธand ๐ โโ ๐ต. This contradicts with the definition of blackhole.Therefore, all direct successors of ๐ฃ should be in ๐ต.
Based on Lemma 1, we have the following lemma.
Lemma 2: In an n-node blackhole ๐ต, the maximum out-degree of any node in ๐ต is ๐โ 1.
Proof: This can be proved by contradiction. Supposethere is a node ๐ฃ with out-degree at least ๐ in an n-nodeblackhole ๐ต, then ๐ฃ should have at least ๐ direct successors,denoted as ๐ 1, ๐ 2, . . . , ๐ ๐. According to Lemma 1, if ๐ฃ โ ๐ต,all of ๐ 1, ๐ 2, . . . , ๐ ๐ should be in ๐ต, which makes the sizeof this blackhole at least ๐+1. Then we find a contradictionhere. Therefore, the maximum out-degree of any node in ann-node blackhole should be no greater than ๐โ 1.
According to Lemma 2, we can derive the followingtheorem for pruning the potential list ๐๐.
Theorem 2: For the potential list ๐๐, only nodes with out-degree less than ๐ need to be considered.
Proof: By Lemma 2, the maximum out-degree of anynode in an n-node blackhole is ๐โ1. In other words, nodeswith out-degree greater than ๐โ 1 have no chance to be inan i-node blackhole. Therefore, only nodes with out-degreeless than ๐ needs to be included in the potential list ๐๐.
According to Theorem 2, only the nodes with out-degreeless than ๐ are used to establish the potential list ๐๐. Afterhaving ๐๐, some additional pruning rules can be applied toremove irrelevant nodes from ๐๐ to get the candidate list ๐ถ๐.
Lemma 3: For each node ๐ฃ โ ๐๐, if there is at least oneof ๐ฃโฒ๐ direct successors ๐ โโ ๐๐, then ๐ฃ โโ ๐ถ๐.
Proof: This can be proved by contradiction. Since๐ โโ ๐๐, this means ๐ has no chance to be in an i-node blackhole. Assume that finally ๐ฃ belongs to an i-node blackhole ๐ต. According to Lemma 1, all ๐ฃโฒ๐ directsuccessors, which include ๐ , will also belong to ๐ต. Thenwe have a contradiction here. Therefore, ๐ฃ has no chance toform an i-node blackhole, and thus ๐ฃ can be removed from๐๐ safely; that is, ๐ฃ โโ ๐ถ๐.
After a node is removed from ๐๐, there are some othernodes associated with it can also be removed from ๐๐.
Lemma 4: If a node ๐ฃ is removed from ๐๐, then all of itsdirect predecessors can also be removed from ๐๐.
Proof: For each of ๐ฃโฒ๐ direct predecessors ๐, ๐ฃ is๐โฒ๐ direct successor. Since ๐ฃ has been removed from ๐๐,then ๐ฃ โโ ๐๐. According to Lemma 3, ๐ should also beremoved from ๐๐. Therefore, all ๐ฃโฒ๐ direct predecessors canbe removed from ๐๐.
297
![Page 5: Detecting Blackhole and Volcano Patterns in Directed Networks](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a2d1d61a28abe1338b6a72/html5/thumbnails/5.jpg)
By Lemma 4, when removing a node ๐ฃ from ๐๐, all itsdirect predecessors should also be removed. Then, the newlyremoved direct predecessors become the new โฒโฒ๐ฃโฒโฒs. Finally,the cascading delete will spread to all ๐ฃโฒ๐ predecessors.Figure 5 shows an example of the cascading delete processwhen removing node ๐ฃ from the potential list ๐3. Theshadow nodes in the figures are nodes removed from ๐3.In Figure 5(๐), ๐ has an out-degree of 3, which makes itexclude from ๐3 at the first place. When nodes in ๐3 arechecked one after another, it can be noticed that ๐ฃ has asuccessor ๐ not in ๐3. Therefore, ๐ฃ is removed from ๐3.Then all of ๐ฃโฒ๐ direct predecessors are all deleted from ๐3
as shown in Figure 5(๐), and this process spreads to all ๐ฃโฒ๐ predecessors in Figure 5(๐).
By applying Lemma 3 and Lemma 4, we prune rulesirrelevant nodes from ๐๐, and get the candidate list ๐ถ๐.Before going any further, we would like to introduce anotherconcept first.
v
a
c
e
f
g
d
h
b
Figure 6. An example of the closure of node ๐ฃ
Definition 8 (closure): Given a directed graph ๐บ =(๐,๐ธ), ๐ฃ โ ๐ . The closure of ๐ฃ, denoted as ๐ฃ+, is definedas: ๐ฃ+ = {๐ โฃ there is a directed path from v to s} โช {๐ฃ}.
Figure 6 shows an example of the closure of node ๐ฃ. Inthis figure, ๐ฃ+ = {๐ฃ, ๐, ๐, ๐, โ, ๐}. Indeed, the closure of anode has an important feature as the following.
Theorem 3: The closure of a node ๐ฃ is a blackhole.Moreover, it is a subset of any blackhole that contains ๐ฃ.
Proof: For the first part of this theorem, by the defi-nition of closure, ๐ฃ+ is the set of all nodes reachable from๐ฃ, together with ๐ฃ. If ๐ฃ+ does not form a blackhole, therehave to be at least an edge ๐ = (๐ , ๐ก) โ ๐บ, such that ๐ โ ๐ฃ+
and ๐ก โโ ๐ฃ+. If ๐ = ๐ฃ, then ๐ก is reachable from ๐ฃ, we have๐ก โ ๐ฃ+; If ๐ โ= ๐ฃ, since there is a directed path from ๐ฃ to ๐ ,and ๐ = (๐ , ๐ก), ๐ก can be reached from ๐ฃ. We can also have๐ก โ ๐ฃ+. In either condition, we can have a contradictionhere. Therefore, ๐ฃ+ is a blackhole.
For the second part of this theorem, if a blackhole ๐ตcontains ๐ฃ, by Lemma 1, all ๐ฃโฒ๐ direct successors should allbe in ๐ต. And then these direct successors become the newโฒโฒ๐ฃโฒโฒs. Eventually, this procedure will be spread to all the ๐ฃโฒ๐ successors. The above leads to ๐ฃ+ โ ๐ต.
The feature of closure (Theorem 3) can be used to derivesome pruning rules to remove some irrelevant nodes from๐ถ๐, and finally lead to the final search list ๐น๐.
ALGORITHM iBlackhole(๐บ = (๐,๐ธ), ๐)Input:
๐บ: the directed graph๐ : the set of all nodes๐ธ: the set of all edges๐: max number of nodes each blackhole may contain
Output:๐ต๐๐๐๐โ๐๐๐: 2 to n-node blackhole set of ๐บ
1. ๐ต๐๐๐๐โ๐๐๐โ โ 2. for ๐โ 2 to ๐ do3. ๐๐ โ {๐ฃโฃ๐๐๐ข๐ก(๐ฃ) < ๐}4. for each ๐ฃ in ๐๐ do5. if at least one of ๐ฃโฒ๐ directed6. successors are not in ๐๐ then7. remove ๐ฃ from ๐๐
8. remove all ๐ฃโฒ๐ predecessors from ๐๐
9. end if10. end for11. ๐ถ๐ โ ๐๐
12. for each ๐ฃ in ๐ถ๐ do13. if โฃ๐ฃ+โฃ > ๐ then14. remove ๐ฃ from ๐ถ๐
15. remove all ๐ฃโฒ๐ predecessors from ๐ถ๐
16. end if17. if โฃ๐ฃ+โฃ == ๐ then18. ๐ต๐๐๐๐โ๐๐๐โ ๐ต๐๐๐๐โ๐๐๐
โช๐ฃ+
19. remove ๐ฃ from ๐ถ๐
20. remove all ๐ฃโฒ๐ predecessors from ๐ถ๐
21. end if22. end for23. ๐น๐ โ ๐ถ๐
24. for each ๐ต โ ๐น๐(๐) do25. if ๐บ(๐ต) is weakly connected then26. if ๐๐๐ข๐ก(๐ต) == 0 then27. ๐ต๐๐๐๐โ๐๐๐โ ๐ต๐๐๐๐โ๐๐๐
โช๐ต
28. end if29. end if30. end for31. end for32. return ๐ต๐๐๐๐โ๐๐๐
Figure 7. The iBlackhole algorithm
Lemma 5: For each node ๐ฃ โ ๐ถ๐, if โฃ๐ฃ+โฃ > ๐, then ๐ฃ andall its predecessors are not in ๐น๐.
Proof: By Theorem 3, ๐ฃ+ is a subset of any blackholewhich contains ๐ฃ. Suppose ๐ฃ is in an i-node blackhole ๐ต.Then we have ๐ฃ+ โ ๐ต. So โฃ๐ตโฃ โฅ โฃ๐ฃ+โฃ > ๐. We can havea contradiction here. Therefore, ๐ฃ has no chance to forman i-node blackhole, and we can remove ๐ฃ from ๐ถ๐ safely.Then, the similar cascading delete procedure can be applied,and thus all the ๐ฃโฒ๐ predecessors can be deleted from ๐ถ๐.Therefore, ๐ฃ and all its predecessors will not be in ๐น๐.
Lemma 6: For each node ๐ฃ โ ๐ถ๐, if โฃ๐ฃ+โฃ = ๐, then ๐ฃ+
can be outputted as an i-node blackhole. Also, ๐ฃ and all itspredecessors can be removed from ๐ถ๐.
Proof: According to Theorem 3, ๐ฃ+ is a blackhole.Since โฃ๐ฃ+โฃ = ๐, ๐ฃ+ can be outputted as an i-node blackhole.Assume that ๐ฃ will also be in another blackhole ๐ต. By
298
![Page 6: Detecting Blackhole and Volcano Patterns in Directed Networks](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a2d1d61a28abe1338b6a72/html5/thumbnails/6.jpg)
s
b
v
a
d
e h
i
f
g
P 3
(a)
k
j
cs
b
v
a
d
e h
i
f
g
C3
(b)
k
j
cs
b
v
a
d
e h
i
f
g
F 3
(c)
k
j
cs
b
v
a
d
e h
i
f
g
F 3
(d)
k
j
c
Figure 8. An example of the procedure of the iBlackhole Algorithm
Theorem 3, ๐ฃ+ โ ๐ต. If โฃ๐ตโฃ > ๐, ๐ต is not an i-node blackholeand cannot be outputted as an i-node blackhole; If โฃ๐ตโฃ = ๐,then ๐ต is exactly ๐ฃ+, and has already been out putted asan i-node blackhole. In either situation, we can remove ๐ฃfrom ๐ถ๐. Then the similar cascading delete procedure canbe applied, and finally all ๐ฃโฒ๐ predecessors will be deletedfrom ๐ถ๐.
Lemma 5 and Lemma 6 are used as pruning rules to prunethe candidate list ๐ถ๐ and get the final search list ๐น๐. Afterhaving ๐น๐, we can apply the brute-force approach on ๐น๐ tofind out all i-node blackhole patterns. In the next subsection,we will give the details of the iBlackhole algorithm.
D. The iBlackhole Algorithm
The iBlackhole algorithm exploits the pruning rules statedfrom Lemma 1 to Lemma 6. Figure 7 shows the detailedpseudocode of the iBlackhole algorithm. Specifically, Line3 establishes the potential list ๐๐. Lines 4 - 11 removeirrelevant nodes from ๐๐, and get the candidate list ๐ถ๐. Lines12 - 23 remove irrelevant nodes from ๐ถ๐, and get the finalsearch list ๐น๐. Lines 24 - 30 apply the brute-force approachon ๐น๐ to find out all i-node blackhole patterns.
Completeness and Correctness. In the iBlackhole algo-rithm, since only the nodes that have no chance to form an i-node blackhole pattern are removed in each iteration ๐ (this isguaranteed by Lemma 1 through Lemma 6). In other words,all the possible combinations of nodes have been checkedto produce ๐น๐, this algorithm is complete. Also, for eachcandidate blackhole pattern, since we have checked whetherthis candidate pattern satisfies the definition of blackhole ornot, this algorithm is correct.
Figure 8 shows an example of the procedure of theiBlackhole algorithm when searching the 3-node blackholepatterns. The shadow nodes in the figures are nodes whichhave been deleted. In Figure 8(๐), ๐ has an out-degree of 3,so ๐ can be deleted from ๐3 at the first place. In Figure 8(๐),when we check the nodes in ๐3 one after another, we noticethat ๐ฃ has a successor ๐ not in ๐3. Therefore, we can remove๐ฃ from ๐3. Then, all the direct predecessors of ๐ฃ can becascaded deleted from ๐3, and this delete process spreadsto all ๐ฃโฒ๐ predecessors. Finally, we have the candidate list๐ถ3 = {๐, ๐, ๐, ๐, ๐, ๐}. In Figure 8(๐), we find that โฃ๐+โฃ = 3.Therefore, we output ๐+ = {๐, ๐, ๐} as a 3-node blackhole,and delete ๐ from ๐ถ3. Now, we have the final search list๐น3 = {๐, ๐, ๐, ๐, ๐}. In Figure 8(๐), we examine each 3-combination of nodes in ๐น3, and find out a 3-node blackhole
{๐, ๐, ๐}. Therefore, there are two 3-node blackhole patternsin this example, {๐, ๐, ๐} and {๐, ๐, ๐} respectively.
E. The iBlackhole-DC Algorithm
While the search space has been reduced dramaticallyin the iBlackhole algorithm, it is still possible to developsome pruning strategies for the graphs with some specialcharacteristics. Indeed, for a node ๐ฃ โ ๐ , ๐ฃ can only forma blackhole pattern with nodes within the same weaklyconnected component in ๐บ by the blackhole definition.Therefore, if a directed graph has several weakly connectedcomponents, which are not connected to each other, adivide-and-conquer pruning strategy can be exploited forfirst identifying these weakly connected components and theblackhole finding method can be conducted in each weaklyconnected component. This pruning strategy can drasticallydivide a large exponential growth search space into severalmuch smaller exponential growth search space, and thusreducing a lot of computational cost.
Along this line, we combine the iBlackhole algorithmwith this divide-and-conquer pruning strategy and developan even more effective algorithm, named iBlackhole-DC, forfinding blackhole patterns. Figure 9 shows the scheme ofthis algorithm for finding out 2 through n-node blackholepatterns in a directed graph ๐บ.
ALGORITHM iBlackhole-DC(๐บ = (๐,๐ธ), ๐)Input:
๐บ: the directed graph๐ : the set of all nodes๐ธ: the set of all edges๐: max number of nodes each blackhole may contain
Output:๐ต๐๐๐๐โ๐๐๐: 2 to n-node blackhole set of ๐บ
1. ๐ต๐๐๐๐โ๐๐๐โ โ 2. for ๐โ 2 to ๐ do3. establish potential list ๐๐
4. remove useless nodes from ๐๐, get candidate list ๐ถ๐
5. remove useless nodes from ๐ถ๐, get final list ๐น๐
6. for each weakly connected component in ๐บ(๐น๐) do7. apply the Brute-Force Algorithm to8. find out i-node blackhole pattern set ๐ต๐๐๐๐โ๐๐๐๐9. ๐ต๐๐๐๐โ๐๐๐โ ๐ต๐๐๐๐โ๐๐๐
โช๐ต๐๐๐๐โ๐๐๐๐
10. end for11. end for12. return ๐ต๐๐๐๐โ๐๐๐
Figure 9. A scheme of the iBlackhole-DC algorithm
299
![Page 7: Detecting Blackhole and Volcano Patterns in Directed Networks](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a2d1d61a28abe1338b6a72/html5/thumbnails/7.jpg)
The completeness and correctness of the iBlackhole-DC algorithm is straightforward. Since the only differencebetween iBlackhole and iBlackhole-DC is the use of thedivide-and-conquer strategy. We know that the iBlackholealgorithm is complete and correct. Also, the divide-and-conquer strategy only separates the nodes which cannot formblackhole patterns. Therefore, the iBlackhole-DC algorithmis also complete and correct.
V. EXPERIMENTAL RESULTS
In this section, we present the experimental results toevaluate the performances of Brute-Force, iBlackhole andiBlackhole-DC algorithms.
A. The Experimental Setup
Experimental Data. The experiments were conductedon four real-world data sets: Wiki, Amazon, Roget, andStock. Table I shows some basic characteristics of thesedata sets.
Table IDATA CHARACTERISTICS
Data set # nodes # egdesWiki 7,115 103,689Wiki500 500 3,865Wiki1000 1,000 9,741Wiki1500 1,500 16,389Wiki1500-full 1,500 16,820Amazon 262,111 1,234,877Amazon1000 1,000 3,952Amazon500-full 500 1,911Roget 1,022 5,075Roget-full 1,022 5,127Stock-0.35 2,453 273
Wiki Data Set. There are 7,115 nodes and 103,689 edgesin the Wiki data set [2]. To make the brute-force algorithmrunnable, we derived three subgraphs from the originalgraph, with the number of nodes 500, 1,000, and 1,500respectively. These subgraphs are named as Wiki500,Wiki1000, and Wiki1500 separately. In addition, wesynthesized a weakly connected directed graph, named asWiki1500-full, by adding some edges to Wiki1500data set.Amazon Data Set. There are 262,111 nodes and
1,234,877 edges in the Amazon data set [3]. Similar to theWiki data set, we derived a subgraph Amazon1000 with1000 nodes, and synthesized a weakly connected directedgraph Amazon500-full from the original graph.Roget Data Set. There are 1,022 nodes and 5,075 edges
in the Roget data set [4]. Also, we synthesized a weaklyconnected directed graph Roget-full from the originalgraph, by adding some edges to it.Stock Data Set. This is a data set generated by our-
selves. Specifically, we collected daily stock prices fromWharton Research Data Services [5] of 3,081 instruments in
Figure 10. An overview of the Dow Jones Index from Jan 08 to Jun 08
the U.S. stock market over a period of 125 consecutive trad-ing days from Jan 2, 2008 to Jun 30, 2008. We tried to avoidselecting the period with a strong movement trend in thestock market, since the movements of all instruments duringthat period tend to have high correlations among each other.As can be seen in Figure 10, there was no strong trend inthe Dow Jones index during the selected period. Then we re-moved instruments in Dow Jones and S&P 500 indexes fromour collection. Those instruments are more representative inthe stock market and therefore tend to have high correlationswith the other instruments. Since we target on finding outsome not-so-obvious blackhole patterns, we only considerinstruments not in Dow Jones and S&P 500 indexes. Afterthat, we constructed the Stock data set as follows. 1)Nodes in this data set correspond to instruments. There are2,453 nodes in this data set; 2) we build a vector ๐๐ ={๐๐1 , ๐๐2 , . . . , ๐๐๐ก , . . . , ๐๐125} for each instrument, where ๐๐๐กis the closing price of instrument ๐ on day ๐ก; 3) we createa Boolean vector ๐ต๐ = {๐๐1 , ๐๐2 , . . . , ๐๐๐ก , . . . , ๐๐124} basedon ๐๐, where ๐๐๐ก = 1, ๐๐ ๐๐๐ก+1
โฅ ๐๐๐ก , ๐๐กโ๐๐๐ค๐๐ ๐ 0; 4) For๐ = {๐ฅ1, . . . , ๐ฅ๐ก, . . . , ๐ฅ๐} and ๐ = {๐ฆ1, . . . , ๐ฆ๐ก, . . . , ๐ฆ๐},๐๐ฅ๐ฆ(๐) is the lagged correlation when ๐ is delayed by ๐.A symmetric situation can be applied to get ๐๐ฆ๐ฅ(๐). Wecompute the lagged correlations ๐๐๐(1) and ๐๐๐(1) for eachpair of instrument ๐ and ๐; 5) there is an edge from node ๐ tonode ๐, if ๐๐๐(1) > ๐, where ๐ is a pre-defined threshold, andvice versa. Since we compute the lagged correlation of 1-day delay between two instruments, if there is an edge fromnode ๐ to node ๐, it indicates the movement of instrument ๐followed the movement of instrument ๐ on the previous day.Here, we specify ๐ as 0.35 to get Stock-0.35 data set.
Note that the method we used to construct the Stockdata set is similar to the way in [6]. However, there are somedifferences. We used the lagged Pearson correlation amonginstruments, and ended up with a directed graph. WhileBoginski et al [6] employed the general Pearson correlationand constructed an undirected graph.
Experimental Platform. All the experiments were per-formed on a Dell Optiplex 960 Desktop with Intel Core 2Quad Processor Q9550 and 4 GB of memory running theWindows XP Professional Service Pack 3 operating system.
300
![Page 8: Detecting Blackhole and Volcano Patterns in Directed Networks](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a2d1d61a28abe1338b6a72/html5/thumbnails/8.jpg)
2 3 4 5 6 7 8 9 1010
โ2
10โ1
100
101
102
103
104
105
Number of Nodes
Run
ning
Tim
e (s
ec)
BruteโForce iBlackhole iBlackholeโDC
(a) Wiki1000
2 3 4 5 6 7 8 9 1010
โ3
10โ2
10โ1
100
101
102
103
104
105
Number of Nodes
Run
ning
Tim
e (s
ec)
BruteโForce iBlackhole iBlackholeโDC
(b) Amazon1000
2 3 4 5 6 7 8 9 1010
โ3
10โ2
10โ1
100
101
102
103
104
105
Number of Nodes
Run
ning
Tim
e (s
ec)
BruteโForce iBlackhole iBlackholeโDC
(c) Roget
Figure 11. The running time of Brute-Force, iBlackhole, and iBlackhole-DC algorithms on different data sets
2 3 4 5 6 7 8 9 1010
โ4
10โ3
10โ2
10โ1
100
101
102
103
104
Number of Nodes
Run
ning
Tim
e (s
ec)
BruteโForce iBlackhole iBlackholeโDC
(a) Wiki500
2 3 4 5 6 7 8 9 1010
โ2
10โ1
100
101
102
103
104
105
Number of Nodes
Run
ning
Tim
e (s
ec)
BruteโForce iBlackhole iBlackholeโDC
(b) Wiki1000
2 3 4 5 6 7 8 9 1010
โ2
10โ1
100
101
102
103
104
105
106
Number of Nodes
Run
ning
Tim
e (s
ec)
BruteโForce iBlackhole iBlackholeโDC
(c) Wiki1500
Figure 12. The running time of Brute-Force, iBlackhole, and iBlackhole-DC algorithms for different # nodes
B. An Overall Comparison
In this subsection, we provide an overall comparison ofBrute-Force, iBlackhole, and iBlackhole-DC algorithms.
First, we compare the performances of three algorithmson different data sets with almost the same number ofnodes. In this experiment, we choose data sets Wiki1000,Amazon1000, and Roget. Figure 11 shows the runningtime of these algorithms. As can be seen, both Brute-Force and iBlackhole algorithms are runnable within certainnumber of nodes, while iBlackhole can go a litter bit furtherthan Brute-Force. In contrast, the iBlackhole-DC algorithmis runnable for finding n-node blackhole patterns with a large๐ value.
The running time of three algorithms for detecting black-hole patterns with different number of nodes forms threeapproximately straight lines in logarithm scale for all threedata sets. (For iBlackhole-DC, it is more clear if we onlyfocus on ๐ โฅ 4). This indicates that the running timefor those algorithms follow an exponential increasing time.Also, the slopes of three performance curves for each dataset are significantly different. For Brute-Force, since we dothe exhaust search at the beginning and the number of nodesof the three data sets are almost the same, the slopes inthose three subfigures are almost the same. For iBlackhole,as well as iBlackhole-DC, they are a little different. Theslope of the curve on the Wiki1000 data set is larger thanslopes in Amazon1000 and Roget. For both iBlackholeand iBlackhole-DC, we prune irrelevant nodes from eachdata set. However, the pruning effect depends on the graphproperties of each data set (i.e. the average in-degree andout-degree plays an important role). This makes the runningtime of iBlackhole and iBlackhole-DC algorithms vary for
different data sets, but after all, much less than the Brute-Force algorithm.
Next, we compare the performances of three algorithmson the same data set with different number of nodes. In thisexperiment, we choose data sets Wiki500, Wiki1000,and Wiki1500. Figure 12 shows the running time of thesethree algorithms on those three data sets.
The overall performances of these three algorithms arevery similar to the first experiment. However, there are stillsomething interesting here. We can observe that the slopesof the three lines in these three data sets are almost thesame. (For iBlackhole-DC, it is more clear if we only focuson ๐ โฅ 4). Since these three subgraphs are derived from thesame network, the inherent graph properties of these datasets are similar. The above might be the reason that similarslopes are observed in the results.
C. iBlackhole vs. iBlackhole-DC
In this subsection, we compare the performancesof iBlackhole and iBlackhole-DC algorithms. We showhow significant the divide-and-conquer strategy im-proves the performance of iBlackhole. In this experi-ment, we choose three synthetical weakly connected di-rected networks, Amazon500-full, Roget-full, andWiki1500-full.
Figure 13 shows the running time of these two algorithmson those three data sets. In the figure, we can see thatthe performance of iBlackhole-DC is several orders ofmagnitude faster than the performance of iBlackhole, sinceit drastically divides a large exponential growth search spaceinto several much smaller exponential growth search space,and thus reduces a lot of computational cost.
301
![Page 9: Detecting Blackhole and Volcano Patterns in Directed Networks](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a2d1d61a28abe1338b6a72/html5/thumbnails/9.jpg)
2 3 4 5 6 7 8 9 1010
โ3
10โ2
10โ1
100
101
102
103
104
105
Number of Nodes
Run
ning
Tim
e (s
ec)
iBlackhole iBlackholeโDC
(a) Amazon500-full
2 3 4 5 6 7 8 9 10
10โ2
100
102
104
Number of Nodes
Run
ning
Tim
e (s
ec)
iBlackhole iBlackholeโDC
(b) Roget-full
2 3 4 5 6 7 8 9 1010
โ2
100
102
104
Number of Nodes
Run
ning
Tim
e (s
ec)
iBlackhole iBlackholeโDC
(c) Wiki1500-full
Figure 13. The running time of iBlackhole algorithm and iBlackhole-DC algorithm on different data sets
Figure 14 shows the visualizations of the structures ofdifferent data sets before and after applying the first pruningscheme to these data sets while detecting 7-node blackholepatterns. This figure is drawn with Pajek [7]. From thisfigure, we can observe that the number of nodes in eachdata set decreases dramatically after pruning, and eachnetwork becomes very sparse. Table II shows some maincharacteristics of the data sets after pruning. As can be seen,while the original data sets are all weakly connected, wecan still get a large number of connected components afterpruning. Therefore, the divide-and-conquer strategy can helpdramatically reduce the search space.
Table IICHARACTERISTICS OF DATA SETS AFTER PRUNING
Data set # nodes # edges # connected compAmazon500 14 36 3
Roget 32 37 11Wiki1500 252 209 43
Pajek Pajek Pajek
PajekPajek Pajek
Befo
re P
runi
ngA
๏ฟฝer P
runi
ng
Amazon500 Roget Wiki1500
Figure 14. Structures of different data sets before and after pruning
D. Blackhole Patterns in the Stock Data
Here, we show an application of blackhole patterns forunderstanding the structural relationship of stock movement.
Figure 15 shows two blackhole patterns identified in theStock-0.35 data set. Owens Corning (OC) is in theleft blackhole pattern. The Westmoreland Coal Company(WLB) has an outlink to OC. This indicates that the pricemovement of WLB followed the price movement of OC.By doing some research, we find out Owens Corning is one
CDL
A DCT
ISIL
MTEX
T KR
TRID
ZRAN
OC
WLB
VQ H P
Figure 15. Two blackhole patterns identified in Stock-0.35 data set
of the biggest building material producers in the country. Itsproducts include the manufactured stone products used in thebuilding. In recent years, there is a trend in the industry thatcompanies are developing new innovative building materialsby recycling the waste in the energy industry, which areprimarily the residual byproducts of coal combustion. As anenergy company, WLB owns five coal mines. Therefore, itis understandable that the stock price of the WestmorelandCoal Company has a lag correlation with the stock price ofOwens Corning. The other two companies in this patternare Venoco Inc. (VQ) and Helmerich & Payne Inc. (HP).Venoco Inc. is an energy company primarily engaged in theacquisition, exploration, exploitation, and development of oiland natural gas properties, while Helmerich & Payne Inc. isa contract drilling company drilling oil and gas wells forothers. Therefore, it is not surprising that the stock pricemovements of these two companies are lag correlated withthe stock price of the Westmoreland Coal Company.
The second blackhole pattern is a star-shaped blackhole,which indicates the stock prices for the other six instrumentsare triggered by Citadel Broadcasting Corp. (CDL). Amongthese six companies, there are one telecommunication com-pany (ADCT), three IC related companies (ISIL, TRID, andZRAN), and one highly engineered steel produce company(TKR), which are all related to the Broadcasting Corporationto some extent. The other company is a wellness solutionprovider, which may be involved in this pattern by chanceor for some unknown reasons.
This application is just a simple indication of the useof the blackhole patterns. Indeed, the blackhole patternscan provide an unique view of some structural properties,and help us better understand the interactions among somenodes in the network. However, we should note that thisuse of blackhole patterns is still preliminary and morecomprehensive studies are expected in the future.
302
![Page 10: Detecting Blackhole and Volcano Patterns in Directed Networks](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a2d1d61a28abe1338b6a72/html5/thumbnails/10.jpg)
VI. RELATED WORK
Related work can be grouped into two categories. The firstcategory includes the work on frequent subgraph mining,which studies how to efficiently find frequent subgraphs inthe graph data. For instance, Jiang et al. [8] proposed ameasure for mining globally distributed frequent subgraphsin a single labeled graph. Meanwhile, there are many worksin mining frequent subgraphs in multiple labeled graphs [9],[10], [11], [12], [13], [14]. The problem of detecting black-hole patterns is different from the above works for tworeasons. First, the definition of blackhole patterns is differentfrom the definition of frequent subgraphs. Second, blackholepatterns are identified whether they are frequent or not.
The second category includes the works for detectingcommunity structures in large networks. Communities ina network are groups of nodes within which connectionsare dense, but between which connections are sparse [15].There are a lot of works on how to detect communitiesin a network. For instance, Newman and Girvan [16],[17] proposed a betweenness-based method, Hopcroft [18]proposed a stable method, and Ghosh [19] proposed a globalinfluence based method to detect community structures.All these methods detect community structures based oncertain definitions and criteria. However, the definition ofblackhole patterns is different from the above definitionsof communities. Also, once a network has been decided,the number of n-node blackhole patterns is determined. Incontrast, it is usually difficult to know how many communitystructures are in the network.
VII. CONCLUDING REMARKS
In this paper, we formulated a problem of finding black-hole and volcano patterns in directed networks. Both black-hole and volcano patterns can be observed in real-worldscenarios, such as the trading ring for market manipulation.Indeed, it is essentially a combinatorial problem for miningblackhole or volcano patterns. To reduce the complexity ofthe problem, we first proved that the problem of findingblackhole patterns is a dual problem of finding volcano pat-terns. Thus, we could be only focused on mining blackholepatterns. To that end, we derived two pruning schemes. Thefirst scheme is based on a set of size-independent pruningrules which can help to prune the candidate search spaceeffectively and thus can dramatically reduce the computa-tional cost of blackhole mining. Based on the first pruningscheme, we developed the iBlackhole algorithm for miningblackhole patterns. In addition, the second scheme is to takeadvantage of an unique graph property; that is, we couldsearch in each individual subgraphs if the target directedgraph contains several disconnected subgraphs. Therefore,by exploiting these two pruning schemes, we developed theiBlackhole-DC algorithm for finding blackhole patterns.
Finally, as shown in the experimental results, the prun-ing effect of both pruning schemes is significant and
the iBlackhole-DC algorithm is several order-of-magnitudefaster than the iBlackhole algorithm, which outperforms abrute-force approach by several orders of magnitude as well.
VIII. ACKNOWLEDGEMENTS
The authors were supported in part by National ScienceFoundation (NSF) via grant number CCF-1018151, andNational Natural Science Foundation of China (NSFC) viaproject number 60925008.
REFERENCES
[1] R. Diestel, Graph Theory (Graduate Texts in Mathematics).Springer, 2006.
[2] J. Leskovec, K. Lang, and et al, โCommunity structure inlarge networks: Natural cluster sizes and the absence of largewell-defined clusters,โ in arXiv:0810.1355, 2008.
[3] J. Leskovec, L. Adamic, and B. Adamic, โThe dynamics ofviral marketing,โ ACM TWEB, vol. 1, 2007.
[4] V. Batagelj and A. Mrvar. Pajek datasets, 2006.http://vlado.fmf.uni-lj.si/pub/networks/data.
[5] Wharton Research Data Services, University of Pennsylvania.https://wrds.wharton.upenn.edu/wrdsauth/members.cgi.
[6] V. Boginski, S. Butenko, and P. M. Pardalos, โStatisticalanalysis of financial networks,โ Computational Stat. and DataAnalysis, vol. 48, pp. 431โ443, 2005.
[7] W. Nooy, A. Mrvar, and V. Batagelj, Exploratory SocialNetwork Analysis with Pajek. Cambridge University Press,2005.
[8] X. Jiang, H. Xiong, C. Wang, and A. H. Tan, โMining globallydistributed frequent subgraphs in a single labeled graph,โData and Knowledge Engineering, vol. 68, pp. 1034โ1058,2009.
[9] X. Yan and J. Han, โgspan: Graph-based substructure patternmining.โ in IEEE ICDMโ02, 2002.
[10] C. Wang, W. Wang, J. Pei, Y. Zhu, and B. Shi, โScal-able mining of large disk-based graph databases,โ in ACMSIGKDDโ04, 2004.
[11] J. Huan, W. Wang, and J. Prins, โEfficient mining of fre-quent subgraphs in the presence of isomorphism,โ in IEEEICDMโ03, 2003.
[12] J. Wang, W. Hsu, M. Lee, and C. Sheng, โA partition-basedapproach to graph mining,โ in ICDE 2006, 2006, p. 74.
[13] M. Kuramochi and G. Karypis, โFinding frequent patternsin a large sparse graph.โ Data Min. Knowl. Discov., vol. 11,no. 3, pp. 243โ271, 2005.
[14] D. J. Cook and L. B. Holder, โSubstructure discovery usingminimum description length and background knowledge.โ J.Artif. Intell. Res. (JAIR), vol. 1, pp. 231โ255, 1994.
[15] M. E. J. Newman, โDetecting community structure in net-works,โ Eur. Phys. J. B, vol. 38, pp. 321โ330, 2004.
[16] M. E. J. Newman and M. Girvan, โFinding and evaluatingcommunity structure in networks,โ Phys. Rev. E 69, vol.026113, 2004.
[17] M. Girvan and M. E. J. Newman, โCommunity structurein social and biological networks,โ in Proc. National Acad.Science, 2002.
[18] J. Hopcroft, O. Khan, and et al, โNatural communities in largelinked networks,โ in ACM SIGKDDโ03, 2003.
[19] R. Ghosh and K. Lerman, โCommunity detection using ameasure of global influence,โ in SNA-KDDโ08, 2008.
303