The Banking Transactions Dataset and its Comparative ...

9
The Banking Transactions Dataset and its Comparative Analysis with Scale-free Networks* Akrati Saxena * , Yulong Pei * , Jan Veldsink ,Werner van Ipenburg , George Fletcher * , Mykola Pechenizkiy * * Eindhoven University of Technology, Eindhoven, The Netherlands {a.saxena,y.pei.1,g.h.l.fletcher,m.pechenizkiy}@tue.nl Co¨ operatieve Rabobank U.A., Utrecht, The Netherlands {jan.veldsink,werner.van.ipenburg}@rabobank.nl Abstract—We construct a network of 1.6 million nodes from banking transactions of users of Rabobank. We assign two weights on each edge, which are the aggregate transferred amount and the total number of transactions between the users from the year 2010 to 2020. We present a detailed analysis of the unweighted and both weighted networks by examining their degree, strength, and weight distributions, as well as the topological assortativity and weighted assortativity, clustering, and weighted clustering, together with correlations between these quantities. We further study the meso-scale properties of the networks and compare them to a randomized reference system. We also analyze the characteristics of nodes and edges using centrality measures to understand their roles in the money transaction system. This will be the first publicly shared dataset of intra-bank transactions, and this work highlights the unique characteristics of banking transaction networks with other scale- free networks. I. I NTRODUCTION The analysis of large-scale complex networks, including so- cial networks, technological networks, biological networks, has offered in-depth insights to uncover and explore their evolution and understand several dynamic phenomena taking place on these networks [1]. Some interesting examples, where complex networks have been used to provide solutions for various real- world problems, include understanding human communication pattern [2], designing fair policies for overcoming pandemic [3], disrupting communities in terrorist networks [4], medicine development and treatment of complex diseases [5], and so on. The analysis of bank transaction network will be beneficial for several research directions. It will help in understanding the flow of money at the microscopic level as well as how this contributes towards the macroscopic money transaction system. It will improve the downstream tasks in the financial domain with guidance from the topological perspective. Some representative tasks including fraud detection and user classification in transaction networks. It will shed light on financial simulator design. Existing simulators focus on individual customer behavior, while our analysis can provide complementary information about collective behaviors of group users. The existing studies on financial network analysis focus on trade network or inter-bank transaction networks that show different patterns from intra-bank transaction data. Most of the studies on the financial domain, either make use of a financial simulator to generate synthetic transactions, e.g., PaySim [6], or concentrate on specific downstream tasks such as anomaly detection [7]. In this paper, we aim to bridge the gap by presenting a detailed analysis of unweighted and weighted networks constructed from a data set of the money transactions of more than 1.6 million users of Rabobank. This paper analyzes unweighted and weighted, large-scale, one-to-one bank transaction networks. We follow the ‘cookbook approach’ for performing a systematic analysis of basic and more advanced network characteristics, which has also been followed in previous network analysis work [2]. We explain the characteristics of the dataset and networks in Section II. We study some of the basic network characteristics in Section III. We explore meso-scale structural properties of the network, including k-cliques and the coupling of link weight with the network topology in Section IV. We briefly highlight centrality- based roles of nodes and edges in Section V. We present a comparative analysis of the unique characteristics of the transaction network with other scale-free networks in Section VI and, finally, discuss our findings and future directions in Section VII. Our main contributions are mentioned below. 1) The anonymized banking transaction data of users. As per the best of our knowledge, this will be the first publicly available dataset of users’ banking transactions. The dataset is available at https://github.com/akratiiet/ RaboBank Dataset. 2) We perform a detailed analysis of unweighted and weighted banking transaction networks and highlight the similarities and differences of these networks with other types of networks. As per the best of our knowledge, this is the first work to analyze intra-bank transaction network. We hope the shared data and this work will help further help in understanding the evolution of money flow in society. II. DATASET AND NETWORKS The dataset is collected from the Co ¨ operatieve Rabobank U.A. 1 , a Dutch multinational banking and financial services 1 https://www.rabobank.nl/ 1 arXiv:2109.10703v1 [cs.SI] 22 Sep 2021

Transcript of The Banking Transactions Dataset and its Comparative ...

Page 1: The Banking Transactions Dataset and its Comparative ...

The Banking Transactions Dataset and itsComparative Analysis with Scale-free Networks*Akrati Saxena∗, Yulong Pei∗, Jan Veldsink†,Werner van Ipenburg†, George Fletcher∗, Mykola Pechenizkiy∗

∗Eindhoven University of Technology, Eindhoven, The Netherlands{a.saxena,y.pei.1,g.h.l.fletcher,m.pechenizkiy}@tue.nl

†Cooperatieve Rabobank U.A., Utrecht, The Netherlands{jan.veldsink,werner.van.ipenburg}@rabobank.nl

Abstract—We construct a network of 1.6 million nodes frombanking transactions of users of Rabobank. We assign twoweights on each edge, which are the aggregate transferredamount and the total number of transactions between the usersfrom the year 2010 to 2020. We present a detailed analysisof the unweighted and both weighted networks by examiningtheir degree, strength, and weight distributions, as well as thetopological assortativity and weighted assortativity, clustering,and weighted clustering, together with correlations between thesequantities. We further study the meso-scale properties of thenetworks and compare them to a randomized reference system.We also analyze the characteristics of nodes and edges usingcentrality measures to understand their roles in the moneytransaction system. This will be the first publicly shared datasetof intra-bank transactions, and this work highlights the uniquecharacteristics of banking transaction networks with other scale-free networks.

I. INTRODUCTION

The analysis of large-scale complex networks, including so-cial networks, technological networks, biological networks, hasoffered in-depth insights to uncover and explore their evolutionand understand several dynamic phenomena taking place onthese networks [1]. Some interesting examples, where complexnetworks have been used to provide solutions for various real-world problems, include understanding human communicationpattern [2], designing fair policies for overcoming pandemic[3], disrupting communities in terrorist networks [4], medicinedevelopment and treatment of complex diseases [5], and so on.The analysis of bank transaction network will be beneficial forseveral research directions.• It will help in understanding the flow of money at the

microscopic level as well as how this contributes towardsthe macroscopic money transaction system.

• It will improve the downstream tasks in the financialdomain with guidance from the topological perspective.Some representative tasks including fraud detection anduser classification in transaction networks.

• It will shed light on financial simulator design. Existingsimulators focus on individual customer behavior, whileour analysis can provide complementary information aboutcollective behaviors of group users.

The existing studies on financial network analysis focus ontrade network or inter-bank transaction networks that showdifferent patterns from intra-bank transaction data. Most of the

studies on the financial domain, either make use of a financialsimulator to generate synthetic transactions, e.g., PaySim [6],or concentrate on specific downstream tasks such as anomalydetection [7]. In this paper, we aim to bridge the gap bypresenting a detailed analysis of unweighted and weightednetworks constructed from a data set of the money transactionsof more than 1.6 million users of Rabobank.

This paper analyzes unweighted and weighted, large-scale,one-to-one bank transaction networks. We follow the ‘cookbookapproach’ for performing a systematic analysis of basic andmore advanced network characteristics, which has also beenfollowed in previous network analysis work [2]. We explainthe characteristics of the dataset and networks in Section II.We study some of the basic network characteristics in SectionIII. We explore meso-scale structural properties of the network,including k-cliques and the coupling of link weight with thenetwork topology in Section IV. We briefly highlight centrality-based roles of nodes and edges in Section V. We presenta comparative analysis of the unique characteristics of thetransaction network with other scale-free networks in SectionVI and, finally, discuss our findings and future directions inSection VII.

Our main contributions are mentioned below.

1) The anonymized banking transaction data of users. Asper the best of our knowledge, this will be the firstpublicly available dataset of users’ banking transactions.The dataset is available at https://github.com/akratiiet/RaboBank Dataset.

2) We perform a detailed analysis of unweighted andweighted banking transaction networks and highlight thesimilarities and differences of these networks with othertypes of networks. As per the best of our knowledge,this is the first work to analyze intra-bank transactionnetwork. We hope the shared data and this work willhelp further help in understanding the evolution of moneyflow in society.

II. DATASET AND NETWORKS

The dataset is collected from the Cooperatieve RabobankU.A.1, a Dutch multinational banking and financial services

1https://www.rabobank.nl/

1

arX

iv:2

109.

1070

3v1

[cs

.SI]

22

Sep

2021

Page 2: The Banking Transactions Dataset and its Comparative ...

company. This dataset consists of bank accounts and transac-tions between them. For any pairs of accounts with one or moretransactions, we also collected more information, includingthe numbers of transactions between two accounts and thetotal amount of money transferred from one account to anotherover a period of 11 years from 2010 to 2020. Thus, it canbe organized into a transaction network, either unweighted orweighted. The data was shared for 1,624,030 bank accountsand 4127043 transactions based on (from account, to account)pair.

We create the network from this data, having 1,624,030 nodeswhich are accounts, and 3,823,167 edges which represent thatthe respective users performed one or more transactions. Next,we identify the weakly connected components in the network,and it contains 723 connected components, where the largestweakly connected component contains 1,622,173 nodes and3,821,514 edges. The information of connected componentsis: 1 component has 1,622,172 nodes, 3 components have 27,15, and 13 nodes, and the rest of the components have lessthan ten nodes. The weakly largest connected component willbe referred to as G(V,E) where V is the set of node and Eis the set of edges. In our analysis, we mainly consider theweakly largest connected component (WLCC) if not mentionedotherwise.

Considering the transaction information, we create twoweighted networks (i) edge-weight is the total amount of moneytransferred between two accounts, and (ii) edge-weight is thetotal number of transactions between two accounts. The firstnetwork will be referred to as GT , and the second will bereferred to as GN . If there are more than one rows for a(from account, to account) pair in the banking data, then theweights are added to compute the edge-weight in GT and GN

networks.Newman categorized the real-world scale-free networks

into four main categories: (i) Biological, (ii) Information, (iii)Technological, and (iv) Social [8]. The banking transactionnetwork falls under the category of Information Network.

III. BASIC NETWORK CHARACTERISTICS

We show a small subgraph of the network in Fig. 1 that issampled using snowball sampling [9]. The sample has beenextracted from the network by picking a source node at random(shown with black color) and including all nodes in the samplethat are within a topological distance of l = 3 from the sourcenode. The node size and color are scaled based on the totaldegree (in-degree + out-degree) of the node. It appears fromthis figure that the network consists of a few hubs that arewell connected with each other and small clusters of nodes aspreviously observed in other scale-free networks [2], [10].

To further understand network structure, we show in Fig. 2the average number of nodes Ns(l) as a function of distance l,where Ns(l) represents the number of nodes within a distancel from a given source node. The plots are shown for severalchoices of the source node using solid colored lines, and theaverage value of Ns(l) for uniformly sampled 0.1% sourcenodes is shown by the black dashed line. The Ns(l) is computed

for a small sample due to the high computational complexity.To a good approximation, we analyze that the curve followsthe Boltzman equation B+ (A−B)/(1+ (x/x0)

p), where A,B, x0, and p are the parameters [11].

We further analyze the macro-scale properties of thenetwork. In Fig. 3, we show the cumulative in- and out-degree distribution of the weakly largest connected component(WLCC). The cumulative degree distribution P>(k) is definedas P>(k) =

∫∞kp(x)dx, where p(x) is the degree probability

density function. The cumulative degree distribution P>(k) forthe WLCC is approximated well by the power-law equation ofthe form P (k) = c∗(k)(−γ), having the exponent γ ≈ 1.43 and1.13 for the cumulative in-degree and out-degree distribution,respectively. Similar to the communication network [2], thisnetwork also has a rapidly decaying degree distribution wherehubs are few, and therefore this network follows the propertiesof traditional scale-free networks such as anomalous diffusion.

For a better understanding of the network, we plot theout-degree versus in-degree in Fig. 4; the plot shows thatthe out-degree is not correlated with in-degree and has theSpearman correlation coefficient −0.15. It indicates that thereis no correlation between in-degree and out-degree. Thatis reasonable because most accounts that belong to regularcustomers have more transfer-out operations, e.g., purchasingproducts, than transfer-in operations, e.g., receiving salaries.

By taking into consideration the transaction information,i.e., times of transactions and amount of money, we constructtwo weighted networks, GT , and GN . The cumulative edge-weight distributions are shown in Fig. 5. We also analyze thecorrelation of edge-weights in both the networks that denote thedependency of the total transferred amount with the number oftransactions. In specific, we show the correlation for randomlysampled 10000 edges in Fig. 6, and observe that the Spearmancorrelation coefficient of both types of edge-weights is 0.71.The results show that the users have a high correlation betweenthe total amount transferred and the number of transactions fortransferring that amount over time.

The weights on edges contain richer information of trans-actions, so we compute the strength where the in-strengthsin(i) and out-strength sout(i) of a node i is the sum ofweights of incoming edges and out-going edges, respectively.For example, the in-strength of a node i is computed as,sin(i) =

∑(j,i)∈E wji, where wji is the weight of an edge

from node j to node i. We investigate the cumulative in- andout-strength distributions of both weighted networks, and theplots are shown in Fig. 7. As expected, both distributionsfollow the power law. The distributions of in-strength and out-strength are similar because in the process of data collection,we filter out special accounts such as gas stations that havemuch higher in-degrees than out-degrees. In Fig. 8, we showthe out-strength versus in-strength for GT and GN networkshaving the Spearman correlation −0.08 and −0.09, respectively.We observe no correlation between out-strength and in-strengthas expected, as most outgoing transactions are made to buybasic necessities for daily life, including grocery shopping orgas station payments, and these transactions have already been

2

Page 3: The Banking Transactions Dataset and its Comparative ...

Fig. 1: A sample subgraph extracted from the black node (chosen uniformly at random) till distance l = 3.

Fig. 2: Average Number of nodes (Ns(l)) at distance l as afunction of distance (l) for uniformly sampled 0.1% sourcenodes (dashed black lines) and the number of nodes at distancel for some random nodes (solid thin lines).

removed from the considered data. However, further detailedanalysis will help in highlight the fraction of expenses fordifferent types; this is out of scope for this work due to theunavailability of this data.

We further analyze the average In-strength vs. In-degreeand average out-strength vs. out-degree for understanding the

Fig. 3: Cumulative In-degree and out-degree distribution.

correlation of average edge-weights from lower to higher degreenodes. The results are shown in Fig. 9. We observe that theaverage in/out-strength increases with in/out-degree till a certainrange (∼ 100), and after that, no clear correlation is observedin GT network. The reason is that a user may have incomingor outgoing transactions with many users, but the transactionswith those users might be performed for a small amount. Forexample, a user providing a paying-guest facility may havemany incoming transactions, but each user will transfer a

3

Page 4: The Banking Transactions Dataset and its Comparative ...

Fig. 4: Out-degree versus In-degree for WLCC.

Fig. 5: Cumulative edge-weight distribution for GT and GN .

Fig. 6: Correlation of edge-weights for randomly sampled10000 edges in two networks GT and GN . The two weightsare clearly correlated having Spearman’s coefficient 0.71.

Fig. 7: Cumulative In-strength and Out-strength distribution.

Fig. 8: Out-strength versus In-strength for GT network in redcolor and for GN network in blue color.

(i) GT

(ii) GN

Fig. 9: Average In-strength vs. In-degree and average out-strength vs. out-degree.

small amount as it is rare that a person stays at one placequite often. Due to different types of accounts having a higherdegree, no clear correlation is observed for the higher degreenodes. However, the average in/out-strength increases within/out-degree in GN network and has a higher correlation.

A. Neighborhood Analysis

For further understanding, we analyze the characteristicsof the local neighborhood of nodes. We first compute theassortativity of degree and strength of the network that measuresthe similarity of connections in the network with respectto the node degree or strength, respectively [12], [13]. Theassortativity values are mentioned in Table I, and neither

4

Page 5: The Banking Transactions Dataset and its Comparative ...

Assortativity ValueIn-degree Assortativity -0.018Out-degree Assortativity -0.011In-strength Assortativity GT -0.004Out-strength Assortativity GT -0.005In-strength Assortativity GN -0.005Out-strength Assortativity GN -0.003

TABLE I: In and Out- Degree and Strength Assortativity valuesfor different networks.

Fig. 10: Average neighbor In-degree versus In-degree andaverage neighbor out-degree versus out-degree.

degree or strength assortativity nor dis-assortativity is observed.These results are not in correlation with the assortativityobserved in other scale-free networks, such as phone call basedcommunication network [2] or the Internet network [14].

We plot the average neighbor In-degree versus In-degree⟨kinnn|kin

⟩and average neighbor out-degree versus out-degree

〈koutnn |kout〉 in Fig. 10, and observe that the most of the usershave a high average neighbor degree irrespective of in/out-degree of a user. The average neighbor In-strength versus In-strength

⟨sinnn|sin

⟩and average neighbor out-strength versus

out-strength 〈soutnn |sout〉 for GT and GN networks are shownin Fig. 11 and no correlation is observed due to the basicnetwork characteristics (as explained for Fig. 9).

We compute the clustering coefficient of nodes that is theratio of the number of closed triads in the neighborhood of anode with the total possible triangles. For unweighted networks,the clustering coefficient of a node i is computed as [15],

CCi =1

ktoti (ktoti − 1)− 2kriTu (1)

where ktot is the sum of in-degree and out-degree of node i,kri is the reciprocal degree of node i, and Tu is the number ofdirected triangles through node i. For weighted networks, theclustering coefficient is computed as [16],

CCi =1

ktoti (ktoti − 1)− 2kri

∑t∈Tdir

ijk

(wt)1/3 (2)

where T dirijk is the set of all directed triangle including node i,and wt for a triangle is computed as wijwjkwki if the trianglehas edges (i, j), (j, k), and (k, i) edges, where wij is thenormalized edge-weight by that is computed by dividing theedge-weight wij by the maximum weight in the network.

(i) GT

(ii) GN

Fig. 11: Average Neighbor In-strength versus In-strength andaverage neighbor out-strength versus out-strength.

The average clustering coefficient versus degree is plotted inFig. 12. The clustering coefficient decreases with the totaldegree of nodes. It shows that when a user makes moretransactions with different accounts, it is less probable thatthe neighboring accounts will make a transaction with eachother. The clustering coefficient follows the same patternas observed in other large-scale scale-free unweighted andweighted networks [2], [17].

In Table II, we summarize the characteristics of the con-sidered networks that include the minimum, maximum, mean,standard deviation, skewness2, and Kurtosis3 values of variousbasic network properties.

IV. MESO-SCALE NETWORK CHARACTERISTICS

In this section, we analyze the meso-scale characteristicsof the network. The macro-scale characteristics focus on theproperties of the network, and micro-scale characteristicsanalyze the properties of the nodes; however, meso-scalecharacteristics mainly focus on the structure of the network.For example, how the nodes are connected, the evolution wasrandom or regulated by any evolving phenomenon, and so on.

First to analyze the local structure of the network, wecompare the presence of cliques of different orders as these are

2The skewness measures the asymmetry of the given probability distributionabout its mean.

3Kurtosis computes how heavily the tail of the distribution differs from thetail of a normal distribution.

5

Page 6: The Banking Transactions Dataset and its Comparative ...

Min Max Mean Std Skewness Kurtosis

GIn-degree 0 2.04× 104 2.36 32.66 2.17× 102 1.05× 105

Out-degree 0 1.27× 104 2.36 20.26 3.35× 102 1.84× 105

Clustering Coefficient 0 1 0.02 0.12 7.29 53.95

GT

In-strength 0 1.57× 1010 1.54× 105 1.34× 107 1.01× 103 1.18× 106

Out-strength 0 4.04× 109 1.54× 105 6.53× 106 4.71× 102 2.61× 105

Edge-weight 1 3.35× 109 6.56× 104 2.75× 106 8.99× 102 9.43× 105

Weighted Clustering Coefficient 0 2.72× 10−3 2.31× 10−7 7.30× 10−6 1.62× 102 4.22× 104

GN

In-strength 0 5.04× 106 68.16 4.08× 103 1.17× 103 1.44× 106

Out-strength 0 3.11× 106 68.16 2.94× 103 8.19× 102 8.01× 105

Edge-weight 1 2.58× 104 28.93 1.19× 102 34.02 3.24× 103

Weighted Clustering Coefficient 0 5.47× 10−2 1.41× 10−5 1.83× 10−4 71.15 1.24× 104

TABLE II: Summary of descriptive network statistics.

Fig. 12: Average clustering coefficient versus total degree.

considered the basic building blocks and moderate differentfunctionalities of the network [18], [19]. We compare thecliques of order k = 1, 2, ..., 8 with random networks in whichthe probability to have a connection is the same for all pairs ofnode. The random networks are created using Erdos-Renyi (ER)network generative model [20]. The link formation probabilityin the ER graph is computed as, p = 2m/(n(n− 1)), and theexpected number E[X] of cliques with k nodes is computed

as E[X] =

(nk

)(k!/a) pl, where l = k(k − 1)/2 and a = k!

is the number of automorphic graphs, those are isomorphic tonode-edge order preserved permutation of the graph [21]. Thecliques are counted by considering the undirected version ofthe network, and the results are shown in Table III. Note thatk = 1 corresponds to the number of nodes n = 1622173 andk = 2 to the number of edges m = 3318903, which are thesame in the Rabobank and random network. The presence ofhigh-order cliques (cliques beyond order three) in the Rabobanknetwork makes it starkly different from a random network thathas a very low probability of high-order cliques being present.

We also compare the reciprocity [22] of the network withrandom networks. The reciprocity of a directed network isdefined as the ratio of the number of edges pointing in bothdirections to the total number of edges in the network; itis computed as, r = |(i,j)∈G|(j,i)∈G|

|E| . The expected value ofreciprocity in an Erdos-Renyi directed network is computedas E[r] = m

n(n−1) , where n is the number of nodes and m isthe number of edges. The network G has reciprocity 0.26 thatis much higher than the expected value of the reciprocity in a

Order Empirical Count ER Expectation1 1622173 16221732 3318903 33189033 417461 11.424 34638 7.43× 10−11

5 3815 9.76× 10−28

6 417 2.70× 10−50

7 26 1.61× 10−78

8 0 1.12× 10−112

TABLE III: Number of cliques of order k = 1, 2, ..., 8in the undirected Rabobank network (empirical count) andtheir expectation values in a corresponding ER network (ERexpectation).

random network that is 1.05× 10−6.To further understand how the users are connected in a

group and how these groups are further connected with eachother, we analyze the community structure of the network.For the Rabobank data, the information of ground truthcommunities is not known. However, the unavailability of theground truth data is a well-known issue in analyzing networkdata, and the researchers have proposed several methods toidentify communities using the structural properties of thenetwork. We use the Leiden community detection method[23] that is an extension of the Louvain community detectionmethod to identify well-connected communities in directedweighted/unweighted networks. The Leiden algorithm startsfrom a singleton partition and executes in three phases (i) movenodes from one community to another to identify a partition,(ii) refinement of the partition, and (iii) create an aggregatednetwork based on the refined partition using the non-refinedpartition. The algorithm is applied iteratively until there is nofurther improvement. The implementation for the communitydetection and their evaluation is used provided by [24].

The distribution of the number of communities versuscommunity size is shown in Fig. 13. We observe that thereare several small-size communities and a few big-size com-munities as observed in several scale-free networks [25]. Thecommunities for network G are linearly binned with bucketsize 100. In Table IV, we discuss the properties of the observedcommunities. The first row tells the number of communities;next, we discuss the min, max, mean and std. of communitysize. The communities are further evaluated using (i) ModularDensity [26] that explains the expected density of community,

6

Page 7: The Banking Transactions Dataset and its Comparative ...

Fig. 13: No. of Communities versus community size.

(ii) Hub Dominance [27] that defines the ratio of the degreeof the highest connected node with respect to the theoreticallymaximal degree within the community, (iii) Conductance [28]that computes the fraction of total edges that point outside thecommunity, and (iv) Normalized-cut [28] that is the normalizedvalue of the fraction of existing edges (out of all possibleedges) leaving the community. The results show that the banktransaction networks contain community structure as observedin other networks, such as social networks, co-authorshipnetworks, financial networks, and so on [25]. However, thelimitation of this analysis is that it is entirely based on theLeiden community detection method due to the unavailabilityof the ground-truth data.

Apart from community structure, we analyze the core-periphery structure of the network that is another well-known meso-scale network structure [29], [30]. In a bankingtransaction network, the core-periphery network will provideinsights into the flow of money from peripheral to more centralnodes. The core-Periphery structure is analyzed using k-shelldecomposition algorithm [31]. In the k-shell method, first, weremove all nodes of degree (in-degree + out-degree) 1 untilthere is no node of degree one and assign them shell-index(kS) 1. Then iteratively, we remove nodes of degree 2, 3, 4, ...until all nodes have a shell-index value. While removing nodesof degree k if any node having degree less than k appears,it is also removed in the same iteration and is assigned shellindex ks = k. In Fig. 14, we show the distribution of the nodesas we move from the outer peripheral layer to the inner corelayer. The inner layers have a very small number of nodes.

Metric/Network G GT GN

#Communities 219 1434 685Size-Min 3 2 2Size-Max 129545 173698 200910Size-Mean 7407.18 1131.22 2368.14Size-Std. 17927.49 7456.09 12261.84Modularity Density 418.99 1557.24 998.55Hub Dominance 0.86 1.22 1.26Conductance 0.10 0.22 0.17Normalized-cut 0.10 0.22 0.17

TABLE IV: Characteristics of communities identified usingLeiden Community Detection method.

Fig. 14: Distribution of nodes from periphery to core.

In network G, the highest shell-index is 46 shared by 2,871nodes that is a very small fraction of all nodes. The resultsare similar to as observed in other networks where only a fewnodes acquire the central position in a network [30].

The structural analysis of the network using k-clique,reciprocity, community, and core-periphery structure, showsthat the network is not random and follows the structuralproperties observed in other real-world large-scale scale-freenetworks [2], [32], [33].

V. CHARACTERISTICS OF NODES AND EDGES

We briefly analyze the characteristics of nodes and edgesusing centrality measures [34] to observe their behavior incorrelation with the scale-free network structure. We firstanalyze the betweenness centrality of edges to study therole of a transaction in the money flow. The betweennesscentrality of an edge is computed based on the number ofshortest paths passing through that edge [35]. The algorithm iscomputationally costly, we therefore u.a.r. sample a subgraphof 10,000 nodes having 59832 edges and show the cumulativedistribution of the link betweenness centrality in Fig. 15. Theresults show that it is very less probable to have a highbetweenness centrality, confirming that most of the transactionshappen within groups and a few transactions happen with therest of the network.

Fig. 15: The cumulative edge betweenness centrality for thesampled graph having 10k nodes.

7

Page 8: The Banking Transactions Dataset and its Comparative ...

Fig. 16: The cumulative PageRank distribution on network G.

To analyze the importance of nodes, we compute PageRankof network G, and its cumulative distribution is shown inFig. 16. The network has a few nodes having very highPageRank and many nodes having lower PageRank value,as also observed in other scale-free networks [36], [10].The experimental analysis of link betweenness centralityand PageRank concludes that a few transactions and a fewnodes have much higher importance than others in the banktransaction network. The one limitation of this analysis is thatthe comparative study of nodes’ and edges’ characteristicswith their attributes couldn’t be performed due to the limitedinformation available in anonymized dataset.

VI. TRANSACTION NETWORK VS. SCALE-FREE NETWORKS

The banking transaction network follows the characteristicsof scale-free networks; still, it has some clear similaritiesand differences with other scale-free real-world networks. Thedegree distribution, strength distribution, edge-weight followspower law as observed in other social, information, biologicalor technological networks [10]; however, there is no correlationbetween in-degree vs. out-degree, and in-strength vs. out-strength as observed in other Information networks, suchas WWW [37]. In social and Information (transportation)networks, the strength of nodes increases with degree [2],[38]; however, the banking transaction network has differentpattern due to the nature of the network evolution (pleaserefer to Section III). These basic characteristics show thatneighborhood connectivity is different from other kinds ofnetworks. The network is neither assortative nor disassortative,as we observe in most of the other scale-free networks includingtrade and finance networks [12], [17]; therefore, no correlationwith the neighborhood nodes’ degree is observed.

The evolution of the network is not random and regulatedby an underlying evolving mechanism. The network hascommunity, and hierarchical structure [25], [30] as observed inother scale-free networks as shown in Section IV. One anothermain difference that we observed is the correlation of weak tieswith edge-weights. In social networks, the weak ties, or alsoreferred to as bridges, are the connections between communitiesand therefore have a lower edge-weight as the social interactionbetween people belonging to two different communities is notvery strong [2]. However, in the banking transaction network,

no such correlation is observed. The transactions that happenedbetween users belonging to different communities might havelower or higher strength based on the transferred amountand the total number of transactions. The evolving modelfor banking transaction networks is still an open question, andthe above statistical observations mentioning the similaritiesand differences will help pin down the underlying evolvingmechanism.

VII. CONCLUSION

In this work, we constructed an unweighted network frombank transaction records of Rabobank and two weightednetworks where edge-weights are assigned using the aggregatedamount of transactions and the total number of transactionsfrom 2010 to 2020. The network is useful in understandingthe flow of money in society as it is derived from one-to-onemoney transactions. To our knowledge, it is the first intra-bank transaction network studied so far and will be the firstbank transaction dataset that will be publicly available. One ofour aims was to explore the relationship between the networktopology and the money flow based on the associated weights.The analysis of k-cliques and reciprocity shows that the networkevolution is not random. To further analyze, we study thenetwork’s community structure that shows that the users areorganized in the smaller groups making more transactionsbetween them and fewer transactions outside the group.

We believe that this bank transaction network and ouranalysis of the network’s characteristics can help solve somepractical tasks in the financial domain. For instance, in anomalydetection, the deviation of anomalous users’ transaction behav-ior from structural characteristics of normal users might help inthe early detection of suspicious and anomalous users. One canfurther analyze the mutual bank transaction network havingconnections if both end-users make transactions to each otheras it will work as a good proxy to understand the underlyingtrust-based social network and its impact on the money flow.

REFERENCES

[1] Alain Barrat, Marc Barthelemy, and Alessandro Vespignani. Dynamicalprocesses on complex networks. Cambridge university press, 2008.

[2] Jukka-Pekka Onnela, Jari Saramaki, Jorkki Hyvonen, Gabor Szabo,M Argollo De Menezes, Kimmo Kaski, Albert-Laszlo Barabasi, andJanos Kertesz. Analysis of a large-scale weighted network of one-to-onehuman communication. New journal of physics, 9(6):179, 2007.

[3] James Atwood, Hansa Srinivasan, Yoni Halpern, and David Sculley. Fairtreatment allocations in social networks. arXiv preprint arXiv:1911.05489,2019.

[4] Ryan Miller, Ralucca Gera, Akrati Saxena, and Tanmoy Chakraborty.Discovering and leveraging communities in dark multi-layered networksfor network disruption. In 2018 IEEE/ACM International Conference onAdvances in Social Networks Analysis and Mining (ASONAM), pages1152–1159. IEEE, 2018.

[5] Edwin K Silverman, Harald HHW Schmidt, Eleni Anastasiadou, LuciaAltucci, Marco Angelini, Lina Badimon, Jean-Luc Balligand, GiudittaBenincasa, Giovambattista Capasso, Federica Conte, et al. Molecularnetworks in network medicine: Development and applications. WileyInterdisciplinary Reviews: Systems Biology and Medicine, 12(6):e1489,2020.

[6] Edgar Lopez-Rojas, Ahmad Elmir, and Stefan Axelsson. Paysim: Afinancial mobile money simulator for fraud detection. In 28th EuropeanModeling and Simulation Symposium, EMSS, Larnaca, pages 249–255.Dime University of Genoa, 2016.

8

Page 9: The Banking Transactions Dataset and its Comparative ...

[7] Yulong Pei, Fang Lyu, Werner van Ipenburg, and Mykola Pechenizkiy.Subgraph anomaly detection in financial transaction networks. In ACMInternational Conference on AI in Finance, 2020.

[8] Mark EJ Newman. The structure and function of complex networks.SIAM review, 45(2):167–256, 2003.

[9] Sang Hoon Lee, Pan-Jun Kim, and Hawoong Jeong. Statistical propertiesof sampled networks. Physical review E, 73(1):016102, 2006.

[10] Mark Ed Newman, Albert-Laszlo Ed Barabasi, and Duncan J Watts. Thestructure and dynamics of networks. Princeton university press, 2006.

[11] Carlo Cercignani. The boltzmann equation. In The Boltzmann equationand its applications, pages 40–103. Springer, 1988.

[12] Mark EJ Newman. Assortative mixing in networks. Physical reviewletters, 89(20):208701, 2002.

[13] Mark EJ Newman. Mixing patterns in networks. Physical review E,67(2):026126, 2003.

[14] Romualdo Pastor-Satorras, Alexei Vazquez, and Alessandro Vespignani.Dynamical and correlation properties of the internet. Physical reviewletters, 87(25):258701, 2001.

[15] Giorgio Fagiolo. Clustering in complex directed networks. PhysicalReview E, 76(2):026107, 2007.

[16] Jari Saramaki, Mikko Kivela, Jukka-Pekka Onnela, Kimmo Kaski, andJanos Kertesz. Generalizations of the clustering coefficient to weightedcomplex networks. Physical Review E, 75(2):027105, 2007.

[17] Stefano Schiavo, Javier Reyes, and Giorgio Fagiolo. International tradeand financial integration: a weighted network analysis. QuantitativeFinance, 10(4):389–399, 2010.

[18] Ron Milo, Shai Shen-Orr, Shalev Itzkovitz, Nadav Kashtan, DmitriChklovskii, and Uri Alon. Network motifs: simple building blocks ofcomplex networks. Science, 298(5594):824–827, 2002.

[19] Nadav Kashtan, S Itzkovitz, R Milo, and U Alon. Topologicalgeneralizations of network motifs. Physical Review E, 70(3):031909,2004.

[20] Paul Erdos, Alfred Renyi, et al. On the evolution of random graphs.Publ. Math. Inst. Hung. Acad. Sci, 5(1):17–60, 1960.

[21] Bela Bollobas and Bollobas Bela. Random graphs. Number 73.Cambridge university press, 2001.

[22] Franco Ruzzenenti, Diego Garlaschelli, and Riccardo Basosi. Complexnetworks and symmetry ii: Reciprocity and evolution of world trade.Symmetry, 2(3):1710–1744, 2010.

[23] Vincent A Traag, Ludo Waltman, and Nees Jan Van Eck. From louvainto leiden: guaranteeing well-connected communities. Scientific reports,9(1):1–12, 2019.

[24] Giulio Rossetti, Letizia Milli, and Remy Cazabet. Cdlib: a python libraryto extract, compare and evaluate communities from complex networks.Applied Network Science, 4(1):1–26, 2019.

[25] Alex Arenas, Leon Danon, Albert Diaz-Guilera, Pablo M Gleiser, andRoger Guimera. Community analysis in social networks. The EuropeanPhysical Journal B, 38(2):373–380, 2004.

[26] Shihua Zhang, Xue-Mei Ning, Chris Ding, and Xiang-Sun Zhang.Determining modular organization of protein interaction networks bymaximizing modularity density. In BMC systems biology, volume 4,pages 1–12. BioMed Central, 2010.

[27] Andrea Lancichinetti, Mikko Kivela, Jari Saramaki, and Santo Fortunato.Characterizing the community structure of complex networks. PloS one,5(8):e11976, 2010.

[28] Jianbo Shi and Jitendra Malik. Normalized cuts and image segmenta-tion. IEEE Transactions on pattern analysis and machine intelligence,22(8):888–905, 2000.

[29] Stephen P Borgatti and Martin G Everett. Models of core/peripherystructures. Social networks, 21(4):375–395, 2000.

[30] Akrati Saxena and SRS Iyengar. Evolving models for meso-scalestructures. In 2016 8th International Conference on CommunicationSystems and Networks (COMSNETS), pages 1–8. IEEE, 2016.

[31] Stephen B Seidman. Network structure and minimum degree. Socialnetworks, 5(3):269–287, 1983.

[32] Yihjia Tsai, Cheng-Chin Lin, and Ching-Chang Lin. Characteristics ofweighted email communications. In 2005 5th International Conferenceon Information Communications & Signal Processing, pages 399–403.IEEE, 2005.

[33] Akrati Saxena. A survey of evolving models for weighted complexnetworks based on their dynamics and evolution. arXiv preprintarXiv:2012.08166, 2020.

[34] Akrati Saxena and Sudarshan Iyengar. Centrality measures in complexnetworks: A survey. arXiv preprint arXiv:2011.07190, 2020.

[35] Ulrik Brandes. On variants of shortest-path betweenness centrality andtheir generic computation. Social Networks, 30(2):136–145, 2008.

[36] M Zubair Shafiq, Muhammad U Ilyas, Alex X Liu, and Hayder Radha.Identifying leaders and followers in online social networks. IEEE Journalon Selected Areas in Communications, 31(9):618–628, 2013.

[37] Yihong Hu and Daoli Zhu. Empirical analysis of the worldwidemaritime transportation network. Physica A: Statistical Mechanics andits Applications, 388(10):2061–2071, 2009.

[38] Alain Barrat, Marc Barthelemy, and Alessandro Vespignani. Modelingthe evolution of weighted networks. Physical review E, 70(6):066149,2004.

9