Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called...

Post on 30-Jun-2020

0 views 0 download

Transcript of Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called...

Adrianna Diaz Siddhartha Kattoju

Exploring the Blockchain

The Ethereum blockchain

In February 1 Ether ~ $800 USD Now 1 Ether ~ $400 USD

● Decentralized● Global distributed ledger● Programmable via smart contracts● Each state change requires a

cryptographically verified transaction.

Data Preparation

● Blockchain data is stored on each node as merkle trees in level db files● Data can be queried using the ethereum client API using a REST interface

○ We expected this would be slower than reading the data from disk

● We downloaded an ethereum client called geth and synced the block chain from the network.

○ ~5 million blocks, thousands of transactions per block○ ~27,800 files, ~2.06 MB each

● Blockchain data was then exported into RPL encoded binary files.● We prepared binary files containing increasing larger number of blocks to

aid in developing our code○ 100, 200, 500, 1k, 2k, 5k, 10k, 20k, 50k, 100k, 200k ...

Dataframe

timestamp: long (nullable = false)

number: decimal(38,0) (nullable = true)

value: decimal(38,0) (nullable = true)

receiveAddress: binary (nullable = true)

sendAddress: binary (nullable = true)

hash: binary (nullable = true)

Issues encounteredInteger Overflow

● Basic unit is Wei 1018 Wei = 1 Ether ● Transaction value as Java Long: 64 bits, signed… -2**63 to 2**63 -1● Internal representation 256 bit integer number

Solution:

● Replace longs with Big Decimal in the library used to read the blockchain data. We contacted the developers and they pushed a fix…

Issues encountered (2)Bouncy Castle

● Some fields had to be computed using the crypto library bouncy castle. Spark 2.2 had an older version of this library which was conflicting with the one used in the hadoop crypto ledger library

Solutions:

● Create a “shaded jar” that included the classes needed under a different package name.

● Migrate to Spark 2.3 but Compute Canada didn’t have it by default until mid-March. (2.3 was released on February 28)

Issues encountered (3)Intermittent issues with Slurm

● We ran a number of small-ish jobs ~ 1 hr 2k blocks, they were timing out. ● Ultimately we found out that the scheduler was sometimes down or too

busy to serve our requests on the weekends.

Solution:

● Wait till it became available again.

Graph Analysis

Spark GraphFrames package

● GraphFrame consists of vertices and edges● Vertexes are ethereum account addresses● Edges are pairs of addresses in each transaction.

Algorithms applied

● Connected Components● Page Rank

Connected Components among Transactions larger than 100 ETH

The Highest Page Rank in the Largest Connected Component was a Cryptocurrency Exchange

Connected Components 0 ETH Transactions

The Highest Page Rank in the Largest Connected Component was an EOS Token Contract

Decreasing Running TimeBOTTLENECK

● We noticed that a significant amount of processing time was spent reading from the binary file and populating our dataframe

SOLUTION:

● We decided to preprocess the data and write the relevant fields to a csv file in a separate job.

● This resulted in about a 6x improvement in the time it took to run our algorithms.

Performance seems linear

Next Steps● Apply Page Rank and Connected Components Algorithms to a larger

sample of the dataset● If possible process the entire ~56G dataset● Explore changes over time