Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called...

17
Adrianna Diaz Siddhartha Kattoju Exploring the Blockchain

Transcript of Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called...

Page 1: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of

Adrianna Diaz Siddhartha Kattoju

Exploring the Blockchain

Page 2: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of

The Ethereum blockchain

In February 1 Ether ~ $800 USD Now 1 Ether ~ $400 USD

● Decentralized● Global distributed ledger● Programmable via smart contracts● Each state change requires a

cryptographically verified transaction.

Page 3: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of

Data Preparation

● Blockchain data is stored on each node as merkle trees in level db files● Data can be queried using the ethereum client API using a REST interface

○ We expected this would be slower than reading the data from disk

● We downloaded an ethereum client called geth and synced the block chain from the network.

○ ~5 million blocks, thousands of transactions per block○ ~27,800 files, ~2.06 MB each

● Blockchain data was then exported into RPL encoded binary files.● We prepared binary files containing increasing larger number of blocks to

aid in developing our code○ 100, 200, 500, 1k, 2k, 5k, 10k, 20k, 50k, 100k, 200k ...

Page 4: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of

Dataframe

Page 5: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of

timestamp: long (nullable = false)

number: decimal(38,0) (nullable = true)

value: decimal(38,0) (nullable = true)

receiveAddress: binary (nullable = true)

sendAddress: binary (nullable = true)

hash: binary (nullable = true)

Page 6: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of

Issues encounteredInteger Overflow

● Basic unit is Wei 1018 Wei = 1 Ether ● Transaction value as Java Long: 64 bits, signed… -2**63 to 2**63 -1● Internal representation 256 bit integer number

Solution:

● Replace longs with Big Decimal in the library used to read the blockchain data. We contacted the developers and they pushed a fix…

Page 7: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of

Issues encountered (2)Bouncy Castle

● Some fields had to be computed using the crypto library bouncy castle. Spark 2.2 had an older version of this library which was conflicting with the one used in the hadoop crypto ledger library

Solutions:

● Create a “shaded jar” that included the classes needed under a different package name.

● Migrate to Spark 2.3 but Compute Canada didn’t have it by default until mid-March. (2.3 was released on February 28)

Page 8: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of

Issues encountered (3)Intermittent issues with Slurm

● We ran a number of small-ish jobs ~ 1 hr 2k blocks, they were timing out. ● Ultimately we found out that the scheduler was sometimes down or too

busy to serve our requests on the weekends.

Solution:

● Wait till it became available again.

Page 9: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of

Graph Analysis

Spark GraphFrames package

● GraphFrame consists of vertices and edges● Vertexes are ethereum account addresses● Edges are pairs of addresses in each transaction.

Algorithms applied

● Connected Components● Page Rank

Page 10: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of

Connected Components among Transactions larger than 100 ETH

Page 11: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of

The Highest Page Rank in the Largest Connected Component was a Cryptocurrency Exchange

Page 12: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of

Connected Components 0 ETH Transactions

Page 13: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of

The Highest Page Rank in the Largest Connected Component was an EOS Token Contract

Page 14: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of

Decreasing Running TimeBOTTLENECK

● We noticed that a significant amount of processing time was spent reading from the binary file and populating our dataframe

SOLUTION:

● We decided to preprocess the data and write the relevant fields to a csv file in a separate job.

● This resulted in about a 6x improvement in the time it took to run our algorithms.

Page 15: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of
Page 16: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of

Performance seems linear

Page 17: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of

Next Steps● Apply Page Rank and Connected Components Algorithms to a larger

sample of the dataset● If possible process the entire ~56G dataset● Explore changes over time