Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called...
Transcript of Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called...
![Page 1: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of](https://reader034.fdocuments.in/reader034/viewer/2022042403/5f15b43d6395d766551a6205/html5/thumbnails/1.jpg)
Adrianna Diaz Siddhartha Kattoju
Exploring the Blockchain
![Page 2: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of](https://reader034.fdocuments.in/reader034/viewer/2022042403/5f15b43d6395d766551a6205/html5/thumbnails/2.jpg)
The Ethereum blockchain
In February 1 Ether ~ $800 USD Now 1 Ether ~ $400 USD
● Decentralized● Global distributed ledger● Programmable via smart contracts● Each state change requires a
cryptographically verified transaction.
![Page 3: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of](https://reader034.fdocuments.in/reader034/viewer/2022042403/5f15b43d6395d766551a6205/html5/thumbnails/3.jpg)
Data Preparation
● Blockchain data is stored on each node as merkle trees in level db files● Data can be queried using the ethereum client API using a REST interface
○ We expected this would be slower than reading the data from disk
● We downloaded an ethereum client called geth and synced the block chain from the network.
○ ~5 million blocks, thousands of transactions per block○ ~27,800 files, ~2.06 MB each
● Blockchain data was then exported into RPL encoded binary files.● We prepared binary files containing increasing larger number of blocks to
aid in developing our code○ 100, 200, 500, 1k, 2k, 5k, 10k, 20k, 50k, 100k, 200k ...
![Page 4: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of](https://reader034.fdocuments.in/reader034/viewer/2022042403/5f15b43d6395d766551a6205/html5/thumbnails/4.jpg)
Dataframe
![Page 5: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of](https://reader034.fdocuments.in/reader034/viewer/2022042403/5f15b43d6395d766551a6205/html5/thumbnails/5.jpg)
timestamp: long (nullable = false)
number: decimal(38,0) (nullable = true)
value: decimal(38,0) (nullable = true)
receiveAddress: binary (nullable = true)
sendAddress: binary (nullable = true)
hash: binary (nullable = true)
![Page 6: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of](https://reader034.fdocuments.in/reader034/viewer/2022042403/5f15b43d6395d766551a6205/html5/thumbnails/6.jpg)
Issues encounteredInteger Overflow
● Basic unit is Wei 1018 Wei = 1 Ether ● Transaction value as Java Long: 64 bits, signed… -2**63 to 2**63 -1● Internal representation 256 bit integer number
Solution:
● Replace longs with Big Decimal in the library used to read the blockchain data. We contacted the developers and they pushed a fix…
![Page 7: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of](https://reader034.fdocuments.in/reader034/viewer/2022042403/5f15b43d6395d766551a6205/html5/thumbnails/7.jpg)
Issues encountered (2)Bouncy Castle
● Some fields had to be computed using the crypto library bouncy castle. Spark 2.2 had an older version of this library which was conflicting with the one used in the hadoop crypto ledger library
Solutions:
● Create a “shaded jar” that included the classes needed under a different package name.
● Migrate to Spark 2.3 but Compute Canada didn’t have it by default until mid-March. (2.3 was released on February 28)
![Page 8: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of](https://reader034.fdocuments.in/reader034/viewer/2022042403/5f15b43d6395d766551a6205/html5/thumbnails/8.jpg)
Issues encountered (3)Intermittent issues with Slurm
● We ran a number of small-ish jobs ~ 1 hr 2k blocks, they were timing out. ● Ultimately we found out that the scheduler was sometimes down or too
busy to serve our requests on the weekends.
Solution:
● Wait till it became available again.
![Page 9: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of](https://reader034.fdocuments.in/reader034/viewer/2022042403/5f15b43d6395d766551a6205/html5/thumbnails/9.jpg)
Graph Analysis
Spark GraphFrames package
● GraphFrame consists of vertices and edges● Vertexes are ethereum account addresses● Edges are pairs of addresses in each transaction.
Algorithms applied
● Connected Components● Page Rank
![Page 10: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of](https://reader034.fdocuments.in/reader034/viewer/2022042403/5f15b43d6395d766551a6205/html5/thumbnails/10.jpg)
Connected Components among Transactions larger than 100 ETH
![Page 11: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of](https://reader034.fdocuments.in/reader034/viewer/2022042403/5f15b43d6395d766551a6205/html5/thumbnails/11.jpg)
The Highest Page Rank in the Largest Connected Component was a Cryptocurrency Exchange
![Page 12: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of](https://reader034.fdocuments.in/reader034/viewer/2022042403/5f15b43d6395d766551a6205/html5/thumbnails/12.jpg)
Connected Components 0 ETH Transactions
![Page 13: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of](https://reader034.fdocuments.in/reader034/viewer/2022042403/5f15b43d6395d766551a6205/html5/thumbnails/13.jpg)
The Highest Page Rank in the Largest Connected Component was an EOS Token Contract
![Page 14: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of](https://reader034.fdocuments.in/reader034/viewer/2022042403/5f15b43d6395d766551a6205/html5/thumbnails/14.jpg)
Decreasing Running TimeBOTTLENECK
● We noticed that a significant amount of processing time was spent reading from the binary file and populating our dataframe
SOLUTION:
● We decided to preprocess the data and write the relevant fields to a csv file in a separate job.
● This resulted in about a 6x improvement in the time it took to run our algorithms.
![Page 15: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of](https://reader034.fdocuments.in/reader034/viewer/2022042403/5f15b43d6395d766551a6205/html5/thumbnails/15.jpg)
![Page 16: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of](https://reader034.fdocuments.in/reader034/viewer/2022042403/5f15b43d6395d766551a6205/html5/thumbnails/16.jpg)
Performance seems linear
![Page 17: Exploring the Blockchain - GitHub Pages · 2020-01-06 · We downloaded an ethereum client called geth and synced the block chain from the network. ~5 million blocks, thousands of](https://reader034.fdocuments.in/reader034/viewer/2022042403/5f15b43d6395d766551a6205/html5/thumbnails/17.jpg)
Next Steps● Apply Page Rank and Connected Components Algorithms to a larger
sample of the dataset● If possible process the entire ~56G dataset● Explore changes over time