Checkpoints
Check pizza query
/ 10
Check dogecoin query
/ 10
transaction hash query
/ 40
Calculate balance query
/ 40
Exploring the Public Cryptocurrency Datasets Available in BigQuery
GSP665
Overview
This lab lets you explore the six cryptocurrency blockchain datasets released publically in BigQuery. This was introduced in the blog post Introducing six new cryptocurrencies in BigQuery Public Datasets—and how to analyze them.
The following is from the blog post, please read to understand the background of the lab:
Since they emerged in 2009, cryptocurrencies have experienced their share of volatility—and are a continual source of fascination. In the past year, as part of the BigQuery Public Datasets program, Google Cloud released datasets consisting of the blockchain transaction history for Bitcoin and Ethereum, to help you better understand cryptocurrency. Today, we're releasing an additional six cryptocurrency blockchains.
We are also including a set of queries and views that map all blockchain datasets to a double-entry book data structure that enables multi-chain meta-analyses, as well as integration with conventional financial record processing systems.
Additional blockchain datasets
The six cryptocurrency blockchain datasets we’re releasing today are Bitcoin Cash, Dash, Dogecoin, Ethereum Classic, Litecoin, and Zcash.
Five of these datasets, along with the previously published Bitcoin dataset now follow a common schema that enables comparative analyses. We are releasing this group of Bitcoin-like datasets (Bitcoin, Bitcoin Cash, Dash, Dogecoin, Litecoin and Zcash) together because they all have similar implementations, i.e., their source code is derived from Bitcoin’s. Similarly, we’re also releasing the Ethereum Classic dataset alongside the previously published Ethereum dataset, and Ethereum Classic is also using the same common schema.
A unified data ingest architecture
All datasets update every 24 hours via a common codebase, the Blockchain ETL ingestion framework (built with Cloud Composer, previously described here), to accommodate a variety of Bitcoin-like cryptocurrencies. While this means higher latency for loading Bitcoin blocks into BigQuery, it also means that:
We are able to ingest additional BigQuery datasets with less effort, meaning additional datasets can be onboarded more quickly in the future. We can implement a low-latency loading solution once that can be used to enable real-time streaming transactions for all blockchains.
Unified schema and views
Since we provided the original Bitcoin dataset last year, we’ve learned how users want to access data, and restructured the dataset accordingly. Some of these changes address performance and convenience concerns, yielding faster and lower cost queries (commonly accessed nested data are denormalized; each table is partitioned by time).
We’ve also included more data, such as script op-codes. Most Bitcoin transactions describe transfers of value not simply as a debit/credit pair, but rather as a series of functions that describe both simple transfers and more complex transactions.
Having these scripts available for Bitcoin-like datasets enables more advanced analyses similar to this smart contract analyzer that Tomasz Kolinko recently built on top of the BigQuery Ethereum dataset. For example, we can now identify and report on patterns of activity involving multi-signature wallets. This is particularly important for analyzing privacy-oriented cryptocurrencies like Zcash.
For analytics interoperability, we designed an unified schema that allows all Bitcoin-like datasets to share queries. To further interoperate with Ethereum and ERC-20 token transactions, we also created some views that abstract the blockchain ledger to be presented as a double-entry accounting ledger.
What you'll do
- Explore and perform a simple SQL query on the BigQuery public cryptocurrency datasets.
- Verify that the BigQuery public crpytocurrency dataset is correct.
- Perform a complex query that calculates the gini coeffienct for Litecoin and Dash, on a week by week basis.
- Use your SQL skills to complete two interesting queries on the bitcoin dataset.
Prerequisites
To get the best out of this lab you should have some:
- Familiarity with cryptocurrencies.
- Ability to write basic SQL statements.
Setup and requirements
Before you click the Start Lab button
Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.
This hands-on lab lets you do the lab activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.
To complete this lab, you need:
- Access to a standard internet browser (Chrome browser recommended).
- Time to complete the lab---remember, once you start, you cannot pause a lab.
How to start your lab and sign in to the Google Cloud console
-
Click the Start Lab button. If you need to pay for the lab, a pop-up opens for you to select your payment method. On the left is the Lab Details panel with the following:
- The Open Google Cloud console button
- Time remaining
- The temporary credentials that you must use for this lab
- Other information, if needed, to step through this lab
-
Click Open Google Cloud console (or right-click and select Open Link in Incognito Window if you are running the Chrome browser).
The lab spins up resources, and then opens another tab that shows the Sign in page.
Tip: Arrange the tabs in separate windows, side-by-side.
Note: If you see the Choose an account dialog, click Use Another Account. -
If necessary, copy the Username below and paste it into the Sign in dialog.
{{{user_0.username | "Username"}}} You can also find the Username in the Lab Details panel.
-
Click Next.
-
Copy the Password below and paste it into the Welcome dialog.
{{{user_0.password | "Password"}}} You can also find the Password in the Lab Details panel.
-
Click Next.
Important: You must use the credentials the lab provides you. Do not use your Google Cloud account credentials. Note: Using your own Google Cloud account for this lab may incur extra charges. -
Click through the subsequent pages:
- Accept the terms and conditions.
- Do not add recovery options or two-factor authentication (because this is a temporary account).
- Do not sign up for free trials.
After a few moments, the Google Cloud console opens in this tab.
Task 1. View the cryptocurrencies in the public dataset
-
Open Navigation Menu > BigQuery.
-
The Welcome to BigQuery in the Cloud Console dailog box opens and click DONE.
-
Click + ADD > Additional sources > Public Datasets.
-
In Search Marketplace, type
bitcoin
and press Enter. -
Click Bitcoin Cash Cryptocurrency Dataset.
-
Click VIEW DATASET.
A new tab will open with BigQuery, and you should be on the bigquery-public-data:crypto_bitcoin_cash
dataset.
All the public datasets are visible.
bigquery-public-data
doesn't appear to the Explorer panel, then click on + ADD > Star a project by name > Project name (bigquery-public-data) and Star.
- In Type to search, type
crypto
.
Only the public datasets starting with crypto will be displayed.
- Expand the datasets so you can see they all share the same structure. This makes performing queries across the different cryptocurrencies easy as the tables, views, and fields are identical in each cryptocurrency dataset.
Task 2. Perform a simple query
In this task, you will view the famous 10,000 bitcoin pizza purchase transaction. You can learn more by reading the Bitcoin Pizza Day 2018 article.
- Change to the Editor tab. Copy and paste this query into the query window and then press Run:
You can see the raw data returned.
Points to note:
- Values for input, output and fee are in satoshis. The satoshi is currently the smallest unit of the bitcoin currency recorded on the block chain. It is a one hundred millionth of a single bitcoin (0.00000001 BTC).
- The sent amount is via multiple inputs all coming from the same address.
- The output amount is sent as a single transaction to the one address.
Task 3. Validate the data
In this task, you will check that you can access the cryptocurrency datasets by performing simple validation queries on two cryptocurrencies.
Double-entry book query of Bitcoin Cash
To motivate an initial exploration of these new datasets, let’s start with a simple example, comparing the way to query both payments and receipts across multiple cryptocurrencies. This comparison is the simplest way to verify that a cryptocurrency is operating as intended, and at least operationally, is a mathematically correct store of value.
- Copy and paste this query into the query window and then press Run:
Verify the Bitcoin values returned are accurate
-
In a new browser tab open the website
https://www.blockchain.com/explorer/search?search=
It will say: What can we help you find?. -
Return to the BigQuery window and copy (Ctrl+C) the top address value returned in BigQuery results.
-
Paste (Ctrl+V) the address value into the search box and click Search.
-
Check the final balance returned (in BTC) - it should be the same as the balance listed in BigQuery results for that address.
Double-entry book query of Dogecoin
- Modify the query(used in Task3) by replacing bitcoin with dogecoin, and then Run the query.
Hint: There are two places in the query to change the dataset name from bitcoin to dogecoin.
Verify the Dogecoin values returned are accurate
-
In a new browser tab open the website
https://dogechain.info/
. -
Return to the BigQuery window and copy the top address returned in BigQuery results.
-
Paste (Ctrl+V) the address value into the search box and click Search.
-
Note the balance returned, it should be the same as the balance listed in BigQuery results for that address.
Notice that the only difference between them is the name of the data location. You can swap in Bitcoin Cash, Dash, Litecoin, and Zcash in a similar fashion.
Task 4. Plot the Gini coefficient for cryptocurrency
Beyond quality control and auditing applications, presenting cryptocurrency in a traditional format enables integration with other financial data management systems. As an example, let’s consider a common economic measure, the Gini Coefficient. In the field of macroeconomics, the Gini Coefficient is a member of a family of econometric measures of wealth inequality. Values range between 0.0 and 1.0, with completely distributed wealth (all members have the same amount) mapping to a value of 0.0 and completely accumulated wealth (one member has everything) mapping to 1.0.
Typically, the Gini Coefficient is estimated for a specific country’s economy based on data sampling or imputation. For crypto-economies, we have complete transparency of the data at the highest possible resolution.
In addition to data transparency, one of the purported benefits of cryptocurrencies is that they allow the implementation of money to more closely resemble the implementation of digital information. It follows that a fully digitized money network will come to resemble the internet, with reduced transactional friction and fewer barriers that impede capital flow. Frequently, implicit in this narrative is that capital will distribute more equally. But we don’t always observe that particular outcome, and the crypto-assets presented here display a broad spectrum of distribution patterns over time. You can read more about using the Gini coefficient to reason about crypto-economic network performance in Quantifying Decentralization.
To set a baseline to interpret our findings, consider how resources are distributed in traditional, non-crypto economies. According to a World Bank analysis in 2013, recent Gini coefficients for world economies have a mean value of 39.6 (with a standard deviation of 9.6).
Create the query
In this next query you will calculate the gini coefficient of dash, on a week-by-week basis. This query will take about 3 minutes to execute. Once you have the data you can easily visualize the graph and compare it with the one generated for the source article.
- Copy and paste this query into the query window and then press Run:
Save the results in a BigQuery table
-
In the Query Results ribbon, click Save Results and select BigQuery table
-
In the Export to BigQuery Table, leave the project as it is, select the
lab
Dataset and name the Table, such asdash_gini
and then press Export. After it finishes, click Go To Table.
Use Looker Studio to visualize the query
-
In the Table view, click Export and select
Export with Looker Studio
. Then clickAuthorize
in the Requesting Authorization prompt. -
When Looker Studio comes up, delete the automatically generated charts (select them and press the Delete key). Click Add a chart on the top menu bar. Under Time Series, select
Time series chart
.
-
Change the metric to gini.
-
Compare the graph with the original source article graph (reproduced below). In the chart below you are looking for the brown line. Starting in December 2019, see how it's changed since then.
Generate the gini coefficient for litecoin
- Perform the same steps as you did for the Dash cryptocurrency to get the gini coefficient, but this time perform it for Litecoin.
Task 5. Explore two famous cryptocurrency events
There is a blog post on the Blockchain site that describes a number of memorable events in bitcoin. You will perform two queries on the bitcoin public dataset to retrieve the data in a couple of those events.
November 22, 2013: This bitcoin transaction sparks mystery and speculation
In the fall of 2013, a 194,993 bitcoin transaction hit the network, which caused many to wonder who was behind this very large transaction. The value at the time of the transaction in USD was over $149 million. CoinDesk wrote "unsurprisingly, a transaction of that size has prompted the bitcoin community to do some analysis and detective work. The transaction involved a large number of sending addresses, with some of them from blocks mined in February 2010 or even earlier, prompting excited speculation they might be from Satoshi Nakamoto, bitcoin’s absent (and likely pseudonymous) founder."
- Write a SQL query to find the
transaction_hash
for transfer of 194993 bitcoins (BTC) and write that data to a table called 51 in the dataset lab. Below is SQL to help you get started:
This will require you to understand the structure of the transactions table, or you can use the outputs view. Your query must return the transaction hash, other fields are not important.
After all this time, what is the balance of the address that purchased the two pizzas for 10,000 BTC?
- Write a SQL query to write the balance for the account that paid 10,000 BTC for the two pizzas in 2010, into the table 52 in the lab dataset.
The output must have the balance for the address.
Here is some SQL to start you off:
Congratulations!
Google Cloud training and certification
...helps you make the most of Google Cloud technologies. Our classes include technical skills and best practices to help you get up to speed quickly and continue your learning journey. We offer fundamental to advanced level training, with on-demand, live, and virtual options to suit your busy schedule. Certifications help you validate and prove your skill and expertise in Google Cloud technologies.
Manual last updated Feb 22, 2024
Lab last tested Feb 22, 2024
Copyright 2024 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.