Are you a Blockchain developer looking for a job at a Blockchain company? If that’s the case, you should probably prepare for technical interviews. In this article I will give you 12 typical interview questions that are asked during technical interviews for Blockchain programmer jobs, as well as their answer. Maximize your chances of passing the interview by reading thoroughly the interview questions given in this article!
I have done dozen of interviews for Blockchain developer jobs, and these are the questions that I got the most often:
- What is the difference between cryptocurrencies and Blockchain?
- Market Data Questions
- Transaction Fees Questions
- How does Blockchain work?
- What is the security model of the Blockchain?
- What is the problem with the Proof-Of-Work Algorithm?
- What is a Merkle tree?
- What is the difference between the UTXO and account model?
- Explain Public vs Private Blockchains
- What are the main differences between Bitcoin and Ethereum?
- What is the scaling problem?
- Explain Gas in Ethereum
Some of the questions are just about market awareness (ex: how many cryptocurrencies exist), some are more technical. They cover both Bitcoin and Ethereum, the 2 most popular Blockchain technologies. You are not likely to get questions on other Blockchain, unless the company you interview with is specifically working with that other Blockchain.
Depending on whether you apply for a senior or junior position, and whether the position is for developing a Blockchain protocol or an application built on top of a Blockchain, the interview will be more or less technical. In any
The Blockchain is the underlying technology whereas cryptocurrencies are the application of this technology to specific business problems.
The 2 are often conflated because Blockchain was first used in the context of Bitcoin. We can say that Bitcoin is a cryptocurrency (digital money) that is powered by the Blockchain technology.
For Bitcoin, the Blockchain technology is used to maintain a ledger of who owns which quantity of Bitcoin, but it doesn’t need to be that way For example, Steemit is a blogging platform that leverage the Blockchain to store articles and protect them against any form of censorship.
The goal of market data questions is to assess if the candidate is following the news of the Blockchain market. Make sure you know some of the important figures of the Blockchain market, such as:
- Number of cryptocurrencies in existence
- Total market cap
- Largest cryptocurrencies by market cap
As of October 2018, the site coinmarketcap listed more than 2000+ cryptocurrencies:
At the same time, the total market capitalization of the crypto market was about 200 billions USD at this time:
Bitcoin makes up an increasingly large share of the total market cap (it used to be only 50% in 2017):
But most of the transactions come from Ethereum:
On Bitinfochart you will find some other interesting charts
At the peak of the crypto bubble in late 2017, the total market cap was about 600 billions:
Most Blockchain requires to pay transaction fees to miners to add some data to the Blockchain. As a developer who will develop on top of the Blockchain, it’s important that you know how to assess the cost of a specific transaction. Once you work as Blockchain developer, if you are asked you to do something that is not economically feasible on the Blockchain, you need to be able to recognize it and tell your boss.
Since the introduction of Segwit, transaction fees on the Bitcoin network went down a lot. In October 2018, a Bitcoin transaction cost about 50 cents.
For Ethereum, transaction fees also went down a lot since the peak of the Blockchain bull market in late 2017. As of october 2018, a standard transaction cost about 12 cents:
Note that transaction fees on Ethereum are a little bit more difficult to measure that for Bitcoin because on Ethereum transaction can not only send assets like for Bitcoin, but can also do all sort of arbitrary computations in smart contracts, resulting in more or less transaction fees. Transaction fees given on this chart
You can find other these transaction fees charts on Bitinforcharts. For Ethereum, another place to understand transaction fees is Ethgasstations. It will tell you how much gasprice you need to pay given the current market conditions and how fast you want your transaction mined. Check the interview question on gas for more info on what is gas.
Blockchain is an append-only distributed database running on a public network of nodes. It uses a combination of:
- Distributed system techniques
- Economic incentives
Blockchain has 3 parts:
Blockchain data is structured into a series blocks. Each block references the previous one by hash, until the first one that is called the Genesis block. The combination of the links between blocks and the proof-of-work algorithm makes forging the Blockchain very difficult: if someone modifies any block earlier in the Blockchain, the hash of the modified block will change, and the attacker will have to change all the blocks up to the latest one to maintain the integrity of the Blockchain. This is impractical to do on Blockchains with a high hashrate (i.e a lot of miners compete to find new blocks).
Each block contains records known as transactions. Transactions are cryptographically signed data packages that described data changes in the Blockchain. For example, in Bitcoin transactions describe simple financial transfers (i.e Bob sends 1 BTC to Alice) but in Ethereum transactions can describe more complex data changes, like the execution of a function in a smart contract that changes some arbitrary variables.
Blocks and transactions can be represented like this in a Blockchain:
Blockchain uses the Proof-Of-Work algorithm as a consensus mechanism across all nodes. To append data to the blockchain, certain nodes called miners include a certain number of transactions in a new block, compute a hash of the block + an arbitrary integer (called a nonce) until the hash is below a certain threshold called
target. If the hash is not below the target, the miner increments the nonce and re-start the process, until the hash is above the target.
When this happen this happen, the miner broadcasts the block to other nodes on the network, and other nodes accept the new block. In the new block, some reward is given to the miner who found the block.
When a Blockchain node is started, it connects to some special nodes called “bootstrap nodes”. The ip addresses of these special nodes is hard-coded in the code of the Blockchain client. These bootstrap nodes act as initial router for the whole network and provide each new node with a list of nearby ips where there are other nodes already connected to the network. The newly connected node then connect to these nodes
Let’s assume that the interviewer is talking of the proof-of-Work algorithm, the historical and most common consensus algorithm used in Blockchains.
To remain safe, a proof-of-work Blockchain needs that 2/3rd of the network participants are honest. At or above this threshold the main chain of the Blockchain will be the “honest” one and it will not be possible for attackers to fork this chain. As a user, you don’t need to know who is honest and who is not. You only need to have the 2/3rd assumption correct.
90% Honest / 10% Dishonest. Ok 😎:
66% Honest / 33% Dishonest. Ok 😎:
50% Honest / 50% Dishonest. NOT Ok 😢:
The Proof-Of-Work algorithm has 2 problems:
- Centralization of miners
- Consumes a lot of energy
Miners have organized themselves into mining groups called mining pools. In mining pools, the block rewards are shared among participant proportionally to the mining power they provided. This does not change the total mining reward of any participant compared to not joining a mining pool, but this does reduce the variability of income.
This has lead to centralization of the hashing power in just a few mining pools that control a large percentage of the overall mining effort. This is a chart of blockchain.con showing the hashrate distribution on the Blockchain network:
This centralization of miners is not desirable because the whole point of Blockchain is to build decentralized systems where no single party is in control.
The second problem is the gigantic power consumptions of miners. To solve a solution to the proof-of-work algorithm and be entitled to the mining reward, miners need to spend a lot of electricity. It was estimated in 2017 than the miners of the Bitcoin network consume as much electricity as a small country like Denmark. Here is a chart that shows the power consumption of the whole Bitcoin network from 2017 to 2018:
The problem lies in the fact that this energy consumption does not serve any useful purpose other than guaranteeing the security of the Bitcoin network. This has led some people to criticize the proof-of-work algorithm for being bad for the environment.
Merkle Trees are data structures that are used extensively in Blockchains.
Before describing the structures of a Merkle tree, let’s first understand what they are used for.
What Merkle trees are used for?
The general use case of a Merkle tree is to summarize a lot of information with little data. With a Merkle tree, you can prove that a piece of data is included in a bigger set of data, without knowing much about the bigger set of data.
Is B into A?
In the context of Blockchains, it is often used to prove that a transaction is included in a block:
Merkle trees are particularly useful for light clients that run on embedded devices with limited space and cpu, like mobiles. For example, wallets often runs on mobile devices and can’t process or store a whole Blockchain (several 100’s of GB).
Now that we know what Merkle trees are used for, let’s describe their structure
Structure of a Merkle tree
Merkle trees are recursive tree-like data structures where the lower rows are used to compute the higher rows:
There are used for various applications in Blockchains, but let’s take the most common use case which is to summarize all the transactions in a block.
Let’s start by the lower row of a Merkle tree.
All the transactions are hashed individually (i.e reduced to a cryptographic fingerprint having always the same size). We are left with one hash for each each transaction. Let’s call H1, H2, … Hn the hashes of transactions 1, 2, … n:
Now, let’s go up one row. We will group the transactions of the first row by 2: H1 with H2, and H3 with H4. For each group, we will:
- Add the 2 hashes
- Compute the hash of these 2 hashes
- Store the resulting hash in the new row. Let’s call this hash H1,2 if that’s the resulting hash of H1 and H2.
For example, for H1 and H2, H1,2 = hash(H1+H2):
H3,4 is computed the same way:
And the algorithm continues all the way up until we find the hash of the top row, also called the root hash:
There are 2 important things to understand:
- If we modify any transaction on the first row, its hash will be different and it will cause the root hash to be different as well. A single root hash correctly identify a specific set of transaction.
- It’s possible to use the intermediary hashes of the Merkle Tree to prove that a transaction was included in a block. No need to have all the hashes of the first row for that. In our examples, we just had 4 transactions so this property does not seem very interesting, but in real Bitcoin blocks there can be 1000’s of transactions of several 100’s bytes each, and this quickly adds up for hardware-constrained clients.
Lastly, as a bonus, I will give you the answer to a typical question that arises when explaining Merkle Trees:
What happen if the number of transactions is odd? In this case, we can’t simply group the bottom row transactions by group of 2, because a transaction will be left alone.
Well, in this case, all we have to do is to double the single transaction so that we have our last group of 2: we will just the transaction with itself!
In Blockchain systems, there are 2 different way of organizing the ledger (i.e keeping track of who owns what):
Earlier Blockchain systems like Bitcoins used UTXO. With UTXO Blockchains, coins that can be spent are put into
outputs, that can be spent by anyone who knows a specific private key. New transactions spend
Outputs that haven’t been spent yet belongs to a group called
UTXO, short for “Unspent Transaction Outputs”.
When a user wants to send some coins to someone, he needs to send a transaction to the Blockchain network. This transaction needs to reference all the outputs of the user that will be spent, a little bit like when you buy something with cash you first need to collect the bills you will need. What is surprising to a lot of developers is that Bitcoin DOES NOT help you at all in this task (i.e there is no API in Bitcoin clients to get the balance of an address). Instead, that’s the job of external entities like wallets to do so.
Because of this, getting the balance of a specific Bitcoin address takes time on UTXO Blockchains.
Later Blockchains like Ethereum switched to a more simple systems called accounts. In accounts-based Blockchains, the Blockchain maintains records of balances indexed by addresses:
0xERui90npo9OeR: 10 ether 0xUU78JKry78KLo: 25 ether ...
This allows these Blockchain to offer a simple API to get the balance of any address. Because the data is already available when the API calls take place, the reading operation can returns right away.
The ability to synchronously retrieve the balance of any address simplifies a lot the designs of wallets and external applications.
Public Blockchains are computer networks where any computers is free to join. For example, the Bitcoin network is a public Blockchain network. Anyone can connect its computer to the network and assume different roles:
- light-client (read data from the Blockchain to maintain a wallet)
- miner (add transactions to the Blockchain by mining new blocks)
- relayer (just relay data between nodes)
Private Blockchains are like public Blockchain except that not everybody is allowed to join the network. If it’s a complete private Blockchain, only pre-approved nodes are allowed to connect to the network. However, this is often too restrictive. Most of the time, semi-private Blockchains are most useful. In semi-private Blockchains, any one is allowed to join the network, but the public is restricted to a subset of all roles (usually just reading data from the Blockchain). Adding data to the Blockchain (i.e mining) is restricted to some nodes that have been pre-approved. For example, that’s the mechanism adopted by EOS. Critics are that such systems are not real Blockchains but glorified distributed databases.
Ethereum was created as an improvement over the limitations of Bitcoin. In particular, Ethereum adds the possibility to run arbitrary computations, compared to Bitcoin where there are just a few kind of transactions that can be run. Ethereum implemented this by creating its own virtual machine (the EVM), capable of running small programs called smart contracts.
Below is a full comparison between the 2 cryptocurrencies:
|Creator||Satoshi Nakomoto*||Vitalik Buterin|
|# of coins||21 millions||Unlimited**|
|Block time||10 mins||15s|
|Smart contracts||Limited (Bitcoin scripting non-turing complete)||Evolved (EVM is turing-complete)|
* We don’t know the real identity of Satoshi Nakomoto
** 100 millions of coins issued as of October 2018
*** Switch to Proof-Of-Stake in 2019/2020
Everybody wants a fast Blockchain:
- Blockchain developers want their application to have more users
- Users want Blockchain application to be fast to use.
However, with the current design of the Proof-Of-Work algorithm Blockchains (like Bitcoin and Ethereum), this is not possible. For example, Bitcoin can only process 7 transaction per seconds while Ethereum can process 15 transactions per second. When too many users use a Blockchain at the same time, Blockchains become clogged, slow and transaction fees skyrocket:
The reason for that is that unlike for traditional systems on the Blockchain every transaction needs to be processed by all the computers of the Blockchain network.
The scaling problem is the largest barrier to mass-adoption for Blockchain systems and most Blockchain projects are working on it. Several general solutions are being developed:
- Block increase time
On Bitcoin, there was a terrible war in 2016/2017 about what was the best way to make Bitcoin scalable. This has led Bitcoin to fork into Bitcoin Core and Bitcoin Cash.
These are the specific solutions that were implemented by various cryptocurrencies:
- Segwit (adopted by Bitcoin
- Blocktime increase (Bitcoin cash
- Bitcoin Lightning Network (Bitcoin sidechain)
- Raiden Netowk (Ethereum sidechain)
- Loom network (Ethereum sidechain)
- Eth 2.0 (implements sharding in Ethereum)
- Casper (implements Proof-Of-Stake in Ethereum)
Gas is used in smart contracts to measure the computational effort required by the EVM (Ethereum Virtual Machine) to run the smart contract. Every time you execute code on the EVM, it consumes a certain quantity of gas. The more gas consumed, the more computationally intensive the task.
Code execution takes place in the context of a transaction. Each transaction contains an Ether allocation to cover the gas cost. This Ether will be paid from the signer of the transaction to the miner who include the transaction in a block.
Gas is just a measurement unit and users don’t pay in gas. Instead, gas cost are paid in ether, the native currency of Ethereum. To convert from gas to Ether use the below formula:
etherPaid = gasPrice * gasCost
gasCost will be given by the EVM of the miner who will mine the transaction. It’s not possible to know it exactly when you code the smart contract because depending on the data you feed to your smart contract function, different code path can be executed. However, its possible to know an estimate with the solidity compiler.
gasPrice defines how much ether must be paid for 1 unit of gas. Each user is free to set any arbitrary value for
gasPrice. However, in order to have a chance to get your transaction included in a block, the
gasPrice needs to be high enough
gasPrice for miners. If the network is saturated, each transaction will compete against a lot of others and users will have to pay more to have their transaction included in a block by miners. If the network is under-utilized, the converse is true. There is no right or wrong value for
gasPrice, but generally speaking the more the
gasPrice, the faster your transaction will be included in a block. To know what is the current market value of gas, you can check Eth Gas Station
A last important parameter is
gasLimit. As mentioned previously it’s not possible to know exactly the
gasCost of a code execution in advance, but its possible to set the maximum amount that you are willing to pay. User set this as a parameter of a transaction, based on the estimation they get from the Solidity compiler.