Understanding The Ethereum Virtual Machine

Rahul Ravindran

Once you write your first few solidity contracts, you may wonder how are contract actually executed? Sure, when you compile your solidity code it may look like a traditional compilation similar to a C program. But understanding the intricacies of how your smart contract is compiled and executed will definitely make you a better developer and help you debug your code faster. In this post we’ll try seek answers to the following questions:

  • How a solidity program is compiled?
  • What is the Ethereum Virtual Machine (EVM)?
  • How the EVM interprets and executes your code?

Prerequisites

  • This is advance discusion. You need to have some experience with blockchain development and have a basic understanding of what a smart contract is. If you do not have this knowledge yet. Please feel free to watch checkout courses like 6-Figure Blockchain Developer or Web Development For Blockchain.
  • Please install solidity compiler. Refer this link for details

A simple smart contract

//hello.sol
pragma solidity >=0.7.0 <0.9.0;

contract HelloWolrd{ 
   uint256 a;
   constructor() {
      a = 1;
   }
}

Lets compile the program using the following command:

$ solc --bin --asm hello.sol

You should see a similar output on your terminal:

======= hello.sol:HelloWolrd =======
EVM assembly:
    /* "hello.sol":33:109  contract HelloWolrd{ ... */
  mstore(0x40, 0x80)
    /* "hello.sol":73:107  constructor() {... */
  callvalue
  dup1
  iszero
  tag_1
  jumpi
  0x00
  dup1
  revert
tag_1:
  pop
    /* "hello.sol":99:100  1 */
  0x01
    /* "hello.sol":95:96  a */
  0x00
    /* "hello.sol":95:100  a = 1 */
  dup2
  swap1
  sstore
  pop
    /* "hello.sol":33:109  contract HelloWolrd{ ... */
  dataSize(sub_0)
  dup1
  dataOffset(sub_0)
  0x00
  codecopy
  0x00
  return
stop

sub_0: assembly {
        /* "hello.sol":33:109  contract HelloWolrd{ ... */
      mstore(0x40, 0x80)
      0x00
      dup1
      revert

    auxdata: 0xa26469706673582212205fe60a80f73e86d9f3ad4b2fce33b99a5903282fbd1e3b5c7df084ce790bd8d964736f6c634300080a0033
}

Binary:
6080604052348015600f57600080fd5b506001600081905550603f8060256000396000f3fe6080604052600080fdfea26469706673582212205fe60a80f73e86d9f3ad4b2fce33b99a5903282fbd1e3b5c7df084ce790bd8d964736f6c634300080a0033

I admit that output is very difficult to understand. Let first focus on the output under the heading

Binary:
6080604052348015600f57600080fd5b506001600081905550603f8060256000396000f3fe6080604052600080fdfea26469706673582212205fe60a80f73e86d9f3ad4b2fce33b99a5903282fbd1e3b5c7df084ce790bd8d964736f6c634300080a0033

This is the compiled bytecode of our contract. The bytecode is a set of instruction that describes what we have written in solidity. Bytecode is deployed to the ethereum network which the EVM then executes.

The Ethereum Virtual Machine

The part of the protocol that actually handles processing the transactions is Ethereum’s own virtual machine, known as the Ethereum Virtual Machine (EVM).

The EVM is a Turing complete virtual machine, as defined earlier. The only limitation the EVM has that a typical Turing complete machine does not is that the EVM is intrinsically bound by gas. Thus, the total amount of computation that can be done is intrinsically limited by the amount of gas provided.

Source: CMU
Source: CMU

Moreover, the EVM has a stack-based architecture. A stack machine is a computer that uses a last-in, first-out stack to hold temporary values.

The size of each stack item in the EVM is 256-bit, and the stack has a maximum size of 1024.

The EVM has memory, where items are stored as word-addressed byte arrays. Memory is volatile, meaning it is not permanent.

The EVM also has storage. Unlike memory, storage is non-volatile and is maintained as part of the system state. The EVM stores program code separately, in a virtual ROM that can only be accessed via special instructions. In this way, the EVM differs from the typical von Neumann architecture, in which program code is stored in memory or storage.

If you scan the raw bytecode by bytes (two characters at a time), the EVM identifies specific opcodes that it associates to particular actions. For example: The first 4 characters of the bytecode 6080 translates to (in hexadecimal):

0x60 0x80

The disassembled code is still very low-level and difficult to read, but as you will see, we can start making sense out of it. Each instruction is made up of an opcode and an optional arguments.

Opcodes

Opcodes are low level stack operators that tells the EVM what to do in an instruction. You can think of opcodes as basic arithmetic operations used in algebra. Before we get started on our ambitious endeavour of completely deconstructing the bytecode, you’re going to need a basic tool set for understanding individual opcodes such as PUSH, ADD, SWAP, DUP, etc. An opcode, in the end, can only push or consume items from the EVM’s stack, memory, or storage belonging to the contract. That’s it.

The following table contains the EVM’s opcode instruction set:

Opcode Name Description Extra Info Gas
0x00 STOP Halts execution 0
0x01 ADD Addition operation 3
0x02 MUL Multiplication operation 5
0x03 SUB Subtraction operation 3
0x04 DIV Integer division operation 5
0x05 SDIV Signed integer division operation (truncated) 5
0x06 MOD Modulo remainder operation 5
0x07 SMOD Signed modulo remainder operation 5
0x08 ADDMOD Modulo addition operation 8
0x09 MULMOD Modulo multiplication operation 8
0x0a EXP Exponential operation 10*
0x0b SIGNEXTEND Extend length of two’s complement signed integer 5
0x0c0x0f Unused Unused
0x10 LT Less-than comparison 3
0x11 GT Greater-than comparison 3
0x12 SLT Signed less-than comparison 3
0x13 SGT Signed greater-than comparison 3
0x14 EQ Equality comparison 3
0x15 ISZERO Simple not operator 3
0x16 AND Bitwise AND operation 3
0x17 OR Bitwise OR operation 3
0x18 XOR Bitwise XOR operation 3
0x19 NOT Bitwise NOT operation 3
0x1a BYTE Retrieve single byte from word 3
0x1b SHL Shift Left EIP145 3
0x1c SHR Logical Shift Right EIP145 3
0x1d SAR Arithmetic Shift Right EIP145 3
0x20 KECCAK256 Compute Keccak-256 hash 30*
0x210x2f Unused Unused
0x30 ADDRESS Get address of currently executing account 2
0x31 BALANCE Get balance of the given account 700
0x32 ORIGIN Get execution origination address 2
0x33 CALLER Get caller address 2
0x34 CALLVALUE Get deposited value by the instruction/transaction responsible for this execution 2
0x35 CALLDATALOAD Get input data of current environment 3
0x36 CALLDATASIZE Get size of input data in current environment 2*
0x37 CALLDATACOPY Copy input data in current environment to memory 3
0x38 CODESIZE Get size of code running in current environment 2
0x39 CODECOPY Copy code running in current environment to memory 3*
0x3a GASPRICE Get price of gas in current environment 2
0x3b EXTCODESIZE Get size of an account’s code 700
0x3c EXTCODECOPY Copy an account’s code to memory 700*
0x3d RETURNDATASIZE Pushes the size of the return data buffer onto the stack EIP 211 2
0x3e RETURNDATACOPY Copies data from the return data buffer to memory EIP 211 3
0x3f EXTCODEHASH Returns the keccak256 hash of a contract’s code EIP 1052 700
0x40 BLOCKHASH Get the hash of one of the 256 most recent complete blocks 20
0x41 COINBASE Get the block’s beneficiary address 2
0x42 TIMESTAMP Get the block’s timestamp 2
0x43 NUMBER Get the block’s number 2
0x44 DIFFICULTY Get the block’s difficulty 2
0x45 GASLIMIT Get the block’s gas limit 2
0x46 CHAINID Returns the current chain’s EIP-155 unique identifier EIP 1344 2
0x470x4f Unused
0x48 BASEFEE Returns the value of the base fee of the current block it is executing in. EIP 3198 2
0x50 POP Remove word from stack 2
0x51 MLOAD Load word from memory 3*
0x52 MSTORE Save word to memory 3*
0x53 MSTORE8 Save byte to memory 3
0x54 SLOAD Load word from storage 800
0x55 SSTORE Save word to storage 20000**
0x56 JUMP Alter the program counter 8
0x57 JUMPI Conditionally alter the program counter 10
0x58 GETPC Get the value of the program counter prior to the increment 2
0x59 MSIZE Get the size of active memory in bytes 2
0x5a GAS Get the amount of available gas, including the corresponding reduction for the cost of this instruction 2
0x5b JUMPDEST Mark a valid destination for jumps 1
0x5c0x5f Unused
0x60 PUSH1 Place 1 byte item on stack 3
0x61 PUSH2 Place 2-byte item on stack 3
0x62 PUSH3 Place 3-byte item on stack 3
0x63 PUSH4 Place 4-byte item on stack 3
0x64 PUSH5 Place 5-byte item on stack 3
0x65 PUSH6 Place 6-byte item on stack 3
0x66 PUSH7 Place 7-byte item on stack 3
0x67 PUSH8 Place 8-byte item on stack 3
0x68 PUSH9 Place 9-byte item on stack 3
0x69 PUSH10 Place 10-byte item on stack 3
0x6a PUSH11 Place 11-byte item on stack 3
0x6b PUSH12 Place 12-byte item on stack 3
0x6c PUSH13 Place 13-byte item on stack 3
0x6d PUSH14 Place 14-byte item on stack 3
0x6e PUSH15 Place 15-byte item on stack 3
0x6f PUSH16 Place 16-byte item on stack 3
0x70 PUSH17 Place 17-byte item on stack 3
0x71 PUSH18 Place 18-byte item on stack 3
0x72 PUSH19 Place 19-byte item on stack 3
0x73 PUSH20 Place 20-byte item on stack 3
0x74 PUSH21 Place 21-byte item on stack 3
0x75 PUSH22 Place 22-byte item on stack 3
0x76 PUSH23 Place 23-byte item on stack 3
0x77 PUSH24 Place 24-byte item on stack 3
0x78 PUSH25 Place 25-byte item on stack 3
0x79 PUSH26 Place 26-byte item on stack 3
0x7a PUSH27 Place 27-byte item on stack 3
0x7b PUSH28 Place 28-byte item on stack 3
0x7c PUSH29 Place 29-byte item on stack 3
0x7d PUSH30 Place 30-byte item on stack 3
0x7e PUSH31 Place 31-byte item on stack 3
0x7f PUSH32 Place 32-byte (full word) item on stack 3
0x80 DUP1 Duplicate 1st stack item 3
0x81 DUP2 Duplicate 2nd stack item 3
0x82 DUP3 Duplicate 3rd stack item 3
0x83 DUP4 Duplicate 4th stack item 3
0x84 DUP5 Duplicate 5th stack item 3
0x85 DUP6 Duplicate 6th stack item 3
0x86 DUP7 Duplicate 7th stack item 3
0x87 DUP8 Duplicate 8th stack item 3
0x88 DUP9 Duplicate 9th stack item 3
0x89 DUP10 Duplicate 10th stack item 3
0x8a DUP11 Duplicate 11th stack item 3
0x8b DUP12 Duplicate 12th stack item 3
0x8c DUP13 Duplicate 13th stack item 3
0x8d DUP14 Duplicate 14th stack item 3
0x8e DUP15 Duplicate 15th stack item 3
0x8f DUP16 Duplicate 16th stack item 3
0x90 SWAP1 Exchange 1st and 2nd stack items 3
0x91 SWAP2 Exchange 1st and 3rd stack items 3
0x92 SWAP3 Exchange 1st and 4th stack items 3
0x93 SWAP4 Exchange 1st and 5th stack items 3
0x94 SWAP5 Exchange 1st and 6th stack items 3
0x95 SWAP6 Exchange 1st and 7th stack items 3
0x96 SWAP7 Exchange 1st and 8th stack items 3
0x97 SWAP8 Exchange 1st and 9th stack items 3
0x98 SWAP9 Exchange 1st and 10th stack items 3
0x99 SWAP10 Exchange 1st and 11th stack items 3
0x9a SWAP11 Exchange 1st and 12th stack items 3
0x9b SWAP12 Exchange 1st and 13th stack items 3
0x9c SWAP13 Exchange 1st and 14th stack items 3
0x9d SWAP14 Exchange 1st and 15th stack items 3
0x9e SWAP15 Exchange 1st and 16th stack items 3
0x9f SWAP16 Exchange 1st and 17th stack items 3
0xa0 LOG0 Append log record with no topics 375
0xa1 LOG1 Append log record with one topic 750
0xa2 LOG2 Append log record with two topics 1125
0xa3 LOG3 Append log record with three topics 1500
0xa4 LOG4 Append log record with four topics 1875
0xa50xaf Unused
0xb0 JUMPTO Tentative libevmasm has different numbers EIP 615
0xb1 JUMPIF Tentative EIP 615
0xb2 JUMPSUB Tentative EIP 615
0xb4 JUMPSUBV Tentative EIP 615
0xb5 BEGINSUB Tentative EIP 615
0xb6 BEGINDATA Tentative EIP 615
0xb8 RETURNSUB Tentative EIP 615
0xb9 PUTLOCAL Tentative EIP 615
0xba GETLOCAL Tentative EIP 615
0xbb0xe0 Unused
0xe1 SLOADBYTES Only referenced in pyethereum
0xe2 SSTOREBYTES Only referenced in pyethereum
0xe3 SSIZE Only referenced in pyethereum
0xe40xef Unused
0xf0 CREATE Create a new account with associated code 32000
0xf1 CALL Message-call into an account Complicated
0xf2 CALLCODE Message-call into this account with alternative account’s code Complicated
0xf3 RETURN Halt execution returning output data 0
0xf4 DELEGATECALL Message-call into this account with an alternative account’s code, but persisting into this account with an alternative account’s code Complicated
0xf5 CREATE2 Create a new account and set creation address to sha3(sender + sha3(init code)) % 2**160
0xf60xf9 Unused
0xfa STATICCALL Similar to CALL, but does not modify state 40
0xfb Unused
0xfc TXEXECGAS Not in yellow paper FIXME
0xfd REVERT Stop execution and revert state changes, without consuming all provided gas and providing a reason 0
0xfe INVALID Designated invalid instruction 0
0xff SELFDESTRUCT Halt execution and register account for later deletion 5000*

Instructions

Each line in the disassembled code above is an instruction for the EVM to execute. Each instruction contains an opcode. For example 0x60 0x80 translates to:

 PUSH1 0x80
  |     |     
  |     Hex value for push.
  Opcode.

Destructing our contract

The compiled code has a lot of boilerplate, the code we have written essentially is compiled under “tag_1”:

tag_1:
  pop
    /* "hello.sol":98:99  1 */
  0x01
    /* "hello.sol":94:95  a */
  0x00
    /* "hello.sol":94:99  a = 1 */
  dup2
  swap1
  sstore
  pop
    /* "hello.sol":33:107  contract HelloWolrd{ ... */
  dataSize(sub_0)
  dup1
  dataOffset(sub_0)
  0x00
  codecopy
  0x00
  return
stop

This assignment is represented by the bytecode “6001600081905550”. Let’s break it up into one instruction per line:

60 01
60 00
81
90
55
50

The EVM is basically a loop that execute each instruction from top to bottom. Let’s annotate the assembly code (indented under the label tag_1) with the corresponding bytecode to better see how they are associated:

tag_1:
  // 60 01
  0x1
  // 60 00
  0x0
  // 81
  dup2
  // 90
  swap1
  // 55
  sstore
  // 50
  pop

Note that 0x1 in the assembly code is actually a shorthand for push(0x1). This instruction pushes the number 1 onto the stack.

EVM: A Stack Machine

The EVM is a stack machine. Instructions might use values on the stack as arguments, and push values onto the stack as results. Let’s consider the operation add. Assume that there are two values on the stack:

[1 2]

When the EVM sees add, it adds the top 2 items together, and pushes the answer back onto the stack, resulting in:

[3]

And notate the contract storage with {}:

// Nothing in storage.
store: {}
// The value 0x1 is stored at the position 0x0.
store: { 0x0 => 0x1 }

Let’s now look at some real bytecode. We’ll simulate the bytecode sequence “6001600081905550” as EVM would, and print out the machine state after each instruction:

// 60 01: pushes 1 onto stack
0x1
  stack: [0x1]
// 60 00: pushes 0 onto stack
0x0
  stack: [0x0 0x1]
// 81: duplicate the second item on the stack
dup2
  stack: [0x1 0x0 0x1]
// 90: swap the top two items
swap1
  stack: [0x0 0x1 0x1]
// 55: store the value 0x1 at position 0x0
// This instruction consumes the top 2 items
sstore
  stack: [0x1]
  store: { 0x0 => 0x1 }
// 50: pop (throw away the top item)
pop
  stack: []
  store: { 0x0 => 0x1 }

The end. The stack is empty, and there’s one item in storage. Evm has now executed our contract successfully.

Conclusion

It’s definitely a good investment to learn how a high-level language like Solidity runs on the Ethereum Virtual Machine (EVM). Knowing the EVM well would help you make awesome tools for yourself and others, also to debug your code better. For further reading checkout the following articles:

0 Comments

Leave a Reply

More great articles

Developing for Solana Blockchain

Overview In this article you’ll learn about some of the very high level topics related to Solana development such as:…

Read Story

How To Perform Custom Ethereum Flash Loans Using Solidity (ERC 3156 Standard)

What will I learn? Connecting and using an Ethereum testnet What is a flash loan (ERC 3156) Flash loan interfaces…

Read Story

Flash Loans Explained!

Overview Decentralised finance on Ethereum has created a new monetary paradigm. Ethereum’s smart contract architecture has laid the grounds for…

Read Story

Never miss a minute

Get great content to your inbox every week. No spam.
[contact-form-7 id="6" title="Footer CTA Subscribe Form"]
Arrow-up