IPFS is a distributed system for storing and accessing files, websites, applications, and data. IPFS allows users to host and receive content in a manner similar to BitTorrent. As opposed to a centrally located server, IPFS is built around a decentralized system of user-operators who hold a portion of the overall data, creating a resilient system of file storage and sharing. Any user in the network can serve a file by its content address, and other peers in the network can find and request that content from any node who has it using a distributed hash table (DHT).
For a quick summary please checkout our Video on IPFS:
History of IPFS
IPFS began in 2015 as an effort by Protocol Labs to build a system that could fundamentally change the way information is transmitted across the globe and pave the way for a distributed, more resilient web. IPFS has grown to support an array of different use cases and is improving information management for industries across the spectrum: from disintermediating the music industry to unblocking weather risk protection for agribusiness. Currently, Protocol Labs’ projects include IPFS, the modular protocols and tools that support it, and Filecoin, among others. Between them, these tools serve thousands of organizations and millions of people.
IPFS stores data in various locations making it possible to download a file from many locations that aren’t managed by one organization. This has the following advantages:
- Supports a resilient internet. If someone attacks Wikipedia’s web servers or an engineer at Wikipedia makes a big mistake that causes their servers to catch fire, you can still get the same webpages from somewhere else.
- Makes it harder to censor content. Because files on IPFS can come from many places, it’s harder for anyone (whether they’re states, corporations, or someone else) to block things. We hope IPFS can help provide ways to circumvent actions like these when they happen.
- Can speed up the web when you’re far away or disconnected. If you can retrieve a file from someone nearby instead of hundreds or thousands of miles away, you can often get it faster. This is especially valuable if your community is networked locally but doesn’t have a good connection to the wider internet. (Well-funded organizations with technical expertise do this today by using multiple data centers or CDNs — content distribution networks (opens new window). IPFS hopes to make this possible for everyone.)
That last point is actually where IPFS gets its full name: the InterPlanetary File System. We’re striving to build a system that works across places as disconnected or as far apart as planets
Instead of referring to data (photos, articles, videos) by location, or which server they are stored on, IPFS refers to everything by that data’s hash, meaning the content itself. The idea is that if you want to access a particular page from your browser, IPFS will ask the entire network, “does anyone have the data that corresponds to this hash?” A node on IPFS that contains the corresponding hash will return the data, allowing you to access it from anywhere (and potentially even offline).
IPFS uses content addressing the way HTTP uses URLs. This means that instead of creating identifiers that address artifacts by location, we can address them by some representation of the content itself. This content-addressable approach separates the “what” from the “where,” so data and files can be stored and served from anywhere by anyone. It works by taking a file and hashing it cryptographically so you end up with a very small and reproducible representation of the file, which ensures that no one can create another file that has the same hash and use that as the address. Instead of a server, you are talking to a specific piece of data.
HTTP vs. IPFS
HTTP has a helpful property in which the location is in the identifier—this makes it easy to find the computers hosting the file and talk to them. This generally works very well, but not in the offline case or in large distributed scenarios where you want to minimize load across the network. It also means that if a particular server is down, the content it hosts is unavailable.
In IPFS you separate the steps into two parts:
- Identify the file with content addressing, via the hash.
- Ask who has it. When you have the hash, then you ask the network you’re connected to “Who has this content (hash)?” and you connect to the corresponding nodes and download it.
The result is a peer-to-peer overlay that enables very fast routing, not tied to a particular physical location but widely and immediately available. To learn more, check out this overview of how IPFS works or watch this video to learn how IPFS deals with files.
How IPFS Works
IPFS is essentially a P2P system for retrieving and sharing IPFS objects. An IPFS object is a data structure with two fields:
- Data — a blob of unstructured binary data of size < 256 kB.
- Links — an array of Link structures. These are links to other IPFS objects.
A Link structure has three data fields:
- Name — the name of the Link.
- Hash — the hash of the linked IPFS object.
- Size — the cumulative size of the linked IPFS object, including following its links.
IPFS objects are normally referred to by their Base58 encoded hash. For instance, let’s take a look at the IPFS object with hash QmarHSr9aSNaPSR6G9KFPbuLV9aEqJfTk1y9B8pdwqK4Rq using the IPFS command-line tool
Directed acyclic graphs (DAGs)
IPFS and many other distributed systems take advantage of a data structure called directed acyclic graphs (opens new window), or DAGs. Specifically, they use Merkle DAGs, which are DAGs where each node has a unique identifier that is a hash of the node’s contents. Sound familiar? This refers back to the CID concept that we covered in the previous section. Put another way: identifying a data object (like a Merkle DAG node) by the value of its hash is content addressing. Check out our guide to Merkle DAGs for a more in-depth treatment of this topic.
IPFS uses a Merkle DAG that is optimized for representing directories and files, but you can structure a Merkle DAG in many different ways. For example, Git uses a Merkle DAG that has many versions of your repo inside of it.
To build a Merkle DAG representation of your content, IPFS often first splits it into blocks. Splitting it into blocks means that different parts of the file can come from different sources and be authenticated quickly. (If you’ve ever used BitTorrent, you may have noticed that when you download a file, BitTorrent can fetch it from multiple peers at once; this is the same idea.)
Because changing an object would change its hash (content addressing) and thus its address, IPFS needs some way to provide a permanent address and even better human-readable name. This is addressed with IPNS (interplanetary naming system) which handles the creation of:
- mutable pointers to objects
- human-readable names
IPNS is based on SFS. It is a PKI namespace a name is simply the hash of a public key. Records are signed by the private key and distributed anywhere.
This is one of the most interesting use cases for IPFS. A blockchain has a natural DAG structure, considering that previous blocks are always linked by their hash from the later ones. Ethereum blockchain also has an associated state database that has a Merkle-Patricia tree structure that can be emulated using IPFS objects.
A key point here is the difference between storing data on the blockchain and storing hashes of data on the blockchain. On the Ethereum blockchain, users pay a large fee for storing data in the associated state database, in order to minimize the bloat of the state database (so-called “blockchain bloat”). This is a standard design pattern for larger pieces of data that do not store the data itself, but an IPFS hash of the data in the state database.
Usually, blockchains make a distinction between what is in the global ledger replicated by every miner (aka, data stored in the chain itself) and the data that might be referenced within the chain but isn’t replicated between all nodes and should be checked up separately (because it is too large). If the blockchain with its associated state database is already embodied in IPFS, then the distinction between storing a hash on the blockchain and storing the data on the blockchain becomes somewhat blurred, since everything is stored in IPFS anyway.
As we all know, HTTP is one of the most successful file distributed systems and is worldwide used. Shifting towards peer-to-peer data distribution systems is not an easy task. While today we have such amazing solution concepts it’s just up to us which direction we are going to take them. Since the world is headed towards peer-to-peer systems, DHT and MerkleDAG is a great place to start. IPFS alongside FileCoin is taking a huge leap towards decentralized storage of personal data in which all of us should take part.