Minimum Viable Block Chain
Cryptocurrencies, and Bitcoin in particular, have been getting a lot of attention from just about every angle: regulation, governance, taxation, technology, product innovation, and the list goes on. The very concept of a “peer-to-peer (decentralized) electronic cash system” turns many of our previously held assumptions about money and finance on their head.
That said, putting the digital currency aspects aside, an arguably even more interesting and far-reaching innovation is the underlying block chain technology. Regardless of what you think of Bitcoin, or its [altcoin derivatives][altcoin], as a currency and a store of value, behind the scenes they are all operating on the same basic block chain principles [outlined by Satoshi Nakamoto][bitcoin]:
We propose a solution to the double-spending problem using a peer-to-peer network. The network timestamps transactions by hashing them into an ongoing chain of hash-based proof-of-work, forming a record that cannot be changed without redoing the proof-of-work. The longest chain not only serves as proof of the sequence of events witnessed, but proof that it came from the largest pool of CPU power... The network itself requires minimal structure.
The block chain is agnostic to any “currency”. In fact, it can (and will) be adapted to power many other use cases. As a result, it pays to understand the how and the why behind the “minimum viable block chain”:
- What follows is not an analysis of the Bitcoin block chain. In fact, I intentionally omit mentioning both the currency aspects, and the many additional features that the Bitcoin block chain is using in production today.
- What follows is an attempt to explain, from the ground up, why the particular pieces (digital signatures, proof-of-work, transaction blocks) are needed, and how they all come together to form the “minimum viable block chain” with all of its remarkable properties.
- Securing transactions with triple-entry bookkeeping
- Securing transactions with PKI
- Balance = Σ(receipts)
- Multi-party transfers & verification
- Double-spending and distributed consensus
- Requirements for a distributed consensus network
- Protecting the network from Sybil attacks
- Proof-of-work as a participation requirement </ul>
- Building the minimum viable block chain
- Adding "blocks" & transaction fee incentives
- Racing to claim the transaction fees
- Resolving chain conflicts
- Blocks are never final </ul>
- Properties of the (minimum viable) block chain
Public-key cryptography, also known as asymmetric cryptography, refers to a cryptographic algorithm which requires two separate keys, one of which is secret (or private) and one of which is public. Although different, the two parts of this key pair are mathematically linked. The public key is used to encrypt plaintext or to verify a digital signature; whereas the private key is used to decrypt ciphertext or to create a digital signature.The original intent behind using a third party (Chuck) was to ensure three properties: * **Authentication:** a malicious party can't masquerade as someone else. * **Non-repudiation:** participants can't claim that the transaction did not happen after the fact. * **Integrity:** the transaction receipt can't be modified after the fact. Turns out, public key cryptography can satisfy all of the above requirements. Briefly, the workflow is as follows: 1. Both Alice and Bob generate a set public-private keypairs. * Both Alice and Bob publish their public keys to the world. * Alice writes a transaction receipt in plaintext. * Alice encrypts the plaintext of the transaction using her private key. * Alice prepends a plaintext "signed by" note to the ciphertext. * Both Alice and Bob store the resulting output.Note that step #5 is only required when many parties are involved: if you don't know who signed the message then you don't know whose public key you should be using to decrypt it. This will become relevant very soon...This may seem like a lot of work for no particular reason, but let's examine the properties of our new receipt: 1. Bob doesn't know Alice's private key, but that doesn't matter because he can look up her public key (which is shared with the world) and use it to decrypt the ciphertext of the transaction. 1. Alice is not really "encrypting" the contents of the transaction. Instead, by using her private key to encode the transaction she is "signing it": anyone can decrypt the ciphertext by using her public key, and because she is the only one in possession of the private key this mechanism guarantees that only she could have generated the ciphertext of the transaction.How does Bob, or anyone else for that matter, get Alice's public key? There are many ways to handle distribution of public keys - e.g. Alice publishes it it on her website. We'll assume that some such mechanism is in place.As a result, the use of public key infrastructure (PKI) fulfills all of our earlier requirements: 1. Bob can use Alice's public key to authenticate signed transaction by decrypting the ciphertext. 1. Only Alice knows her private key, hence Alice can't deny that the transaction took place - she signed it. 1. Neither Bob nor anyone else can fake or modify a transaction without access to Alice's private key.Note that for #2, Alice can deny that she is the true owner of the public-private keypair in question - i.e. someone is faking her identity in #1. The keypair to identity association is something your key distribution mechanisms needs to account for.**Both Alice and Bob simply store a copy of the signed transaction and the need for an intermediary is eliminated.** The "magic" of public key cryptography is a perfect match for their two-party barter system.To keep things simple, we'll assume that all transfers "spend" full value of the transaction being transfered. It's not too hard to extend this system to allow fractional transfers, but that's unnecessary complexity at this point.With the new transaction in place, John makes a copy of the encrypted ledger for his safekeeping (now there are three copies) and runs some checks to verify its integrity: 1. John fetches Alice's and Bob's public keys and verifies the first three transactions. 2. John verifies that Bob is transferring a "valid" transaction: - The transaction that is being transferred is addressed to Bob. - Bob has not previously transfered the same transaction to anyone else. If all the checks pass, they complete the exchange and we can compute the new balances by traversing the ledger: Bob has a net zero balance, Alice has a debit of 2 chroms, and John has a credit of 2 chroms (courtesy of Alice). Further, John can now take his new ledger to Alice and ask her for payment, and even though Alice wasn't present for their transaction, that's not a problem: * Alice can verify the signature of the new transfer transaction using Bob's public key. * Alice can verify that the transfer transaction is referencing one of her own valid transactions with Bob.The above transfer and verification process is a pretty remarkable property of the system! Note that to make it all work, we need two enabling technologies: (a) use of PKI, which enables digital signature verification, and (b) the receipt ledger, which enables us to look at the full transaction history to verify balances and to link previous transactions to enable the transfer.Satisfied with their ingenuity John and Bob part ways: Bob goes home with a new stamp and John with a new ledger. On the surface, everything looks great, but **they've just exposed themselves to a challenging security problem... Can you spot it?**In CS speak, a two-party ledger provides "strong consistency", and growing the ledger beyond two parties requires some form of distributed consensus to mitigate double-spend.**The simplest possible solution to this problem is to require that all parties listed in the ledger must be present at the time when each new transaction is made, such that everyone can update their books simultaneously.** An effective strategy for a small-sized group, but also not a scalable one for a large number of participants.Note that the "how" of building a P2P network is a massive subject in its own right: protocols and signaling, traversing firewalls and NATs, bootstrapping, optimizing how updates are propagated, security, and so on. That said, the low-level mechanics of building such a network are out of scope of our discussion... we'll leave that as an exercise for the reader.Turns out, [distributed consensus][consensus] is a well studied problem in computer science, and one that offers some promising solutions. For example, [two-phase commit][2PC] (2PC) and [Paxos] both enable a mechanism where we only need the majority quorum (50%+) of participants to be present to safely commit a new transaction: as long as the majority has accepted the transaction the remainder of the group is guaranteed to eventually converge on the same transaction history. **That said, neither 2PC nor Paxos are sufficient on their own.** For example, how would either 2PC or Paxos know the total number of participants in our P2P stamp-collector network when new participants are joining on a daily basis and others are disappearing without notice? If one of the prior participants is offline, are they offline temporarily, or have they permanently left the network? Similarly, there is another and an even more challenging "[Sybil attack][sybil]" that we must account for: there is nothing stopping a malicious participant from creating many profiles to gain an unfair share of voting power within our P2P network. If the number of participants in the system was fixed and their identities were authenticated and verified (i.e. a trusted network), then both 2PC and Paxos would work really well. Alas, that is simply not the case in our ever changing stamp collector P2P network. Have we arrived at a dead end? Well, not quite… **One obvious solution to solve our dilemma is to eliminate the "distributed" part from the problem statement.** Instead of building a P2P distributed system we could, instead, build a global registry of all stamp collectors that would record their account information, authenticate them and (try to) ensure that nobody is cheating by creating multiple identities, and most importantly, keep one shared copy of the ledger! Concretely, **we could build a website where these trades can take place, and the website would then take care of ensuring the integrity and correct ordering of all transactions by recording them in its centralized database.** The above is a practical solution but, let's admit it, an unsatisfying one since it forces us to forfeit the peer-to-peer nature of our ledger system. It places all of the trust in a single centralized system and opens up an entirely new set of questions: what is the uptime, security and redundancy of the system; who maintains the system and what are their incentives; who has administrative access, and so on. **Centralization brings its own set of challenges.** Let's rewind and revisit some of the problems we've encountered with our P2P design: * Ensuring that every participant is always up to date (strongly consistent system) imposes high coordination costs and affects availability: if a single peer is unreachable the system cannot commit new transactions. * In practice we don't know the global status of the P2P network: number of participants, whether individuals are temporarily offline or decided to leave the network, etc. * Assuming we can resolve the above constraints, the system is still open to a Sybil attack where a malicious user can create many fake identities and exercise unfair voting power. **Unfortunately, resolving all of the above constraints is impossible unless we relax some of the requirements:** the [CAP theorem][cap] tells us that our distributed system can't have strong consistency, availability, and partition tolerance. As a result, in practice our P2P system [must operate under the assumption of weak(er) consistency][weak] and deal with its implications: * We must accept that some ledgers will be out of sync (at least temporarily). * The system must eventually converge on a global ordering (linearizability) of all transactions. * The system must resolve ledger conflicts in a predictable manner. * The system must enforce global invariants - e.g. no double-spends. * The system should be secure against Sybil and similar attacks. How do we know if any particular peer is allowed to participate in the first place? If all that's needed is just a unique private key to sign a vote, then a malicious user could simply generate an unlimited number of new keys and flood the network. The root problem is that **when forged identities are cheap to generate and use, any voting system is easily subverted.** **To solve this problem we need to make the process of submitting the vote "expensive".** Either the cost of generating a new identity must be raised, or the very process of submitting a vote must incur sufficiently high costs. To make this concrete, consider some real-world examples: * When you show up to vote in your local government election, you are asked to present an ID (e.g. a passport) that is (hopefully) expensive to fake. In theory, nothing stops you from generating multiple fake IDs, but if the costs are high enough (monetary costs of making a fake, risk of being caught, etc), than the cost of running a Sybil attack will outweigh its benefits. * Alternatively, imagine that you had to incur some other cost (e.g. pay a fee) to submit a vote. If the cost is high enough, then once again, the barrier to running a large-scale Sybil attack is increased. Note that neither of the above examples "solves" the Sybil attack completely, but they also don't need to: as long as we raise the cost of the attack to be larger than the value gained by successfully subverting the system, then the system is secure and behaves as intended.Note that we're using a loose definition of "secure". The system is still open for manipulation, and the exact vote count is affected, but the point is that a malicious participant doesn't affect the final outcome.
sha256("I vote for Bob") → b28bfa35bcd071a321589fb3a95cac...1. The resulting hash value is invalid because it does not start with our required substring of two zeros. 1. We modify the vote statement by appending an arbitrary string and try again: *
sha256("I vote for Bob - hash attempt #2") → 7305f4c1b1e7...1. The resulting hash value does not satisfy our condition either. We update the value and try again, and again, and… 155 attempts later we finally get: *
sha256("I vote for Bob - hash attempt #155") → 008d08b8fe...The critical property of the above workflow is that the output of the cryptographic hash function (SHA-256 in this case) is completely different every time we modify the input: the hash value of the previous attempt does not tell us anything about what the hash value of the next attempt when we increment our counter. **As a result, generating a valid vote is not just "hard problem", but also one better described as a lottery where each new attempt gives you a random output.** Also, we can adjust the odds of the lottery by changing the length of the required prefix: 1. Each character of the SHA-256 checksum has 16 possible values: _0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f._ 1. In order to generate a hash with a valid two zero prefix the sender will need 256 (16^2) attempts on average. 1. Bumping the requirement to 5 zeros will require more than 1,000,000 (16^5) attempts on average… Point being, we can easily increase the cost and make the sender spend more CPU cycles to find a valid hash.How many SHA256 checksums can we compute on a modern CPU? The cost depends on the size of the message, CPU architecture, and other variables. If you're curious, open up your console and run a benchmark:The net result is that generating a valid vote is "expensive" for the sender, but is still trivial to verify for the receiver: the receiver hashes the transaction (one operation) and verifies that the checksum contains the required hash collision prefix... Great, so how is this useful for our P2P system? **The above proof-of-work mechanism allows us to adjust the cost of submitting a vote such that the total cost of subverting the system (i.e. spoofing enough valid votes to guarantee a certain outcome) is higher than the value gained by attacking the system.**
$> openssl speed sha.Note that the "high cost to generate a message" is a useful property in many other contexts. For example, email spam works precisely because it is incredibly cheap to generate a message. If we could raise the cost of sending an email message - say, by requiring a proof-of-work signature - then we could break the spam business model by raising costs to be higher than profits.PKI-protected IOU from Alice. 1. Bob completes a transaction with John where he transfers Alice's IOU to John. Both Bob and John update their ledgers, but Alice doesn't know about the transaction… yet. 1. **Happy scenario:** John asks Alice to redeem his new IOU; Alice verifies his transaction by fetching Bob's public key; if the transaction is valid she pays John the required amount. 1. **Not so happy scenario:** Bob uses his old ledger that omits his transaction with John to create a double-spend transaction with Katy. Next, both Katy and John show up at Alice's doorstep and realize that only one of them will get paid. The double-spend is possible due to the "weak consistency" of the distributed ledger: neither Alice nor Katy know about John and Bob's transaction, which allows Bob to exploit this inconsistency to his advantage. Solution? If the network is small and all participants are known, we can require that each transaction must be "accepted" by the network before it is deemed valid: * **Unanimous consensus:** whenever a transaction takes place the two parties contact all other participants, tell them about the transaction, and then wait for their "OK" before they commit the transaction. As a result, all of the ledgers are updated simultaneously and double-spend is no longer possible. * **Quorum consensus:** to improve processing speed and availability of the network (i.e. if someone is offline, we can still process the transaction) we can relax the above condition of unanimous consensus to quorum consensus (50% of the network). Either of the above strategies would solve our immediate problem for a small network of known and verified participants. However, neither strategy scales to a larger, dynamic network where neither the total number of participants is known at any point in time, nor their identity: 1. We don't know how many people to contact to get their approval. 1. We don't know whom to contact to get their approval. 1. We don't know whom we are calling.Note that we can use any means of communication to satisfy above worfklow: in person, internet, avian carriers, etc!Lacking identity and global knowledge of all the participants in the network we have to relax our constraints. **While we can't guarantee that any particular transaction is valid, that doesn't stop us from making a statement about the probability of a transaction being accepted as valid:** * **Zero-confirmation transaction:** we can accept a transaction without contacting any other participants. This places full trust on the integrity of the payer of the transaction - i.e. they won't double-spend. * **N-confirmation transaction:** we can contact some subset of the (known) participants in the network and get them to verify our transaction. The more peers we contact, the higher the probability that we will catch malicious parties attempting to defraud us. **What is a good value for "N"?** The answer depends on the amount being transferred and your trust and relationship with the opposite party. If the amount is small, you may be willing to accept a higher level of risk, or, you may adjust your risk tolerance based on what you know about the other party. Alternatively, you will have to do some extra work to contact other participants to validate your transaction. In either case, there is a tradeoff between the speed with which the transaction is processed (zero-confirmation is instant), the extra work, and the risk of that transaction being invalid. So far, so good. Except, there is an additional complication that we must consider: our system relies on transaction confirmations from other peers, but nothing stops a malicious user from generating as many fake identities as needed (recall that an "identity" in our system is simply a public-private keypair, which is trivial to generate) to satisfy Katy's acceptance criteria. **Whether Bob decides to execute the attack is a matter of simple economics: if the gain is higher than the cost then he should consider running the attack.** Conversely, if Katy can make the cost of running the attack higher than the value of the transaction, then she should be safe (unless Bob has a personal vendetta and/or is willing to lose money on the transaction... but that's out of scope). To make it concrete, let's assume the following: * Bob is transferring 1 chrom to Katy. * The cost of generating a fake identity and transaction response is 0.001 chroms: energy costs to keep the computer running, paying for internet connectivity, etc. If Katy asks for 1001 confirmations, then it no longer makes (economic) sense for Bob to run the attack. Alternatively, **we could add a proof-of-work requirement for each confirmation and raise the cost for each valid response from 0.001 chroms to 1:** finding a valid hash will take CPU time, which translates to a higher energy bill. As a result, Katy would only need to ask for 11 confirmations to get the same guarantee.Note that Katy also incurs some costs while requesting each confirmation: she has to expend effort to send out the requests and then validate the responses. Further, if the cost of generating a confirmation and verifying it is one-to-one, then Katy will incur the same total cost to verify the transaction as its value… which, of course, makes no economic sense.Great, problem solved, right? Sort of... in the process we've created another economic dilemma. **Our network now incurs a cost to validate each transaction that is of equal or higher value than the transaction itself.** While this acts as an economic deterrent against malicious participants, why would any legitimate participant be willing to incur any costs for someone else? A rational participant simply wouldn't, it doesn't make sense. Doh.
This is why the asymmetry of proof-of-work is critical. Katy incurs low cost to send out requests and validate the responses, but the party generating the confirmation needs to expend significantly more effort to generate a valid response.We made a big leap here. Previously we've only had one type of record in our network - the signed transaction. Now we have signed transactions and blocks. The former is generated by the individuals engaging in trade, and the latter is generated by parties interested in collecting fees by validating and confirming transactions.**Phew, ok, Alice has announced a new transaction and received a valid block from Chris confirming it. That's one confirmation, what about the rest?** Also, Chris is (hopefully) not the only participant who is incentivized to work on generating the blocks. What if someone else generates a different block at the same time, and which of those blocks is "valid"? This is where it gets interesting...
Also, note that the above scheme requires some minimum volume of transactions in the system to sustain the incentives for individuals creating the blocks: the more transactions there are, the lower the fees have to be for any single transaction.Note that there is also no consensus amongst the peers on which transactions should be validated next. Each participant aggregates their own list and can use different strategies to optimize their expected payoff. Also, due to the nature of our proof-of-work function (finding a partial hash collision for a SHA-256 checksum of the block), the only way to increase the probability of claiming a block is to expend more CPU cycles.There is one more caveat that we need to deal with: it's possible that two peers will find a valid block at about the same time and begin propagating through the network - e.g. Kent and Chris in the diagram above. As a result, some fraction of the network may end up accepting Kent's block as topmost block, while the rest will take Chris's block. Now what?It's possible that the race condition can persist for multiple blocks, but eventually one branch will race ahead of the other and the rest of the network will converge on the same longest chain.Great, we now have a strategy to resolve conflicts between different chains in the network. Specifically, the network promises linearizability of transactions by recording them in a linked list of blocks. But, crucially, it makes no promises about an individual block "guaranteeing" the state of any transaction. Consider the example above: * Alice sends out her transaction into the network. * Chris generates a valid block that confirms her transaction. Except, there is a fork in the chain and Chris's block is later "removed" as the network converges on Kent's branch of the chain. As a result, even when Alice receives a block with her transaction, she can't be sure that this block won't be undone in the future!By "undo" we mean any scenario where one of the participants can make the network accept an alternative transaction transferring funds to any account other than yours - e.g. you complete the transaction, hand over the widget and get a receipt, but the attacker then injects a transaction that "double-spends" those same funds to an alternative account.Why does the length of the block chain act as a good proxy for "safety" of a transaction? If an attacker wanted to undo a particular transaction, then they will need to build a chain that begins at a block prior to the one where that transaction is listed, and then build a chain of other blocks that is longer than the one currently used by the network. As a result, the deeper the block, the higher the amount of computational effort that would be required to replace it by creating an alternative chain. The longer the chain the more expensive it is to execute an attack.How many blocks should you wait for before accepting a transaction? There is no one number, the answer depends on the properties of the network (time to generate each block, propagation latency of the transactions and blocks, size of the network, etc), and the transaction itself: it's value, what you know about the other party, your risk profile, and so on.