Overview of Bitcoin Blockchain: SHA256 Hashing

In the previous post, we saw

  • the complexity of the modern financial transactions,
  • how things are centralized due to the need for trust
  • how the costs & risks are high due to centralization
  • We also understood what is a blockchain and the different attributes of a block in the blockchain

We came across a term called hash which the miner generates with the contents of his block and he/she compares if the hash function is below a ‘target’, if yes, he/she is allowed to append the block to the blockchain

In this blog, we will delve deeper in the hash function. We will cover

1. What is hashing and SHA256
2. How does that provide security
3. How does this fit into the Bitcoin ecosystem

Consider hash to be the equivalent of a human fingerprint in the digital space i.e. if you generate a hash for this sentence

SHA256 is an amazing cryptographic finding

then you will get the following hash

1433ed18ed506f53632a1f8f004760a9b61a49917a16856af481a9c1359cc077

However many times you enter the same text, the algorithm will always spit out the same hash. This property is called determinism and plays a key role in cryptography

You can experiment by generating SHA256 (Secure Hash Algorithm) in codebeautify site. You can generate the hash by keying in text or even large word documents, presentations, etc

The beauty is, if you just as much as add an extra space (between amazing & cryptographic) in the below sentence, then the hash algorithm changes dramatically. This property is called Avalanche effect

SHA256 is an amazing  cryptographic finding

9017c8475e7edf374338aa9d3a2d561a993650a8349e45c84af4e8330bf3e660

If you observe both the above hashes, they have 64 characters and the numbers range from “0-9” and alphabets from “a-f”. No matter how big a file you input to generate an hash, you will get a unique hash comprising of a combination of these numbers and alphabet. This ability to generate unique hashes i.e. without collision is another important property needed for a secure cryptographic hash algorithm

Now you may think, this algorithm generates only 64 hexadecimal characters of randomized hash and if ANY digital file can be converted into this hash, wouldn’t there obviously be a collision? Mathematically yes, however in the hexadecimal format, each character takes 4 bits and with 64 characters, there are 256 bits; the probability of such a collision is very very very low because the number of possible combinations are 2^256. Further it would endanger our use case (in Bitcoin) if someone can generate collisions on purpose, which is practically impossible with the available technology. There is an interesting discussion on the combination in this stackoverflow thread

If you are with me till now, the question you may have is, if the hash is exposed, would someone be able to reverse hash and generate back the message/document? This is the difference between hashing & encryption. Hashing is non-reversible; while the intent of encryption is to be able to decrypt aka reversible. This is yet another property that we need for Bitcoin hashing

So in summary, following are the needed property, which SHA256 supports

1. Deterministic
2. Non-reversible
3. Avalanche effect
4. Non-collision
5. Fast computation

Forget about Bitcoin, does hashing has any other real world use case?

Absolutely! When you download a software from a third party server, there is always a probability that a third party actor could have added a piece of malicious code. To ensure that you as a user gets to download the manipulated source, the author of the software will publish the hash key. After you download the software, you can hash your copy and if both yours and the developers hash matches, then you know that you have got the exact copy. Here is an example of how Fedora (a Linux distribution) suggests you verify their download

Similarly if you are sending across a large financial/legal contract through a third party (e-mail, runner, etc), you can generate the hash and share it to the recipient, so that he or she on receiving the document can hash it and compare, so as to ensure that there has been no tampering

How does all this tie up with Bitcoin blockchain?

Now that you have understood the characteristics of SHA256, let us see how this is put to use in Bitcoin blockchain

When a miner adds transactions to his block, the block has the previous hash, the transactions (data) and nonce with which the miner generates a hash. If the hash is not below the set target (we will see what this means in a later blog), the miner changes the nonce & regenerates the hash until he or some other miner finds a hash below the target. If a miner finds the hash below the target, this miner gets to add his block to the blockchain and earn the reward

This miner then announces to the Bitcoin network the value of the nonce which generated the hash; with this information, all the other miners & nodes can verify if they are able to generate the same hash. This is one of the many checks which the nodes perform in order to ensure the integrity of the blockchain

As you can observe, the determinism of SHA256 allows every node to audit the integrity of the successful miner. The non-reversible property ensures no bad actor can manipulate. The avalanche effect ensures that it is impossible for the miner to guess what the Nonce should be for him to get a hash below the target. The non-collision property ensures that in a decentralized distributed network, two miners do not get the same hash for different data sets nor can a bad actor manipulate a block with incorrect transactions and insert a block into the blockchain. Last but not the least, unless the hashing computation is fast, it would not be practically usable

What have you learnt so far?

Between the previous blog and this, you have understood

I suppose by now you are able to piece together parts of our discussion and have a high level understanding that through hashing the integrity of the system is maintained however you aren’t yet clear on how a miner is able to write to a blockchain, how is decentralization playing a game in ensuring trust-less transactions, how does the blockchain handle exceptions. In the next blog, we will look into the immutable ledger and peer-to-peer network and in subsequent blogs, we will dive into the rest

If you still feel dazed, not to worry, by the end of this blog series, with 1-2 re-reading, you will understand the simplicity of this technology. Do also leave your comment if this is helpful or needs to be further simplified