What Is Hashing? Here’s What SHA-256 and MD5 Actually Do
Photo by Markus Spiske on Unsplash
Table of Contents
I Spent a Year Using Hashes Without Understanding Them
I’d run `git log` and see those 40-character commit hashes every day. I’d store passwords as "hashed" values because every tutorial said to. I’d see SHA-256 mentioned in API docs and nod along like I knew what made it different from MD5. I didn’t.
Then one afternoon I accidentally changed a single character in a config file and watched the hash go from `a7ffc6f3...` to `3b2d9286...` — a completely different string. Same file, one byte different, and the hash output looked like it came from a different planet. That’s when I realized I needed to actually understand this thing.
So I went down the rabbit hole. And it turns out hashing is one of those concepts that’s way simpler than it sounds — once someone explains it without throwing linear algebra at you.
A Hash Is a One-Way Fingerprint — That’s It
Three properties make hashing useful:
**It’s deterministic.** Feed the same input, get the same output. Every single time. No exceptions. You can test this yourself right now — paste "hello world" into a hash generator and you’ll get the SHA-256 output `b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9`. Run it again tomorrow and it’ll be identical.
**It’s one-way.** You can go from input to hash, but you can’t go backward. This isn’t like Base64 encoding, where you decode the output and get the original back. A hash destroys information. The output is 64 characters whether your input was 5 characters or 5 million — there’s no way to reconstruct what got compressed.
**It’s avalanche-sensitive.** Change one character in the input, and the output changes completely. "hello world" and "hello World" (just one capital letter) produce entirely different hashes. Not slightly different — utterly, unrecognizably different. This property makes hashes incredibly useful for detecting tampering.
That third one is what clicked for me. It means you can verify data hasn’t been modified by comparing its hash before and after transmission. If even one bit changed, the hash won’t match.
SHA-256 vs MD5 vs SHA-1 — Which One Matters
Photo by FLY:D on Unsplash
| Algorithm | Output Length | Speed | Security Status | Use in 2026 | |-----------|-------------|-------|----------------|-------------| | MD5 | 32 hex chars (128 bits) | Very fast | Broken | Checksums only | | SHA-1 | 40 hex chars (160 bits) | Fast | Broken since 2017 | Legacy, avoid | | SHA-256 | 64 hex chars (256 bits) | Moderate | Strong | General purpose | | SHA-512 | 128 hex chars (512 bits) | Moderate | Very strong | High-security apps | | BLAKE3 | 64 hex chars (256 bits) | Very fast | Strong | Modern alternative |
**MD5** was the standard for decades. Then researchers proved you could generate two different inputs that produce the same hash — a "collision." That broke its security guarantees. You’ll still see MD5 used for non-security checksums, like verifying a file download isn’t corrupted. But don’t use it for anything security-related.
**SHA-1** held on longer, but Google’s SHAttered attack in 2017 demonstrated a practical collision. Git still uses SHA-1 internally for commit hashes (it’s migrating to SHA-256), but for new projects there’s zero reason to pick SHA-1.
**SHA-256** is the workhorse. It’s what JWTs use for their signatures — I covered the full mechanics in my JWT explainer if you want the details. It’s what Bitcoin uses. It’s what most APIs use when they sign requests. For general development work in 2026, SHA-256 is your safe default.
**BLAKE3** is the newer option. Faster than SHA-256 on modern hardware, equally secure, and it supports features like keyed hashing and tree hashing natively. You probably won’t encounter it in most web dev work yet, but it’s worth knowing about.
Where You’re Already Using Hashes Without Knowing
**Git commits.** Every commit SHA you’ve ever copied is a hash of the commit’s content, parent references, author info, and timestamp. That’s why they look random. Change the commit message by one character and the entire commit hash changes. Git uses this to guarantee the integrity of your entire repository history.
**Password storage.** When you sign up for a site and set a password, that password should never be stored in plain text. The server hashes it and stores the hash. When you log in later, it hashes what you typed and compares the two hashes. If they match, you’re in. The actual password never sits in the database — which is why "forgot password" flows make you create a new one instead of emailing you the old one. They literally can’t retrieve it.
**File integrity checks.** Ever downloaded a Linux ISO and seen a "SHA-256 checksum" listed next to the download link? You hash the downloaded file locally and compare it to the published hash. If they match, the file wasn’t corrupted or tampered with during the download.
**JWT signatures.** When a server signs a JWT token, it hashes the header and payload together with a secret key. That hash becomes the signature. It’s how the server knows the token hasn’t been modified by a client.
**Subresource Integrity (SRI).** Those `integrity="sha384-..."` attributes on CDN `<script>` tags? Hash checks. Your browser hashes the downloaded script and compares it to the expected hash in the attribute. If a CDN gets compromised and serves malicious JavaScript, the hash won’t match and the browser blocks it.
You interact with all of this daily without thinking about it. I know I did for a long time.
Why You Can’t Un-Hash Something
Photo by Alexander Sinn on Unsplash
Because information gets destroyed in the process. Think of it like a meat grinder — you can turn a steak into ground beef, but you can’t turn ground beef back into a steak. The structure is gone.
A SHA-256 hash is always 256 bits. But the input could be anything from a single byte to terabytes of data. An infinite number of possible inputs map to a finite number of hash outputs. Mathematically, you can’t reconstruct the exact original.
"But what about rainbow tables?" Fair question. A rainbow table is a massive precomputed database of input-hash pairs. Someone hashes millions of common passwords and stores the results. If your password’s hash matches one in the table, they know the original input.
The defense? **Salting.** Before hashing a password, you prepend a random string (the "salt") that’s unique to each user. So "password123" becomes something like "x9k2mQ_password123" before hashing. Now the attacker would need a separate rainbow table for every possible salt — computationally impossible.
Every serious password hashing library handles salting automatically. If you’re hashing passwords manually and not salting them, stop. Use bcrypt or Argon2 instead. They do the right thing out of the box.
Password Hashing Is a Different Game Entirely
The problem is that SHA-256 is *fast*. That’s a feature for file checksums and data integrity, but it’s a liability for passwords. A modern GPU can compute billions of SHA-256 hashes per second. If someone gets your database of SHA-256-hashed passwords, they can brute-force millions of guesses every second.
Password-specific hash functions are designed to be slow on purpose.
**bcrypt** has been the standard since the late ’90s. It includes a configurable "cost factor" that controls how many iterations the hash runs. Higher cost means slower hashing, which means harder to brute-force. Most implementations default to a cost of 10–12, and each bump roughly doubles the computation time.
**scrypt** adds memory-hardness on top of that. It’s not just CPU-intensive — it requires significant RAM to compute. This makes it harder to parallelize on GPUs, which have tons of cores but limited memory per core.
**Argon2** won the Password Hashing Competition in 2015 and is the OWASP-recommended choice for new projects as of 2026. It’s configurable for both time and memory usage, with three variants: Argon2d (GPU-resistant), Argon2i (side-channel resistant), and Argon2id (hybrid — the one you should pick).
If you’re building authentication today, use Argon2id. If your framework only supports bcrypt, that’s still solid. Just don’t reach for MD5, SHA-1, or plain SHA-256 for passwords. Ever.
And while you’re setting up password hashing, make sure you’re actually requiring strong passwords from users in the first place. The best hashing algorithm in the world won’t save "password123" from a determined attacker with a dictionary file.
The irony of password hashing? The best hash function is the slowest one. Everything else in computing optimizes for speed. This is the one place where slow is a feature, not a bug.
Frequently Asked Questions
What is hashing and how is it different from encryption?
Hashing is a one-way function that converts any input into a fixed-length string. You can’t reverse a hash to get the original data back — the information is permanently destroyed. Encryption is two-way: data gets scrambled with a key and can be unscrambled with the same key (or a paired key). If you need to retrieve the original data later, you want encryption. If you just need to verify data matches without storing the original — like checking a password at login — hashing is the right tool.
Is MD5 safe to use in 2026?
MD5 isn’t safe for any security-sensitive use. Researchers demonstrated practical collision attacks years ago, meaning attackers can generate two different inputs that produce the same MD5 hash. It’s still acceptable for non-security purposes like verifying file downloads haven’t been corrupted during transfer. But for anything involving authentication, digital signatures, or data integrity verification, use SHA-256 or stronger.
Why do different hash algorithms produce different length outputs?
Each hash algorithm is designed with a specific output size built into its mathematics. MD5 produces 128 bits (32 hex characters), SHA-256 produces 256 bits (64 hex characters), and SHA-512 produces 512 bits (128 hex characters). Longer outputs mean a larger space of possible hashes, which makes collisions — two inputs producing the same hash — exponentially less likely. The output length doesn’t depend on the input size at all. A 3-character word and a 3-gigabyte file produce the same length hash.
What happens if two different inputs produce the same hash?
That’s called a collision, and it’s theoretically inevitable because hash functions map an infinite number of possible inputs to a finite set of outputs. For strong algorithms like SHA-256, finding a collision deliberately would require trying roughly 2^128 combinations — more than the number of atoms in the observable universe. For broken algorithms like MD5, researchers found efficient methods to manufacture collisions on purpose, which is why it’s considered insecure.
Should I use SHA-256 or bcrypt for hashing passwords?
Always use bcrypt (or Argon2id) for passwords, never SHA-256. SHA-256 is designed to be fast, which is great for checksums but terrible for passwords. A modern GPU can compute billions of SHA-256 hashes per second, making brute-force attacks feasible. bcrypt and Argon2id are deliberately slow and include configurable work factors that let you increase computation time as hardware gets faster. They also handle salting automatically.
How does salting make password hashes more secure?
A salt is a random string added to each password before hashing. Without salting, identical passwords always produce identical hashes — so if two users both choose "password123," their stored hashes are the same. An attacker with a precomputed rainbow table can crack all of them at once. With salting, each user gets a unique random salt prepended to their password, so identical passwords produce completely different hashes. The salt is stored alongside the hash and doesn’t need to be secret — it just needs to be unique per user.
Try ToolsFuel
23+ free online tools for developers, designers, and everyone. No signup required.
Browse All Tools