13

Why do we use hex representation as default for the output of a hash function's result?

For example, the SHA-256 hash function: the output of SHA-256 in hex representation uses 64 characters, while using Base64 on the raw output produces 44 characters.

Demo:

<?php
$password = "password";
$sha256 = hash('sha256',$password);
echo 'sha256('.strlen($sha256).'): '.$sha256.'<br />';

$sha256Base64 = base64_encode(hash('sha256',$password,true)); echo 'sha256('.strlen($sha256Base64).'): '.$sha256Base64.'<br />';

Output:

sha256(64): 5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8
sha256(44): XohImNooBHFR0OVvjcYpJ3NgPQ1qq73WKhHvch0VQtg=
Patriot
  • 3,162
  • 3
  • 20
  • 66
Neil Yoga Crypto
  • 313
  • 1
  • 4
  • 11

3 Answers3

14

Hexadecimal is traditional -- by this, I mean that there first were command-line tools that used hexadecimal for output, then other people using the hash functions found it fit to stick to hexadecimal, if only to be able to compare their values with the output of the aforementioned tools. That's how traditions get established: a more-or-less random choice at the start, then the need for interoperability and backward compatibility kicks in.

In the case of hexadecimal in cryptographic algorithms, one can probably trace it to the use of C language for reference implementations. Most algorithms are described with a specification (mathematical description, usually typeset in LaTeX), and a reference implementation that produces basic test vectors. For better or worse, the reference implementation is usually in C (or sometimes C++). In C, there is no standard facility for Base64 encoding (some programming platforms offer that, or external libraries, but it is not standard); but hexadecimal is easily obtained with a simple printf() with a "%08x" format string. As a very classic example, consider the MD5 specification (RFC 1321), which contains a reference implementation that does hexadecimal output.

The tradition is well entrenched; for the SHA-3 competition, NIST actually asked for reference implementations in C, and known-answer tests with a fully-specified text format that was hexadecimal throughout.

It must also be said that hexadecimal is convenient for debugging: the human developer can easily observe hexadecimal output and map these to individual bits, by doing the simple conversion in his head. Base64 is not as simple, because it entails 64 glyphs instead of 16, including some which are prone to induce visual confusion (1 vs I vs l, 0 vs O...). Also, many algorithm internally use 32-bit or 64-bit words, that map well to CPU registers; 32 and 64 are multiples of 4 but not of 6, so Base64 encoding again implies some non-trivial splitting.

Thomas Pornin
  • 88,324
  • 16
  • 246
  • 315
3

In short: Hexadecimal is virtually a gold standard for radix 16 encoding. Base64 isn't standard at all.

Hex (quoting):-

the letters A–F or a–f represent the values 10–15, while the numerals 0–9 are used to represent their usual values.

And each character represents a nibble. So exactly two characters per byte.

Now consider Base64. There may or may not be padding. The 62nd and 63rd characters can vary according to protocol. Sometimes there's even a cyclic redundancy check automatically included. Let me just list part of the Base64 Wiki page contents:-

3 Examples
3.1 Output padding
3.2 Decoding Base64 with padding
3.3 Decoding Base64 without padding

4 Implementations and history

4.1 Variants summary table
4.2 Privacy-enhanced mail
4.3 MIME
4.4 UTF-7
4.5 OpenPGP
4.6 RFC 3548
4.7 RFC 4648
4.8 The URL applications
4.9 HTML
4.10 Other applications
4.11 Radix-64 applications not compatible with Base64

It's a protocol dependent mess. And notice §4.11! So it's just simpler and less prone to implementations/interpretations and variations/errors.

Paul Uszak
  • 15,905
  • 2
  • 32
  • 83
-4

Because in JS, numbers are always a representation of a 64-bit floating point number (the bit, mantissa and exponent), and on the SHA-256 hash function, hex, which is also equivalent to 16, always generates a 64-bit encoded representation.

Patriot
  • 3,162
  • 3
  • 20
  • 66
yarn
  • 1