20

I understand this may not be the best place to ask a question like this, but I believe that this community may be the best/only place I can ask such a question.

I have inputs and outputs from an in-house hash developed years ago and the developer is no longer working with us any more. I need to determine the process that is the hash. Before you mention how this may be impossible, it appears to be a relatively simple hash and I have a few theories that appear to get me started in the right direction. First, the data:

 Input   Hash
1293239 EHELKHE
1331487 ILZXFZF
1320709 IKYJXWH
1328166 GUVUMMY
1156693 HFXLFYZ
1313273 ELHZLZV
1287367 GKDMWGM
1318623 EHDHYYF

This does not appear to be a very good hash, in my opinion. For starters, the output length matches the input length, this may be coincidence though. The main thing that catches my eye is that the output characters seem to all exist in a particular pool of possibilities. Here's a list of unique output characters: DEFGHIJKLMUVWXYZ. This lines up exactly with the hex values of 0123456789ABCDEF, in fact, in ASCII values, the two sets are 20 off from eachother. So I hypothesis:

D = 0
E = 1
F = 2
G = 3
H = 4
I = 5
J = 6
K = 7
L = 8
M = 9
U = A
V = B
W = C
X = D
Y = E
Z = F

From there, I'm out of ideas. I've compared the hash results with many standard hashes and have seen no correlation and have determined that it is most likely an original process. I've played with the numbers as best as I can and I have made no progress. What techniques are available to analyze this hash?

Mike Edward Moras
  • 18,161
  • 12
  • 87
  • 240
user316
  • 211
  • 1
  • 2
  • 5

3 Answers3

34

You are correct that it is a "bad hash". In fact it is not a hash at all. I've worked at a company that used a slightly different scheme for obfuscating database keys/numbers in URLs. And I also worked for another company that used a scheme that looked surprisingly similar for unlock codes for electronic devices.

The formula for converting "hashes" back into inputs looks like:

  1. Take the "hash" (column "hash" below") and use your substitution cipher (E goes to 1, etc) to turn it into hex (column "subs" below).

  2. Convert the result into decimal (column "to dec" below).

  3. You have an 8 digit number, swap the first and last half of the digits (column "swap" below).

  4. Divide the result in 3 by 13 to obtain the customer id (column "divide" below).

"hash".....subs......to dec......swap.....divide
EHELKHE - 1418741 - 21071681 - 16812107 - 1293239
ILZXFZF - 58FD2F2 - 93311730 - 17309331 - 1331487
IKYJXWH - 57E6DC4 - 92171716 - 17169217 - 1320709
GUVUMMY - 3ABA99E - 61581726 - 17266158 - 1328166
HFXLFYZ - 42D82EF - 70091503 - 15037009 - 1156693
ELHZLZV - 184F8FB - 25491707 - 17072549 - 1313273
GKDMWGM - 3709C39 - 57711673 - 16735771 - 1287367
EHDHYYF - 1404EE2 - 20991714 - 17142099 - 1318623

For folks who might find this in the future, it helps to make a spreadsheet and multiplication tables (customer id * 1 through customer id * 50). Formulas like these are usually simple linear formulas like $mx + b$, sometimes with a modulo involved (such as $mx + b \mod c$). And sometimes with "junk" digits added to make things harder to reverse engineer.

D.W.'s idea of "chosen plaintext" attack is also a good one. If you had pairs of customer id and "hashes" that varied by a single digit in the customer id, that would probably make it easier to attack.

And "security by obscurity" is no security at all. I spent about 2 hours avoiding homework tonight.

wythagoras
  • 207
  • 1
  • 6
Tangurena
  • 1,436
  • 15
  • 21
9

I agree that reverse-engineering the binary code might be a worthwhile approach.

Another option is to try a chosen-plaintext attack. e.g., try hashing 0000, 0001, 0002, ..., 0009, 0010, .., 0090, .., 1000, .., 9000, and see what you can learn from that.

D.W.
  • 36,982
  • 13
  • 107
  • 196
6

Since the input is base-10 and the output is base-16, the output actually looks bigger than the input.

Do you have access to a binary code implementation? Disassembling or debugging that would be easiest.

How many hashed pair samples do you have access to? Can you generate more with chosen inputs?

What was the skill set and mindset of the developer who built it? When did he make it?

Is this a software license key crack? :-)

EDIT: Since it's compiled C++, you can disassemble or debug the executable to extract the algorithm. This will always be easier than attempting to reverse it as a black box.

Hint: To find crypto code, look for XOR instructions where the operands are not the same register to itself. If you find magic constants being used, do a web search for them. They may tell you exactly which algorithm is being used.

Marsh Ray
  • 1,896
  • 13
  • 15