10

I was encrypting some Ansible secrets this morning and noticed that the ciphertexts seemed to have a lot of 3s and 6s in them. I did some frequency counts and found that yes, in fact, about 40% of the digits are 3s, and more than 20% 6s:

$ANSIBLE_VAULT;1.2;AES256;staging-swarm
65313330663061336263663766653361336535316363303035386461656265643262346238313962
3961353035636635663338633338616437353332343735390a346135346166613737343265623239
31343733663762613533306332626263366330313639356239316532643134383035333932316535
3766396161306433640a656638363338323636653933323064343431363530616637363035303938
35373235656264636134623030633430343435306336646330633739323430663763343366333733
33666663323466323463343732353533353833306537383936336662343161333137303763326534
353764373232326263353238623634623565

Frequency count:

3: 207
6: 114
5: 37
2: 35
4: 31
1: 25
0: 23
7: 19
9: 12
8: 11

This seems to hold up for other Ansible secrets, e.g. the ones in this question, combined frequency count:

3: 247
6: 145
2: 54
5: 42
4: 35
1: 26
9: 26
7: 25
0: 22
8: 22

Is this a peculiarity of the bytes of the underlying AES ciphertext, or of the way Ansible/Python represents the binary ciphertext as a digit string?

David Moles
  • 213
  • 1
  • 7

2 Answers2

18

Is this a peculiarity of the bytes of the underlying AES ciphertext, or of the way Ansible/Python represents the binary ciphertext as a digit string?

It's an artifact of how it represents a binary ciphertext.

What it does it first converts the binary into an ASCII representation of the hex, that is, converting it into the digits '0' through '9' and 'a' through 'f'. Then, it converts each hex digit into its two digit hex representation, that is, the values '30' through '39' and '61' through '66'.

Back translating the first 16 characters of your quoted string:

6531333066306133

when converted back into ASCII, those are the 8 characters e130f0a3, which is the hex representation of the actual binary value.

As for why the character distribution you saw:

  • The character 3 is the first digit of 62.5% of the encoded bytes and is the second digit of 12.5% of the encoded bytes; hence it is expected to occur 37.5% of the time in aggregate

  • The character 6 is the first digit of 37.5% of the encoded bytes and is the second digit of 12.5% of the encoded bytes; hence it is expected to occur 25% of the time.

  • Any of the characters 1,2,4,5 is the second digit of 12.5% of the encoded bytes, hence it is expected to occur 6.25% of the time.

  • Any of the characters 0,7,8,9 is the second digit of 6.25% of the encoded bytes, hence they are expected to occur 3.125% of the time.

Those are roughly what you've seen (although the second example doesn't hold that close to these expected statistics; I'm not certain why that would be)

poncho
  • 154,064
  • 12
  • 239
  • 382
0

Can't comment as I need 50 reputation points (). So posting this as an answer. To add to what @poncho noted, this is the piece of code in ansible, where you can see that the hexlify() is being called twice:

Also see the note:

    # Unnecessary but getting rid of it is a backwards incompatible vault
    # format change
tvsaru
  • 1
  • 2