1

I have a base64 'cipher' text. I know that in clear text is a hidden XML document (I know nothing about its structure), but the base64 alphabet was somehow shuffled. Is there any smart way, how to detect the modified alphabet better than checking all 64! permutations? I can slightly reduce the number of permutations by using the knowledge that XML document starts (without doctype, just root element) with opening tag (character '<') and ends with '>'. But there are still 62! permutations which are still not computable.

I have also tried to infer some other characters, but as the tags may be arbitrarily long and the content may be any character, I don't see much space for frequency analysis.

Jainabhi
  • 45
  • 1
  • 10
malejpavouk
  • 135
  • 7

1 Answers1

1

A well-formed XML document is subject to all sorts of constraints.

You can incorporate these constraints into a function that checks a base64 substitution key by looking for invalid syntax, such as improperly nested tags, or a tag name containing control characters, white space, or any of the characters !"#$%&'()*+,/;<=>?@[\]^`{|}~, or that starts with a dot, hyphen or numeral.

Combine this with a backtracking search algorithm, and with a bit of luck you should obtain a solution within a reasonable amount of time.

If it turns out to be too slow, there are other assumptions you could make. For example, most XML documents use only printable ASCII characters in their tag names, and often contain white space after closing tags.

r3mainer
  • 2,073
  • 15
  • 17