I am working on python 2-3 compatibility. When working with str and byte types, there is an issue I am coming across. Here is an example
# python 2
x = b"%r" % u'hello' # this returns "u'hello'"
# python 3
x = b"%r" % u'hello' # this returns b"'hello'"
Notice how the extra unicode u appears in the final representation of x in python 2? I need to make my code return the same value in python3 and python2. My code can take in str, bytes, or unicode values.
I can coerce the python 3 value to the python 2 value by doing
# note: six.text_type comes from the six compatibility library. Basically checks to see if something is unicode in py2 and py3.
new_data = b"%r" % original_input
if isinstance(original_input, six.text_type) and not new_data.startswith(b"u'"):
new_data = b"u%s"
This makes the u'hello' case work correct but messes up the 'hello' case.
This is what happens:
# python 2
x = b"%r" % 'hello' # this returns "'hello'"
# python 3
x = b"%r" % 'hello' # this returns b"'hello'"
The problem is that in python 3 u'hello' is the same as 'hello', So if I include my code above, the result for both u'hello and 'hello' end up returning the same result as u'hello in python 3.
So I need some kind of way to tell if a python 3 input string explicitly has specified the u in front of the string, and only execute my code above if that case is satisfied.