6

I’ve developed an iOS app in which we can send emojis from iOS to web portal and vice versa. All emojis sent from iOS to web portal are displaying perfect except “© and ®”.

Here is the emoji encoding piece of code.

NSData *data = [messageBody dataUsingEncoding:NSNonLossyASCIIStringEncoding]; 
NSString *encodedString = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];

// This piece of code returns \251\256 as Unicodes of copyright and registered emojis, as these two Unicodes are not according to standard code so it doesn't display on web portal.

So what should I do to convert them standard Unicodes?

Test Run :

messageBody = @"Copy right symbol : © AND Registered Mark symbol : ®";

// Encoded string i get from the above encoding is

Copy right symbol : \\251 AND Registered Mark symbol : \\256

Where as it should like this (On standard unicodes )

Copy right symbol : \\u00A9 AND Registered Mark symbol : \\u00AE
aqsa arshad
  • 801
  • 8
  • 27

3 Answers3

5

messageBody is a string there is no reason to convert it to data only to convert it back to a string. Replace your code with

NSString *encodedString = messageBody;

If the messageBody object is incorrect then the way to fix it is to change the way it was created. The server sends data, not strings. The data that the server sends is encoding in some agreed upon way. Generally this encoding is UTF-8. If you know the encoding you can convert the data to a string; if you don't, then the data is gibberish that cannot be read. If the messageBody is incorrect, the problem occurred when it was converted from the data that the server sent. It seems likely that you are parsing it with the incorrect encoding.

The code you posted is just plain wrong. It converts a string to data using one encoding (ASCII) and the reads that data with a different encoding (UTF8). That is like translating a book to Spanish and then having a Portuguese speaker translate it back - it might work for some words, but it is still wrong.

If you are still having trouble then you should share the code of where messageBody is created.

If you server expects a ASCII string with all unicode characters changed to \u00xx then you should first yell at your server guy because he is an idiot. But if that doesn't work you can do the following code

NSString* messageBody = @"Copy right symbol : © AND Registered Mark symbol : ®";
NSData* utf32Data = [messageBody dataUsingEncoding:NSUTF32StringEncoding];
uint32_t *bytes = (uint32_t *) [utf32Data bytes];
NSMutableString* escapedString = [[NSMutableString alloc] init];
//Start a 1 because first bytes are for endianness
for(NSUInteger index = 1; index < escapedString.length / 4 ;index++ ){
   uint32_t charValue =  bytes[index];
    if (charValue <= 127) {
        [escapedString appendFormat:@"%C", (unichar)charValue];
    }else{
        [escapedString appendFormat:@"\\\\u%04X", charValue];
    }
}
Jon Rose
  • 8,373
  • 1
  • 30
  • 36
  • Then unicodes will be displayed as it is , it won't be converted to emojis. – aqsa arshad Apr 04 '17 at 11:16
  • What do you mean by 'as is'? What does the string look like? – Jon Rose Apr 04 '17 at 11:32
  • Please read question again. Here i'm creating a encoded string. – aqsa arshad Apr 04 '17 at 12:31
  • 1
    I still don't understand what is wrong with the `messageBody` string that you are trying to change it. Do you want to replace all non-ascii characters with LITERALLY a backslash and a u and series of number? – Jon Rose Apr 04 '17 at 13:11
  • please read this: https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/ – Jon Rose Apr 05 '17 at 06:18
  • this answer definitely doesn't deserve downvote. it provides at least half of the solution. – Mert Buran Apr 05 '17 at 11:15
  • @aqsaarshad I think this answer is fundamentally sound in terms of how to pass data that happens to be text, with both sender and receiver agreeing on the encoding. But, I suspect that you are right in that it doesn't provide any special handling of emojis with an alternate representation such as :bike: in place of the character . Perhaps you can add details to your question about how you are using emojis. – Tom Blodget Apr 05 '17 at 11:35
5

First, I will try to provide the solution. Then I will try to explain why.

Escaping non-ASCII chars

To escape unicode chars in a string, you shouldn't rely on NSNonLossyASCIIStringEncoding. Below is the code that I use to escape unicode&non-ASCII chars in a string:

// NSMutableString category
- (void)appendChar:(unichar)charToAppend {
    [self appendFormat:@"%C", charToAppend];
}

// NSString category
- (NSString *)UEscapedString {
    char const hexChar[] = "0123456789ABCDEF";
    NSMutableString *outputString = [NSMutableString string];
    for (NSInteger i = 0; i < self.length; i++) {
        unichar character = [self characterAtIndex:i];
        if ((character >> 7) > 0) {
            [outputString appendString:@"\\u"];
            [outputString appendChar:(hexChar[(character >> 12) & 0xF])]; // append the hex character for the left-most 4-bits
            [outputString appendChar:(hexChar[(character >> 8) & 0xF])];  // hex for the second group of 4-bits from the left
            [outputString appendChar:(hexChar[(character >> 4) & 0xF])];  // hex for the third group
            [outputString appendChar:(hexChar[character & 0xF])];         // hex for the last group, e.g., the right most 4-bits
        } else {
            [outputString appendChar:character];
        }
    }
    return [outputString copy];
}

(NOTE: I guess Jon Rose's method does the same but I didn't wanna share a method that I didn't test)

Now you have the following string: Copy right symbol : \u00A9 AND Registered Mark symbol : \u00AE

Escaping unicode is done. Now let's convert it back to display the emojis.

Converting back

This is gonna be confusing at first but this is what it is:

NSData *data = [escapedString dataUsingEncoding:NSUTF8StringEncoding];
NSString *converted = [[NSString alloc] data encoding:NSNonLossyASCIIStringEncoding];

Now you have your emojis (and other non-ASCIIs) back.

What is happening?

The problem

In your case, you are trying to create a common language between your server side and your app. However, NSNonLossyASCIIStringEncoding is pretty bad choice for the purpose. Because this is a black-box that is created by Apple and we don't really know what it is exactly doing inside. As we can see, it converts unicode into \uXXXX while converting non-ASCII chars into \XXX. That is why you shouldn't rely on it to build a multi-platform system. There is no equivalent of it in backend platforms and Android.

Yet it is pretty mysterious, NSNonLossyASCIIStringEncoding can still convert back ® from \u00AE while it is converting it into \256 in the first place. I'm sure there are tools on other platforms to convert \uXXXX into unicode chars, that shouldn't be a problem for you.

Mert Buran
  • 2,989
  • 2
  • 22
  • 34
  • Yes i got this Now. I've implement it but i got your point here. Thankyou – aqsa arshad Apr 07 '17 at 12:01
  • I'm having one more problem of same type. But that is when receiving a message from android or web portal. When i try to display emojis using `NSNonLossyASCIIStringEncoding` some times it returns nil due to the presence of character code "\u00a0" or any other non supported code. So any other way around of it ? any help would be appreciated. Thanks – aqsa arshad Apr 07 '17 at 12:11
  • Details are here : http://stackoverflow.com/questions/42021291/nsnonlossyasciistringencoding-returns-nil – aqsa arshad Apr 07 '17 at 12:12
  • the problem with "\u00a0" seems to be non-escaped "\" character. `NSNonLossyASCIIStringEncoding` works fine if it is "\\u00a0", at least it turns into whitespace instead of `nil` string. i guess you should make sure your strings are escaped properly after you receive them from network. so, you should turn "\u00a0" into "\\u00a0" during response serialization or any time before converting into unicode string. hope that helps. – Mert Buran Apr 07 '17 at 12:25
  • i Just mentioned a test code.. This is a sample string i got from server `Good morning \ud83c\udf3c .. So here comes test emojis \ud83d\ude00\ud83d\ude0d\ud83d\ude17\ud83d\ude05\ud83d\ude19\u263a️\ud83e\udd17\ud83d\ude1b\ud83d\ude0e\ud83d\ude18\ud83d\ude1d more emojis \ud83d\ude0e\ud83e\udd11\ud83e\udd11` & its returning Nill. This only happens when i receive emojis from web portal. (Not all time it occurs randomly) ... Can you please tell me whats wrong in this and how i can fix this. ....... P.s: This string doesn't contain "\u00a0" in it. – aqsa arshad Apr 10 '17 at 06:18
  • I've also tried this `testString= [testString stringByReplacingOccurrencesOfString:@“\\” withString:@“\\\\”];' but this didn't worked as well. – aqsa arshad Apr 10 '17 at 06:22
  • your string seems weird: "\u263!" and then i see a "?". i suggest you testing with shorter strings to see where the problem is. it can't happen randomly, something isn't converted properly – Mert Buran Apr 10 '17 at 07:51
  • This is valid string received from web portal. Yes i know something isn't converting properly but that needs to be figured out. – aqsa arshad Apr 12 '17 at 06:33
0

I'm really do not understand your problem.

You can simply convert ANY character into nsdata and return it into string. You can simply pass UTF-8 string including both emoji and other symbols using POST request.

NSString* newStr = [[NSString alloc] initWithData:theData encoding:NSUTF8StringEncoding];
NSData* data = [newStr dataUsingEncoding:NSUTF8StringEncoding];

It have to work for both server and client side.

But, of course, you have got the other problem that some fonts do not support allutf-8 chars. That's why, e.g., in terminal you might not see some of them. But this is beyong the scope of this question.

NSNonLossyASCIIStringEncoding is used only then you really wnat to convert symbol into chain of symbols. But it is not needed.

Vyacheslav
  • 26,359
  • 19
  • 112
  • 194
  • Please read this one. http://stackoverflow.com/questions/8635393/ios-5-how-to-convert-an-emoji-to-a-unicode-character – aqsa arshad Apr 05 '17 at 10:16
  • this one too : http://stackoverflow.com/questions/8635393/ios-5-how-to-convert-an-emoji-to-a-unicode-character?noredirect=1&lq=1 – aqsa arshad Apr 05 '17 at 10:21