Emoji to be encoded in Unicode
The Unicode Technical Committee is working on encoding emoji (絵文字) in the Unicode Standard and ISO10646. It has spurred loads of discussions on the Unicode mailing list with more than a handful of forked threads, leading to fundamental questions like whether we should even encode them, and what constitutes a character.
Not to worry, the Unicode consortium is a veteran when it comes to dealing with the hairy issues of creating standards that work across languages, cultures and geographic regions. They simply can’t please everyone.
To me, the motivation for this is clear — interoperability. The current state of affairs in the Japanese mobile industry leaves a lot to be desired: across the carriers, there exist different sets of supported emoji’s, different private-use characters, substitution mappings, and code pages (user-defined characters in Shift_JIS, really). As one can imagine, the results is chaos, and as I software engineer, I really don’t want to imagine what those poor software engineers have to do to make it “just work” when a message cross the carrier boundaries.
To illustrate my point, let’s look at what Google does when you send a message with some emoji characters from GMail to each of DoCoMo, Softbank, and au.
The screenshot above is for DoCoMo, but I also repeated the experiment for Softbank and au. From the bounce message, one can easily tell that what is saved as a sent message and what gets actually transmitted to each of the carriers’ SMTP server are all distinct in their encodings.
What’s saved in Gmail when you click on “Show original” embeds the graphics using standard mime techniques (multipart/related with CID URIs a.k.a RFC 2111) with an extension attribute called goomoji in the HTML version, which carries part of Unicode private use character assigned for it. For example, the crab is assigned
U+FE1E3 in Google, so its goomoji value is 1E3.
What’s sent to DoCoMo is a different story altogether:
It’s a standard multipart/alternative message with 2 parts: text/plain and text/html, both encoded in Shift_JIS. Decoding the text/plain part gives:
>>> import base64
>>> sjis = base64.b64decode("W4NKg2pd+aT56ApYT1hPIIFfKF4tXimBXgoKaHR0cDovL3hyaS5uZXQvPXdpbAo=")
>>> sjis
'[\x83J\x83j]\xf9\xa4\xf9\xe8\nXOXO \x81_(^-^)\x81^\n\nhttp://xri.net/=wil\n'
>>> print sjis.decode("shift_jis", 'ignore')
[カニ]
XOXO \(^-^)/
http://xri.net/=wil
Since DoCoMo doesn’t have the decapods in their emoji set, it gets encoded as カニ (Japanese for crab) in square brackets. Next comes the double musical notes , which is assigned a user defined Shift_JIS value of F9A4 in DoCoMo (explains why I had to pass the ‘ignore’ parameter to the decode method above, Python has no way to map that that sequence to Unicode and therefore barfs). Same goes for the tulip
. Ignoring the plain text “XOXO”, the last emoticon is a hug face
, which Google assigned a code point to but none of the carriers use a graphic to represent. In fact, this is mapped to a Kao-moji (顔文字 – “face words”).
For au (KDDI), the message was also sent in multipart/alternative with text/plain and text/html parts but this time encoded in ISO-2022-JP. Similar situation here, where crab => [カニ], KDDI’s version of ISO-2022-JP for the musical notes and tulip emoji, and kaomoji for the hugs.
Similar deal for Softbank, but the charset specified is PDC, but it smells just like Shift_JIS with user-defined characters to me.
I hope by now you have an appreciation of the kind of fiddling that Google engineers had to do in order to get their messages to display properly on Japanese mobile phones, just because the carriers decided to go invent their own mappings and character sets. It’s no wonder that Google and Apple, with its recently announced emoji support in iPhone, are among those supporting this effort.
The ongoing work can be found here and all the emoji’s is available here and here in gory details.
If I had the time and luxury (read: paid) to participate, I would. I wish them all the best and hope to see a good set of emoji’s in Unicode soon.
No related posts.
Related posts brought to you by Yet Another Related Posts Plugin.

February 1st, 2009 at 3:52 am
[...] remember reading about Apple supporting emoji on the iPhone OS 2.2. Now that I’ve upgraded, I decided to try it out but for could never [...]
June 22nd, 2010 at 11:29 pm
Aww, I want apple to approve of these cute little characters
PLEASE APPLE!!
I used to be able to send them to different carriers but not anymore..