Emoji Encoding Conversion between Carriers

I remember reading about Apple supporting emoji on the iPhone OS 2.2. Now that I’ve upgraded, I decided to try it out but for could never find it after hunting through the keyboard preferences. Googling showed that these cute little emoticons are only available for Softbank users. Thankfully, Steven Troughton-Smith has figured out that by editing a file on your iPhone backup, the “emoji” option suddenly shows up under Settings -> General -> Keyboard -> International Keyboards -> Japanese!

Now that I have the ability to enter these emoji on my iPhone, I figure I’d try it out by sending an email to myself. Alas, all I get is a list of of boxes. Time to look at the message content (relevant fields):

Content-Type: text/plain;
  charset=cp932;
  format=flowed
Content-Transfer-Encoding: base64
X-Mailer: iPhone Mail (5G77)
Mime-Version: 1.0 (iPhone Mail 5G77)
Subject: trying out emoji
Date: Sat, 31 Jan 2009 19:11:03 +0800

aGV5IHRoZXJlIPGQDQoNCvWc9d3wSPB78unzR/SY9Jv3RPCW9vnwlQ==

ok, pretty strange that it’s sent in cp932 encoding, but we’ll see:

>>> from base64 import b64decode
>>> s = b64decode('aGV5IHRoZXJlIPGQDQoNCvWc9d3wSPB78unzR/SY9Jv3RPCW9vnwlQ==')
>>> s
'hey there \xf1\x90\r\n\r\n\xf5\x9c\xf5\xdd\xf0H\xf0{\xf2\xe9\xf3G\xf4\x98\xf4\x9b\xf7D\xf0\x96\xf6\xf9\xf0\x95'
>>> s.decode('cp932')
u'hey there \ue10b\r\n\r\n\ue407\ue448\ue008\ue03b\ue220\ue23b\ue347\ue34a\ue528\ue055\ue520\ue054'
>>>

The Unicode code points look like they do correspond to the Softbank private use ones, so I’m going to use the emoji4unicode package to convert it to HTML. The following Python script will convert it to the various carrier’s representation:

import emoji4unicode
import carrier_data

s = b64decode('aGV5IHRoZXJlIPGQDQoNCvWc9d3wSPB78unzR/SY9Jv3RPCW9vnwlQ==')
uni = s.decode('cp932')

sd = carrier_data.GetSoftbankData()
dd = carrier_data.GetDocomoData()
kd = carrier_data.GetKddiData()

emoji4unicode.Load()
def find_symbol(pua, carrier):
    for sym in emoji4unicode.GetSymbols():
        uni = sym.GetCarrierUnicode(carrier)
        if uni and uni == pua:
            return sym

def map_symbol(sym, carrier, cdata):
    uni = sym.GetCarrierUnicode(carrier)
    if not uni: # no mapping for this carrier
        return sym.GetTextFallback()
    else:
        if uni.startswith(">"): # mapped
            if len(uni) > 5:
                raise Exception, "cannot handle this yet"
            uni = uni[1:]

        return cdata.SymbolFromUnicode(uni).ImageHTML()

softbank = []
docomo = []
kddi = []

for u in uni:
    hex = "%04X" % ord(u)
    if u > '\x7F':
        sym = find_symbol(hex, "softbank")

        softbank.append(sd.SymbolFromUnicode(hex).ImageHTML())
        kddi.append(map_symbol(sym, "kddi", kd))
        docomo.append(map_symbol(sym, "docomo", dd))
    elif u == '\n':
        softbank.append('<br />')
        kddi.append('<br />')
        docomo.append('<br />')
    else:
        softbank.append(u)
        kddi.append(u)
        docomo.append(u)

print "Softbank:<br />", ''.join(softbank).encode('utf-8')
print "<hr />KDDI:<br />", ''.join(kddi).encode('utf-8')
print "<hr />DoCoMo:<br />", ''.join(docomo).encode('utf-8')

Results below:

Softbank:
hey there

KDDI:
hey there

DoCoMo:
hey there

[サンタ][<][イチゴ][ナス][サル][イルカ][クジラ]

As you can see, DoCoMo has the least number of emoji’s, so many of the characters like “Santa”, “Strawberry”, “Eggplant”, “Monkey”, “Dolphin” and “Whale” are substituted by the fallback text format.

Do note that the python script is not optimized at all, and loops through every emoji in the database for each character it needs to convert. Also, if you need such functionality in your application, there are various libraries out there that already does the mapping well. This is just an experiment.



Related posts:

  1. Emoji to be encoded in Unicode The Unicode Technical Committee is working on encoding emoji (絵文字) in the Unicode Standard and ISO10646. It has spurred loads...
  2. On Mobile OpenID in Japan This presentation by =zigorou (Toru Yamaguchi) titled “Considering OpenID for Mobile” (Thanks =peterd and =nat) is particularly interesting for me...
  3. iPhone-blogging I’ve always wanted to post to my own blog from the mobile phone rather than using third party blogging platforms....
  4. Domain Tool for iPhone — whois on the move Introducing DomainTool — an iPhone application for querying domain name whois information. I wrote this to learn iPhone programming (with...
  5. mod_python OpenID Access Control Since XRI is pretty much in bed with OpenID and NeuStar is an XRI shop, I get to play around...

Related posts brought to you by Yet Another Related Posts Plugin.


  • January 31, 2009 at 1:35 pm Wil
    I remember reading about Apple supporting emoji on the iPhone OS 2.2. Now that I’ve upgraded, I decided to try it out but for could never find it after hunting through the keyboard preferences. Googling showed that these cute little emoticons are only available for Softbank users. Thankfully, Steven Troughton-Smith has figured out that by editing a file on your iPhone backup, the “emoji” option suddenly shows up under Settings -> General -> Keyboard -> International Keyboards -> Japanese! Now that I have the ability to enter these emoji on my iPhone, I figure I’d try it out by sending an email to myself. Alas, all I get is a list of of boxes.