Emoji Encoding Conversion between Carriers

I remember reading about Apple supporting emoji on the iPhone OS 2.2. Now that I’ve upgraded, I decided to try it out but for could never find it after hunting through the keyboard preferences. Googling showed that these cute little emoticons are only available for Softbank users. Thankfully, Steven Troughton-Smith has figured out that by editing a file on your iPhone backup, the “emoji” option suddenly shows up under Settings -> General -> Keyboard -> International Keyboards -> Japanese!

Now that I have the ability to enter these emoji on my iPhone, I figure I’d try it out by sending an email to myself. Alas, all I get is a list of of boxes. Time to look at the message content (relevant fields):

Content-Type: text/plain;
  charset=cp932;
  format=flowed
Content-Transfer-Encoding: base64
X-Mailer: iPhone Mail (5G77)
Mime-Version: 1.0 (iPhone Mail 5G77)
Subject: trying out emoji
Date: Sat, 31 Jan 2009 19:11:03 +0800

aGV5IHRoZXJlIPGQDQoNCvWc9d3wSPB78unzR/SY9Jv3RPCW9vnwlQ==

ok, pretty strange that it’s sent in cp932 encoding, but we’ll see:

>>> from base64 import b64decode
>>> s = b64decode('aGV5IHRoZXJlIPGQDQoNCvWc9d3wSPB78unzR/SY9Jv3RPCW9vnwlQ==')
>>> s
'hey there \xf1\x90\r\n\r\n\xf5\x9c\xf5\xdd\xf0H\xf0{\xf2\xe9\xf3G\xf4\x98\xf4\x9b\xf7D\xf0\x96\xf6\xf9\xf0\x95'
>>> s.decode('cp932')
u'hey there \ue10b\r\n\r\n\ue407\ue448\ue008\ue03b\ue220\ue23b\ue347\ue34a\ue528\ue055\ue520\ue054'
>>>

The Unicode code points look like they do correspond to the Softbank private use ones, so I’m going to use the emoji4unicode package to convert it to HTML. The following Python script will convert it to the various carrier’s representation:

import emoji4unicode
import carrier_data

s = b64decode('aGV5IHRoZXJlIPGQDQoNCvWc9d3wSPB78unzR/SY9Jv3RPCW9vnwlQ==')
uni = s.decode('cp932')

sd = carrier_data.GetSoftbankData()
dd = carrier_data.GetDocomoData()
kd = carrier_data.GetKddiData()

emoji4unicode.Load()
def find_symbol(pua, carrier):
    for sym in emoji4unicode.GetSymbols():
        uni = sym.GetCarrierUnicode(carrier)
        if uni and uni == pua:
            return sym

def map_symbol(sym, carrier, cdata):
    uni = sym.GetCarrierUnicode(carrier)
    if not uni: # no mapping for this carrier
        return sym.GetTextFallback()
    else:
        if uni.startswith(">"): # mapped
            if len(uni) > 5:
                raise Exception, "cannot handle this yet"
            uni = uni[1:]

        return cdata.SymbolFromUnicode(uni).ImageHTML()

softbank = []
docomo = []
kddi = []

for u in uni:
    hex = "%04X" % ord(u)
    if u > '\x7F':
        sym = find_symbol(hex, "softbank")

        softbank.append(sd.SymbolFromUnicode(hex).ImageHTML())
        kddi.append(map_symbol(sym, "kddi", kd))
        docomo.append(map_symbol(sym, "docomo", dd))
    elif u == '\n':
        softbank.append('<br />')
        kddi.append('<br />')
        docomo.append('<br />')
    else:
        softbank.append(u)
        kddi.append(u)
        docomo.append(u)

print "Softbank:<br />", ''.join(softbank).encode('utf-8')
print "<hr />KDDI:<br />", ''.join(kddi).encode('utf-8')
print "<hr />DoCoMo:<br />", ''.join(docomo).encode('utf-8')

Results below:

Softbank:
hey there

KDDI:
hey there

DoCoMo:
hey there

[サンタ][<][イチゴ][ナス][サル][イルカ][クジラ]

As you can see, DoCoMo has the least number of emoji’s, so many of the characters like “Santa”, “Strawberry”, “Eggplant”, “Monkey”, “Dolphin” and “Whale” are substituted by the fallback text format.

Do note that the python script is not optimized at all, and loops through every emoji in the database for each character it needs to convert. Also, if you need such functionality in your application, there are various libraries out there that already does the mapping well. This is just an experiment.




No related posts.

Related posts brought to you by Yet Another Related Posts Plugin.


This website uses IntenseDebate comments, but they are not currently loaded because either your browser doesn't support JavaScript, or they didn't load fast enough.

One Response to “Emoji Encoding Conversion between Carriers”

  1. Cute Emoticons Says:

    Cool.. i just know about this emoji encoding.. Looks like need a"bit" of hard work to make this emoji come out..

Leave a Reply

Please login with your OpenID to post a comment: