Emoji Encoding Conversion between Carriers
I remember reading about Apple supporting emoji on the iPhone OS 2.2. Now that I’ve upgraded, I decided to try it out but for could never find it after hunting through the keyboard preferences. Googling showed that these cute little emoticons are only available for Softbank users. Thankfully, Steven Troughton-Smith has figured out that by editing a file on your iPhone backup, the “emoji” option suddenly shows up under Settings -> General -> Keyboard -> International Keyboards -> Japanese!
Now that I have the ability to enter these emoji on my iPhone, I figure I’d try it out by sending an email to myself. Alas, all I get is a list of of boxes. Time to look at the message content (relevant fields):
Content-Type: text/plain; charset=cp932; format=flowed Content-Transfer-Encoding: base64 X-Mailer: iPhone Mail (5G77) Mime-Version: 1.0 (iPhone Mail 5G77) Subject: trying out emoji Date: Sat, 31 Jan 2009 19:11:03 +0800 aGV5IHRoZXJlIPGQDQoNCvWc9d3wSPB78unzR/SY9Jv3RPCW9vnwlQ==
ok, pretty strange that it’s sent in cp932 encoding, but we’ll see:
>>> from base64 import b64decode
>>> s = b64decode('aGV5IHRoZXJlIPGQDQoNCvWc9d3wSPB78unzR/SY9Jv3RPCW9vnwlQ==')
>>> s
'hey there \xf1\x90\r\n\r\n\xf5\x9c\xf5\xdd\xf0H\xf0{\xf2\xe9\xf3G\xf4\x98\xf4\x9b\xf7D\xf0\x96\xf6\xf9\xf0\x95'
>>> s.decode('cp932')
u'hey there \ue10b\r\n\r\n\ue407\ue448\ue008\ue03b\ue220\ue23b\ue347\ue34a\ue528\ue055\ue520\ue054'
>>>
The Unicode code points look like they do correspond to the Softbank private use ones, so I’m going to use the emoji4unicode package to convert it to HTML. The following Python script will convert it to the various carrier’s representation:
import emoji4unicode
import carrier_data
s = b64decode('aGV5IHRoZXJlIPGQDQoNCvWc9d3wSPB78unzR/SY9Jv3RPCW9vnwlQ==')
uni = s.decode('cp932')
sd = carrier_data.GetSoftbankData()
dd = carrier_data.GetDocomoData()
kd = carrier_data.GetKddiData()
emoji4unicode.Load()
def find_symbol(pua, carrier):
for sym in emoji4unicode.GetSymbols():
uni = sym.GetCarrierUnicode(carrier)
if uni and uni == pua:
return sym
def map_symbol(sym, carrier, cdata):
uni = sym.GetCarrierUnicode(carrier)
if not uni: # no mapping for this carrier
return sym.GetTextFallback()
else:
if uni.startswith(">"): # mapped
if len(uni) > 5:
raise Exception, "cannot handle this yet"
uni = uni[1:]
return cdata.SymbolFromUnicode(uni).ImageHTML()
softbank = []
docomo = []
kddi = []
for u in uni:
hex = "%04X" % ord(u)
if u > '\x7F':
sym = find_symbol(hex, "softbank")
softbank.append(sd.SymbolFromUnicode(hex).ImageHTML())
kddi.append(map_symbol(sym, "kddi", kd))
docomo.append(map_symbol(sym, "docomo", dd))
elif u == '\n':
softbank.append('<br />')
kddi.append('<br />')
docomo.append('<br />')
else:
softbank.append(u)
kddi.append(u)
docomo.append(u)
print "Softbank:<br />", ''.join(softbank).encode('utf-8')
print "<hr />KDDI:<br />", ''.join(kddi).encode('utf-8')
print "<hr />DoCoMo:<br />", ''.join(docomo).encode('utf-8')
Results below:
hey there











hey there











hey there
[サンタ]![]()
![]()
[<][イチゴ][ナス][サル]
[イルカ][クジラ]
As you can see, DoCoMo has the least number of emoji’s, so many of the characters like “Santa”, “Strawberry”, “Eggplant”, “Monkey”, “Dolphin” and “Whale” are substituted by the fallback text format.
Do note that the python script is not optimized at all, and loops through every emoji in the database for each character it needs to convert. Also, if you need such functionality in your application, there are various libraries out there that already does the mapping well. This is just an experiment.
No related posts.
Related posts brought to you by Yet Another Related Posts Plugin.

May 12th, 2009 at 4:22 am
Cool.. i just know about this emoji encoding.. Looks like need a"bit" of hard work to make this emoji come out..