<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Wil Tan &#187; i18n</title>
	<atom:link href="http://dready.org/blog/category/i18n/feed/" rel="self" type="application/rss+xml" />
	<link>http://dready.org/blog</link>
	<description>musings on internationalized identifiers: domain names, OpenID, TLDs</description>
	<lastBuildDate>Thu, 15 Dec 2011 03:42:26 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>What&#8217;s with the J in Emails?</title>
		<link>http://dready.org/blog/2011/12/01/whats-with-the-j/</link>
		<comments>http://dready.org/blog/2011/12/01/whats-with-the-j/#comments</comments>
		<pubDate>Thu, 01 Dec 2011 05:52:23 +0000</pubDate>
		<dc:creator>wil</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[emoticons]]></category>
		<category><![CDATA[font]]></category>
		<category><![CDATA[outlook]]></category>
		<category><![CDATA[windings]]></category>

		<guid isPermaLink="false">http://dready.org/blog/?p=292</guid>
		<description><![CDATA[This has bothered me ever since I saw it appearing in emails:
I&#8217;d love that J
WTF is that &#8220;J&#8221;? Does it stand for &#8220;joke&#8221;? &#8220;Jesus&#8221;?
After a while it became apparent that it&#8217;s somewhat equivalent to a smiley face, but I was still puzzled by it until I peeked under the hood today and found an email [...]


No related posts.]]></description>
			<content:encoded><![CDATA[<p>This has bothered me ever since I saw it appearing in emails:</p>
<blockquote><p>I&#8217;d love that J</p></blockquote>
<p>WTF is that &#8220;J&#8221;? Does it stand for &#8220;joke&#8221;? &#8220;Jesus&#8221;?</p>
<p>After a while it became apparent that it&#8217;s somewhat equivalent to a smiley face, but I was still puzzled by it until I peeked under the hood today and found an email sent from Outlook with the following bit in the HTML part:</p>
<pre class="code">I'd love that &lt;span style="font-family:Wingdings"&gt;J&lt;/span&gt;</pre>
<p>A-ha!</p>
<p>When rendered using the Windings font, indeed you get a smiley face:</p>
<pre class="code">I'd love that <span style="font-family:Wingdings">J</span></pre>
<p>And the <code>text/plain</code> part of the email actually does contain the regular <code>&#58;)</code>, so you&#8217;d only see the &#8220;J&#8221; showing up if your device is trying to display the HTML version but it doesn&#8217;t have the Windings font available.</p>


<p>No related posts.</p>]]></content:encoded>
			<wfw:commentRss>http://dready.org/blog/2011/12/01/whats-with-the-j/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Why Internationalize?</title>
		<link>http://dready.org/blog/2009/02/19/why-internationalize/</link>
		<comments>http://dready.org/blog/2009/02/19/why-internationalize/#comments</comments>
		<pubDate>Wed, 18 Feb 2009 19:52:43 +0000</pubDate>
		<dc:creator>wil</dc:creator>
				<category><![CDATA[dns]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[icann]]></category>
		<category><![CDATA[arabic]]></category>
		<category><![CDATA[domain]]></category>
		<category><![CDATA[idn]]></category>
		<category><![CDATA[l33t]]></category>
		<category><![CDATA[sethgodin]]></category>
		<category><![CDATA[tld]]></category>

		<guid isPermaLink="false">http://dready.org/blog/?p=231</guid>
		<description><![CDATA[
Seth Godin&#8217;s book Tribes: We Need You to Lead Us looks like a good read, especially for marketers, crowd-herders, and entrepreneurs. Along with the book, he also started an invitation-only triiibal network on Ning, and got the folks to write an ebook called The Tribes Casebook (free download).
There&#8217;s a particular essay in there written by [...]


No related posts.]]></description>
			<content:encoded><![CDATA[<p><img src="http://upload.wikimedia.org/wikipedia/commons/thumb/e/e1/Brueghel-tower-of-babel.jpg/400px-Brueghel-tower-of-babel.jpg" alt="Tower of Babel" title="Tower of Babel" /></p>
<p><a href="http://sethgodin.typepad.com">Seth Godin</a>&#8217;s book <em>Tribes: We Need You to Lead Us</em> looks like a good read, especially for marketers, crowd-herders, and entrepreneurs. Along with the book, he also started an invitation-only triiibal network on Ning, and got the folks to write an ebook called The Tribes Casebook (<a href="http://sethgodin.typepad.com/seths_blog/2008/10/free-tribes-ebo.html">free download</a>).</p>
<p>There&#8217;s a particular essay in there written by <a href="http://datesndip.wordpress.com/">Dr. Saleh AlShebil</a> titled <em>When Technology Fails: A Language gets Born in an Online Tribe</em>. Dr. AlShebil wrote about how an ASCII-based language (that he calls <em>Araby</em>) was born due to the lack of Arabic language input support on early instant messaging networks. These are transliterations of Arabic into Latin alphabets, not unlike <a href="http://en.wikipedia.org/wiki/Leet">l33t</a> but grew out of different motivations.</p>
<p>Here&#8217;s what it looks like (<a href="http://jcmc.indiana.edu/vol9/issue1/palfreyman.html" title="A Funky Language for Teenzz to Use: Representing Gulf Arabic in Instant Messaging">source</a>):</p>
<table cellspacing="1" bgcolor="#666">
<tbody>
<tr bgcolor="#777">
<th width="50%">Sound</th>
<th width="10%">Arabic letter</th>
<th width="10%">ASCII</th>
<th width="30%">Example</th>
</tr>
<tr>
<td>/&#295;/ (a heavy /h/-type sound)</td>
<td dir="rtl">&#1581;</td>
<td align="center">7</td>
<td>wa7ed (one)</td>
</tr>
<tr>
<td>/&#661;/ (a tightening of the throat resembling a light gargle)</td>
<td dir="rtl">&#1593;</td>
<td align="center">3</td>
<td>ba3ad (after)</td>
</tr>
<tr>
<td>/t&#8217;/ (the emphatic version of /t/)</td>
<td dir="rtl">&#1591;</td>
<td align="center">6</td>
<td>6arrash (he sent)</td>
</tr>
<tr>
<td>/s&#8217;/ (the emphatic version of /s/)</td>
<td dir="rtl">&#1589;</td>
<td align="center">9</td>
<td>a9lan (actually)</td>
</tr>
<tr>
<td>/&#660;/ (glottal stop)</td>
<td dir="rtl">&#1569;</td>
<td align="center">2</td>
<td>so2al (question)</td>
</tr>
</tbody>
</table>
<p>So, <span dir="rtl" lang="ar">واحد</span> (one) sounds roughly like &#8220;wahed&#8221;, and you&#8217;d write it as &#8220;wa7ed&#8221;.</p>
<p>Quoting Dr. AlShebil (<em>emphases</em> added):<br />
<blockquote>
Arabic language alphabet is comprised of 28 letters. Some of these letters do not have an equivalent “sound” in English. So what did our online tribe do? They began looking for numbers and other keystrokes that can somehow resemble what the real Arabic letter “looks” like. Let me explain…</p>
<p>For instance, the Arabic letter “ﻉ” is pronounced as A’aa when used in a word and it got replaced with the number “3” since “3” looks like an inverted “ﻉ”. So the word Arabic which is written “Araby” (in Arabic sounding English) and begins with “ﻉ” was then written as “3raby.”</p>
<p>&#8230;This new form of tribal net lingo began to spread like wildfire. It would probably be a safe assumption to say that <em>any Arab who is online today (especially the youth)</em> is pretty familiar with it. Using it was not limited to chat and instant messaging but has also swelled to include any form of writing in online communities and even in mobile text messaging (sms). The Arabic net lingo virus caught on to <em>Arabic websites that even wanted their domain names to sound or “look” Arabic.</em>
</p></blockquote>
<p>As mentioned above, this is similar to l33t-speak, and also the lesser-known <a href="http://ja.wikipedia.org/wiki/%E3%82%AE%E3%83%A3%E3%83%AB%E6%96%87%E5%AD%97">ギャル文字</a> (<a href="http://en.wikipedia.org/wiki/Gyaru-moji">Gyaru-Moji</a>).</p>
<p>Now, I dig subcultures like these, but don&#8217;t you think there&#8217;s something wrong with the emergence of a new lingo that could potentially <em>erode a language like Arabic just because technology couldn&#8217;t support it</em>?</p>
<p>Is this serious enough to erode the Arabic language? Maybe I&#8217;m exaggerating but one can imagine youths forgetting how to spell correctly in Arabic script because they&#8217;re so used to using &#8220;Araby&#8221;.</p>
<p>This is the case for why internationalization is important for the Internet (and technology in general.) More importantly, it is the prime motivation behind Internationalized Domain Names, which is in turn a primary contributor to the need for <a href="http://www.cloudregistry.net">new TLDs</a>.</p>
<p>Internationalization is not for vanity or luxury, it&#8217;s a necessity to preserve culture.</p>


<p>No related posts.</p>]]></content:encoded>
			<wfw:commentRss>http://dready.org/blog/2009/02/19/why-internationalize/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Python 3.0: Text vs. Data Instead Of Unicode vs. 8-bit</title>
		<link>http://dready.org/blog/2008/12/15/python-30-text-vs-data-instead-of-unicode-vs-8-bit/</link>
		<comments>http://dready.org/blog/2008/12/15/python-30-text-vs-data-instead-of-unicode-vs-8-bit/#comments</comments>
		<pubDate>Sun, 14 Dec 2008 17:26:58 +0000</pubDate>
		<dc:creator>wil</dc:creator>
				<category><![CDATA[i18n]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[unicode]]></category>
		<category><![CDATA[django]]></category>

		<guid isPermaLink="false">http://dready.org/blog/?p=188</guid>
		<description><![CDATA[Python 3.0 (Py3K) is out. I&#8217;m with Sam Ruby &#8212; this seemingly simple change of paradigm from &#8220;Unicode vs. 8-bit&#8221; to &#8220;Text vs. Data&#8221; is a breath of fresh air.
What&#8217;s inconsistent in this new version though is that the new bytes type still contains many of the methods with text semantics that should only make [...]


No related posts.]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.python.org/download/releases/3.0/">Python 3.0</a> (Py3K) is out. I&#8217;m with <a href="http://intertwingly.net/blog/2008/12/04/Python-3-0-Released">Sam Ruby</a> &#8212; this seemingly simple change of paradigm<a href="http://docs.python.org/dev/3.0/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit"> from &#8220;Unicode vs. 8-bit&#8221; to &#8220;Text vs. Data&#8221;</a> is a breath of fresh air.</p>
<p>What&#8217;s inconsistent in this new version though is that the new <code>bytes</code> type still contains many of the methods with text semantics that should only make sense as <code>string</code> methods: e.g. <code>capitalize()</code> and <code>islower()</code>. I suspect these are provided as convenience methods, which is fine. But one would imagine that these byte methods will work by decoding the bytes using the default encoding of your locale, then performing the operations on the resulting string. As it turns out from my trials, it seems to assume that your bytes are encoded in Latin1:</p>
<p><code><br />
&gt;&gt;&gt; greek_beta = "Β" # This is the uppercase greek letter "beta", not regular B.<br />
&gt;&gt;&gt; greek_beta.isupper()<br />
True<br />
&gt;&gt;&gt; greek_beta.islower()<br />
False<br />
&gt;&gt;&gt; greek_beta.lower()<br />
'β'<br />
&gt;&gt;&gt; greek_beta_bytes = greek_beta.encode('iso8859-7')<br />
&gt;&gt;&gt; greek_beta_bytes<br />
b'\xc2'<br />
&gt;&gt;&gt; greek_beta_bytes.isupper()<br />
False<br />
&gt;&gt;&gt; greek_beta_bytes.islower()<br />
False<br />
&gt;&gt;&gt; greek_beta_bytes.lower()<br />
b'\xc2'<br />
&gt;&gt;&gt; greek_beta_bytes.upper()<br />
b'\xc2'<br />
</code></p>
<p>This is definitely a gotcha that may lead to hard to find bugs. So, it is best to avoid using those methods on <code>bytes</code> objects.</p>
<p>Otherwise, this change is definitely more &#8220;correct&#8221; in that Py3K forces you to know the type of your variables earlier or at the interfaces (to the outside world) so errors like these are less likely to sneak up from your back. For example, you can no longer use the <code>+</code> (sequence concatenation operator) to mix text and data. Whereas in Python 2.x, you can do:</p>
<p><code><br />
&gt;&gt;&gt; name = 'Wil'<br />
&gt;&gt;&gt; greet = lambda n: u'Hello ' + n<br />
&gt;&gt;&gt; greet(name)<br />
u'Hello Wil'<br />
&gt;&gt;&gt; name = u'François'.encode('utf-8')<br />
&gt;&gt;&gt; name<br />
'Fran\xc3\xa7ois'<br />
&gt;&gt;&gt; greet(name)<br />
Traceback (most recent call last):<br />
  File "&lt;stdin&gt;", line 1, in &lt;module&gt;<br />
  File "&lt;stdin&gt;", line 1, in &lt;lambda&gt;<br />
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)<br />
</code></p>
<p><small>What went wrong was that you&#8217;re relying on Python&#8217;s automatic Unicode conversion, which uses the standard ASCII codec (which is a good thing) to promote your <code>str</code> to <code>unicode</code>.</small></p>
<p>In Python 3.x, you will be greeted by the <code><a href="http://docs.python.org/dev/3.0/library/exceptions.html#exceptions.TypeError">TypeError</a>: Can't convert 'bytes' object to str implicitly</code> message if you tried to pass a <code>bytes</code> object to the function. This will happen on any bytes object, so the error is easier to catch.</p>
<p>In this new version, the <a href="http://docs.python.org/dev/3.0/library/unicodedata.html"><code>unicodedata</code></a> module is upgraded to Unicode version 5.1.0.</p>
<p>Now, I&#8217;m not ready to run production code on Py3K yet but it would be nice if Django (my favourite Python-based web framework) can run on it. It looks like <a href="http://loewis.de/martin/">Martin von Löwis</a> has started the <a href="http://wiki.python.org/moin/PortingDjangoTo3k" title="Porting Django to Python 3.0 (Py3K)">porting</a>.</p>


<p>No related posts.</p>]]></content:encoded>
			<wfw:commentRss>http://dready.org/blog/2008/12/15/python-30-text-vs-data-instead-of-unicode-vs-8-bit/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

