git conflict resolution

April 28th, 2010

Logging this mostly for myself.

When a conflict occurs during a pull operation from remote repository, we get this:

$ git pull kumo1 develop
From ssh://kumo0/home/wil/...
 * branch            develop    -> FETCH_HEAD
Auto-merged src/.../
CONFLICT (content): Merge conflict in src/.../
Automatic merge failed; fix conflicts and then commit the result.

What git did was to fetch objects from the remote repository, and tried to merge it in the branch that you specified. Sometimes the merge operation fails due to a conflict, and the conflicting edits are left in the file. It is then up to you to eyeball the file, straighten it and then “commit the result” (as the message said.)

However, if you tried to commit that file (after fixing the conflict), you’d get this:

$ git commit -m "my fixes" src/.../
fatal: cannot do a partial commit during a merge.

What you’d want is to add the -i argument to the git commit command, which tells it to stage the additional file before committing.

Optimizing Autocomplete by Utilizing Browser Cache

March 29th, 2010

Say you have a snazzy AJAXified autocomplete field that gives instantaneous feedback to the user as she types — perhaps a username field on a signup form or something akin to Google Suggest. Except, it’s not performing as well as you thought it should. That round trip to the server for each character is taking too long.

The first thing you should do is to see if HTTP Keep-Alive is supported by your server.

Second, and this may seem obvious, but I’ve seen too many developers forget to leave a hint to the browser to cache the results. As a result, the page becomes sluggish due to a feature that’s meant to be responsive.

See what happens behind the scene when you sign up for a new Twitter account. Suppose you try to register the username “wil”, but it’s taken. For each character you type, the browser makes HTTP requests to check that the username in the input field to see if it’s available.

So that’s one for “w”, “wi”, “wil”. Then you find that all 3 are taken. Ok, perhaps time to add a numeric suffix? “wil1″ – nope, delete the “1″, and we’re back to “wil” again. Guess what? Another HTTP request is sent to Twitter for the same string “wil”!

Had Twitter set an “Expires” header to usernames that are taken, the browser wouldn’t have had to make that round trip!

Below are the headers sent by Twitter for the URI (courtesy of Hurl):

In the case of signup forms, usernames are rather involatile pieces of data, so it’s a prime optimization target. As your namespace becomes more scarce, you’ll tend to have people trying more strange combinations, increasing the number of requests to your servers.

In my case, I’ve applied it to our TLD management platform. Domain names that are registered gets a 10 minute cache timeout value (which is heaps short but good enough to ensure a snappy UI operations). However, with domain names, it’s a lot more volatile but we’re not guaranteeing success at the point of registering a name so anywhere between 1-5 minutes is usually sufficient.

A simple view snippet in Django does the trick and goes a long way to making your users happy.

/ Blogging from a HSDPA connection

Debugging without source on Mac

March 11th, 2010

This is probably newbie stuff for hardcore C programmers, but I’m logging here for posterity and for my own benefit. I don’t pretend to be one, but recently found myself needing to find out the cause of a curious message while running Python on my Mac OS X 10.5.8:

The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().

Okay, this means that my program forked, and the child did not immediately exec(). Instead, it went on to call some Core Foundation function. Naturally, I’d want to find out what is it that it’s doing! Time to fire up our trusty gdb.

macmac:~ $ sudo gdb -p 52772
GNU gdb 6.3.50-20050815 (Apple version gdb-960) (Sun May 18 18:38:33 UTC 2008)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-apple-darwin".
Attaching to process 52772.
Reading symbols for shared libraries . done
Reading symbols for shared libraries .................................... done
0x9354a6fa in select$DARWIN_EXTSN ()
Breakpoint 1 at 0x938df314
(gdb) cont
(gdb) bt
#1  0x938d97bd in CFRunLoopGetCurrent ()
#2  0x95bf09fa in +[NSThread currentThread] ()
#3  0x95befe39 in _NSInitializePlatform ()
#4  0x918568b8 in _class_initialize ()
#5  0x9185678c in _class_initialize ()
#6  0x91855239 in _class_lookupMethodAndLoadCache ()
#7  0x918656d6 in objc_msgSend ()
#8  0x9185dbdf in call_load_methods ()
#9  0x918570d3 in load_images ()
#10 0x8fe02e38 in __dyld__ZN4dyld12notifySingleE17dyld_image_statesPK11mach_headerPKcl ()
#11 0x8fe0e7cf in __dyld__ZN11ImageLoader23recursiveInitializationERKNS_11LinkContextEj ()
#12 0x8fe0e775 in __dyld__ZN11ImageLoader23recursiveInitializationERKNS_11LinkContextEj ()
#13 0x8fe0e775 in __dyld__ZN11ImageLoader23recursiveInitializationERKNS_11LinkContextEj ()
#14 0x8fe0e775 in __dyld__ZN11ImageLoader23recursiveInitializationERKNS_11LinkContextEj ()
#15 0x8fe0e775 in __dyld__ZN11ImageLoader23recursiveInitializationERKNS_11LinkContextEj ()
#16 0x8fe0e775 in __dyld__ZN11ImageLoader23recursiveInitializationERKNS_11LinkContextEj ()
#17 0x8fe0e8c9 in __dyld__ZN11ImageLoader15runInitializersERKNS_11LinkContextE ()
#18 0x8fe02202 in __dyld__ZN4dyld15runInitializersEP11ImageLoader ()
#19 0x8fe0bbdd in __dyld_dlopen ()
#20 0x935042c2 in dlopen ()
#21 0x001b09ac in _PyImport_GetDynLoadFunc ()
#22 0x001a35a4 in _PyImport_LoadDynamicModule ()
.. (snipped) ..

This tells us that Python was trying to call the dlopen() function to load a dynamic library. I know that because the stacktrace went from a bunch of CPython functions to dlopen() (at stack #20) and eventually led up to the function that we were told to break on. Now it’s time to find out which library it is trying to load.

(gdb) frame 20
#20 0x935042c2 in dlopen ()
(gdb) info frame
Stack level 20, frame at 0xb007d6f0:
 eip = 0x935042c2 in dlopen; saved eip 0x1b09ac
 called by frame at 0xb007d9b0, caller of frame at 0xb007d6d0
 Arglist at 0xb007d6e8, args: 
 Locals at 0xb007d6e8, Previous frame's sp is 0xb007d6f0
 Saved registers:
  ebx at 0xb007d6e4, ebp at 0xb007d6e8, eip at 0xb007d6ec

If you had the source code for dlopen(), gdb would’ve helpfully printed the arguments for you, but alas we don’t have the source code!

Thankfully, we know what the function looks like from the dlopen man page:

     void *dlopen(const char* path, int mode);

So we know that the library we’re looking for is in the first argument, which is a pointer to a C string. And we will find it.

Some googling later, I found the layout for darwin stackframes:

Mac OS X stack frame layout

Sweet! According to this, the first function argument is in EBP+8:

(gdb) info registers ebp
ebp            0xb007d6e8	0xb007d6e8

Aha! But you didn’t really need to do that; the info frame command output already told us that locals are at 0xb007d6e8.

0xb007d6e8 + 8 = 0xb007d6f0, so we shall get the address stored at this address:

(gdb) x/a 0xb007d6f0
0xb007d6f0:	0xb007da67

Now print the string at the target address:

(gdb) x/s 0xb007da67
0xb007da67:	 "/Users/wil/src/proj/root/lib/python2.5/lib-dynload/"

Et voila! This happens to be the glue code for Python to access the Mac OS X Internet Config settings. In some previous foray, I learned (to my surprise) that Python’s urllib (on Darwin) actually uses IC to find out your proxy settings and use it! For some reason, I never expected it to do so but I guess it makes sense for a seamless user experience.

Did I find out why it’s giving me that fork/exec message? No, not really but it doesn’t matter much.
(Update: I do know the reason for this; it’s because some code started fetching from HTTP after fork, but that’s the expected behavior, and I don’t wish to reexec. Program continues to work, so I just have to ignore it in my development environment. This will be deployed to a Linux or FreeBSD which is why I said it doesn’t matter much.)

Now, I’m convinced that there is a better way to do it so if you have any idea, please leave a comment.

p.s. This can probably be easily achieved with DTrace, but it makes my head spin.

logcheck update on FreeBSD

January 23rd, 2010

Logging this quickly for posterity.

If like me, you just updated the logcheck port on your FreeBSD to version 1.2.69_1 and found that it’s broken, you might have run into the same permission problem I did.

Apparently, the port installed some files with too restrictive permissions and the error message from logcheck does not help. The files in question are some dynamically interpreted Perl plug-ins to detect rotated files in various schemes.

Just change the permissions as follows and it should all work again:

# chmod 644 /usr/local/share/logcheck/detectrotate/*.dtr

Tornado with VirtualEnv and Pip Quickstart

October 9th, 2009

Friendfeed’s open source Tornado web server is great, and is incredibly easy to get up-and-running. Just install tornado, write your app and run it.

At some point, however, you’d want more structure in your project and manage dependencies to ease deployment. This is where virtualenv and pip shines. For a few more steps, you can bootstrap your project and have the warm fuzzy feeling that you can easily deploy the stuff when the code is ready.

Installing virtualenv and pip

If you haven’t set up virtualenv, do so (as root):

# easy_install virtualenv

Decide where you’d put your project directory. I’ll use /path/to/myapp for now. The next step is to create a virtualenv where all your Python packages are stored. I like to use the convention of a directory called root where all dependencies are installed. I’d generally also use it as the prefix for any cmmi packages that I’d like to contain within the project.

$ cd /path/to/myapp
$ virtualenv --no-site-packages root

Activate the environment that we just created:

$ . root/bin/activate
(root)[wil@wasabi /path/to/myapp]$ 

From now on, all packages installed with easy_install will be placed in this virtualenv.

Next, we will install pip into this virtualenv:

(root)[wil@wasabi /path/to/myapp]$ easy_install pip

Once pip is installed, as long as you’ve got your virtualenv activated, anything installed with pip will also go into the right place (without your having to remember to use the -E command line argument.)

Installing Tornado

Tornado (as of the current version) needs two mandatory dependencies, i.e. pycURL and simplejson. Make sure you have the right libcURL version installed on your system (using apt-get or other mechanism) and pick the compatible pyCURL version.

(root)[wil@wasabi /path/to/myapp]$ pip install pycurl==7.16.4
(root)[wil@wasabi /path/to/myapp]$ pip install simplejson

Now we’ll install tornado proper. I chose to go with the bleeding edge and ask pip to install from the git trunk.

(root)[wil@wasabi /path/to/myapp]$ pip install -e \

Should you not want that, you can tell pip to install from the tarball URL instead (at least until tornado gets added to PyPI.)

(root)[wil@wasabi /path/to/myapp]$ pip install \

Tornado is installed!

Every now and then, it’s a good idea to save your pip dependencies by running

(root)[wil@wasabi /path/to/myapp]$ pip freeze > pip-req.txt

Start your project

What I like about this is that the project directory has all the dependencies contained within a single directory (root). This is really just my convention; I’d create a src directory where my application code lives.

(root)[wil@wasabi /path/to/myapp]$ mkdir src
(root)[wil@wasabi /path/to/myapp]$ cd src
(root)[wil@wasabi /path/to/myapp/src]$ 

Let’s test drive Tornado:

(root)[wil@wasabi /path/to/myapp/src]$ cp ../root/src/tornado/demos/helloworld/ .
(root)[wil@wasabi /path/to/myapp/src]$ python

From browser, visit your host at port 8888 to verify.

That’s it!

Forums are so 1999

May 22nd, 2009

I cringe forums1 or message board every time I visit one.

They are usually cluttered with distracting animated gifs, elaborate signatures and tons of useless stats about the posters (novice/expert level, im status, joining date, etc.)

They are notoriously cumbersome to navigate, let alone find what you need. The forum administrators and moderators know it, so most forums use stickies as a band-aid. For the casual visitor, stickies are usually road signs that tell you where to look for certain things, a summary of important points or FAQs. Needless to say, they’re pretty effective when you compare them with the rest of the mess, but a band-aid nonetheless.

I’d blame the user interface for this awkward communication medium. It’s not so much due to its age, because I’d rather read stuff on old skool NNTP with vi key bindings than to navigate these forums. Rather, it’s the constant need for paging (displaying page 1 of 32!!), which means that you can’t effectively use the browser’s in-page find feature. Search is usually broken or otherwise less than relevant. No threading, nor ability to easily filter messages. I could go on.

Perhaps it’s just a matter of tweaking the skin and applying sane user interface design. Certainly, there are some better-designed forums out there that are less painful to use. I haven’t seen much innovation in that area in a long time, actually not since my first encounter with them. I suspect people actually like and have come to expect those cumbersome features.

Forum communities are an entire subculture of their own, and I don’t expect a shift anytime soon. Yet, I can’t help but wonder if we can do better.

1 In case you’re wondering, I intentionally used the word “forums” instead of “fora”.

links for 2009-02-19

February 20th, 2009

Why Internationalize?

February 19th, 2009

Tower of Babel

Seth Godin‘s book Tribes: We Need You to Lead Us looks like a good read, especially for marketers, crowd-herders, and entrepreneurs. Along with the book, he also started an invitation-only triiibal network on Ning, and got the folks to write an ebook called The Tribes Casebook (free download).

There’s a particular essay in there written by Dr. Saleh AlShebil titled When Technology Fails: A Language gets Born in an Online Tribe. Dr. AlShebil wrote about how an ASCII-based language (that he calls Araby) was born due to the lack of Arabic language input support on early instant messaging networks. These are transliterations of Arabic into Latin alphabets, not unlike l33t but grew out of different motivations.

Here’s what it looks like (source):

Sound Arabic letter ASCII Example
/ħ/ (a heavy /h/-type sound) ح 7 wa7ed (one)
/ʕ/ (a tightening of the throat resembling a light gargle) ع 3 ba3ad (after)
/t’/ (the emphatic version of /t/) ط 6 6arrash (he sent)
/s’/ (the emphatic version of /s/) ص 9 a9lan (actually)
/ʔ/ (glottal stop) ء 2 so2al (question)

So, واحد (one) sounds roughly like “wahed”, and you’d write it as “wa7ed”.

Quoting Dr. AlShebil (emphases added):

Arabic language alphabet is comprised of 28 letters. Some of these letters do not have an equivalent “sound” in English. So what did our online tribe do? They began looking for numbers and other keystrokes that can somehow resemble what the real Arabic letter “looks” like. Let me explain…

For instance, the Arabic letter “ﻉ” is pronounced as A’aa when used in a word and it got replaced with the number “3” since “3” looks like an inverted “ﻉ”. So the word Arabic which is written “Araby” (in Arabic sounding English) and begins with “ﻉ” was then written as “3raby.”

…This new form of tribal net lingo began to spread like wildfire. It would probably be a safe assumption to say that any Arab who is online today (especially the youth) is pretty familiar with it. Using it was not limited to chat and instant messaging but has also swelled to include any form of writing in online communities and even in mobile text messaging (sms). The Arabic net lingo virus caught on to Arabic websites that even wanted their domain names to sound or “look” Arabic.

As mentioned above, this is similar to l33t-speak, and also the lesser-known ギャル文字 (Gyaru-Moji).

Now, I dig subcultures like these, but don’t you think there’s something wrong with the emergence of a new lingo that could potentially erode a language like Arabic just because technology couldn’t support it?

Is this serious enough to erode the Arabic language? Maybe I’m exaggerating but one can imagine youths forgetting how to spell correctly in Arabic script because they’re so used to using “Araby”.

This is the case for why internationalization is important for the Internet (and technology in general.) More importantly, it is the prime motivation behind Internationalized Domain Names, which is in turn a primary contributor to the need for new TLDs.

Internationalization is not for vanity or luxury, it’s a necessity to preserve culture.

links for 2009-02-18

February 19th, 2009

Why be an OpenID Relying Party?

February 12th, 2009

Plaxo’s Joseph Smarr presented the following at the OpenID Design Summit at Facebook HQ yesterday:

This was a controlled experiment combining 3 technologies (2 of which from the Open Stack but hybridized) under the hood to create a streamlined signup experience that goes like this:

  1. Someone at Plaxo invites you to join by entering your Gmail address
  2. You get an invitation email from Plaxo
  3. You click on the link
  4. Plaxo knows that you’re a GMail user (and likely still signed in), so it presents you with the following screen:

    I believe that since Plaxo already has your Gmail address, it is already somehow encoded in here to save you from having to type it in, but I haven’t tried it so I’m not sure
  5. Clicking “Sign up with my Google Account” brings you over to Google with the following screen:
  6. Clicking “Continue Sign-in” tells Plaxo that you are indeed the holder of the Gmail address, at the same time authorizing Plaxo to import your address book from Google.
  7. That’s it! You’re signed up to Plaxo and your Gmail address book is available in Plaxo.

The result was a staggering 92% return rate (from the Google authorization confirmation screen above), of which 92% continued with the sign up and allowed Plaxo to import their contacts from their Google address book. The results were so impressive that Plaxo’s business folks stopped the tech folks from turning off the experiment!

Indeed these results are impressive by today’s standard of endless signup forms and social networking fatigue. I would whole-heartedly agree that through this clever experiment, Plaxo has met their goals of making it better for the user, the identity provider, as well as the relying site.

The technologies that made these possible were:

  • OpenID for proving who you are (to Plaxo that you do indeed own the GMail address.)
  • OAuth (implemented as an extension to OpenID) was used to grant Plaxo access to your contacts stored on Google; and
  • Google Contacts API for actually importing them into Plaxo (would be nice to see Portable Contacts being adopted by Google)

Individually, those technologies are good at what they’re designed to do but when combined with a simple hint such as “the user is a GMail account holder, and is probably still signed in to the service”, it could be very powerful.

Still, my biggest takeaway from the slides are:

  • 17% (of Plaxo signups) come from GMail account holders; and
  • 73% come from the top 4 (Yahoo, Microsoft, Google, and AOL)
  • all of them being OpenID Providers

This shows that you can already take advantage of the fact that a large percentage of users already own an OpenID, who may be more willing to sign up to your service than they otherwise wouldn’t have if faced with another tedious registration form.

While many (including myself) have criticized OpenID that there are more providers than relying parties, Plaxo has proven (with impressive numbers) that with a little ingenuity and optimization of UX, sites can reap the benefits of being an RP!