Emojis: Lost in translation


"Yet another WhatsApp hoax!"

I sighed when the image above landed in my WhatsApp inbox yesterday. At first encounter, I brushed it aside but after receiving it the fifth time, curiosity got the better of me. I learned that it has been doing rounds on social media for quite a while and got very skeptical of the snail's "exclusivity". Naturally, I went digging and discovered that of all but 10 emojis on my computer (a Mac, Windows users YMMV) face leftwards. Those "rightful" ten are:

๐ŸŒ โœˆ๏ธ ๐Ÿš€ ๐Ÿ”‰ ๐Ÿ’ƒ๐Ÿป ๐Ÿ‚ ๐Ÿ’บ ๐Ÿ’ค ๐Ÿ”œ ๐Ÿ”™

Coming from a Indo-European linguistic background 1, this struck me as an odd design decision on Unicode's part. All Indo-European languages except Urdu are written left-to-right. These cultures also conceptualise time as moving from left to right. 2 We gesticulate in that order and even our mathematics establishes that left to right means moving forward. Given this context, the left-facing nature of emojis presents quite the communication problem. Say, I need to text my girlfriend that I'm "Leaving work for home". I might type in something like this:

๐Ÿข ๐Ÿš— ๐Ÿ 

which looks like I am asking her to pick me up from work. Let's try changing the order

๐Ÿ  ๐Ÿš— ๐Ÿข

Ah, now it makes perfect sense only if you read it from right to left. Left-to-right writing and emojis don't gel together well. How could the smart folks at Unicode commit such a rookie semantic error? It just doesn't compute. To fully understand the design decisions behind emojis, I had to go all the way back to where it all started: Japan.

The birth of ใ€“ใ€“ใ€“ใ€“ใ€“ใ€“

'Tis the year 1999. As the new millennium dawns, the Japanese telecom giant NTT DoCoMo prepares launch of their mobile internet platform iMode. To set themselves apart from the competitors, the company commissions a group of designers to create special pictographs to jazz up iMode's messaging features. And the emoji is born; ็ตต (e โ‰… picture) ๆ–‡ (mo โ‰… writing) ๅญ— (ji โ‰… character).

Shortly, the competitors - KDDI and SoftBank (then Vodafone) - follow suit by developing different (but partially overlapping) character sets. The subsequent years see a meteoric rise in use of emojis in Japan. But since each mobile carrier have different rules to represent these emojis in the underlying software ("encoding" in software engineering parlance), emojis sent from a SoftBank phone couldn't be read on a DoCoMo or a KDDI. The carriers develop cross-mapping tables to allow limited exchange of emojis across carrier lines but it was all too easy for the characters to get corrupted or worse misinterpreted. Things get even worse when Japanese carriers start to interoperate email and text messaging with non-Japanese carriers. Recipients are often left wondering what could "ใ€“ใ€“ใ€“ใ€“ใ€“ใ€“" mean? ใ€“ character was used as a placeholder to denote a missing emoji.

Tech industry gets in on the action too, some go as far as adding animation into the mix. Skype, AOL, MSN, Google Talk, Gmail and thousands of internet forums launch their own emoji sets in mad scramble to appear hip and cool to their young audience. Emojis, or emoticons outside Japan, became to the 2000s what image filters are to the 2010s.

As the use of emoji expands, so do the compatibility issues. Since, all emojis use proprietary encodings, interoperability between different software clients is poorly supported. This gets further exacerbated by the fact that emojis are not plain text 3 and hence couldn't be sent via SMS or plain text email.

The emoji revolution stands at crossroads. Business as usual means gradual obsolescence. But becoming part and parcel of everyday communication would require a Herculean effort - standardising global emoji use.

Enters Unicode

Unicode Consortium is a non-profit that helps standardise how text appears on software programs. All the characters that you are seeing on your screen right now have a unique underlying code, called their Unicode code. For example, the Unicode code for the English letter capital A is U+0041, B is U+0042, C is U+0043 and so on and so forth. As all the computers and smartphones come with built-in Unicode support, all the symbols included in the Unicode character are de facto supported.

In early 2000, DoCoMo sought to get its emoji character set accepted into the Unicode standard but couldn't muster enough support from fellow Japanese carriers. It was felt that there wasn't sufficient evidence to suggest that emojis would become widespread enough to be considered regionally and/or internationally significant symbols.

On the other side of the Pacific, Google had been making use of private-use Unicode codes to serve emojis to their Gmail and GTalk customers. In late 2006, Google wondered if emojis could become part of the Unicode standard and later submitted a working draft In March 2009, Google and Apple jointly filed a formal proposal to encode emoji symbols.

The right way

To create an emoji standard, Unicode first needed to consolidate the symbols used by KDDI, DoCoMo and SoftBank into a single reference list. Then they had to select a single illustration out of the three (or a matching new symbol) and give it a name and Unicode code. See the picture below taken from the official proposal.

snail-looking-left Note the left-facing snail

As you can see, all the emojis across different carrier all have left-facing illustrations. This is because traditional Japanese writing is vertical, and moves write to left. In such a reading system, the left-facing symbols will feel right at home. Japanese consumers were by far the biggest emoji users in 2010, perhaps this is why most of the symbols retained their Japanese-style orientation while making their way into Unicode.

Aha! Read from bottom up for maximum aha! SOV stands for subject-object-verb.

After 18 months of uninterrupted consolidation, identification, and standardisation, Unicode 6.0 was launched becoming the first Unicode standard to have characters explicitly intended as emojis, thus ensuring their universal availability and bringing the "ใ€“ใ€“ใ€“ใ€“ใ€“ใ€“" era to a much deserved close.

Not so fast...

Following the launch of Unicode 6.0, the three different emoji fonts from the Japanese carriers were merged into a single emoji font to be used universally. Alas, it has since metastasised into half a dozen fonts. Tech firms follow Unicode guidelines at their discretion and often create their own interpretation of a snail or chair or a dancing person, turning emojis into branding opportunities.

Remember the outlier snail ๐ŸŒ in the opening? The only animal in the emoji set looking rightwards. Well, that's not entirely true. The Unicode code for the snail is U+1F40C and the "official" illustration depicts a lefty snail. (See the proposal snapshot above). Here's how it appears on different platform.


Thing get more interesting.


And then they get ridiculous, the "seat" on iOS is a coach class airplane seat and on Android a Charles Eames chair.


Back to square one?

Emojis have made tremendous progress since their humble beginnings in Japan in 1999. But they are far from the "universal language" the world trusts them to be. Google, Apple and Microsoft, along with Facebook and Twitter, each interpret the emoji character set with slight but significant differences. They are tech firms' personal IP and branding tools, and can be highly irregular and even confusing at times.

Fifteen years of progress and we have come full circle. Maybe time really is cyclical.

  1. เคนเคฟเค‚เคฆเฅ€ (Hindi) since childhood and English for the last 14 years. Indo-European languages โ†ฉ

  2. Except the Indians. Hinduism, and by extension Jainism and Buddhism, treat time as a cyclical force, something that has a beginning, a middle and a new beginning (instead of an "end"). This is why reincarnation and breaking the cycle of life to achieve moksha receives so much air-time in Hindu, Buddhist and Jain theologies. โ†ฉ

  3. By 2006, Google was using private-use Unicode codes for their Gmail/Gtalk emojis. Most internet forums used GIF and other software companies developed their own custom character sets. โ†ฉ


