What emojis tell us about encoding


They're our punchlines. They're our kisses. They're our favourite way to share a lol. Emojis do a lot of work for us. But sometimes emojis don't show up properly, even though the text around them comes through just fine. For example, this happens:

Image title

Ah, the mysterious empty box. What is that? Why does it pop up from time to time?

This error happens because emojis are an example of encoding at work. Yes, emojis are transmitted as code! And sometimes compatibility issues arise. Something's getting lost in translation between your device and the sender's device. In this post, we'll explore what's going on under the hood.

You probably aren't thinking about the rules of encoding when sending a text-based message. But when you see an emoji error like this one:

Image title

you're getting a tiny taste of the massive amount of translation work that's happening between devices in the blink of an eye.

Let's walk through how emojis work.

WHAT IS ENCODING

All of the letters and emojis that we type are encoded. Encoding is a set of rules that standardizes how we represent letters and emojis in a way that computers understand -- binary.

At the end of the day, the computer boils everything down to binary, which is just ones and zeros. Our devices need to agree on encoding and decoding rules to say which combination of ones and zeros will mean "a". And which combo means capital "A", etc.

An emoji is just another character, like letters in the alphabet. So if emojis are all represented as characters, not pictures, how do we handle them in programming languages?

THE UNICODE STANDARD

The Unicode Standard has assigned numbers to represent emojis. Here's how it works.

In the Unicode Standard, each emoji is represented as a "code point" (a hexadecimal number) that looks like U+1F063, for example. Thanks to Unicode, our devices all over the world can all agree that U+1F603 is the combination that triggers a grinning face.

Even though we're using the same number, what the user sees can vary. Let's see how.

WHY EMOJIS VARY ACROSS PLATFORMS

If you want to use the "beers" emoji, you'd use U+1F37B. Let's see how it looks on various platforms.

Image title

As you can see, each platform has its own style. The Unicode Consortium only provides suggested new emoji concepts and their assigned code points (those U+ numbers). Then, each software producer develops their own visual style for the concept. Apple choses to design graphics that aim for realism by using lots of gradients. Meanwhile, HTC and Twitter have bright colors and a more cartoony "flat art" style. And until recently, Google's emoji set featured partially-empty beer glasses topped with magical, gravity-defying froth! (Google issued a redesign in late November.)

WHY EMOJI DISPLAY ERRORS HAPPEN

During the design phase for new emoji graphics, each company sets their own schedule to release them. As a result, your device may not be able to decode what friends are sending you, and vice versa.

This, my friend, is the reason why emoji display errors happen. When your device receives a U+ code it doesn't recognize, or doesn't have a matching picture for, you'll see a replacement -- the empty box, or a box with a question mark, or split emoji components, like a profession emoji + a gender sign 👮‍♀️ or a person + a skin tone swatch 👶 🏿 .

Here's how the full emoji update process works, step-by-step:

- Anyone can suggest a new emoji idea through Unicode's submission process.

- The Unicode Consortium accepts and reviews proposals for new emojis.

- Once a year, the Consortium announces which new emojis concepts they have accepted.

- Application vendors design and implement new emojis.

So what do developers do when new emoji specifications are released? Unicode describes the process on their website as follows:

"As part of normal software release cycles, platform vendors periodically make decisions about which Unicode characters to support in new versions of their software. Supporting new emoji characters involves additions to fonts, enhancements to emoji input methods (keyboards or palettes), and often updates to libraries that determine character properties and behavior (such as word selection or line breaking). Depending on release cycle length and timing relative to a Unicode release, it may take a year or so for new Unicode characters to appear on phones and other platforms."

The Unicode Consortium does not require software creators to comply with suggested emoji updates, but they do filter new emoji proposals by anticipated adoption rates. According to their proposal submission process guide, Unicode's homing in on ideas that are likely to be used by millions of people. They aim for popular ideas that will likely be picked up by the leading platform vendors: Google, Apple, Twitter, Facebook and Windows. All of these organizations are members of the Unicode Consortium and provide input on emoji selection.

Unicode also seeks input through public reviews, an annual conference, volunteer roles, and emoji proposal submissions from the public. (For more details on participation in Unicode decision-making, see Unicode's list of members and their membership FAQ.)

WHAT DEVELOPERS NEED TO KNOW

If you're building an iPhone app and want to let users type emojis:

- You do not need to add special code to allow emoji input in text boxes. Apple provides a framework called UIKit that includes a predefined keyboard with emojis.

- New emojis will automatically appear in your app when Apple updates their own libraries.

If you're building an Android app and want to let users type emojis:

- You'll use the Android Studio Layout Editor to add text fields.

- You can implement the EmojiCompat font library to help make that sure emoji will show up consistently.

If you're building a website and you want to let users input text with emojis:

- Emojis will work in most text fields, like form fields you might add in a to-do list web app.

- Browsers and computer operating systems tend to have more emoji compatibility problems than mobile applications do. You may not notice this, unless your own content relies heavily on the latest and greatest emoji, like Emojipedia does. Better safe than sorry. Test, test, test.

Copyright:

If you're building a mobile app and want to make it available on multiple platforms, your emoji might look a little different on each platform. You cannot force Android to use Apple emojis, for example. This would be breaking copyright rules. However, you can use custom emoji sets in your mobile apps if you want emojis to look the same everywhere.

You can license emoji fonts from vendors such as EmojiOne.

Open-source emoji libraries are free to use anywhere you want, for commercial or non-commercial purposes. For example, check out EmojiTwo.

Twitter has made their Twemoji set open-source, so it's available for everyone to use. You can tell the browser to use the Twitter emoji set in your web apps by following this tutorial.

RESOURCES

- Build a Simple User Interface [TUTORIAL] A good place to start learning Android user interface fundamentals. (Source: Android Developers)

- iOS From Scratch With Swift: First Steps With UIKit [TUTORIAL] Suitable for absolute beginners, this tutorial introduces UIKit and the ever-so-important MVC (model-view-controller) concept. TutsPlus includes excellent screenshots as they walk you through the steps to get started building a single-view iPhone app. (Source: TutsPlus)

- How a 16-year-old Muslim girl made the "woman with headscarf" emoji a reality [NEWS] Learn about the Unicode submission process and recent moves to make emoji more diverse. (Source: Vox)

- Exploring the Android Emoji Compat Library [BLOG] Learn how Android processes emoji and replaces missing ones with emoji look-alikes. (Source: Joe Birch on Medium)

- Emoji 4.0 [BLOG] Ole Begemann wrote an excellent blog post that explains multi-code emoji. Read this for more details on the new skin tone modifiers and gendered profession emojis. (Source: Ole Begemann)

- Membership Levels and Fees (Unicode) [REFERENCE] If you're curious about how the Unicode Consortium works, their membership rates are listed here. Most members are tech companies. Note: it's $75/year for individuals or $35/year for students. Top tiers (paying over $17,000/year) have voting power. (Source: Unicode)

- Emoji Versions [REFERENCE] See how Unicode has developed over the years with this chart of emojis grouped by year of release. (Source: Unicode)

- Choosing and Applying a Character Encoding [REFERENCE] Learn more about website encoding with beginner-friendly definitions of encoding terms.

Learn more about encoding in Season 1, Episode 2 of the base.cs podcast.