A Strategy for i18n and Node.js


Recently I internationalized a Node/Express web application that I’ve been working on and it seems to have gone fairly well (users in multiple languages are using it happily and I’m seeing a marked increase in traffic because of it!). Not much of what I’m writing up here is particular to Node, per se, just a general strategy for internationalizing a web application.

I’ve used enough internationalized web sites, and travelled to enough foreign countries and attempted to use English language sites back in the US, that I knew what kind of features I wanted:

  • Full parity between languages. Wherever possible the same content should be available to everyone.
  • Use sub-domains to contain different language versions. It’s overkill/expensive to use different TLDs and it’s annoying to have to twiddle query strings or paths to match a language.
  • No automatic translations of content to the user’s native language. There is nothing worse than arriving at a site and being forced into a version that you can’t read either because it’s poorly translated, or you’re being GeoIP detected, or you’re on a computer whose language settings don’t match your own. If you visit a URL it should always be in the same language
  • No automatic redirects to a site with the user’s native language. Same as the last one. If I visit “foo.com” I should not be automatically routed to “es.foo.com” because it thinks that I speak Spanish. Instead, give the user a notification in their native language and allow them to visit the page themselves.

The end result would be a simple URL system that works like so:

  • domain.com – Main site (English)
  • ja.domain.com – Japanese Site
  • XX.domain.com – Other languages

and due to the full translation parity and the lack of URL modification it meant that for every page that you visited you could visit the same page in another language just by changing the sub-domain.

For example:

   domain.com/search?q=mountain
ja.domain.com/search?q=mountain

Both work identically, just the second one is presented in Japanese.

This has the benefit that in the header of the page I can link the user to the same exact page but in their native language.

Additionally I can use the rel=”alternate” hreflang=”x” technique to help Google understand the structure of my site better. I can put this in the header of my page and Google will show the language-preferred version of the site in the Google results.

<link rel="alternate" hreflang="ja"
    href="http://ja.domain.com/" />

Server

Encouraging users to find the correct content is a key implementation detail, considering that the content is not translated automatically nor is the user redirected to content in their native language. While the links in the header are a good start I also wanted to show a message at the top of the page encouraging the user to view the content (with the message being written in their native language).

As it turns out this can be particularly tricky to implement. The easiest way to do it is to simply check the user’s request headers and look at what’s listed in their “Accept Language” and then display the message based upon that. This really only works if your content is always dynamic and is never being cached.

If that’s not the case for your application, how and where you do your caching matters.

In my particular application I’m using nginx in front of a proxied collection of Node/Express servers. This means that everything coming from the Node server is cached (including any messages to the user telling them to visit another page).

As a result, in order to display this message to the user we’re going to need to manage the logic for this on the client. Unfortunately this is where we hit another stumbling block: It’s not possible to reliably determine the desired language of the user using just JavaScript/the DOM.

Thus we’ll need to get the server to pass us some extra information on what the user’s desired language is.

To do this I used the nginx AcceptLanguage Module and then set a cookie with the desired language and passed it to the client. This is the relevant nginx configuration to make that happen.

set_from_accept_language $lang en ja;
add_header Set-Cookie lang=$lang;

And now on the client-side all the needs to happen is reading the cookie for the desired language and displaying the redirect message if the current language and the desired language don’t match.

This gives the best of all worlds: nginx continues to aggressively cache the results from my Node servers and the client displays a message in the user’s native language encouraging them to visit the appropriate sub-domain.

i18n Logic

I’ve written a new Node i18n module which follows the makes the following strategy possible.

None of the i18n logic is particularly out of the ordinary but there are a few strategies I took that helped to simplify things.

  • All translations are stored in named JSON files.
  • Those files are loaded and used for in-place translation in the application.
  • Translations are done using the typical __("Some string.") technique (wherein “Some string.” is replaced with the translated string, if it exists, otherwise “Some string.” is returned instead).

Since all requests to the server are handled by a single set of servers this means that translation logic cannot be shared – it must be initialized and used on a request-by-request basis. I’ve seen other i18n solutions, like i18n-node that assume that the server will only ever be serving up pages in a single language, and this tends to fail in practice – especially in the shared-state, asynchronous, realm of Node. For example:

Since whenever a request comes in and set the current language, it sets the current language for the shared i18n object – and given the asynchronous nature of Node it’s possible that other requests may be happening at the same time, and thus changing the displayed language of another request as a result.

You’ll want to make sure that, at minimum, your current language state is stored relative to the current request to avoid this problem. (My new i18n node module fixes this, for example.)

In practice it means that you’ll be adding a i18n property to the request object, likely as a piece of Express middleware, like so:

app.use(function(req, res, next) {
	req.i18n = new i18n(/* options... */);
	next();
});

Workflow

I have it so that the i18n logic behaves differently depending upon if the server is in development mode or in production mode.

When in development mode:

  • Translation JSON files are read on every request.
  • Translation files are updated automatically any time a new string is detected.
  • Warnings and debug messages are shown.

In production mode:

  • All JSON translation files are cached the first time they’re read.
  • Translation files are never updated dynamically.
  • No warnings or debug messages are shown.

The two major differences here are the caching and the auto-updating of the JSON files. When in development it’s quite useful to have the translation files reload on every request, in case a change has been made to their contents. Whereas in production they really should be considered static files.

Additionally, the workflow of having the translation files update every time a new string is found is actually quite useful: It helps you to catch strings that you may have forgotten to translate. Naturally doing this in production (frequently hitting the disk) is not a good idea.

Marking Up Strings

Strings that need to be translated can be found in a number of locations: Inside your application source, inside templates, inside JavaScript files, and even (god forbid) inside CSS files.

In the case of my application I had no strings inside my JavaScript or CSS files. This worked out nicely because I had already written my application in such a way that content is never being dynamically constructed from strings inside my client-side JavaScript. If I were to do that I would use a template and put the template directly into the HTML of my page, using something like my JavaScript Micro-Templating solution.

I consider it to be especially important that you try to avoid having any translatable strings in your JavaScript or CSS files as those files you’ll want to heavily cache and likely put onto a CDN. Naturally, you could dynamically replace those strings as part of your build process and just generate a number of script/css files, one for each language you support, but that is likely up to you and how much extra work you want to introduce into your build.

Inside my application I made sure that the only time I ever attempted to translate a string it was inside of a Express view handler (meaning that I had access to the request object, which is where I bound my i18n object).

An example of using the i18n object inside a view:

  1. module.exports = {
  2.     index: function(req, res) {
  3.         req.render("index", {
  4.             title: req.i18n.__("My Site Title"),
  5.             desc: req.i18n.__("My Site Description")
  6.         });
  7.     }
  8. };

For my templates I use the confusingly-named swig, but the technique for actually using the i18n methods will be roughly the same for most templating systems:

  1. {% extends "page.swig" %}
  2.  
  3. {% block content %}
  4. <h1>{{ __("Welcome to:") }} {{ title }}</h1>
  5. <p>{{ desc }}</p>
  6. {% endblock %}

A string is wrapped with a __(...) call and the string is replace with the translated string, as it is called.

Translation

So far the actual translation process has been relatively simple. It’s just me doing the translation and I’m not out-sourcing anything (at least not yet). Additionally my site is relatively simple, only a couple dozen strings at the moment.

I’ve been able to temporarily “cheat” in a few ways (at least until I hire a real translator):

  • Use Open Source Translations. There are already massive Open Source projects out there that have done a lot of hard work in translating their UIs. For example Drupal has all of their translations online in easy-to-download formats. I was able to find a number of strings that I needed by going through their files.
  • Look at already-localized sites. This is another cheat but look for other sites that have some of the same features of your site and have already gone through the hard work of localizing into multiple languages, like Google. (In my case I was working on a search engine so a number of Google’s strings directly matched strings on my site.)
  • Google Translate. I know, I know – but I was really surprised at how much better Google Translate has gotten as of late, especially for translating single words or concepts. It’s able to tell you the exact meanings for different possible translations, which is really impressive.

Conclusion

I’m only a couple weeks in to having implemented this process on my site, so I’m sure some things are likely to change as I start to scale up more. I’ve already substantially increased traffic to my site, especially as Google has started to index the newly-translated site. As I mentioned before I have a new Node i18n module that complements the above process and hopefully make it easier for others to follow, as well.

Posted: January 11th, 2013


If you particularly enjoy my work, I appreciate donations given with Gittip.

29 Comments (Show Comments)



Comments are closed.
Comments are automatically turned off two weeks after the original post. If you have a question concerning the content of this post, please feel free to contact me.


Secrets of the JavaScript Ninja

Secrets of the JS Ninja

Secret techniques of top JavaScript programmers. Published by Manning.

Ukiyo-e Database and Search

Ukiyo-e.org

Japanese woodblock print database and search engine.


John Resig Twitter Updates

@jeresig

Infrequent, short, updates and links.