the:chris:walker ↩

Javascript Static Internationalisation (i18n)

Recently I need to think about how to add another language to some client-side-javascript-heavy code.

It seem’s that most people go for a langauge file (often JSON encoded) and an i18n library (e.g. jQuery.i18n). However this seems to put more work on the client and if the langauge file is loaded seperately, an extra HTTP request.

My initial thought was to statically build different language versions of the files, and have the client request the language it wants, e.g. ‘some.file.en_GB.js’.

This posed some questions about pluralisation and how to parse the code to introduce the new language’s strings. Seemed that it could be very error prone.

So I came up with a two layer solution. We use two tiny i18n functions in javascript, that allow us to develop easily and indentify strings for translation - also allowing us to add translation comments as well. Then we use the JS parsing and processing capabilities to create an Abstract syntax tree which we can then modify removing the i18n functions and replacing them with fixed strings. The same process can be used to create list of strings to translate (i.e. generate a gettext .pot file).

In the browser-side code I use these two functions, one for simple strings and one for plurals:

window.$t = function(msgid, default_msg, comment){
  return default_msg;
};

window.$tn = function(msgid, default_singular, default_plural, count, comment){
  return count !== 1 ? default_plural : default_singular;
};

Simple enough, basically just passes the ‘default’ value straight through. That way in development, everything works and I get my default messages correctly.

However the magic happens when we want to translate this file into another langauge. We need to do 2 things:

  1. Extract the strings to be translated and translate them
  2. Inject the translations back in.

How can we do this cleanly? And why did you include translation comments in the $t and $tn functions?

Well, I’ll tell you. For javascript minification I use uglify-js. It’s great, but what’s even greater, that I didn’t previously know, is the access to its internals. I can help you take javascript code, turn it into an Abstract Syntax Tree (AST) and manipulate it, then turn the AST back into javascript code.

The implications should be sinking in.

We simply parse our file to an AST, and walk the tree for instances of our $t and $tn functions. when we find one, we can extract the msgid, the default (i.e. the en_GB translation) and some helpful translation comments.

Now if we are on an extract run, we can simply store this info and generate a .pot file – gettext-style.

If we are on a translate run, we can use our translations to replace the entire function call with the new translation. That means we go from:

alert($t(
  'hello_world', 
  'Hello World!', 
  'this hello world is alerted to the user!'
));

to, in ‘English’:

alert("Hello World!");

or, in French (pardon my poor langauge skills!):

alert("Bonjour, tous les mondes!");

It’s all completely safe replacement, and we can minify the new AST if we want in the same process. The code to do this is super-simple too:

/* node.js code */
var parser = require('uglify-js').parser,
    printer = require('ugilfy-js').uglify;

/* get your code from your file */
var my_js_code = require('fs').readFileSync("/path/to/file.js");

/* create the syntax tree */
var abstract_syntax_tree = parser.parse(my_js_code);

/* mangle it to your liking... */
var mangled = my_mangling_function(abstract_syntax_tree);

/* re-generate the js */
var new_js = printer.gen_code(mangled);

/* output */
process.stdout.write(new_js);

The dynamic part of our site works out the language needed and includes the correct javascript files, which in turn lazy-load other modules of the corrcet locale.

Pretty cool, only leaving the annoying problem of pluralisation which we solved by keeping different count > index functions in the dynamic site and including the correct one directly in a script tag on the page.

Our plural translation function looked like this:

$tn = function(count, msgid, defaults, comment){
  //English pluarl calulation
  return (count === 1) ? defaults[0] : defaults[1];
}

After translation into a language, this is replaced with a pluralisation function which also uses the locale’s particular plural function, for example:

// before translation
var filecount = $tn(
  count, 
  "number_of_files", 
  [ "One file", "%d files" ], 
  "describes the number of files"
);

//after translation into english
/* English Plural Index function zero if n=1, one else. */
var $p = function(n){ return +(n!==1); }
var filecount = ["One file","%d files"][$p(count)];


//and after translation into Polish (which has 3 forms)
/* Polish Plural Index function  */
var $p = function(n){ 
  if(n===1){ return 0; }; 
  if(n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20)){ return 1; }
  return 2;
}
var filecount = ["1 plik", "%d pliki", "%d plików"][$p(count)];

With the new Javascript Source Maps this seems a really nice way to do translation’s in your javascript files.