development

Meet Trubar, a friend of Orange

Janez Demšar

Jan 17, 2023

Orange has been translated to Slovenian language (no official release yet: rough corners are very much being polished). This will pave the way for translations into other languages.

What do we use? Gettext, right? Wrong. Orange is written in modern Python and uses f-strings for string interpolation. They are great, but don't play well with gettext.

Why gettext doesn't cut it

If you've never heard about gettext: gettext is a popular framework for software localization. Developers enclose every string that needs to be translated into a function call that translates it. The function would be called something short, like _ or tr. One of gettext's utilities extracts all such strings into messages files. Translators provide translations. Messages are compiled into binary form and imported into the program. When function _ (or tr or whatever) gets a string, it searches for a translation and returns it.

So in Python, a programmer would write print(tr("Data contains {n} instances").format(n))). Translator would see the message "Data contains {n} instances" and provide a translation, for instance "Podatki vsebujejo {n} primerov.". After messages are compiled and in their place, the function tr would receive the English message and return the Slovenian translation.

The above example uses the old-style interpolation with format. is does not work for Python's f-strings. With f-strings, we would have print(tr(f"Data contains {n} instances"). The number, n, is interpolated before the function call, so tr already receives the entire string, like "Data contains 1234 instances". This obviously doesn't match any pre-translated messages, because it includes a specific number. Call tr earlier? See, that's the thing: you can't. There is no such string as "Data contains {n} instances". It is never "materialized": in Python's abstract syntax tree, these are three distinct elements (a constant "Data contains ", expression n, and a constant instance). At the moment they are joined into an actual object (of type str) that can be passed to some function, the value of n is already interpolated.

Gettext-like approaches thus can't work on f-strings. To our knowledge, there is currently no tool to support translation of f-strings.

Luckily, we are programmers.

Enters Trubar

We developed Trubar. Named after Primož Trubar (the author of the first Slovenian printed books, Catechismus and Abecedarium, and the first translator of part of the Bible in Slovenian language), Trubar collects all strings from sources and places them into a yaml-like file. There, they can be translated - or marked as strings that must not be translated. Then, Trubar copies the sources, substituting strings with their translations, if provided.

This is a part of the file related to the translation of the Table widget above.

widgets/data/owtable.py:
    class `OWDataTable`:
        Data Table: Tabela
        icons/Table.svg: false
        class `Inputs`:
            Data: Podatki
        class `Outputs`:
            Selected Data: Izbor podatkov
        def `__init__`:
            Variables: Spremenljivke
            show_attribute_labels: false
            Show variable labels (if present): Pokaži oznake spremenljivk
            show_distributions: false
            Visualize numeric values: Vizualiziraj številske vrednosti
            color_by_class: false
            Color by instance classes: Obarvaj primere glede na razred
            Selection: Izbor
            select_rows: false
            Select full rows: Izbiraj cele vrstice
            Restore Original Order: Izvirni vrstni red
        def `set_dataset`:
            name: false
            Data: Podatki

If it looks simple, it's because it is simple.

Besides working with f-strings, the advantage of Trubar's approach is that - unlike gettext - it does not pollute the sources with calls and parentheses. There's enough of them already. Instead of marking strings for translation within the source code, they are marked (for non-translation) in the message file. This must be done at some point, but it's better to do it in a dedicated place, not in the code.

This of course comes at a cost: unlike with gettext where the user can switch between languages, software translated with Trubar requires a separate distribution for each language. Or packing separate sources for multiple languages into one distribution. Whether this hurts or not, depends upon circumstances.

What about plural forms?

You ask that seriously? So you don't speak Slovenian, I guess.

See the text "Ni ciljne spremenljivke" in the screenshot? It means "No target variable". Slovenian language counts like this:

Ni ciljne spremenljivke
Ena ciljna spremenljivka
Dve ciljni spremenljivki
Tri ciljne spremenljivke
Štiri ciljne spremenljivke
Pet ciljnih spremenljivk
...

and then the same form is used until 101, where we go back to singular, dual, plural, plural and genitive (which is what you see above).

Developed by native speaker of this beautifully complicated Slavic language, Trubar handles it with excellence. The beauty of f-strings is that they make implementation of plural forms much simpler because they allow using arbitrary functions. We simply inserted the appropriate function call and that's it. For instances, besides having four plural forms, the translation of proposition "with" before a number depends upon the number, and this is handled simply by using the function that returns the proper former for the given number. Which all happens without any help in the original source.

More about localization in documentation.

Ready to use?

Sure, try it out. Trubar is pip-installable, documentation is decent enough and although it is still early in development, we don't plan any huge changes. The least we can promise is that its compatibility-breaking changes will give you less headache that those by pyqtgraph. (Though ... this may not be much of a promise. :)

This site uses cookies to improve your experience.