[Scummvm-devel] Translations of the About text in ScummVM

Mon May 9 12:22:14 CEST 2011

Am 08.05.2011 um 22:16 schrieb Marcus Comstedt:

> 
> Max Horn <max at quendi.de> writes:
> 
>>> If the Ukranian locale is active, don't you have to map it to
>>> iso-8859-5 instead of latin1?
>> 
>> No, because this mapping is only used as a fallback, when no translation is available. The idea is that then only the built-in font is available, which in turn is latin1 based, too.
> 
> Hm, maybe I'm missing something about how the translation system in
> ScummVM works, but to me it's not obvious how it follows that only
> latin1 fonts are available from the fact that the Ukrainian
> translation does not provide a translation for the msgid
> "Touché", thus making a fallback necessary...

Right, I was implicitly assuming that all localizations would provide translations for all strings with HTML entities. But of course you are right, this might not be the case for some of them.

In that case, we could provide a function which can map the entities to either Latin1, or ASCII (both mappings are present in credits.pl already).

Or we go back to the variant of the plan where we use ASCII msgids, i.e. approach 1 in my other email.

> Anyway, if only mapping to latin1 should be perfomed, I'd like to
> suggest "Touché" instead of "Touché", to avoid bloating
> lookup tables.

I don't really mind much either way, and could very well live with that, too.

It's just that I personally find it much easier to remember what é stands for, as opposed to é. So if I had to translate the resulting .po files, I'd have a much easier time if the .po file contained the msgid "Touché". But if we can come up with a way that automatically pre-populates correct "translations" in the .po files for these strings, then I guess it wouldn't matter either way.
(Another reason why I always preferred ä etc. is that they separate meaning from encoding. But that's admittedly a point that is less relevant for our purpose.)

By the way, I don't think the lookup table size is such a problem, we are talking about currently 12 entities here. Plus one extra check for "Torbjörn", in case we are converting to ASCII, where his name should become "Torbjorn", as opposed to the other "ö" which all become "oe" ;). 

Anyway, this is IMHO a bikeshed detail we can worry about later; it won't be a problem to implement exclusive support for "&#NNN;" style entities after all :). Let's first settle if we really want to go in this direction.

>  Or even better, one could have "Touché" in the code,
> and map it to "Touché" in the non-fallback case, no?

Depends on what you mean with "have Touché in the code" ... ? of course, our code files should all be ASCII only, hence so far we had this in the code (using octal character constants)
  "Touch\351",

But whatever we use in the code is then used for the msgid in the .po files, though octal encoded character constants like the above are indeed converted. So the above would lead to msgid "Touché" showing up in latin1 encoded files -- and something very different in iso-8859-5 files. Which is what we want to avoid... Changing that would require modifying xgettext, I think, so I'd rather avoid that (but I am not a gettext expert, if there is a way, I'd be happy to learn about it).

So, for now, it seems preferable to put one of these into the code:
  "Touché"
  "Touché"

In each case, this is the string the translator's would see as "msgid".

Bye,
Max