[Scummvm-devel] Translations of the About text in ScummVM

Mon May 9 14:16:16 CEST 2011

Max Horn <max at quendi.de> writes:

[...]
> (Another reason why I always preferred ä etc. is that they separate meaning from encoding. But that's admittedly a point that is less relevant for our purpose.)

Yes, that's why I made the reservation about recoding only to latin1.
If multiple output condings (such as both latin1 and us-ascii) should
be supported, then an encoding neutral makes sense.

> By the way, I don't think the lookup table size is such a problem, we are talking about currently 12 entities here. Plus one extra check for "Torbjörn", in case we are converting to ASCII, where his name should become "Torbjorn", as opposed to the other "ö" which all become "oe" ;). 

So would you use ö for one and &ouml2; for the other?

[...]
> But whatever we use in the code is then used for the msgid in the
> .po files, though octal encoded character constants like the above
> are indeed converted. So the above would lead to msgid "Touché"
> showing up in latin1 encoded files -- and something very different
> in iso-8859-5 files. Which is what we want to avoid... Changing that
> would require modifying xgettext, I think, so I'd rather avoid that
> (but I am not a gettext expert, if there is a way, I'd be happy to
> learn about it).

Actually, no change to xgettext is required, just postprocess the
generated file:

  perl -pe 's/[\200-\377]/"&#".ord($&).";"/e'

or even

  perl -pe 'use HTML::Entities;s/[\200-\377]/encode_entities($&)/e' 

Although I fear xgettext might recode the file to UTF-8 (which should
be a problem already now, since the msgid would not match "Touch\351"
but rather "Touch\303\251" instead), so some additional filtering to
fix that would also be needed.

But anyway, having

  "Touch\351"

or

  "Touché"

in the code does not make much difference of course.  :-)

  // Marcus