Friday, August 8, 2008

Translating man-pages

Update, 2020-06-20: I've maintained the post below mainly for historical interest. However things have moved along quite a bit in 12 years, rendering various of my suggestions obsolete. If you are thinking about translating manual pages, see the manpages-I10n project, which oversees translations into several languages and employs a range of useful tooling for translation.

Lately, I've gotten a few requests for information about how to translate man-pages into other languages. First off, I should say that I have never translated man-pages. But I have communicated with a few people who do. So these are my current thoughts... Do you really want to do this? Before you answer this, consider the following:
  • man-pages contains the documentation of the Linux and glibc programming APIs (i.e., pages in Sections 2, 3, 4, 5, and 7). Is this the set of man pages that your group of language speakers most need? If, for example, you are more interested in translating pages for end users, then you might want instead to translate pages in the coreutils package. (More generally, if you want to find out which package a particular man page belongs to, take a look here.)
  • How big is the target audience? Your target audience is primarily programmers. What proportion of them aren't able to read English well enough to read man pages, and therefore would benefit from a translation? Is that group big enough to warrant the effort of a translation? Or is there perhaps a better place where you can invest your time in working on Linux?
  • How much time do you have? There are currently around 850 pages in man-pages, amounting to perhaps 2000 pages of printed text. My guess is that this amounts to one to two person years of translation work. In other words, you'll need to have a team of translators, if you intend to complete the translation in any reasonable time.
  • What is your longer term commitment? man-pages is a moving target: starting a couple of months ago, I'm now working full time on man-pages, and I make a release every week or so. The French translator estimates that there is around two days' work for him translating each release. Now, I may not be working full time on man-pages forever, and therefore the required translation effort may decrease some day, but the point remains that there is a significant ongoing effort required to keep a translation up to date and useful.

The size of the translation effort should not be underestimated. It is because it is so large that to date there has been only one complete and up-to-date translation: the French translation. (For a while, there was a fairly full German translation, but it seems to have languished for a few years now.) The state of the French translation has largely been down to the extraordinary work of two people: Christophe Blaess, and more recently, Alain Portal. (In fact, there are nowadays two French translations which cooperate to some extent: the Debian distribution has a team doing a French translation of man-pages.) But nowadays even the French translator(s) have started to feel the strain resulting from the recent increase in my output.

If you decide you really want to do a translation (and think very carefully before you do decide that!), then I have a few thoughts on how you go about it.

Tools: I have no real recommendations here (since I never translated man-pages). But it's worth mentioning that the Debian French translators use po4a, and see it as very beneficial for their work, especially for facilitating the work of a team of translators.

Other than that, I'd say that you need to:

  • Estimate the time required to translate the 850 pages in man-pages, and decide if you have the necessary number translators who have sufficient time to complete the work.
  • Divide the work up so that your translators can work independently on translations. I suggest you divide the pages up into small, related parcels. For example, the POSIX message queue pages (mq_*) could be a parcel translated by a single translator, or the math man pages could be a parcel translated by a single translator, etc.
  • Come up with a review plan, so that each translation by one member of your team is reviewed by at least one other member.
  • Devise a glossary of terminology, so that you all translate English technical terms ("e.g., shared memory segment") into the same terms in the target language.
  • Plan for ongoing maintenance, so that as the English man pages are updated, then the translated pages are also updated. Don't underestimate the amount of this work!

My suggestion is that if you go forward with a translation project, then:

  • Pick a particular man-pages release -- let's say man-pages-3.x -- and translate all of the pages in that version.
  • When that is completed, you can then update your translation with all of the changes that have occurred in the English original since release man-pages-3.x. I keep fairly detailed changelogs which should assist you during this phase of the work.

I suggest doing things this way since I estimate that trying to do a translation while simultaneously trying to keep up with changes in already translated pages would just prove too difficult. You might decide otherwise.

And finally... did I mention that you should think long and hard before embarking on a translation of man-pages?

[12 Aug 08: minor updates, to point out exactly which sections are in man-pages, and to suggest more appropriate pages to translate, if targeting end users.]

1 comment:

Michael H. said...

Do we have any kind of figures as to which language's translation would have the most benefits for the dev community? I know it might well be impossible to compile that kind of data. Still worth considering though.