PO4A.7
Section: User Contributed Perl Documentation (1)
Updated: 202-0-13
Index
Return to Main Contents
NAME
po4a - framework to translate documentation and other materials
Introduction
po4a (PO for anything) eases the maintenance of documentation translation using
the classical gettext tools. The main feature of po4a is that it decouples the
translation of content from its document structure.
This document serves as an introduction to the po4a project with a focus on
potential users considering whether to use this tool and on the curious wanting
to understand why things are the way they are.
Why po4a?
The philosophy of Free Software is to make the technology truly available to
everyone. But licensing is not the only consideration: untranslated free
software is useless for non-English speakers. Therefore, we still have some
work to do to make software available to everybody.
This situation is well understood by most projects and everybody is now
convinced of the necessity to translate everything. Yet, the actual
translations represent a huge effort of many individuals, crippled by small
technical difficulties.
Thankfully, Open Source software is actually very well translated using the
gettext tool suite. These tools are used to extract the strings to translate
from a program and present the strings to translate in a standardized format
(called PO files, or translation catalogs). A whole ecosystem of tools has
emerged to help the translators actually translate these PO files. The result
is then used by gettext at run time to display translated messages to the end
users.
Regarding documentation, however, the situation still somewhat disappointing.
At first translating documentation may seem to be easier than translating a
program as it would seem that you just have to copy the documentation source
file and start translating the content. However, when the original
documentation is modified, keeping track of the modifications quickly turns
into a nightmare for the translators. If done manually, this task is unpleasant
and error-prone.
Outdated translations are often worse than no translation at all. End-users can
be tricked by documentation describing an old behavior of the program.
Furthermore, they cannot interact directly with the maintainers since they
don't speak English. Additionally, the maintainer cannot fix the problem as
they don't know every language in which their documentation is translated.
These difficulties, often caused by poor tooling, can undermine the motivation
of volunteer translators, further aggravating the problem.
The goal of the po4a project is to ease the work of documentation translators.
In particular, it makes documentation translations maintainable.
The idea is to reuse and adapt the gettext approach to this field. As with
gettext, texts are extracted from their original locations and presented to
translators as PO translation catalogs. The translators can leverage the
classical gettext tools to monitor the work to do, collaborate and organize as
teams. po4a then injects the translations directly into the documentation
structure to produce translated source files that can be processed and
distributed just like the English files. Any paragraph that is not translated
is left in English in the resulting document, ensuring that the end users never
see an outdated translation in the documentation.
This automates most of the grunt work of the translation maintenance.
Discovering the paragraphs needing an update becomes very easy, and the process
is completely automated when elements are reordered without further
modification. Specific verification can also be used to reduce the chance of
formatting errors that would result in a broken document.
Please also see the FAQ below in this document for a more complete list
of the advantages and disadvantages of this approach.
Supported formats
Currently, this approach has been successfully implemented to several kinds
of text formatting formats:
- man (mature parser)
-
The good old manual pages' format, used by so many programs out there. po4a
support is very welcome here since this format is somewhat difficult to use and
not really friendly to newbies.
The Locale::Po4a::Man(3pm) module also supports the mdoc format, used by
the BSD man pages (they are also quite common on Linux).
- AsciiDoc (mature parser)
-
This format is a lightweight markup format intended to ease the authoring of
documentation. It is for example used to document the git system. Those
manpages are translated using po4a.
See Locale::Po4a::AsciiDoc for details.
- pod (mature parser)
-
This is the Perl Online Documentation format. The language and extensions
themselves are documented using this format in addition to most existing Perl
scripts. It makes easy to keep the documentation close to the actual code by
embedding them both in the same file. It makes programmer's life easier, but
unfortunately, not the translator's, until you use po4a.
See Locale::Po4a::Pod for details.
- sgml (mature parser)
-
Even if superseded by XML nowadays, this format is still used for documents
which are more than a few screens long. It can even be used for complete books.
Documents of this length can be very challenging to update. diff often
reveals useless when the original text was re-indented after update.
Fortunately, po4a can help you after that process.
Currently, only DebianDoc and DocBook DTD are supported, but adding support for
a new one is really easy. It is even possible to use po4a on an unknown SGML
DTD without changing the code by providing the needed information on the
command line. See Locale::Po4a::Sgml(3pm) for details.
- TeX / LaTeX (mature parser)
-
The LaTeX format is a major documentation format used in the Free Software
world and for publications.
The Locale::Po4a::LaTeX(3pm) module was tested with the Python
documentation, a book and some presentations.
- text (mature parser)
-
The Text format is the base format for many formats that include long blocks of
text, including Markdown, fortunes, YAML front matter section, debian/changelog,
and debian/control.
This supports the common format used in Static Site Generators, READMEs, and
other documentation systems. See Locale::Po4a::Text(3pm) for details.
- xml and XHMTL (probably mature parser)
-
The XML format is a base format for many documentation formats.
Currently, the DocBook DTD (see Locale::Po4a::Docbook(3pm) for details) and
XHTML are supported by po4a.
- BibTex (probably mature parser)
-
The BibTex format is used alongside LaTex for formatting lists of references (bibliographies).
See Locale::Po4a::BibTex for details.
- Docbook (probably mature parser)
-
A XML-based markup language that uses semantic tags to describe documents.
See Locale::Po4a:Docbook for greater details.
- Guide XML (probably mature parser)
-
A XML documentation format. This module was developed specifically to help with supporting
and maintaining translations of Gentoo Linux documentation up until at least March 2016
(Based on the Wayback Machine). Gentoo have since moved to the DevBook XML format.
See Locale::Po4a:Guide for greater details.
- Wml (probably mature parser)
-
The Web Markup Language, do not mixup WML with the WAP stuff used on cell phones.
This module relies on the Xhtml module, which itself relies on the XmL module.
See Locale::Po4a::Wml for greater details.
- Yaml (probably mature parser)
-
A strict superset of JSON. YAML is often used as systems or configuration projects.
YAML is at the core of Red Hat's Ansible.
See Locale::Po4a::Yaml for greater details.
- RubyDoc (probably mature parser)
-
The Ruby Document (RD) format, originally the default documentation format for Ruby
and Ruby projects before converted to RDoc in 2002. Though apparently the Japanese
version of the Ruby Reference Manual still use RD.
See Locale::Po4a::RubyDoc for greater details.
- Halibut (probably experimental parser)
-
A documentation production system, with elements similar to TeX, debiandoc-sgml,
TeXinfo, and others, developed by Simon Tatham, the developer of PuTTY.
See Locale::Po4a:Halibut for greater details.
- Ini (probably experimental parser)
-
Configuration file format popularized by MS-DOS.
See Locale::Po4a::Ini for greater details.
- texinfo (very highly experimental parser)
-
All of the GNU documentation is written in this format (it's even one of the
requirements to become an official GNU project). The support for
Locale::Po4a::Texinfo(3pm) in po4a is still at the beginning.
Please report bugs and feature requests.
- gemtext (very highly experimental parser)
-
The native plain text format of the Gemini protocol. The extension
".gmi" is commonly used. Support for this module in po4a is still in
its infancy. If you find anything, please file a bug or feature
request.
- org (very highly experimental parser)
-
The document format used by the Org mode. Support for this module in
po4a is still in its infancy. If you find anything, please file a bug
or feature request.
- vimhelp (very highly experimental parser)
-
The format used for Vim help files (and some third-party plugin
documentation). Support for this format in po4a is still in its
infancy. If you find anything, please file a bug report or feature
request.
- simplepod (very highly experimental parser)
-
Similar to the previously mentioned pod, this one adopts the new
Pod::Simple as its parser. Since it is newly created, some bugs are
expected. If you notice any strange behavior, please let us know.
Eventually, this module will replace pod.
- Others supported formats
-
Po4a can also handle some more rare or specialized formats, such as the
documentation of compilation options for the 2.4+ Linux kernels (Locale::Po4a::KernelHelp) or the diagrams
produced by the dia tool (Locale::Po4a::Dia). Adding a new format is often very easy and the main
task is to come up with a parser for your target format. See
Locale::Po4a::TransTractor(3pm) for more information about this.
- Unsupported formats
-
Unfortunately, po4a still lacks support for several documentation formats. Many
of them would be easy to support in po4a. This includes formats not just used
for documentation, such as, package descriptions (deb and rpm), package
installation scripts questions, package changelogs, and all the specialized
file formats used by programs such as game scenarios or wine resource files.
Using po4a
The easiest way to use this tool in your project is to write a configuration file for the
po4a
program, and only interact with this program. Please refer to its documentation, in
po4a(1). The
rest of this section provides more details for the advanced users of po4a wanting to deepen their
understanding.
Detailed schema of the po4a workflow
Make sure to read
po4a(1) before this overly detailed section to get a simplified overview of the
po4a workflow. Come back here when you want to get the full scary picture, with almost all details.
In the following schema, master.doc is an example name for the documentation to be translated;
XX.doc is the same document translated in the language XX while doc.XX.po is the translation
catalog for that document in the XX language. Documentation authors will mostly be concerned with
master.doc (which can be a manpage, an XML document, an AsciiDoc file, etc); the translators
will be mostly concerned with the PO file, while the end users will only see the XX.doc file.
Transitions with square brackets such as "[po4a updates po]" represent the execution of a po4a tool
while transitions with curly brackets such as "{update of master.doc}" represent a manual
modification of the project's files.
master.doc
|
V
+<-----<----+<-----<-----<--------+------->-------->-------+
: | | :
{translation} | {update of master.doc} :
: | | :
XX.doc | V V
(optional) | master.doc ->-------->------>+
: | (new) |
V V | |
[po4a-gettextize] doc.XX.po -->+ | |
| (old) | | |
| ^ V V |
| | [po4a updates po] |
V | | V
translation.pot ^ V |
| | doc.XX.po |
| | (fuzzy) |
{translation} | | |
| ^ V V
| | {manual editing} |
| | | |
V | V V
doc.XX.po --->---->+<---<-- doc.XX.po addendum master.doc
(initial) (up-to-date) (optional) (up-to-date)
: | | |
: V | |
+----->----->----->------> + | |
| | |
V V V
+------>-----+------<------+
|
V
[po4a updates translations]
|
V
XX.doc
(up-to-date)
Again, this schema is overly complicated. Check on po4a(1) for a simplified overview.
The left part depicts how po4a-gettextize(1) can be used to convert an existing translation
project to the po4a infrastructure. This script takes an original document and its translated
counterpart, and tries to build the corresponding PO file. Such manual conversion is rather
cumbersome (see the po4a-gettextize(1) documentation for more details), but it is only needed
once to convert your existing translations. If you don't have any translation to convert, you can
forget about this and focus on the right part of the schema.
On the top right part, the action of the original author is depicted, updating the documentation.
The middle right part depicts the automatic updates of translation files: the new material is
extracted and compared against the exiting translation. The previous translation is used for the
parts that didn't change, while partially modified parts are connected to the previous translation
with a "fuzzy" marker indicating that the translation must be updated. New or heavily modified
material is left untranslated.
Then, the manual editing block depicts the action of the translators, that
modify the PO files to provide translations to every original string and
paragraph. This can be done using either a specific editor such as the GNOME
Translation Editor, KDE's Lokalize or poedit, or using an online
localization platform such as weblate or pootle. The translation result is
a set of PO files, one per language. Please refer to the gettext documentation
for more details.
The bottom part of the figure shows how po4a creates a translated source document from the
master.doc original document and the doc.XX.po translation catalog that was updated by the
translators. The structure of the document is reused, while the original content is replaced by its
translated counterpart. Optionally, an addendum can be used to add some extra text to the
translation. This is often used to add the name of the translator to the final document. See below
for details.
Upon invocation, po4a updates both the translation files and the translated documentation files
automatically.
Starting a new translation project
If you start from scratch, you just have to write a configuration file for po4a, and you are set.
The relevant templates are created for the missing files, allowing your contributors to translate
your project to their language. Please refer to
po4a(1) for a quick start tutorial and for all
details.
If you have an existing translation, i.e. a documentation file that was translated manually, you can
integrate its content in your po4a workflow using po4a-gettextize. This task is a bit cumbersome
(as described in the tool's manpage), but once your project is converted to po4a workflow,
everything will be updated automatically.
Updating the translations and documents
Once setup, invoking
po4a is enough to update both the translation PO files and translated
documents. You may pass the
"--no-translations" to
po4a to not update the translations (thus
only updating the PO files) or
"--no-update" to not update the PO files (thus only updating the
translations). This roughly corresponds to the individual
po4a-updatepo and
po4a-translate
scripts which are now deprecated (see "Why are the individual scripts deprecated" in the FAQ below).
Using addenda to add extra text to translations
Adding new text to the translation is probably the only thing that is easier in
the long run when you translate files manually :). This happens when you want
to add an extra section to the translated document, not corresponding to any
content in the original document. The classical use case is to give credits to
the translation team, and to indicate how to report translation-specific
issues.
With po4a, you have to specify addendum files, that can be conceptually
viewed as patches applied to the localized document after processing. Each
addendum must be provided as a separate file, which format is however very
different from the classical patches. The first line is a header line,
defining the insertion point of the addendum (with an unfortunately cryptic
syntax -- see below) while the rest of the file is added verbatim at the
determined position.
The header line must begin with the string PO4A-HEADER:, followed by a
semi-colon separated list of key=value fields.
For example, the following header declares an addendum that must be placed
at the very end of the translation.
PO4A-HEADER: mode=eof
Things are more complex when you want to add your extra content in the middle
of the document. The following header declares an addendum that must be
placed after the XML section containing the string "About this document" in
translation.
PO4A-HEADER: position=About this document; mode=after; endboundary=</section>
In practice, when trying to apply an addendum, po4a searches for the first line
matching the "position" argument (this can be a regexp). Do not forget that
po4a considers the translated document here. This documentation is in
English, but your line should probably read as follows if you intend your
addendum to apply to the French translation of the document.
PO4A-HEADER: position=À propos de ce document; mode=after; endboundary=</section>
Once the "position" is found in the target document, po4a searches for the next
line after the "position" that matches the provided "endboundary". The
addendum is added right after that line (because we provided an
endboundary, i.e. a boundary ending the current section).
The exact same effect could be obtained with the following header, that is equivalent:
PO4A-HEADER: position=About this document; mode=after; beginboundary=<section>
Here, po4a searches for the first line matching "<section>" after the line
matching "About this document" in the translation, and add the addendum
before that line since we provided a beginboundary, i.e. a boundary marking
the beginning of the next section. So this header line requires placing the
addendum after the section containing "About this document", and instruct po4a
that a section starts with a line containing the "<section>" tag. This is
equivalent to the previous example because what you really want is to add this
addendum either after "</section>" or before "<section>".
You can also set the insertion mode to the value "before", with a similar
semantic: combining "mode=before" with an "endboundary" will put the addendum
just after the matched boundary, that is the last potential boundary line before
the "position". Combining "mode=before" with an "beginboundary" will put the
addendum just before the matched boundary, that is the last potential boundary
line before the "position".
Mode | Boundary kind | Used boundary | Insertion point compared to the boundary
========|===============|========================|=========================================
'before'| 'endboundary' | last before 'position' | Right after the selected boundary
'before'|'beginboundary'| last before 'position' | Right before the selected boundary
'after' | 'endboundary' | first after 'position' | Right after the selected boundary
'after' |'beginboundary'| first after 'position' | Right before the selected boundary
'eof' | (none) | n/a | End of file
Hint and tricks about addenda
- *
-
Remember that these are regexp. For example, if you want to match the end of a
nroff section ending with the line ".fi", do not use ".fi" as endboundary,
because it will match with "the[ fi]le", which is obviously not what you
expect. The correct endboundary in that case is: "^.fi$".
- *
-
White spaces ARE important in the content of the "position" and boundaries. So
the two following lines are different. The second one will only be found if
there is enough trailing spaces in the translated document.
PO4A-HEADER: position=About this document; mode=after; beginboundary=<section>
PO4A-HEADER: position=About this document ; mode=after; beginboundary=<section>
- *
-
Although this context search may be considered to operate roughly on each line
of the translated document, it actually operates on the internal data string of
the translated document. This internal data string may be a text spanning a
paragraph containing multiple lines or may be a XML tag itself alone. The
exact insertion point of the addendum must be before or after the internal
data string and can not be within the internal data string.
- *
-
Pass the "-vv" argument to po4a to understand how the addenda are added to the
translation. It may also help to run po4a in debug mode to see the actual
internal data string when your addendum does not apply.
Addenda examples
- *
-
If you want to add something after the following nroff section:
.SH "AUTHORS"
You should select a two-step approach by setting mode=after. Then you should
narrow down search to the line after AUTHORS with the position argument
regex. Then, you should match the beginning of the next section (i.e.,
^.SH) with the beginboundary argument regex. That is to say:
PO4A-HEADER:mode=after;position=AUTHORS;beginboundary=.SH
- *
-
If you want to add something right after a given line (e.g. after the line
"Copyright Big Dude"), use a position matching this line, mode=after and
give a beginboundary matching any line.
PO4A-HEADER:mode=after;position=Copyright Big Dude, 2004;beginboundary=^
- *
-
If you want to add something at the end of the document, give a position
matching any line of your document (but only one line. Po4a won't proceed if
it's not unique), and give an endboundary matching nothing. Don't use simple
strings here like "EOF", but prefer those which have less chance to be in
your document.
PO4A-HEADER:mode=after;position=About this document;beginboundary=FakePo4aBoundary
More detailed example
Original document (POD formatted):
|=head1 NAME
|
|dummy - a dummy program
|
|=head1 AUTHOR
|
|me
Then, the following addendum will ensure that a section (in French) about
the translator is added at the end of the file (in French, "TRADUCTEUR"
means "TRANSLATOR", and "moi" means "me").
|PO4A-HEADER:mode=after;position=AUTEUR;beginboundary=^=head
|
|=head1 TRADUCTEUR
|
|moi
|
To put your addendum before the AUTHOR, use the following header:
PO4A-HEADER:mode=after;position=NOM;beginboundary=^=head1
This works because the next line matching the beginboundary "/^=head1/" after
the section "NAME" (translated to "NOM" in French), is the one declaring the
authors. So, the addendum will be put between both sections. Note that if
another section is added between NAME and AUTHOR sections later, po4a
will wrongfully put the addenda before the new section.
To avoid this you may accomplish the same using mode=before:
PO4A-HEADER:mode=before;position=^=head1 AUTEUR
How does it work?
This chapter gives you a brief overview of the po4a internals, so that you
may feel more confident to help us to maintain and to improve it. It may also
help you to understand why it does not do what you expected, and how to
solve your problems.
TransTractors and project architecture
At the core of the po4a project, the
Locale::Po4a::TransTractor(3pm) class is the common ancestor to
all po4a parsers. This strange name comes from the fact that it is at the same
time in charge of translating document and extracting strings.
More formally, it takes a document to translate plus a PO file containing
the translations to use as input while producing two separate outputs:
Another PO file (resulting of the extraction of translatable strings from
the input document), and a translated document (with the same structure as
the input one, but with all translatable strings replaced with content of
the input PO). Here is a graphical representation of this:
Input document -- /---> Output document
TransTractor:: / (translated)
+-->-- parse() --------+
/
Input PO --------/ ---> Output PO
(extracted)
This little bone is the core of all the po4a architecture. If you provide both
input and disregard the output PO, you get po4a-translate. If you disregard
the output document instead, you get po4a-updatepo. The po4a uses a first
TransTractor to get an up-to-date output POT file (disregarding the output
documents), calls msgmerge -U to update the translation PO files on disk, and
builds a second TransTractor with these updated PO files to update the output
documents. In short, po4a provides one-stop solution to update what needs to
be, using a single configuration file.
po4a-gettextize also uses two TransTractors, but another way: It builds one
TransTractor per language, and then build a new PO file using the msgids of the
original document as msgids, and the msgids of the translated document as
msgstrs. Much care is needed to ensure that the strings matched this way
actually match, as described in po4a-gettextize(1).
Format-specific parsers
All po4a format parsers are implemented on top of the TransTractor. Some of them
are very simple, such as the Text, Markdown and AsciiDoc ones. They load the
lines one by one using
TransTractor::shiftline(), accumulate the paragraphs'
content or whatever. Once a string is completely parsed, the parser uses
TransTractor::translate() to (1) add this string to the output PO file and
(2) get the translation from the input PO file. The parser then pushes the
result to the output file using
TransTractor::pushline().
Some other parsers are more complex because they rely on an external parser to
analyze the input document. The Xml, HTML, SGML and Pod parsers are built on top
of SAX parsers. They declare callbacks to events such as "I found a new title
which content is the following" to update the output document and output POT
files according to the input content using TransTractor::translate() and
TransTractor::pushline(). The Yaml parser is similar but different: it
serializes a data structure produced by the YAML::Tiny parser. This is why the
Yaml module of po4a fails to declare the reference lines: the location of each
string in the input file is not kept by the parser, so we can only provide
"$filename:1" as a string location. The SAX-oriented parsers use globals and
other tricks to save the file name and line numbers of references.
One specific issue arises from file encodings and BOM markers. Simple parsers
can forget about this issue, that is handled by TransTractor::read() (used
internally to get the lines of an input document), but the modules relying on an
external parser must ensure that all files are read with an appropriate PerlIO
decoding layer. The easiest is to open the file yourself, and provide an
filehandle or directly the full string to your external parser. Check on
Pod::read() and Pod::parse() for an example. The content read by the
TransTractor is ignored, but a fresh filehandle is passed to the external
parser. The important part is the "<:encoding($charset)" mode that is passed
to the open() perl function.
Po objects
The
Locale::Po4a::Po(3pm) class is in charge of loading and using PO and
POT files. Basically, you can read a file, add entries, get translations with
the
gettext() method, write the PO into a file. More advanced features such as
merging a PO file against a POT file or validating a file are delegated to
msgmerge and
msgfmt respectively.
Contributing to po4a
Even if you have never contributed to any Open Source project in the past, you
are welcome: we are willing to help and mentor you here. po4a is best maintained
by its users nowadays. As we lack manpower, we try to make the project welcoming
by improving the doc and the automatic tests to make you confident in
contributing to the project. Please refer to the CONTRIBUTING.md file for more
details.
Open-source projects using po4a
Here is a very partial list of projects that use po4a in production for their
documentation. If you want to add your project to the list, just drop
us an email (or a Merge Request).
- *
-
adduser (man): users and groups management tool.
- *
-
apt (man, docbook): Debian package manager.
- *
-
aptitude (docbook, svg): terminal-based package manager for Debian
- *
-
F-Droid website <https://gitlab.com/fdroid/fdroi-website> (markdown):
installable catalog of FOSS (Free and Open Source Software) applications for the
Android platform.
- *
-
git <https://github.com/jnavila/gi-manpage-l10n> (asciidoc):
distributed version-control system for tracking changes in source code.
- *
-
Linux manpages <https://salsa.debian.org/manpage-l10-team/manpage-l10n> (man)
This project provides an infrastructure for translating many manpages
to different languages, ready for integration into several major
distributions (Arch Linux, Debian and derivatives, Fedora).
- *
-
Stellarium <https://github.com/Stellarium/stellarium> (HTML):
a free open source planetarium for your computer. po4a is used to
translate the sky culture descriptions.
- *
-
Jamulus <https://jamulus.io/> (markdown, yaml, HTML):
a FOSS application for online jamming in real time. The website
documentation is maintained in multiple languages using po4a.
- *
-
Other item to sort out:
<https://gitlab.com/fdroid/fdroid-website/>
<https://github.com/fsfe/reuse-docs/pull/61>
FAQ
How do you pronounce po4a?
I personally vocalize it as pouah <
https://en.wiktionary.org/wiki/pouah>, which
is a French onomatopoetic that we use in place of yuck :) I may have a strange sense of humor :)
Why are the individual scripts deprecated?
Indeed,
po4a-updatepo and
po4a-translate are deprecated in favor of
po4a. The reason is
that while
po4a can be used as a drop-in replacement to these scripts, there is quite a lot of
code duplication here. Individual scripts last around 150 lines of codes while the
po4a program
lasts 1200 lines, so they do a lot in addition of the common internals. The code duplication results
in bugs occurring in both versions and needing two fixes. One example of such duplication are the
bugs #1022216 in Debian and the issue #442 in GitHub that had the exact same fix, but one in
po4a
and the other
po4a-updatepo.
In the long run, I would like to drop the individual scripts and only maintain one version of this
code. The sure thing is that the individual scripts will not get improved anymore, so only po4a
will get the new features. That being said, there is no deprecation urgency. I plan to keep the
individual scripts as long as possible, and at least until 2030. If your project still use
po4a-updatepo and po4a-translate in 2030, you may have a problem.
We may also remove the deprecation of these scripts at some point, if a refactoring reduces the code
duplication to zero. If you have an idea (or better: a patch), your help is welcome.
What about the other translation tools for documentation using gettext?
There are a few of them. Here is a possibly incomplete list, and more tools are
coming at the horizon.
- poxml
-
This is the tool developed by KDE people to handle DocBook XML. AFAIK, it
was the first program to extract strings to translate from documentation to
PO files, and inject them back after translation.
It can only handle XML, and only a particular DTD. I'm quite unhappy with
the handling of lists, which end in one big msgid. When the list become big,
the chunk becomes harder to swallow.
- po-debiandoc
-
This program done by Denis Barbier is a sort of precursor of the po4a SGML
module, which more or less deprecates it. As the name says, it handles only
the DebianDoc DTD, which is more or less a deprecated DTD.
- xml2po.py
-
Used by the GIMP Documentation Team since 2004, works quite well even if,
as the name suggests, only with XML files and needs specially configured
makefiles.
- Sphinx
-
The Sphinx Documentation Project also uses gettext extensively to manage its
translations. Unfortunately, it works only for a few text formats, rest and
markdown, although it is perhaps the only tool that does this managing the whole
translation process.
The main advantages of po4a over them are the ease of extra content addition
(which is even worse there) and the ability to achieve gettextization.
SUMMARY of the advantages of the gettext based approach
- *
-
The translations are not stored along with the original, which makes it
possible to detect if translations become out of date.
- *
-
The translations are stored in separate files from each other, which prevents
translators of different languages from interfering, both when submitting
their patch and at the file encoding level.
- *
-
It is based internally on gettext (but po4a offers a very simple
interface so that you don't need to understand the internals to use it).
That way, we don't have to re-implement the wheel, and because of their
wide use, we can think that these tools are more or less bug free.
- *
-
Nothing changed for the end-user (beside the fact translations will
hopefully be better maintained). The resulting documentation file
distributed is exactly the same.
- *
-
No need for translators to learn a new file syntax and their favorite PO
file editor (like Emacs' PO mode, Lokalize or Gtranslator) will work just fine.
- *
-
gettext offers a simple way to get statistics about what is done, what should
be reviewed and updated, and what is still to do. Some example can be found
at those addresses:
- https://docs.kde.org/stable5/en/kdesdk/lokalize/project-view.html
- http://www.debian.org/intl/l10n/
But everything isn't green, and this approach also has some disadvantages
we have to deal with.
- *
-
Addenda are somewhat strange at the first glance.
- *
-
You can't adapt the translated text to your preferences, like splitting a
paragraph here, and joining two other ones there. But in some sense, if
there is an issue with the original, it should be reported as a bug anyway.
- *
-
Even with an easy interface, it remains a new tool people have to learn.
One of my dreams would be to integrate somehow po4a to Gtranslator or
Lokalize. When a documentation file is opened, the strings are
automatically extracted, and a translated file + po file can be
written to disk. If we manage to do an MS Word (TM) module (or at
least RTF) professional translators may even use it.
SEE ALSO
- *
-
The documentation of the all-in-one tool that you should use: po4a(1).
- *
-
The documentation of the individual po4a scripts: po4a-gettextize(1),
po4a-updatepo(1), po4a-translate(1), po4a-normalize(1).
- *
-
The additional helping scripts:
msguntypot(1), po4a-display-man(1), po4a-display-pod(1).
- *
-
The parsers of each formats, in particular to see the options accepted
by each of them: Locale::Po4a::AsciiDoc(3pm)
Locale::Po4a::Dia(3pm), Locale::Po4a::Gemtext(3pm),
Locale::Po4a::Guide(3pm), Locale::Po4a::Ini(3pm),
Locale::Po4a::KernelHelp(3pm), Locale::Po4a::Man(3pm),
Locale::Po4a::RubyDoc(3pm), Locale::Po4a::Texinfo(3pm),
Locale::Po4a::Text(3pm), Locale::Po4a::Xhtml(3pm),
Locale::Po4a::Yaml(3pm), Locale::Po4a::BibTeX(3pm),
Locale::Po4a::Docbook(3pm), Locale::Po4a::Halibut(3pm),
Locale::Po4a::LaTeX(3pm), Locale::Po4a::Org(3pm),
Locale::Po4a::Pod(3pm), Locale::Po4a::SimplePod(3pm),
Locale::Po4a::Sgml(3pm), Locale::Po4a::TeX(3pm),
Locale::Po4a::VimHelp, Locale::Po4a::Wml(3pm),
Locale::Po4a::Xml(3pm).
- *
-
The implementation of the core infrastructure:
Locale::Po4a::TransTractor(3pm) (particularly important to understand the
code organization), Locale::Po4a::Chooser(3pm), Locale::Po4a::Po(3pm),
Locale::Po4a::Common(3pm). Please also check the CONTRIBUTING.md file in
the source tree.
AUTHORS
Denis Barbier <barbier,linuxfr.org>
Martin Quinson (mquinson#debian.org)
Index
- NAME
-
- Introduction
-
- Why po4a?
-
- Supported formats
-
- Using po4a
-
- Detailed schema of the po4a workflow
-
- Starting a new translation project
-
- Updating the translations and documents
-
- Using addenda to add extra text to translations
-
- How does it work?
-
- TransTractors and project architecture
-
- Format-specific parsers
-
- Po objects
-
- Contributing to po4a
-
- Open-source projects using po4a
-
- FAQ
-
- How do you pronounce po4a?
-
- Why are the individual scripts deprecated?
-
- What about the other translation tools for documentation using gettext?
-
- SUMMARY of the advantages of the gettext based approach
-
- SEE ALSO
-
- AUTHORS
-