We go through a step-by-step description of how to make on-screen messages from a toy program to appear in Oriya instead of English; starting from the programming and ending with the user's viewpoint. Some discussion is also made of how to go about the task of translation.
This article describes how to support native languages under a system using the GNU gettext utilities. While it should be applicable to other versions of gettext, the one actually used for the examples here is version 0.12.1. Another system, called catgets, described in the X/Open Portability Guide, is also in use, but we shall not discuss that here.
1 #include <libintl.h> 2 #include <locale.h> 3 #include <stdio.h> 4 #include <stdlib.h> 5 int main(void) 6 { 7 setlocale( LC_ALL, "" ); 8 bindtextdomain( "hello", "/usr/share/locale" ); 9 textdomain( "hello" ); 10 printf( gettext( "Hello, world!\n" ) ); 11 exit(0); 12 }Of course, a real program would check the return values of the functions and try to deal with any errors, but we have omitted that part of the code for clarity. Compile as usual with gcc -o hello hello.c. The program should be linked to the GNU libintl library, but as this is part of the GNU C library, this is done automatically for you under Linux, and other systems using glibc.
#define _(STRING) gettext(STRING)and then use _(string) instead of gettext(string).
Let us dissect the program line-by-line.
printf( "Hello, world!\n" );with,
printf( gettext( "Hello, world!\n" ) );(If you are unfamiliar with C, the \n at the end of the string produces a newline at the end of the output.) This simple modification to all translatable strings allows the translator to work independently from the programmer. gettextize eases the task of the programmer in adapting a package to use GNU gettext for the first time, or to upgrade to a newer version of gettext.
xgettext -d hello -o hello.pot hello.cThis processes the source code in hello.c, saving the output in hello.pot (the argument to the -o option). The message domain for the program should be specified as the argument to the -d option, and should match the domain specified in the call to textdomain (on line 9 of the program source). Other details on how to use gettext can be found from “man gettext.”
A .pot (portable object template) file is used as the basis for translating program messages into any language. To start translation, one can simply copy hello.pot to oriya.po (this preserves the template file for later translation into a different language). However, the preferred way to do this is by use of the msginit program, which takes care of correctly setting up some default values,
msginit -l or_IN -o oriya.po -i hello.potHere, the -l option defines the locale (an Oriya locale should have been installed on your system), and the -i and -o options define the input and output files, respectively. If there is only a single .pot file in the directory, it will be used as the input file, and the -i option can be omitted. For me, the oriya.po file produced by msginit would look like:
# Oriya translations for PACKAGE package. # Copyright (C) 2004 THE PACKAGE'S COPYRIGHT HOLDER # This file is distributed under the same license as the PACKAGE package. # Gora Mohanty <gora_mohanty@yahoo.co.in>, 2004. # msgid "" msgstr "" "Project-Id-Version: PACKAGE VERSION\n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2004-06-22 02:22+0530\n" "PO-Revision-Date: 2004-06-22 02:38+0530\n" "Last-Translator: Gora Mohanty <gora_mohanty@yahoo.co.in>\n" "Language-Team: Oriya\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" #: hello.c:10 msgid "Hello, world!\n" msgstr ""msginit prompted for my email address, and probably obtained my real name from the system password file. It also filled in values such as the revision date, language, character set, presumably using information from the or_IN locale.
It is important to respect the format of the entries in the .po (portable object) file. Each entry has the following structure:
WHITE-SPACE # TRANSLATOR-COMMENTS #. AUTOMATIC-COMMENTS #: REFERENCE... #, FLAG... msgid UNTRANSLATED-STRING msgstr TRANSLATED-STRINGwhere, the initial white-space (spaces, tabs, newlines,...), and all comments might or might not exist for a particular entry. Comment lines start with a '#' as the first character, and there are two kinds: (i) manually added translator comments, that have some white-space immediately following the '#,' and (ii) automatic comments added and maintained by the gettext tools, with a non-white-space character after the '#.' The msgid line contains the untranslated (English) string, if there is one for that PO file entry, and the msgstr line is where the translated string is to be entered. More on this later. For details on the format of PO files see gettext::Basics::PO Files:: in the Emacs info-browser (see Appdx. A for an introduction to using the info-browser in Emacs).
The first thing to do is fill in the comments at the beginning and the header entry, parts of which have already been filled in by msginit. The lines in the header entry are pretty much self-explanatory, and details can be found in the gettext::Creating::Header Entry:: info node. After that, the remaining work consists of typing the Oriya text that is to serve as translations for the corresponding English string. For the msgstr line in each of the remaining entries, add the translated Oriya text between the double quotes; the translation corresponding to the English phrase in the msgid string for the entry. For example, for the phrase “Hello world! \n” in oriya.po, we could enter “ନମସ୍କାର \n”. The final oriya.po file might look like:
# Oriya translations for hello example package. # Copyright (C) 2004 Gora Mohanty # This file is distributed under the same license as the hello example package. # Gora Mohanty <gora_mohanty@yahoo.co.in>, 2004. # msgid "" msgstr "" "Project-Id-Version: oriya\n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2004-06-22 02:22+0530\n" "PO-Revision-Date: 2004-06-22 10:54+0530\n" "Last-Translator: Gora Mohanty <gora_mohanty@yahoo.co.in>\n" "Language-Team: Oriya\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" "X-Generator: KBabel 1.3\n" #: hello.c:10 msgid "Hello, world!\n" msgstr "ନମସ୍କାର\n"
For editing PO files, I have found the kbabel editor suits me the best. The only problem is that while Oriya text can be entered directly into kbabel using the xkb Oriya keyboard layouts [1] and the entries are saved properly, the text is not displayed correctly in the kbabel window if it includes conjuncts. Emacs po-mode is a little restrictive, but strictly enforces conformance with the PO file format. The main problem with it is that it does not seem currently possible to edit Oriya text in Emacs. yudit is the best at editing Oriya text, but does not ensure that the PO file format is followed. You can play around a bit with these editors to find one that suits your personal preferences. One possibility might be to first edit the header entry with kbabel or Emacs po-mode, and then use yudit to enter the Oriya text on the msgstr lines.
msgfmt -c -v -o hello.mo oriya.poThe -c option does detailed checking of the PO file format, -v makes the program verbose, and the output filename is given by the argument to the -o option. Note that the base of the output filename should match the message domain given in the first arguments to bindtextdomain and textdomain on lines 8 and 9 of the example program in Sec. 2. The .mo (machine object) file should be stored in the location whose base directory is given by the second argument to bindtextdomain. The final location of the file will be in the sub-directory LL/LC_MESSAGES or LL_CC/LC_MESSAGES under the base directory, where LL stands for a language, and CC for a country. For example, as we have chosen the standard location, /usr/share/locale, for our base directory, and for us the language and country strings are “or” and “IN,” respectively, we will place hello.mo in /usr/share/locale/or_IN. Note that you will need super-user privilege to copy hello.mo to this system directory. Thus,
mkdir -p /usr/share/locale/or_IN/LC_MESSAGES cp hello.mo /usr/share/locale/or_IN/LC_MESSAGES
echo $LANG export LANG=or_INThe first statement shows you the current setting of your locale (this is usually en_US, and you will need it to reset the default locale at the end), while the second one sets it to an Oriya locale.
A Unicode-capable terminal emulator is needed to view Oriya output directly. The new versions of both gnome-terminal and konsole (the KDE terminal emulator) are Unicode-aware. I will focus on gnome-terminal as it seems to have better support for internationalization. gnome-terminal needs to be told that the bytes arriving are UTF-8 encoded multibyte sequences. This can be done by (a) choosing Terminal -> Character Coding -> Unicode (UTF-8), or (b) typing “/bin/echo -n -e ' \033% \G'” in the terminal, or (c) by running /bin/unicode_start. Likewise, you can revert to the default locale by (a) choosing Terminal -> Character Coding -> Current Locale (ISO-8859-1), or (b) “/bin/echo -n -e ' \033% \@',” or (c) by running /bin/unicode_stop. Now, running the example program (after compiling with gcc as described in Sec. 2) with,
./helloshould give you output in Oriya. Please note that conjuncts will most likely be displayed with a “halant” as the terminal probably does not render Indian language fonts correctly. Also, as most terminal emulators assume fixed-width fonts, the results are hardly likely to be aesthetically appealing.
An alternative is to save the program output in a file, and view it with yudit which will render the glyphs correctly. Thus,
./hello > junk yudit junkDo not forget to reset the locale before resuming usual work in the terminal. Else, your English characters might look funny.
While all this should give the average user some pleasure in being able to see Oriya output from a program without a whole lot of work, it should be kept in mind that we are still far from our desired goal. Hopefully, one day the situation will be such that rather than deriving special pleasure from it, users take it for granted that Oriya should be available and are upset otherwise.
1 #include <libintl.h> 2 #include <locale.h> 3 #include <stdio.h> 4 #include <stdlib.h> 5 int main(void) 6 { 7 setlocale( LC_ALL, "" ); 8 bindtextdomain( "hello", "/usr/share/locale" ); 9 textdomain( "hello" ); 10 printf( gettext( "Hello, world!\n" ) ); 11 printf( gettext( "How are you\n" ) ); 12 exit(0); 13 }For such a small change, it would be simple enough to just repeat the above cycle of extracting the relevant English text, translating it to Oriya, and preparing a new message catalog. We can even simplify the work by cutting and pasting most of the old oriya.po file into the new one. However, real programs will have thousands of such strings, and we would like to be able to translate only the changed strings, and have the gettext utilities handle the drudgery of combining the new translations with the old ones. This is indeed possible.
xgettext -d hello -o hello-new.pot hello.cNow, we use a new program, msgmerge, to merge the existing .po file with translations into the new template file, viz.,
msgmerge -U oriya.po hello-new.potThe -U option updates the existing .po file, oriya.po. We could have chosen to instead create a new .po file by using “-o <filename>” instead of -U. The updated .po file will still have the old translations embedded in it, and new entries with untranslated msgid lines. For us, the new lines in oriya.po will look like,
#: hello.c:11 msgid "How are you?\n" msgstr ""For the new translation, we could use, “ଆପଣ କିପରି ଅଛନ୍ତି?” in place of the English phrase “How are you?” The updated oriya.po file, including the translation might look like:
# Oriya translations for hello example package. # Copyright (C) 2004 Gora Mohanty # This file is distributed under the same license as the hello examplepackage. # Gora Mohanty <gora_mohanty@yahoo.co.in>, 2004. # msgid "" msgstr "" "Project-Id-Version: oriya\n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2004-06-23 14:30+0530\n" "PO-Revision-Date: 2004-06-22 10:54+0530\n" "Last-Translator: Gora Mohanty <gora_mohanty@yahoo.co.in>\n" "Language-Team: Oriya\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" "X-Generator: KBabel 1.3\n" #: hello.c:10 msgid "Hello, world!\n" msgstr "ନମସ୍କାର\n" #: hello.c:11 msgid "How are you?\n" msgstr "ଆପଣ କିପରି ଅଛନ୍ତି?\n"
Compile oriya.po to a machine object file, and install in the appropriate place as in Sec. 2.4. Thus,
msgfmt -c -v -o hello.mo oriya.po mkdir -p /usr/share/locale/or_IN/LC_MESSAGES cp hello.mo /usr/share/locale/or_IN/LC_MESSAGESYou can test the Oriya output as above, after recompiling hello.c and running it in an Oriya locale.
This work is part of the project for enabling the use of Oriya under Linux. I thank my uncle, N. M. Pattnaik, for conceiving of the project. We have all benefited from the discussions amidst the group of people working on this project. On the particular issue of translation, the help of H. R. Pansari, A. Nayak, and M. Chand is much appreciated.
The info browser can be started by typing “C-h i” in Emacs. The first time you do this, it will briefly list some commands available inside the info browser, and present you with a menu of major topics. Each menu item, or cross-reference is hyperlinked to the appropriate node, and you can visit that node either by moving the cursor to the item and pressing Enter, or by clicking on it with the middle mouse button. To get to the gettext menu items, you can either scroll down to the line,
* gettext: (gettext). GNU gettext utilities.and visit that node. Or, as it is several pages down, you can locate it using “I-search.” Type “C-s” to enter “I-search” which will then prompt you for a string in the mini-buffer at the bottom of the window. This is an incremental search, so that Emacs will keep moving you forward through the buffer as you are entering your search string. If you have reached the last occurrence of the search string in the current buffer, you will get a message saying “Failing I-search: ...” on pressing “C-s.” At that point, press “C-s” again to resume the search at the beginning of the buffer. Likewise, “C-r” incrementally searches backwards from the present location.
Info nodes are listed in this document with a “::” separator, so that one can go to the gettext::Creating::Header Entry:: by visiting the “gettext” node from the main info menu, navigating to the “Creating” node, and following that to the “Header Entry” node.
A stand-alone info browser, independent of Emacs, is also available on many systems. Thus, the gettext info page can also be accessed by typing “info gettext” in a terminal. xinfo is an X application serving as an info browser, so that if it is installed, typing “xinfo gettext” from the command line will open a new browser window with the gettext info page.
This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.70)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -no_math -html_version 4.0,math,unicode,i18n,tables -split 0 memo
The translation was initiated by Gora Mohanty on 2004-07-24