Sinhala script is sometimes classified as a `complex script', along with other `Indic' languages. Some of the modifiers coming before a base character makes matters even complicated.
Present standards (SLS 1134 and Unicode) assigns characters for each vowel, consonent and modifier. This makes certain things such as collation simple, but as modifiers sometimes have more than one shapes (e.g.: glyphs) associated with them and vice versa, the rendering engine has to do some extra work.
Examples: ko and kra
The GNU C Library implements locales as defined by POSIX and other related standards. Each locale is associated with the language, a country and optionally a character code. Sinhala languge (si) locale for Sri Lanka (LK) with UTF-8 encoding is signified by si_LK.UTF-8.
Each locale contains attributes related to language and a country, including currency formats and symbols, number formats, date and month names and formats, paper sizes, ways of writing names and addresses, phone codes and formats and measurement systems.
Properly internationalized programs query the locale database for even the most trivial matters such as date or month names. Therefore, changing the locale makes all the programs behave according to the relevent country, language and encoding.
Here is an example - short names for days of week (from /usr/share/i18n/locales/si_LK):
% Abbreviated weekday names (%a)
abday "<U0D89>";"<U0DC3>";/
"<U0D85>";"<U0DB6>";/
"<U0DB6><U0DCA><U200D><U0DBB>";"<U0DC3><U0DD2>";/
"<U0DC3><U0DD9>"
Once a locale is created, it has to be `compiled' by running localedef or a higher level tool such as locale-gen on Debian.
Here is an example which shows the effect of setting the locale on the date program.
% date Thu Mar 10 18:05:04 LKT 2005 % export LC_ALL=si_LK.UTF-8 % date 2005 මාර්තු 10 වැනි බ්රහස්පතින්දා 18:04:52 +0600 %
Aliases for locales can be defined in /etc/locale.aliases for convenience as follows:
si si_LK.UTF-8 si_LK si_LK.UTF-8 sinhala si_LK.UTF-8
Sinhala locale for Sri Lanka has been submitted to the Bugzilla of GNU C Library.
Internationalized programs don't just print messages. Instead, they lookup for the proper translation in the message databae.
The GNU C Library provides two ways of message translation. Out of them, the Uniforum approach, or the gettext family of functions, is more popular in GNU/Linux systems, and we provide only gettext catelogues so far.
The effect of changing environment variables such as LC_ALL has a similar effect on internationalized applications; all the strings that have equivalents in the gettext catelogue will be displayed translated.
A message catelogue is made by first creating a PO file and compiling it to an MO file using a tool msgfmt. A typical PO file looks like this:
msgid "Close" msgstr "වසන්න" msgid "Copy" msgstr "පිටපත් කරන්න" msgid "Contents" msgstr "පටුන"
Compiled MO files for Sinhala are placed in /usr/share/locale/si/LC_MESSAGES/. Generally, each program of library has an MO file associated with it. Here is an example of gedit with the locales C and si_LK.UTF-8 respectively.
The X Window System also has a locale system almost independent from the C Library. However, programmes based on GTK and QT libraries have their own rendering engines and use the locales and translation catelogues in the C library. Therefore, it's not necessary to add si_LK locale exclusively to X.
However, X Window System will not switch to the relevant locale when it doesn't know about it. The common workaround is to `bind' those locales to en_US.UTF-8. We have submitted patches to both X.Org and, XFree86 to do this for si_LK.UTF-8.
Relevent files are compose.dir, locale.alias and locale.dir in /usr/X11R6/lib/X11/locale/. Here is an extract from locale.dir:
en_US.UTF-8/XLC_LOCALE: sh_YU.UTF-8 en_US.UTF-8/XLC_LOCALE: si_LK.UTF-8 en_US.UTF-8/XLC_LOCALE: sk_SK.UTF-8
Almost all the `complex' scripts are using OpenType fonts, an extension to Apple's Truetype font format.
For Sinhala GNU/Linux, we created an OpenType font using outlines from the Sinhala LaTeX project. These outlines were originally developed by Yannis Haralambous.
Pattern substitution was added to the font using FontForge (formerly known as PFAEdit). Basic glyphs were left at the Sinhala unicode code page, but ligatures (combinations) were added to the end of fonts without assigning code points (-1). Here is an example (`nu'):
GNU Network Object Model Environment (GNOME) is an application development framework. which is commonly known for it's desktop environment.
GNOME is built on top of the Gimp Toolkit (GTK), originally developed for the GNU Image Manipulation Program (GIMP).
GTK / GNOME uses a library called Pango to `shape' strings, i.e., to converts strings encoded in different languages into sequences of glyphs (shapes) from fonts.
Original patch submitted to Pango simply added Sinhala into the indic OpenType rendering module. However, it tried to create conjuncts implicitly, even when ZWJ is not present. A fix was submitted and is available in Pango 1.8.1 onwards.
GTK supports `input method' modules, that can be selected by right clicking on any widget that does text input.