...making Linux just a little more fun!

Linux's problem with Simplified Chinese

By Silas Brown

Chinese characters are used not only in Mainland China but also in Taiwan, Hong Kong, Japan, and sometimes in Korea and other places. There are differences in the way these countries write certain characters, and ideally this should just be a matter of selecting a different font. However, for various reasons some (but not all) of the "Simplified Chinese" characters used on the mainland have been given their own Unicode numbers that are different from the Unicode numbers of their nearest equivalent characters in Japan, etc. This leads to a problem.

Imagine a typical text in Simplified Chinese. Some of the characters have their own special code numbers that occur only in Simplified texts, whereas other characters have code numbers that can also occur in Japanese and other texts. Now imagine that a rendering system such as Pango is going to render this text. Pango takes the text character by character, and tries to find each character in the available fonts. Suppose it first of all finds a character that occurs in a Japanese font. There is a Japanese font on the system, so it takes that character from the Japanese font and renders it. But then the next character is not in the Japanese font because it is a special Simplified character that occurs only in a Simplified Chinese font. There is a Simplified Chinese font on the system too, so off it goes and renders that second character from the Simplified Chinese font.

The problem is that some of the characters (the ones that have special Simplified-only codepoints) will be rendered from the Simplified Chinese font, but others (the ones that share codepoints with Japanese etc) will be rendered from another font. If the fonts happen to have the exact same style and weight, etc., then nobody will notice, but usually there are noticeable differences in style between the fonts, so this mixing of fonts, with some characters taken from one font and others taken from another, can make the display of Simplified Chinese on Linux look very unprofessional.

Many users can tolerate this situation on the screen, but printing is a different matter. For example, the music typesetting system GNU Lilypond uses Pango to render the text of lyrics, etc., and if you want to produce beautiful-looking copy with Simplified Chinese text, this character-by-character mixing of fonts could be a showstopper.

The Pango renderer does have facilities for application programmers to specify the language and therefore influence the choice of fonts, such as by calling pango_context_set_language() or by using Pango markup. However, this is small consolation for those using applications that do not expose this functionality to the user.

One thing you can do is edit your ~/.fonts.conf file to tell the system which fonts you prefer. This will affect all applications, so you may need to change it back when you want to see Traditional Chinese or Japanese, etc. Below is a ~/.fonts.conf file which specifies the Arphic Simplified Chinese font as a preference (Debian/Ubuntu package ttf-arphic-gbsn00lp). It also specifies DejaVu for the Latin fonts, along with Gandhari, a font which is good for Pinyin markup.

<fontconfig>
	<alias>
		<family>serif</family>
		<prefer>
            <family>gandhari unicode</family>
			<family>dejavu serif</family>
			<family>ar pl sungtil gb</family>
		</prefer>
	</alias>
	<alias>
		<family>sans-serif</family>
		<prefer>
            <family>gandhari unicode</family>
			<family>dejavu sans</family>
			<family>ar pl sungtil gb</family>
		</prefer>
	</alias>
	<alias>
		<family>monospace</family>
		<prefer>
			<family>dejavu sans mono</family>
			<family>ar pl sungtil gb</family>
		</prefer>
	</alias>
</fontconfig>

On a Debian or Ubuntu system, most of the families can be seen by looking at /var/lib/defoma/x-ttcidfont-conf.d/dirs/TrueType/fonts.dir. To install Gandhari, download it from Andrew Glass's site, visit fonts:/ in Konqueror (package konqueror if you don't already have KDE) and drag gur.ttf into that folder.

This approach does not solve everything; for example, the on-screen display of Chinese in Tk applications might still be inconsistent (it does not seem possible to set a preferred order of fonts in Tk/X11; you can set one preferred font but there's no obvious way to control what it falls back on when displaying characters that are not available in that font). However, the use of ~/.fonts.conf should at least help with applications that you are likely to use for printing.


Talkback: Discuss this article with The Answer Gang


[BIO] Silas Brown is a legally blind computer scientist based in Cambridge UK. He has been using heavily-customised versions of Debian Linux since 1999.


Copyright © 2010, Silas Brown. Released under the Open Publication License unless otherwise noted in the body of the article. Linux Gazette is not produced, sponsored, or endorsed by its prior host, SSC, Inc.

Published in Issue 173 of Linux Gazette, April 2010

Tux