Localization/Simplified Chinese

From ArchWiki

According to "The Arch Way": We cannot configure everything for you, because "Preferences and needs are different for everyone", but we will try to ensure the configuration to be convenient and simple. In fact, it is even easier than some Chinese versions of Linux.

This article provides Chinese cultural guidance for various common software as much as possible. But in practical applications, you may encounter all kinds of issues. Do not be discouraged when you are in trouble. Solving problems is a pleasure in itself. You can seek help through various platforms:

Basic Chinese support

To properly display Chinese, you must set the locale correctly and install the appropriate Chinese fonts.

locale settings

Install Chinese locale

In Linux, locales are used to set up different environments for running programs. Commonly used Chinese locales are (the most intuitive is the number of words that can be displayed):

zh_CN.GB2312
zh_CN.GBK
zh_CN.GB18030
zh_CN.UTF-8
zh_TW.BIG-5
zh_TW.UTF-8

It is recommended to use UTF-8 locale. You need to modify /etc/locale.gen to set the locales that can be used in the system (erase the comment symbol "#" before the corresponding item):

en_US.UTF-8 UTF-8
zh_CN.UTF-8 UTF-8

After executing locale-gen, the selected locales can be used in the system. You may use locale to view the currently used locale(s), and locale -a to view the currently available locales.

Enable Chinese locales

Warning: Globally setting Chinese locales in /etc/locale.conf will cause tty to display garbled texts due to the tty glyph limitation of Linux kernel. To properly display Chinese characters under tty, install and configure zhconAUR.
Set the global default locale to English (optional)

To avoid the tty garbled text issue mentioned above, globally set the LANG locale to en_US.UTF-8 in /etc/locale.conf:

LANG=en_US.UTF-8
User-specific locales

You may set your own user environment variables in ~/.bashrc, ~/.xinitrc, or ~/.xprofile.

  • .bashrc: Settings are applied everytime you log in using the terminal.
  • .xinitrc: Settings are applied everytime you use startx or SLiM to start the X interface.
  • .xprofile: Settings are applied everytime you log in using a display manager such as GDM.
Set Chinese locales for graphical interfaces

It is not recommended to set a global Chinese locale in /etc/locale.conf because it causes tty to display garbled characters.

As mentioned earlier, Chinese locale can be set separately in ~/.xinitrc or ~/.xprofile. Prepend the following two lines to one of the two files (if you are not sure which file to use, prepend to both):

export LANG=zh_CN.UTF-8
export LANGUAGE=zh_CN:en_US
Warning: Be sure to put them before the exec _example_WM_or_DE_ line in ~/.xinitrc.
Note: This method is suitable for SLiM users or for people who don't use a graphical login interface (aka greeter). GDM and SDDM users can configure the display language in GNOME or KDE settings.
Note: It is not recommended to override all locale settings with a global export LC_ALL. LC_ALL should be reserved for diagnostic debugging purposes only. LC_ALL will bring unnecessary difficulties for diagnosing language settings issues.

Chinese fonts

Install fonts

In addition to locales, Chinese fonts are also required.

Commonly used free (GPL or compatible copyright) Chinese fonts include:

Tango-view-refresh-red.pngThis article or section is out of date.Tango-view-refresh-red.png

Reason: The ~/.fonts path is deprecated. It may be preferable to link to Fonts#Manual installation instead of duplicating information regarding font paths here. (Discuss in Talk:Localization/Simplified Chinese)

System fonts will be installed to /usr/share/fonts by default. If you do not have root authority or plan to use certain fonts yourself, you can directly copy these fonts to ~/.fonts (or its subdirectories) and add the path to /etc/fonts/local.conf. For details, go over the following chapters.

See also: [1]

Chinese fonts configuration

fontconfig settings

Tango-view-refresh-red.pngThis article or section is out of date.Tango-view-refresh-red.png

Reason: The ~/.fonts.conf path appears to be deprecated. It may be preferable to link to Font configuration#Fontconfig configuration or Font configuration (简体中文)#Fontconfig配置 instead of explicitly mentioning font configuration paths. (Discuss in Talk:Localization/Simplified Chinese)

The setting file of fontconfig is ~/.fonts.conf (user) or /etc/fonts/conf.d (global). It is recommended to modify the former.

For Chinese fonts settings, see Fonts (简体中文) and Font configuration (简体中文).

Font Configuration (简体中文)/Chinese (简体中文) provides a demonstration of Chinese fontconfig.

See also:

Fixed Simplified Chinese display as a variant (Japanese) glyph

After installing Noto Sans CJK, adobe-source-han-sans-otc-fonts (Siyuan Bold) or adobe-source-han-serif-otc-fonts (Siyuan Song), in some cases (framework undefined area), rendered Chinese characters do not match the standard form, such as 门, 关, and 复.

This is because different default fonts can be set in each program, such as Arial or Tohamo, and the attributes of these fonts are controlled by fontconfig. The order of use is based on the regional code and the default order of A-Z. Since ja-JP is before zh_{CN,HK,SG,TW}, Japanese fonts are used first.

Tip: Fonts can be set separately in Chromium/Chrome/Firefox browser settings, for example, adjust the font option to Noto xxx CJK SC.

You can use the following methods to solve the issue (taking simplified Chinese as an example):

  • Only install Simplified Chinese fonts in cjk, such as Siyuan Bold Simplified Chinese package adobe-source-han-sans-cn-fonts, adobe-source-han-serif-cn-fonts or noto-fonts-scAUR.
  • Add LANG=zh_CN.UTF-8 to locale.conf to set Simplified Chinese as the default language. Since the Locale is defined for CJK priority, the default priority is ignored.
  • Manually adjust the priority so that the Chinese fonts are set before the Japanese fonts. [2]:

Create a file under /etc/fonts/conf.d/ or /etc/fonts/conf.avail/, such as 64-language-selector-prefer.conf, or modify or create ~/.fonts.conf (only effective for the user):

If noto-fonts-cjk is installed, write:

<?xml version="1.0"?>
<!DOCTYPE fontconfig SYSTEM "fonts.dtd">
<fontconfig>
  <alias>
    <family>sans-serif</family>
    <prefer>
      <family>Noto Sans CJK SC</family>
      <family>Noto Sans CJK TC</family>
      <family>Noto Sans CJK JP</family>
    </prefer>
  </alias>
  <alias>
    <family>monospace</family>
    <prefer>
      <family>Noto Sans Mono CJK SC</family>
      <family>Noto Sans Mono CJK TC</family>
      <family>Noto Sans Mono CJK JP</family>
    </prefer>
  </alias>
</fontconfig>

If you have installed adobe-source-han-sans-otc-fonts:

<?xml version="1.0"?>
<!DOCTYPE fontconfig SYSTEM "fonts.dtd">
<fontconfig>
  <alias>
    <family>sans-serif</family>
    <prefer>
      <family>Source Han Sans SC</family>
      <family>Source Han Sans TC</family>
      <family>Source Han Sans HW</family>
      <family>Source Han Sans K</family>
    </prefer>
  </alias>
  <alias>
    <family>monospace</family>
    <prefer>
      <family>Source Han Sans SC</family>
      <family>Source Han Sans TC</family>
      <family>Source Han Sans HW</family>
      <family>Source Han Sans K</family>
    </prefer>
  </alias>
</fontconfig>

Note that if you create an xml file under /etc/fonts/conf.avail, for example:

# ln -s /etc/fonts/conf.avail/64-language-selector-prefer.conf /etc/fonts/conf.d/64-language-selector-prefer.conf

you have to update the font cache to take effect:

# fc-cache -fv

Execute the following command to check. If NotoSansCJK-Regular.ttc: "Noto Sans CJK SC" "Regular" appears, the settings are successfully applied:

# fc-match -s | grep 'Noto Sans CJK'

Chinese input method

Commonly used Chinese input method frameworks are IBus, fcitx and scim. For specific installation and configuration, please refer to the respective articles.

Note: SCIM current lacks maintenance and is therefore not recommended.

Terminal Chinese support

Bootloader Chinese support

See GRUB2 (简体中文).

Cultural configuration in software

Firefox

Simplified Chinese installation: firefox-i18n-zh-cn

Traditional Chinese installation: firefox-i18n-zh-tw

Libreoffice

Simplified Chinese installation: libreoffice-fresh-zh-cn or libreoffice-still-zh-cn.

Traditional Chinese installation: libreoffice-fresh-zh-tw or libreoffice-still-zh-cn.

PDF reader

Most PDF viewers already support Chinese. However, there are some additional language packs/fonts that need to be installed:

Arcobat's fonts are acroread-fontsAUR, or acroread-fonts-systemwideAUR for system-wide fonts.

For poppler-based readers (e.g., okular, Evince) and image processing tools that can handle PDF files (e.g., Inkscape, krita, mypaint), poppler-data needs to be installed.

Java

For Sun Java users, create a fallback directory under /opt/java/jre/lib/fonts, then link or copy several Chinese fonts to the directory to allow java programs to display Chinese correctly. For example, if jreAUR and opendesktop-fonts have been installed, use the following command:

# ln -s /usr/share/fonts/TTF/odosung.ttc /opt/java/jre/lib/fonts/fallback/
# cd /opt/java/jre/lib/fonts/fallback/
# mkfontdir
# mkfontscale

vim

If the locale is utf8-encoded, using vim to open other Chinese encoded files may be garbled. The following settings need to be made in ~/.vimrc:

~/.vimrc
...
set fileencodings=utf8,cp936,gb18030,big5
...

Chinese video subtitles

MPlayer

To allow MPlayer to display Chinese subtitles correctly, the key is to make sure the encoding of the subtitle file is consistent with the encoding used in mplayer's configurations. If the subtitle file is encoded as gbk, use subcp=cp936; If the subtitle file is encoded as utf-8, use subcp=utf8. If the subtitle file is encoded as utf-8 and set to subcp=cp936, some garbled characters will appear. Another simpler method is to set to subcp=enca:zh:ucs-2, so that enca is responsible for the encoding and display of subtitles.

Modify ~/.mplayer/config:

~/.mplayer/config
font='文泉驿正黑'
subcp=enca:zh:ucs-2

Use the following command to manually load subtitles:

$ mplayer xxx.avi -sub xxxxx.srt

If a graphical front end (such as SMPlayer) is used, it will work as long as you set the default subtitle encoding and font in the settings dialog box.

xine

Xine can also display Chinese subtitles, but you need to make your own Chinese fonts. For details, please refer to [3].

gstreamer

In totem 1.4.0, since gstreamer0.10 is used, it should be able to automatically load srt subtitles with the same name.

LaTeX

You need to install CJK packages and the appropriate fonts. For details, please refer to [4].

Garbled problem

The basic principle to avoid garbled characters is to use utf-8 instead of gbk/gb2312.

File name is garbled

Install convmv and use the convmv command to convert the encoding format. For example:

$ convmv -f GBK -t UTF-8 --notest --nosmart file

-f specifies the original encoding, and -t specifies the output encoding. Use convmv --list to find out all the supported encodings. --notest means not to test but to transcode (if you do not use this parameter, the conversion result will be printed instead of actual transcoding), --smart allows convmv to ignore the request if it is already in UTF-8.

File content is garbled

Use the iconv command to convert the format. For example:

$ iconv -f GBK -t UTF-8 -o new-file origin-file

-f specifies the original encoding, and -t specifies the output encoding. Use iconv -l to query all supported encodings. -o specifies the output file.

zip compressed package is garbled

Under non-utf8 coding environments (generally the Chinese environment under Windows), do not use zip for compressions (7z is recommended). A special parameter is needed:

$ unzip -O gbk file.zip

gbk is the encoding format of the file, specified with -O.

Alternatively, use unzip-natspecAUR.

MP3 file label garbled

For players that use GStreamer as the backend, such as Rhythmbox and totem, after setting the following environment variables, the GB3 encoded ID3 tag in mp3 can be read correctly:

export GST_ID3_TAG_ENCODING=GBK:UTF-8:GB18030
export GST_ID3V2_TAG_ENCODING=GBK:UTF-8:GB18030

For Beep media player, you can select MPEG Audio plugin in pefenrence->plugins->media, and then click Penfenrences below. A dialog box will appear. Select title, tick Disable ID3v2 and Convert non-UTF8 ID3 tags to UTF8. Fill in gbk in ID3 encoding. Now, BMP can correctly display the GB3 encoded ID3 tag.

Quod Libet player supports tag editing and setting ID3v2 encoding. This can be set in ~/.quodlibet/config:

~/.quodlibet/config
...
id3encoding = gbk
...
Note: Quod Libet supports utf8 encoding by default.

The best solution is to convert the id3 tag encoded as gbk to utf8 encoding. Install python-mutagen, then use the following command to convert:

$ mid3iconv -e gbk XXX.mp3

Garbled Chinese file name under Windows partition

Generally, the mounted character set is different from the locales. You can modify /etc/fstab (if you do not understand, please read fstab carefully). If the locale is utf8, modify the line to:

/etc/fstab
...
/dev/sdxx /media/win ntfs defaults,iocharset=utf8 0 0

If the locale is GBK, it should be:

/etc/fstab
...
/dev/sdxx /media/win ntfs defaults,iocharset=cp936 0 0
...

Samba garbled

When using Arch as a Samba server, adding the following line to /etc/samba/smb.conf can solve the garbled problem of Windows clients:

/etc/samba/smb.conf
...
unix charset=gb2312
...

ftp garbled

Many ftp sites are GBK encoded. If you use UTF8 locale, the downloaded file name may be garbled. For lftp, make the following settings under .lftp/rc:

.lftp/rc
...
set ftp:charset "gbk"
set file:charset "UTF-8"
...

For gftp, you can do the following settings in .gftp/gftprc:

.gftp/gftprc
...
remote_charsets=gb2312
...

However, the downloaded file name is still garbled and needs to be patched and compiled. The patch address is: https://www.teatime.com.tw/%7Etommy/linux/gftp_remote_charsets.patch

Translation software

  • stardict: StarDict.
  • sdcv: command line StarDict.
  • ydcv: Youdao dictionary on the command line.
  • youdao-dictAUR: Youdao dictionary (graphic interface), screen word translation.
  • goldendict-gitAUR: There is no dictionary by default, you can download the corresponding dictionary package (supports Babylon's thesaurus format .BGL, StarDict no longer maintained thesaurus format (.ifo/.dict/.idx/.syn), dictd words Library format (.index/.dict(.dz), ABBYY Lingvo's thesaurus format (.dsl/.lsa/.dat), mdict's thesaurus format, etc. The thesaurus files of these dictionaries can be downloaded and imported on the Internet Use of GoldenDict (may have copyright issues).
  • moedictAUR: A multi-platform Chinese dictionary. In addition to Chinese characters, words, idioms, etc., it also contains Hakka, Hokkien, simple foreign language translations, stroke order writing, etc. moedict online address
  • linedictAUR: An online English-Chinese dictionary that gets results by crawling Youdao translation webpage, some support English-Chinese translation, imitating dmenu to display the results at the top of the screen. It is rather easy to use. The API used by ydcv has expired, and the new API is free to use the frequency limit, so linedictAUR is a good alternative.