NLS support E

2025/08/25

.NLS support.

Internal error message is by English only except on Japanese environment.
This issue is about file encoding and screen display language.

See "Japanese DBCS, Code conversion" about Japanese.

For Linux console version,
set "OPT LINECH OFF" not to use ACS(LineDrawingCharacter) if your language
has SBCS codepoint 0x80 to 0xff.

(Note)On Windows, it is better not to use UTF8 encoded filename that may not be properly
translate to UTF-16 Windows internal codepage
because Windows assumes input is locale code.

Windows:
To use different codepage from system default,
you have to set DOS prompt codepage for console version.
For ex, enter "chcp 28591" or "chcp 1252" for Germany.
Both 28591 and 1252 is ISO-8859-1 and 1252 has codepoint of 0x80-0x9f.
For GUI version, use /C cmdline parameter like as "xe /c1252".
Then select CharSet from "Other" combobox, ANSI for ISO-8859-1.
Beforehand, you have to add the language through Windows-Constrol-panel.
And change language selection through Language-bar.
Linux:
You have to set codepage of terminal emulator such as gnome-terminal for console version.
You have set LANG environment matched to terminal emulator encoding
about UTF8 or not.
Consideration for keyboard layout is also required.
Use SCIM setting up or "setxkbmap" cmd like as "setxkbmap de" for Germany.
SCIM operation on FC5 is System->Management->Preference->SCIM.

For gxe/wxe, selected font may supports ligature.
Ligature means to combine two glyph to one glyph for some combination such as "fi", "ff".
If this chkbox is Off, keep mono-spacing.
If On, cursor position may be unmatched with display position,
gxe/wxe accepts ligature to utf8 and locale code file,
xe displays the character at the cursor position byte offset.
"OPT LIGATURE" cmd or LIG cmd(A+";" key) is available.
Ligature is applied to UTF8 file only for console version.
Ligature is not applied for the file opened as binary file.
I heard that in some language glyphs split are un-readable without ligature.
Try with combination with Unicode combining character option
which is set by "OPT UNICOMB" cmd or CMB cmd(A+":" key).

A+u key("UTF SWKB" as cmd) switches treatment of kbd input between UTF8 and locale code.
See "UTF8 support" for detail.

-----------------------------------------------------------------------

Followings are how to display Simplified Chinese (GB18030) in Japanese environment.
GB18030 4-digit DBCS and EUC 3-byte supplementary Kanji characters are displayed with tab padding characters.
The display of padding characters can be toggled on/off with "TAB" command.
If IME is not available, you can enter characters in Hex input mode (toggle with C+F11).

For Windows
For Console version (xe), set the command prompt properties, e.g. chcp 54936.
For the GUI version (wxe), specify code page parameter, wxe /c54936.
In Setup dialog, set CharSet to ANSI or select GB2312 from Others.
GB18030 is extension of GB2312, but 4-byte DBCS is also supported in GB2312.
You may also need to change the FontStyle.
For other than GB18030, see "Windows CodePage & Font" below and the command line parameter "-C."

For Linux,
For Console version (xe), specify -Czh_CN.GB18030 command line parameter.
If you get the error "setlocale failed," run "locale-gen" once.
sudo locale-gen "zh_CN.GB18030"
In terminal emulator settings, select a font that can display kanji.

For the GUI version (gxe), specify the -Czh_CN.GB18030.
If you get also the error "setlocale failed," run "locale-gen".
And also, select a font that can display kanji from the Setup menu by "Font Change" button.

Windows CodePage & Font

Windows:wingdi.h defines as following

#define ANSI_CHARSET 0
#define DEFAULT_CHARSET 1
#define SYMBOL_CHARSET 2
#define SHIFTJIS_CHARSET 128
#define HANGEUL_CHARSET 129
#define HANGUL_CHARSET 129
#define GB2312_CHARSET 134
#define CHINESEBIG5_CHARSET 136
#define OEM_CHARSET 255

#define JOHAB_CHARSET 130
#define HEBREW_CHARSET 177
#define ARABIC_CHARSET 178
#define GREEK_CHARSET 161
#define TURKISH_CHARSET 162
#define VIETNAMESE_CHARSET 163
#define THAI_CHARSET 222
#define EASTEUROPE_CHARSET 238
#define RUSSIAN_CHARSET 204

#define MAC_CHARSET 77
#define BALTIC_CHARSET 186

command-line parameter.
-C : change locale charset.

Windows : Codepage. ex) -c949 -cGerman_Germany.1252
Use xcv cmd("xcv -List) for available codepage.
For xe console version, font is determined by "command prompt"'s
charset property. You may see strange glyph.
For wxe, you have to set also charset on setup dialog.

Linux :Charset ex) -cGBK, -ciso88591 -czh_CN.GB18030
Available charset is displayed by xcv cmd or "iconv --list".

Default Charset is get from LANG environment if the Charset is not UTF8.
ex) iso88591 when LANG is "de_DE.iso88591".
If the Charset is UTF8 charset is selected as following.
(selects available charset from the left-hand)
Locale Charset
------ -------
zh_CN GB18030,GBK,GB2312
ko_KR UHC,EUC_KR
ja_JP eucjp
On fullscreen console, "ISO88591" if iconv supported or "C"
is selected.

Axe uses ICU converter as following.
zh_CN :"GB18030","GBK","GB2312"
ko_KR :"korean","EUC-KR"
ja_JP :"EUC-JP"
zh_TW :"Big5-HKSCS","Big5"
else :"ISO-8859-1"

For other locale, get by nl_langinfo after setlocale by localecode only like as "setlocale(LC_ALL,"de_DE")".
If setlocale failed(chk it by "locale -a" cmd), iso88591 is selected.

For gxe, input from GTK is UTF8, gxe translate it to this charset
and translate back to UTF8 when display to screen.
For xe console version, input from terminal emulator is translated
to this charset. If -c is not specified default charset is selected
using LANG environment.(If LANG is UTF8,determins proper charset).
And translate to ucs to display using ncursesw.
Ex) 0xa4a2 is pronounced "a" by Japanese, and the same glyph is 0xaaa2 on EUC-KR.
When enter "a" key then Enter key on IME window,
Input from IME glyph
---------------------- ---------------- ------
EUC-JP.UTF8 + -cEUC-KR aaa2( by KR) "yy"
EUC-JP + -cEUC-KR a4a2( by JP) "xx"
yy(Japanese Hiragana) and xx(Hangul) is not displayed by ASCII screen.
xe console version may display space by the reason of terminal emulator font selection)

-Nm : Accept UTF8 byte sequence itself.
When /Nm is specified, for UTF8 code input to CPLC(non UTF8) file
set UTF8 code itself if Alt+u ON(indecated by =u=> on command input line),
set translated locale code if Alt+u OFF(===>) or "?" if translation error occured.