#if defined(W32) || defined(LNX)
.NLS support.
Internal error message is by English only except on Japanese environment.
This issue is about file encoding and screen display language.
See "Japanese DBCS, Code conversion" about Japanese.
For Linux console version,
set "OPT LINECH OFF" not to use ACS(LineDrawingCharacter) if your language
has SBCS codepoint 0x80 to 0xff.
(Note)On Windows, it is better not to use UTF8 encoded filename that may not be properly
translate to UTF-16 Windows internal codepage
because Windows assumes input is locale code.
Windows:
To use different codepage from system default,
you have to set DOS prompt codepage for console version.
For ex, enter "chcp 28591" or "chcp 1252" for Germany.
Both 28591 and 1252 is ISO-8859-1 and 1252 has codepoint of 0x80-0x9f.
For GUI version, use /C cmdline parameter like as "xe /c1252".
Then select CharSet from "Other" combobox, ANSI for ISO-8859-1.
Beforehand, you have to add the language through Windows-Constrol-panel.
And change language selection through Language-bar.
Linux:
You have to set codepage of terminal emulator such as gnome-terminal for console version.
You have set LANG environment matched to terminal emulator encoding
about UTF8 or not.
Consideration for keyboard layout is also required.
Use SCIM setting up or "setxkbmap" cmd like as "setxkbmap de" for Germany.
SCIM operation on FC5 is System->Management->Preference->SCIM.
For gxe/wxe, selected font may supports ligature.
Ligature means to combine two glyph to one glyph for some combination such as "fi", "ff".
If this chkbox is Off, keep mono-spacing.
If On, cursor position may be unmatched with display position,
gxe/wxe accepts ligature to utf8 and locale code file,
xe displays the character at the cursor position byte offset.
"OPT LIGATURE" cmd or LIG cmd(A+";" key) is available.
Ligature is applied to UTF8 file only for console version.
Ligature is not applied for the file opened as binary file.
I heard that in some language glyphs split are un-readable without ligature.
Try with combination with Unicode combining character option
which is set by "OPT UNICOMB" cmd or CMB cmd(A+":" key).
A+u key("UTF SWKB" as cmd) switches treatment of kbd input between UTF8 and locale code.
See "UTF8 support" for detail.
-----------------------------------------------------------------------
Followings are about Chinese and Korean DBCS support.
I tested CN and KR. I can only recognize those glyph but can not understand it,
especially HUNGL,
Please send me experience reports if it seems strange for you.
My test environments is WindowsXP Japanese version and Linux FC5.
The locale code selected is displayed on the top menu.
Code is displayed by width of byte count.
For Linux,
4 byte DBCS of GB18030 or SS3-Kanji of EUC is followed by padding char like as tab padding.
TAB cmd controls display status of DBCS padding chars also.
Printing of padding is controlled by WWScrPrt chkbox on Setup dialog of gxe.
To input 3 or 4 byte DBCS using HexInputMode(toggled by C+F11),
send twice by 2+1(1+2) or 2+2.(use "x" key to send premature byte).
C/J keyboard setup on WindowsXP Japanese version.
Control-panel-->locale and language-->Detail button on language tab
Add a language and proper IME.
Search Web-document about detail operation.
For wxe, set codepage on Charset of setup dialog.
Select "Other" radio button and select codepage from combobox,
or enter codepage on textbox referencing wingdi.h.
And select Fontstyle.
For xe console version, set property of "command prompt".
Click mouse right button on "command prompt" icon(Not left top icon of command prompt screen).
Set current codepage on "Option" tab
Use chcp cmd if no codepage prepared for the property.
When codepage selected, font list may be changed.
After codepage was set, IME on Language bar can be selected
when focus is set on the command prompt screen.
Just about IME operation.
(To change default: Control panel-->locale and language-->Detail button on language tab-->key setting )
LeftAlt+Shift: switch IME on language bar.
mouse is usable for GUI application.
Korean MS-IME2002 R-Alt: English/Numeric<-->Hangul Alt+^: Single <-->Double width English/Numeric letter.
CN(Simplified) PinYin Shift: Translation On<-->Off Shift+Space: Single <-->Double width English/Numeric letter.
Ctrl+Space: English/Numeric<-->IME mode
CN(Big5) Phonetic Same as PinYin.
Windows CodePage & Font
Charset on Setup dialog CodePage CmdPromptFont IME
Japanese 128 cp 932 MSGothic MSIME 2002
hangul 129 cp 949 GulimChe MSIME 2002
GB2312 134 cp 936 NSimSun PinYin 2002
Big5 136 cp 950 MingLiU NewPhonetic
When ssh from Windows to Linux/390,screen may corrupt by UTF8 char.
Use this option like as "xe -cUS-ASCII".
>0x'80' char is displayed by ".".
Windows:wingdi.h defines as following
#define ANSI_CHARSET 0
#define DEFAULT_CHARSET 1
#define SYMBOL_CHARSET 2
#define SHIFTJIS_CHARSET 128
#define HANGEUL_CHARSET 129
#define HANGUL_CHARSET 129
#define GB2312_CHARSET 134
#define CHINESEBIG5_CHARSET 136
#define OEM_CHARSET 255
#define JOHAB_CHARSET 130
#define HEBREW_CHARSET 177
#define ARABIC_CHARSET 178
#define GREEK_CHARSET 161
#define TURKISH_CHARSET 162
#define VIETNAMESE_CHARSET 163
#define THAI_CHARSET 222
#define EASTEUROPE_CHARSET 238
#define RUSSIAN_CHARSET 204
#define MAC_CHARSET 77
#define BALTIC_CHARSET 186
Linux(FedoraCore 5)
Change selected language(Desktop-->Administration-->Language), then logout/login.
Note. If you changed terminal emulator(e.g. gnome-terminal) encoding,
Change also locale environment like as "export LANG=xxx".
Additional command-line option.
-C : change locale charset.
Windows : Codepage. ex) -c949 -cGerman_Germany.1252
Use xcv cmd("xcv -List) for available codepage.
For xe console version, font is determined by "command prompt"'s
charset property. You may see strange glyph.
For wxe, you have to set also charset on setup dialog.
When ssh from Windows to Linux/390,screen may corrupt by UTF8 char.
Use this option like as "xe -c437". >0x'80' char will be displayed by ".".
Linux :Charset ex) -cGBK, -ciso88591 -czh_CN.GB18030
Available charset is displayed by xcv cmd or "iconv --list".
Default Charset is get from LANG environment if the Charset is not UTF8.
ex) iso88591 when LANG is "de_DE.iso88591".
If the Charset is UTF8 charset is selected as following.
(selects available charset from the left-hand)
Locale Charset
------ -------
zh_CN GB18030,GBK,GB2312
ko_KR UHC,EUC_KR
ja_JP eucjp
On fullscreen console, "ISO88591" if iconv supported or "C"
is selected.
Axe uses ICU converter as following.
zh_CN :"GB18030","GBK","GB2312"
ko_KR :"korean","EUC-KR"
ja_JP :"EUC-JP"
zh_TW :"Big5-HKSCS","Big5"
else :"ISO-8859-1"
For other locale, get by nl_langinfo after setlocale by localecode only like as "setlocale(LC_ALL,"de_DE")".
If setlocale failed(chk it by "locale -a" cmd), iso88591 is selected.
For gxe, input from GTK is UTF8, gxe translate it to this charset
and translate back to UTF8 when display to screen.
For xe console version, input from terminal emulator is translated
to this charset. If -c is not specified default charset is selected
using LANG environment.(If LANG is UTF8,determins proper charset).
And translate to ucs to display using ncursesw.
Ex) 0xa4a2 is pronounced "a" by Japanese, and the same glyph is 0xaaa2 on EUC-KR.
When enter "a" key then Enter key on IME window,
Input from IME glyph
---------------------- ---------------- ------
EUC-JP.UTF8 + -cEUC-KR aaa2( by KR) "yy"
EUC-JP + -cEUC-KR a4a2( by JP) "xx"
yy(Japanese Hiragana) and xx(Hangul) is not displayed by ASCII screen.
xe console version may display space by the reason of terminal emulator font selection)
-Nm : Accept UTF8 byte sequence itself.
When /Nm is specified, for UTF8 code input to CPLC(non UTF8) file
set UTF8 code itself if Alt+u ON(indecated by =u=> on command input line),
set translated locale code if Alt+u OFF(===>) or "?" if translation error occured.
#endif // W32 || LNX