EBCDIC conversion cfg-E

2024/11/25


   .EBCDIC translation config file.

      In addition to the specific internal conversion table,
      EBCDIC file support allows the use of external converters such as ICU/WindowsApi/iconv.
      When using the internal conversion table, you can also specify a specific EBCDIC conversion
      without a cfg file by specifying a command line parameter.

      The location of the cfg file is specified in the ini file. The default is ::xeebc.map

            EBCDIC_cfg   ="" #("::xeebc.map")# EBCDIC translation config filename

      In the XCV utility, this is specified by the /MF:mapfile parameter.

      ****************************************************************************

      There are two types of conversion patterns.
          (a).Japanese conversion using internal tables.
              In addition to specifying command line parameters, you can specify conversion options.
              Options such as SJIS_OPT, SOSI_A2E, and SOSI_E2A are provided.
          (b).Conversion using ICU or system converter(Windows API, Linux iconv).
              You can use the uconv command for ICU and the iconv command for iconv.
              List of converters can be obtained with "uconv -l" (ICU) or "iconv -l".
              In Windows 11, ICU seems to be installed by default, but uconv.exe is not available.
              If uconv is not available, try "xcv -list -ICU".
              The Windows code page and ICU code page can be listed by "xcv -List".
              Note:Windows code page 20290 is a Japanese Katakana extension,
              but Japanese Kanji does not seem to be supported. If DBCS is required, use ICU.

      Optional parameter.

        #####################################################################################

           CONVERTER      : Converter selection
                            0 : Convert using internal tables only.
                                MAP_E2A, MAP_A2E allows Character-by-character adjustment.
                            1 : Convert using ICU (requires ICU to be installed)
                            2 : Convert using iconv API on Linux, WideCharToMultibyte/MultiByteToWideChar on Windows.
                              e.g.
                                 CONVERTER   1           # use ICU converter

        ## The following ICU_DLL_SUFFIX and ICU_API_SUFFIX are required for "CONVERTER 1",
           but xe checks the system directory, so you probably don't need to specify them.
           If an error occurs, please specify them.
           (On Windows, check c:\Windows\System32, and on Linux, use "uconv --version" to check.)

           ICU_DLL_SUFFIX : Suffix for ".dll" or ".so" library name.
                              e.g.
                                 ICU_DLL_SUFFIX   44
           ICU_API_SUFFIX : Suffix for the ICU API name.
                              e.g.
                                 ICU_DLL_SUFFIX  _44

        ## The next ICU_DATA does not need to be specified if c:\Windows\Globalization\ICU is OK on Windows 11.
           It is not necessary to specify it on Linux if "uconv --version" is OK.
           If the converter Open fails, search libicudata.so.xx then specify its directory.
           When you have created your own ".cnv" file and placed it somewhere other than the default folder,
           you will need to specify its location.
           You can also set it in the environment variable ICU_DATA.

           ICU_DATA
                              e.g. (Linux)   ICU_DATA /system/usr/icu:/data/data/yourcnvs
                                   (Windows) ICU_DATA w:\icu\icu562\icu\bin;w:\icu\icu481\icu\bin

           DBCS_CHARSET   : Specifies the converter that supports DBCS in UCS<-->EBCDIC conversion.
                            If this is not specified, all conversions will be SBCS mode.
                            When "CONVERTER 0" is used, specify it in SJIS_OPT instead of DBCS_CHARSET.

                              e.g.
                                 DBCS_CHARSET cp939     #Japanese
                                 DBCS_CHARSET cp933     #Korean Mixed EBCDIC
                                 DBCS_CHARSET cp935     #Chinese(Simplified) Mixed EBCDIC
                                 DBCS_CHARSET cp937     #Chinese(Traditional)Mixed EBCDIC

           SBCS_CHARSET   : Specify the UCS2<-->EBCDIC SBCS converter.
                            Not required if DBCS_CHARSET is specified.
                            If "DefaultMap" is specified, the internal table equivalent to ISO8859-1<-->EBCDIC of CP037 is used.
                            "DefaultMapEuro" is equivalent to CP1140, and converts ebc-9f to u-20ac (Euro Sign) instead of u00a4 (Currency Sign).
                            If "CONVERTER 0" is specified, only DEFAULTMAP/DeafultMapEuro can be specified, and DBCS_CHARSET cannot be specified.
                            If "CONVERTER 1" is specified, there are converters that define 9f-->20ac (Euro sign)
                            such as cp037 vs cp1140 and cp273 vs cp1141, so use them appropriately.

                              e.g.
                                 SBCS_CHARSET DefaultMap
                                 SBCS_CHARSET CP1140     #ICU; EBCDIC 037+Euro
                                 SBCS_CHARSET 500        #Windows codepage;IBM EBCDIC International

           LOCAL_CHARSET  : Name of the Unicode<-->PC code page converter.
                            If not specified, the code page will be determined from the environment variables, etc.
                            e.g.
                                LOCAL_CHARSET   437          # Windows codepage
                                LOCAL_CHARSET ISO-8859-1     # Linux, iconv converter

           MAP_E2A/MAP_A2E: Specifies the SBCS conversion adjustment for each code point.
                            Only effective when "CONVERTER 0" is used.
                            E2A is effective for "CV b2m" but has no effect on "CV m2b".
                            The inverse of E2A must be specified separately by A2E.
                              e.g.
                                 MAP_E2A  0xa0:0xaf
                                 MAP_A2E  0xaf:0xa0
                                 MAP_E2A  0xa1:~      # EBCDIC 0xa1 -> ASCII tilde
                                 MAP_E2A  0xa1:u0101  # Uxxxx format is target of MAP_E2A only
                                 MAP_A2E     ~:0xa1   # EBCDIC 0xa1 <- ASCII tilde

                            If you use SJIS_OPT KANA_EXT/ENG_EXT, you need to be careful when you set Ascii>0x80 in E2A.
                            In ShiftJis, 0x81<-->0x9f, 0xe0<-->0xfc, and in EUC-JP, 0xa1<-->0xfe are defined
                            as the first byte of Japanese DBCS, so if you M2B the output of B2M, there is a possibility that SBCS will be mistakenly converted to DBCS.
                            For example, the following example is thought to be intended to swap ebc-15 and ebc-25.
                                MAP_E2A   0x15:0x0a
                                MAP_E2A   0x25:0x85
                                MAP_A2E   0x0a:0x15
                                MAP_A2E   0x85:0x25
                            However, since 0x85 is the first byte of SJIS DBCS.
                            So, the B2M output 0x85xx is considered Japanese DBCS when combined with the byte following 0x85.

           SUBCHAR_0a     : Controls the output of 0x0a (line feed code) when converting to a PC code page.
                            Parameter valid only for pattern (b)-Use external converter above.
                            1 : Replace 0x0a with converted SBCS alternative character.
                            0 : Output 0x0a as it is. (Default value)

           SUBCHAR_S2D    : When converting to a PC code page, is EBCDIC SBCS->multibyte conversion allowed?
                            This parameter is only valid for pattern (b)-Use external converter.
                            For example, in cp037, ebc-a7==>u-00a7, but in cp932, "CV b2m" results in u-00a7==> 0x8198 (Double-Byte).
                            If you set "SUBCHAR_S2D 1", it will be u-00a7==> '?'.
                            1 : Replace with SBCS alternative character.
                            0 : Allow multibyte output. (Default value)

           SJIS_OPT       : Specifies SJIS conversion options.
                            This parameter is only valid for pattern (a)-Use internal table.
                            The XCV/CV command has optional parameter with the same effect,
                            and the command parameter specification takes precedence.

                            ENG_EXT: Japanese English Lowercase Extended (CP939=CP300+CP1027)
                            KANA_EXT: Japanese Katakana Extended (CP930=CP300+CP290)

                            IBM: Maps EBCDIC kanji to the SJIS-IBM area (default value)
                            NEC: Maps EBCDIC kanji to the SJIS-NEC area
                            JIS78: SJIS 1978 version
                            JIS83: SJIS 1983 version (default value)
                              e.g.
                                 SJIS_OPT      NEC
                                 SJIS_OPT      JIS78
                                 SJIS_OPT      KANA_EXT

           SOSI_A2E       : SO/SI setting option when converting DBCS to EBCDIC.
                            The XCV/CV/SAVe/REPlace/COPy/... commands have optional parameters with the same effect which take precedence.
                            The default value is INS.

                            INS  : Inserts SO(0xe), SI(0x0f). The output is expanded.
                            REP  : Replaces spaces before and after the DBCS string if there are any, inserts if not.
                            SHIFT: In addition to REP, absorbs the expansion caused by the insertion by deleting the trailing spaces.
                              e.g.
                                 SOSI_A2E      REP

           SOSI_E2A       : How SO/SI is handled when converting DBCS from EBCDIC.
                            Commands such as XCV/CV/SAVe/REPlace/COPy/... have optional parameters with the same effect which takes precedence.

                            DEL : Delete SO/SI. This shortens the output line length.
                            REP : Replace SO/SI with ASCII space (default).


      ## Sample file ##

        xeebc.map                                                     ~
        ###########################################################################
        # CONVERTER             1     # 0:Internal Table, 1:ICU, 2:iconv/WindowsAPI||+v124R
        # ICU_DLL_SUFFIX      44      # ICU dllname suffix
        # ICU_API_SUFFIX      _44     # ICU apiname suffix
        # DBCS_CHARSET        cp939   #(Linux)EBCDIC Japanese English lower-case letter extension.~||+v124R
        # SBCS_CHARSET        cp037   #(ICU)EBCDIC-US               ||+v124R
        # SBCS_CHARSET        37      #(Windows)ECDIC-US              ||+v124I
        # SBCS_CHARSET DefaultMapEuro # for "Converter 0"
        # LOCAL_CHARSET    ISO-8859-1 #(Linux)Latin-1                 ||+v124R
        # LOCAL_CHARSET    28591      #(Windows Codepage) for ISO-8859-1||+v124R
        #
        #     SJIS_OPT     ENG_EXT    # ENG_EXT/KANA_EXT
        #     SJIS_OPT     NEC        # IBM/NEX/JIS78/JIS83
        #
        #     MAP_E2A    0xa2: 0x5c   # Yen sign and backslash
        #     MAP_A2E    0x5c: 0xa2   #
        #     MAP_E2A    0xa1: ~      # tilde and upper bar
        #     MAP_A2E       ~: 0xa1   #
        #     MAP_E2A    0xa0: ?      # tilde and upper bar
        #
        #   SOSI_A2E   INS            # INS/REP/SHIFT
        #   SOSI_E2A   DEL            # DEL/REP
        #   SUBCHAR_0a     1          #1/0 replace by SBCS substitution char.
        #   SUBCHAR_S2D    1          #1/0 replace converter output by sub-char when SBCS is translated to pc-DBCS.
        ##################################################################################||~v124R
        #
        ## Use Internal mapping table
        #
        #        CONVERTER          0
        #        SJIS_OPT           KANA_EXT
        #        MAP_E2A            0x15:0x0a
        #
        ## Use ICU , DBCS codepage
        #
        #        CONVERTER          1
        #        DBCS_CHARSET       cp930
        #
        ## Use ICU , SBCS codepage
        #
        #        CONVERTER          1
        #        SBCS_CHARSET       cp037
        #
        ## Use iconv (Linux)
        #
        #        CONVERTER          2
        #        SBCS_CHARSET       cp1047
        #
        ##################################################################################