EBCDIC file support-E

2025/01/20


   .EBCDIC file support.

      -CPEB option is used for EDIt/BROwse/SAVe/REPlace/CREate/APPend/CUT/END/COPy/MOVe/PASte cmd.
         e.g. "e file1 cpeb:IBM1047"
       The code page of the cpeb option is stored in the profile, so you only need to specify it at first time.
       To reset the profile record, specify null, like "CPEB=".

       When opening a directory list, the CPEB option is the default for files in the folder.
       If you open with other options (CPLC/CPU8/CPAS), the profile will be overwritten.

       The embedded XCV utility is a tool similar to Linux's iconv that converts EBCDIC<-->Unicode<-->locale code.
       See the next "EBCDIC translation" about cfg file(default value is ::xeebc.map).
       You can use multiple converters by specifying "Converter 1" (ICU) or "Converter 2" (CodePage on Windows, iconv on Linux) in the cfg file.

      -EDIt/BROwse displays/edits directly without converting to locale code.
       Displays codes in Unicode on screen, and in HEX display displays the EBCDIC code itself.
       The line feed code is 0x15 unless /Mp(0x0d0a) /Mu(0x0a) is specified.
       It is intended for editing mainframe EBCDIC files downloaded without conversion,
       or for editing/displaying EBCDIC files mounted by NFS with the no conversion option.
       It can also be used if you have access to Linux on IBM Z via sshfs.
       (About sshfs, see "Remote file access".)
       The header line displays "=E", and in binary mode, "=e".
       DBCS is not taken into account when opening in binary mode (EB command).
       If there are too many control characters and it is determined to be binary file,
       open with the ET command or with /Mt option.

      -If you experience broken-gryph when using -wxe, try changing the charset in the Setup dialog to something other than Default.

      -Line feed character.
       PC files are 0x0a or 0x0d0a, but the EBCDIC line feed code is 0x15.
       The following commands support line feed code and record mode LRECL specification.
       However, SAVe can only be available in the case of specifying a file name.
           EDIt/BROwse, END, CREate/REPlace, SAVe, COPy/MOVe.
       When writing to the file by END/SAVe after "CV b2m", change the line feed character with /Mu, etc.
       If you simply END/SAVe, the EBDCIC line feed character x15 will be written as is.

       The Fxx in the COPy/MOVe command specifies the LRECL of the source file to copy.
       The record mode ON/OFF and LRECL are saved in the profile,
       but the line feed character must be specified each time, except for the default value.
           /M{t|p|u|m|e|r} /Fnn[-mm]
             e:EBCDIC-NL(0x15), r;RecordMode, nn:LRECL
       e.g.
           e ebcf1 CPEB                        : EBCDIC file TextMode(EOL-ID=0x15)
           e ebcf1 CPEB  /mp                   : EBCDIC file TextMode(EOL-ID:0x0d0a)
           e ebcf1 CPEB  /mr /F72              : EBCDIC file RecordMode(LRECL=72)
                                                 Default LRECL is 80.
           end /mu                             : Change EOL-ID to 0x0a.
           end /mr /f80                        : Save as LRECL=80 RecordMode file(no EOL-ID).
           s ebcf1 CPEB /mr /F80               : translate (From PC file) to EBCDIC file.
                                                 Output is LRECL=80 RecordMode file.
           rep u8f1 nx CPU8 /mp                : translate (from RecordMode EBCDIC file) to UTF8 file.
                                                 EOL-ID:0x0d0a is appended to each line.

       The EBCDIC line feed code is 0x15, but the converter may map it as follows:
       EBC-NL(0x15)<-->ASCII-0x85, EBC-LF(0x25)<-->ASCII-LF(0x0a).
       When opened the file by text mode, lines are split at 0x15 and 0x15 is not displayed.
       (If you also want 0x0d15 to be a line feed, specify the /Mz option.)
       When text mode, line feed chars are not displayed, so they are not subject to conversion by the CV command.
       However, if you open the file in binary mode and run "CV b2m", those will be converted as 0x15->0x85 (0x0d will remain unchanged).
       If 0x85 is not printable as a locale code, it will be converted to '?'.
       If you are not using an external converter ("CONVERTER 0"), you can use two correction options to convert 0x15-->0x0a
          cv b2m crlf / cv m2b crlf
       Or, in the cfg file, specify
          MAP_E2A  0x15: 0x0a   # EBCDIC 0x15(NL) -> ASCII 0a(LF)
       In the case of M2B conversion, if opened in text mode, 0x0d0a/0x0a will not be converted,
       and 0x15 will be output with the "e" of the output option /Mpe,
       but in binary mode, the CRLF option and MAP_E2A are effective, and 0x0a-->0x15 conversion is enabled,
       but 0x0d0a will become 0x0d15.
       With the -m2b option of the xcv command, -Mseteol can be specified to convert 0d0a/0a->0x15.
       Without Mseteol, 0d0a/0a will be ignored and 0x15 will not be output.
       If you want both 0x0d15 and 0x15 to be line feede characters in B2M, specify the crlfz option.

         :====>See below: (e.g.01) LineFeed character.

      -If you do not use a configuration file, the internal conversion table supports only one of the following, and B2B conversion is not possible.
       Specify this with a command line parameter.
       If you specify "CONVERTER 0" in the cfg file, you can also set the same option by SJIS_OPT.

        xe /EBC[=]{KANA_EXT | ENG_EXT | DefaultMap | DefaultMapEuro | cfg=filepath}
                KANA_EXT(=CP930=CP290+CP300)  : Japanese extension, Katakana
                ENG_EXT(=CP939=CP1027+CP300)  : Japanese extension, alphanumeric lowercase
                DefaultMap(=CP037)            : Latin-1, No Support of Kanji
                DefaultMapEuro(=CP037+Euro)   : ebc-9f:u-20ac
                cfg=filepath : The default value is specified in the ini file.

       The default values are as follows:
         (Windows)
           In a Japanese environment, the code page is displayed on the screen as CP930 and JIS83.
           In other environments, the code page is CP037 and No DBCS (SO/SI) consideration.
         (Linux)
           In a Japanese environment, the code page is displayed on the screen as EUC-JP (=CP930) and JIS83.
           In other environments, the code page is CP037 and No DBCS (SO/SI) consideration.

         :====>See below: (e.g.02) Example of internal conversion table.

      -COPy/MOVe/PASte commands convert from the specified code page to the destination code page,
       if you don't want conversion, specify the "B" (Binary mode) suffix on the line command
       (see Line Commands below)

         :====>See below: (e.g.03) Copy/Move/Paste example.

      -SAVe/REPlace/CREate/APPend/CUT/END convert to the specified code page, then written to the file.
       END command does not output a file if there are no changes, so specifying CPEB is meaningless.
       COPy/MOVe/PASte commands with a specified file name use the profile record of that file.
       If a record exists, there is no need to specify CPxx.
       However, if you change the CONVERTER parameter in cfg from ICU to something other than ICU
       or vice versa over session close, the naming will be different, which may result in the code page not being found.
       If you get an "invalid code page" error, open the file with "CPEB=" (null specification) and reset the code page once.

       Even if there is a conversion error in the output command, it will be output to the end of the file, but the profile will not be updated.
       Conversion error characters will be replaced with alternative characters.
       Alternative characters depend on the code page.
       If no DBCS alternative characters are defined, the following will be used in this order:
       u-fffd, u-ff1f (DBCS "?"), u-3000 (DBCS space).

         :====>See below: (e.g.04) Example of Save/Replace/Cretae/Append/Cut/End.

      -The line commands "C"/"M" will also convert if the source and destination code pages are different.
       If you do not want to convert, specify the "B" (Binary mode) suffix.
       Binary mode does not support UTF8 files for either the source or destination.
       When converting locale file to EBCDIC, spaces caused by TAB (0x09) expansion will be deleted.

        Line Commands: Destination Line Format
          {A|B}[B][C][n][,b][.s]
            A : After, B: Before, B : Binary Mode, C : Copy with CID
            n : repeat, b : bandle, s : skip

       The line command "=" will also convert the source and destination code pages if they are different.
       (When comparing with UTF8 and locale code files, EBCDIC-->UTF8 is compared after locale conversion)
       If you want to compare in binary mode, specify the "B" (binary mode) suffix.

        =[B][n][,b][.s]
        ==[B]

         :====>See below: (e.g.05) Example of line command.

      -The B>CV command can use B2M/M2B for CPLC (Locale code) files.
       When "CONVERTER 0" is specified in the cfg file, Japanese conversion options such as JIS83 can be specified.
       When using ICU/iconv, various EBCDIC code pages can be used
       by specifying the CPEB:ebcidc-codepage parameter in B2M/M2B.
       When performing B2M for an EBCDIC file opened, omitting CPEB will use the code page of that file.
       If you want to change the code page of an EBCDIC file, use the B2B option.
       However, because the code page attribute of the file is not been changed, the screen display shows the codes in the original code page.
       To switch to a new code page, specify the SETCP option.

         :====>See below: (e.g.06) CV command example.

      -EBC command.

       You can check the current cfg file settings by "EBC".
       You can also change the code page with "EBC SETCP".
       With SETCP the original code is not changed. It is displayed according to the specified code page.
       Therefore, if there are no other changes, the file will not be rewritten by a save operation.
       The UNDo function will revert the code page set with the SETCP option of the EBC/CV command.
       The code page set with the SETCP option is recorded in the profile when you save a file.
       "EBC SETCP=" will return to the default code page.

         :====>See below: (e.g.07) Example of EBC command.

      -xcv Utility
       xcv is a command line tool included in the xe package that runs in the command prompt/Terminal emulator.
       In B2M/M2B, "M" is the current local code page, and "B" is specified with /MF:mapfile or /CPEB:cp.
       If you want to change the EBCDIC code page, use B2B.
       The /Mseteol option enables you to add a line feed code when converting from PC to EBCDIC.
           /mASCEOL: set line feed code of the EBCDIC file to Windows:0x0d0a, Linux 0x0a. Default:0x15.
           /mSETEOL: Adds an line feed code to each output line. For x2B and record mode B2x.
       The fixed length LRECL is specified with the /F option.
           /F[nn][N]:EBCDIC file is fixed length. nn:LRECL(default=80).
                     LRECL of B2x (input) or x2B (output) for EBCDIC file.
                     N: The last 8 digits of the line are the line number field. Adjust the line length by inserting SO/SI.
                        "N" is only valid for m2b.

         :====>See below: (e.g.08) Example of xcv command.

      -About DBCS (Double-Byte Character Set)

       In EBCDIC, double-byte characters are enclosed by SO(0x0e) and SI(0x0f). For example, 0e-40-40-0f is a DBCS space.
       SO/SI encloses the whole string, not necessarily one character at a time.
       In non-EBCDIC code pages, except UTF8, DBCS is a combination of a specific leading byte (>=0x80) and the next character.
       It can be 3 bytes (EUC-SS3), or 4 bytes (GB18030). For example, the DBCS space is 81-40 in MS932.
       Therefore, line length changes in EBCDIC<-->locale code page conversion.
       DBCS enclosing char(SO:0x0e and SI:0x0f) are displayed as "?".
       SO/SI is inserted when typing DBCS.
       When "Cut and Paste" from an EBC file, only when it is enclosed in SO/SI (even bytes only) will it be evaluated as DBCS.
       For example, when pasting into a UTF8/locale code file, the code will be converted, but if the SO/SI are not brought, it will be converted as SBCS.
       After splitting a line at a DBCS position with the Split line command,
       joining the lines with the Join line command can be evaluated as DBCS, but the TFLow command inserts a space
       between the lines, so it will not go back to the original.
       Invalid DBCS-EBCDIC is displayed on the screen as ":;". Check the hex code in vertical HEX display.

       Specify the SO/SI handling in the command options.
       The default value is specified in the cfg file. The Copy line command is lacking parameters of SO/SI option, so the cfg file is applied.
         e.g. COPy ebcfile -sd  : When copying an EBCDIC file into the PC file screen, SO/SI will be deleted and the line length will change.
              COPy ebcfile -sr  : When copying an EBCDIC file into the PC file screen, SO/SI will be replaced with spaces, and the line length will not change. Default value.
              COPy pcfile  -si  : When copying a PC file into an EBCDIC file, SO/SI will be inserted, and the line length will change. Default value.
              COPy pcfile  -sr  : When copying a PC file into an EBCDIC file, SO/SI will be inserted,
                                  but if there are spaces before SO or after SI, that spaces will be deleted.
       In cfg, set the default value.
           SOSI_A2E     REP       # REP(="-sr") or INS ("=-si")
           SOSI_E2A     DEL       # DEL(="-sd") or REP ("=-sr")

      -0x09 is not treated as a tab. (0x05 is also not treated as a tab)
       When in character insert mode, the Tab key does not insert 0x09 but jumps to next tab position like in character replace mode.

      -In HEX input mode (Ctrl+F11), DBCS is not taken into account and each byte is sent to the screen.

      -Find/Change command.
       Search and replace in EBCDIC code regardless of kbd's UTF8 mode.
       DBCS is not taken into account when opened in binary mode.
       Option: -g (Grep) and search-word: P'.' are not supported.

      -TFLow cmd
       When lines are concatenated by TFLow command, a space is inserted at the concatenation point,
       so that there is an odd number of chars, then usually the SO on the previous line and the SI on the next line will not reconstruct DBCS.

      -The "#" line command (which executes what is on that line as a native command) is not supported.

      -About fixed length records.

       For the XCV command, if the /F[xx][N] command line option is specified (the CV command does not have the /F option),
       when converting from PC to EBCDIC, long sentences are split and short sentences are padded with spaces up to the LRECL.
       Also, newline characters are not written.
       The N option in /F[xx][N] specifies that the last 8 chars of the line are the line number.
       When a line is extended by inserting SO/SI during DBCS conversion, if there are spaces before the line number field
       it will delete that spaces to keep the LRECL. If it is not possible
       the line will be split, but if N is specified the line number field will be truncated.
       When converting from EBCDIC to PC, fixed-length files are read in binary mode (ignoring line feed codes),
       converted, and output with line feed codes added.