|
How to do things AI Noob vs. Pro
List biggest files Free Open Source: Swiss File Knifea command line
Depeche View
command line
free external tools,
cpp sources
articles |
characters and codepages with SFK for Windows:
SFK uses 8-bit character codes with a possible
range of 255 different characters. see: sfk ascii
character codes 32-126, or hexadecimal 0x20-0x7E,
are 7-bit ASCII characters. within SFK they are
called "Low Codes", or LoCodes. as long as you
use only a-z A-Z 0-9 !"#$%&_ etc. you use LoCodes,
which will work the same on every computer in the
world, and you can ignore code pages.
but as soon as you want to use accent characters,
umlauts, cyrillic, greek etc. you need HiCodes
in the range 0x80-0xFF. these are dependent on the
codepages of your Windows system, and you can only
use chars of your own language, plus English.
your Windows CMD.EXE command line uses two codepages:
1. ANSI codepage for data processing.
every text within SFK is encoded in this codepage.
Most text editor programs like Notepad will
use this codepage by default.
2. Dos/OEM codepage for input and display.
what you type on your keyboard is encoded in 850.
the CMD.EXE terminal can only display HiCodes in
this codepage correctly.
HiCode conversions step by step:
- when you run sfk, and pass parameters, these are
converted from OEM to Ansi and then given to sfk.
so sfk gets only Ansi encoded parameters.
- within SFK all data processing is done with Ansi,
e.g. filter ... +xed ... will pass Ansi text.
- when printing text to terminal, SFK converts it
from Ansi to OEM for output. otherwise HiCodes
would all look wrong, as the terminal needs OEM.
- when writing text output to file, like
filter ... >out.txt
filter ... +tofile out.txt
it is written as Ansi, without any conversion.
you can then open out.txt with the Notepad
or Depeche View, which expect Ansi text,
and HiChars will display correctly.
Beware of HiCodes within batch files.
- if you run SFK interactively like:
sfk filter in.txt -+myword
and myword contains HiCodes, you type them
all as OEM chars, and it works.
- if you create a batch file with Windows Notepad,
and therein type
sfk filter in.txt -+myword
and myword contains HiCodes, you will find that
filter no longer finds the word.
Because Notepad created an Ansi encoded text file,
so the "myword" chars are Ansi encoded.
what happens?
- CMD.EXE still thinks "myword" is OEM,
and incorrectly "converts" it to Ansi,
which actually breaks all HiCode chars.
- sfk.exe then gets myword with completely
wrong encoding, and the search fails.
how to fix this:
- write your .bat files with OEM encoding.
this can be done with Notepad++:
- create a new file mytest.bat
- select: Encoding / Character Set / your area,
then select your OEM codepage.
- now type sfk commands into the batch file,
and save it.
- side effect: if you create sfk scripts
embedded in such a batch file, like:
sfk batch mytest2.bat
searches therein will fail again if this
is OEM encoded. because by default "sfk script"
wants to load Ansi text. to fix this use
option -dos like: sfk script -dos ...
What is not possible?
SFK cannot process any text outside your Ansi codepage.
for example, if a computer uses Western Europe
codepage 1252, it is possible to search German umlauts
and some French accent characters. but it is impossible
to search and filter cyrillic text (encoded in 1251),
and it will even be impossible to type cyrillic chars
in the first place, as the keyboard has no such keys.
see also:
sfk help nocase about case insensitive search
sfk help unicode unicode to Ansi conversion
characters and codepages with SFK for
Windows:
SFK uses 8-bit character codes with a
possible
range of 255 different characters. see:
sfk ascii
character codes 32-126, or hexadecimal
0x20-0x7E, are 7-bit ASCII characters.
within SFK they are
called "Low Codes", or LoCodes. as long
as you
use only a-z A-Z 0-9 !"#$%&_ etc. you
use LoCodes,
which will work the same on every computer
in the
world, and you can ignore code pages.
but as soon as you want to use accent
characters,
umlauts, cyrillic, greek etc. you need
HiCodes
in the range 0x80-0xFF. these are dependent
on the codepages of your Windows system,
and you can only
use chars of your own language, plus
English.
your Windows CMD.EXE command line uses two
codepages:
1. ANSI codepage for data processing.
every text within SFK is encoded in this
codepage. Most text editor programs like
Notepad will use this codepage by
default.
2. Dos/OEM codepage for input and display.
what you type on your keyboard is
encoded in 850. the CMD.EXE terminal can
only display HiCodes in this codepage
correctly.
HiCode conversions step by step:
- when you run sfk, and pass parameters,
these are converted from OEM to Ansi
and then given to sfk. so sfk gets only
Ansi encoded parameters.
- within SFK all data processing is done
with Ansi, e.g. filter ... +xed ...
will pass Ansi text.
- when printing text to terminal, SFK
converts it
from Ansi to OEM for output. otherwise
HiCodes
would all look wrong, as the terminal
needs OEM.
- when writing text output to file, like
filter ... >out.txt
filter ... +tofile out.txt
it is written as Ansi, without any
conversion.
you can then open out.txt with the
Notepad or Depeche View, which expect
Ansi text, and HiChars will display
correctly.
Beware of HiCodes within batch files.
- if you run SFK interactively like:
sfk filter in.txt -+myword
and myword contains HiCodes, you
type them all as OEM chars, and it works.
- if you create a batch file with Windows
Notepad, and therein type
sfk filter in.txt -+myword
and myword contains HiCodes, you will
find that filter no longer finds the
word. Because Notepad created an Ansi
encoded text file, so the "myword" chars
are Ansi encoded.
what happens?
- CMD.EXE still thinks "myword" is OEM,
and incorrectly "converts" it to Ansi,
which actually breaks all HiCode
chars.
- sfk.exe then gets myword with
completely wrong encoding, and the
search fails.
how to fix this:
- write your .bat files with OEM
encoding. this can be done with
Notepad++: - create a new file
mytest.bat - select: Encoding /
Character Set / your area,
then select your OEM codepage.
- now type sfk commands into the
batch file, and save it.
- side effect: if you create sfk scripts
embedded in such a batch file, like:
sfk batch mytest2.bat
searches therein will fail again
if this is OEM encoded. because by
default "sfk script" wants to load
Ansi text. to fix this use option
-dos like: sfk script -dos ...
What is not possible?
SFK cannot process any text outside your
Ansi codepage.
for example, if a computer uses Western
Europe codepage 1252, it is possible to
search German umlauts and some French
accent characters. but it is impossible to
search and filter cyrillic text (encoded in
1251), and it will even be impossible to
type cyrillic chars in the first place, as
the keyboard has no such keys.
see also:
sfk help nocase about case
insensitive search
sfk help unicode unicode to Ansi
conversion
you are viewing this page in mobile portrait mode with a limited layout. turn your device right, use a desktop browser or buy the sfk e-book for improved reading. sfk is a free open-source tool, running instantly without installation efforts. no DLL's, no registry changes - just get sfk.exe from the zip package and use it (binaries for windows, linux and mac are included).
|


