the way of sfk - command line text file processing

how to speed up your working process by use of the command shell, a set of batchfiles, and the swiss file knife.

NOTE: this is additional documentation. for the primary syntax help, run "sfk" without parameters.

sfk is not just a tool, but an implementation of working process principles, namely:

  • realtime search and analysis of ASCII text files (usually sourcecode).
  • power editing trough cluster files, trial and backup development.
  • dynamic source patching (how to make local changes permanent without ever checking in).
  • structuring work by batchfiles.

    which will be discussed here in a short overview. furthermore, this document contains

  • sfk grep, detab, filter, run examples.
  • sfk general command syntax.
  • sfk run syntax.
  • sfk patch syntax.
  • sfk instant ftp server.
  • windows vs. linux syntax differences.

    but first of all, let's talk about the command line principle, which is a prerequisite for using sfk efficiently.

    01: the shell-based working process

    The average developer is using some kind of integrated development environment (IDE). An IDE provides many comfortable features for basic tasks, and can be sufficient for small projects; but as soon as you need to automatize a couple of steps, especially massive file operations, it's getting complicated - if not impossible.

    Therefore I will now talk about the Windows (XP/NT/2K) Command Shell (more about Unix shells later). If you shiver just by hearing this term, your only contact with the shell so far was probably this:

         Start Menu -> Run -> cmd.exe

    By default, you get an uncomfortable, inefficient shell, about 80x25 characters, wide font, and without some important mouse support for text marking and insertion. Furthermore, just to open the shell via "Start Menu" is ways too complicated. So let's do some configuration:

  • create a shell desktop icon for immediate access. to do so, open Start Menu -> Programs -> Accessories, then right mouse button over the Command Prompt icon, select "copy". now left-click onto some empty space on the desktop, select "paste". a new icon appears.
  • right-click on this icon, select the properties.
  • layout: set screen buffer size: width 120, height 3000. this means, whatever text is listed, the shell will allow you to view the last 3000 lines, so no important outputs will get lost.
  • layout: select window size: width 120, height 25. this tells how much of the buffer is displayed by default, it can be changed afterwards by resizing the window.
  • options: activate QuickEdit Mode (and Insert Mode), if not done already. this essential option allows you to mark text in the shell eighter by free selection (keeping left button pressed) or by double-click (selecting a whole word), to copy this into clipboard by right button, and to insert clipboard content by right button (if nothing is marked up in the shell).
  • font: select a more compact font size, I recommend "7x12".
  • now select the Shortcut tab. extend the Target expression to

         %SystemRoot%\system32\cmd.exe /K c:\batch\init.bat

    which tells the shell to always run init.bat on startup.

  • finally, close the shell properties, and create a directory and file "c:\batch\init.bat" with this content:

         set PATH=%PATH%;c:\batch

    the efficient shell: first contact

    now double-click on the new icon. a shell window opens. at first, learn how to move quickly: few people may have noticed, but command auto-completion by pressing the TAB key has become a standard with the windows command shell. to enter the directory C:\batch, type this:

         C:   (+ENTER KEY)
         cd \   (+ENTER KEY)
         cd ba

    and do NOT press ENTER after "ba", but simply press the TAB key. your command should be autocompleted to:

         cd batch

    and then you can press ENTER. if there happens to be another dir starting with "ba", e.g. "baba", you may get "cd baba" at first. don't mind, just press TAB again - sooner or later, "cd batch" will be listed.

    now, this isn't highly spectacular yet, but once you try to walk into a directory like

         D:\work100\TheProject\BaseLib\CoreDriver\include\

    it is a huge difference if you try to type the whole expression (including approx 3. typos and retries), or if you simply type:

         cd wo(TABKEY)\th(TABKEY)\ba(TABKEY)\(TABKEY)\(TABKEY)

    note that you're not even required to type any word at all - for the last two parts of above expression, I just typed \ and then TABKEY already. this way, the shell simply lists the first directory (or file) available.

    if your window has no auto-completion, you may be using an older Win2K, in which this feature is available, but not active by default. to activate, say

         regedit
         search HKEY_CURRENT_USER\Software\Microsoft\Command Processor
         set CompletionChar to value 9

    from now, every new shell opened supports autocompletion. And about the unix users: well, most unix shells support autocompletion by default, and have a comfortable working layout, so there should be no configuration effort.

    now, what do we have?

  • a shell in which we can navigate nearly as quick as in explorer.
  • but we can also run every command-line tool instantly, just by typing it's name. of course we have to copy the tool into C:\batch first, or alternatively, we may extend the PATH in C:\batch\init.bat to include the tool.
  • and we can extend our working environment anytime, by the creation of new batch files in C:\batch.

    First of all, download sfk, and copy sfk.exe to C:\batch.

    And from now on, stay in the shell. Whatever follows now, I expect that you have permanent command line access.

    02: realtime search and analysis of sourcecode.     back to top

    If you're working with thousands of source files, you often have to lookup something. This process is sped up massively by the creation of "snapfiles", for example:

          sfk snapto=all-src.cpp -dir TheProject -file .c .h .cpp .hpp .xml

    This command collects all source files from the directory tree "TheProject" into one large text file, "all-src.cpp". Now you can load this file into your favourite text editor, and perform high-speed lookups accross all content, with less than 5 seconds per lookup on a current machine (Pentium IV etc.).

    This principle can be optimized into pingpong reverse tracking:

  • you open the same snapfile in two windows (e.g. with the commercial text editor UltraEdit, use the command "Duplicate Window")
  • then you arrange both windows parallel on the screen (UltraEdit: tile vertical)
  • now, start searching something in the left window. once you're in a source spot of interest, you may want to lookup something else, e.g. a method name which is called from there.
  • now simply change to the right window, and do your next search. this way you keep one eye on the original spot, and one on the next.
  • now you may want to research a third term. ok, you only have two windows, so change back to the left, and search there again.
  • therefore it's called ping-pong: you change between a left and right window, step by step proceeding through a huge source base.
  • this principle is efficient with a snapfile only. if you were loading each local sourcefile one by one, you would soon end up with hundreths (thousands?) of opened windows, loosing any overview.

    03: power editing through cluster files.     back to top

    "If I jam all those files together into a snapshot... why can't I change these contents directly in the snapshot?". An intriguing question indeed. To make this possible, sfk would have to

  • scan the snapshot file permanently for changes.
  • isolate changes, write them back to the target files.
  • scan all target files as well for changes, and if there are, re-integrate them into the snapshot.

    we cannot really do this with a huge snapshot containing the contents of thousands of files. the system would break down if we check a thousand files for a change each second... but in a smaller scale, it is possible. sfk calls this a clusterfile.

    to create a clusterfile, first think a moment which files you actually need in there for editing. select a file tree with a maximum of 200 files approx., then don't use snapto, but:

          sfk synctext=100-edit.cpp -dir TheProject/FooLib/CoreDriver -file .hpp .cpp

    why is it called synctext? sfk collects all source files from the mentioned directory into 100-edit.cpp, and then it doesn't exit, but continues to run in sync mode. in this mode, sfk does exactly what was mentioned above: check both the cluster (100-edit.cpp) as well as all target files for changes, and sync them either "down" (from cluster to targets) or "up" (from targets into cluster, if targets were changed directly).

    now load the cluster into your favourite editor, have some really global changes, and select save. you will notice a short information popup listing all targets concerned by your change. the target contents are updated automatically. it's the same as if you were selecting a function like "replace in files" from a text editor, recursively in a file tree.

    up-syncing

    but what happens if you load one of the target files directly in your editor, change and save it?

  • sfk will detect the change, for example, in CoreLib.hpp.
  • it loads CoreLib.hpp into memory, re-integrates the content, and writes a new 100-edit.cpp to disk.
  • your text editor should now auto-detect that 100-edit.cpp was changed, and offer you to reload the file.

    NOTE: cluster editing should be used ONLY if your text editor is able of autodetecting changes in text files, and offers you to reload such files automatically!

    you also get a short popup info from sfk, saying "RELOAD CLUSTER NOW".

    trial and backup development

    let's say, you integrated a new, cool feature into 100-edit.cpp, and it's working fine. before you integrate the next feature, make a save point of your work this way:

  • end sfk syncmode by pressing ESCAPE (or CTRL+C on unix).
  • copy 100-edit.cpp 110-next.cpp
  • sfk synctext=110-next.cpp

    this way, you create a new revision of your codebase. sfk is synced onto this new revision. drop 100-edit.cpp from your editor, load 110-next.cpp instead, and continue working.

    now, you may jump back and forth between these revisions any time by stopping sfk, and re-syncing onto the other one. for example, if you changed lots of stuff in 110-next.cpp, but want to check again how 100-edit.cpp behaved, then

  • end sfk syncmode by pressing ESCAPE.
  • sfk synctext=100-edit.cpp

    sfk will automatically do a down-sync by default, which means all content of 100-next.cpp is written out into the target files. after your test of this version, jump forward again through

  • end sfk syncmode by pressing ESCAPE.
  • sfk synctext=110-next.cpp

    and, again, all target file content will be overwritten by the new code.

    this is what i call trial and backup development - having both massive, global changes, but also a very easy and transparent local backup system.

    line number mapping

    so far, so fast - you warp over your sources, and change them in realtime. great! but once you compile, you may have to cope with compile errors, for example:

          CoreLib.cpp(6300) : error C2065: 'nCnt' : undeclared identifier

    unfortunately, the line number 6300 is not the line number in the cluster, but in a local target file. it must be mapped. to do so, write yourself a short compile batch like this:

         make yoursys.mak >err.txt 2>&1
         sfk maptext=110-next.cpp <err.txt

    sfk will read the compiler's error messages from err.txt, parse through it, and whatever looks like a filename and line number is mapped into a cluster location, and listed.

    04: dynamic source patching.     back to top

    or: how to keep local changes, even across new codebases - without ever checking in.

    the primary reason why you may need local, permanent code changes is customizing the system for your own needs. imagine your company is making a security software, and you are working on a component in this software which tries to read something from a file. now let's say, whenever you change anything in your code, you have to do this:

    this whole process costs you about 60 seconds. even worse, it is boring and tiring, an extremely robotic, stupid task - but on every change in your code, you have to run through the whole process, again and again - 20 or 30 times a day.

    if you're a real crack, you start thinking about process optimization. you investigate into the source, and soon you find the authentication module, with it's password check. with 3 lines of code, the check can be worked around. the same applies for the other stuff - with just a few lines of code, you manage to run the plugin loading and file selections fully automatically.

    in other words, with a handfull of changes at the right places, your system runs fully automated to your point of interest. great!

    of course you do not intend to check these changes in - they're local optimization just to improve your working process. and maybe you cannot check them in, even if you liked to, as you have no write access to these modules.

    but a few days later, you may have to upgrade to a new codebase. you have to sync to the newest sources from the cvs depot, to get the latest features and fixes from your colleagues. and as you sync, your local changes get lost.

    now, what do you do? on the new codebase, you insert your code optimizations again - from hand. you

  • open file by file
  • search for a specific source line, or couple of lines. you do not jump to a fixed line number, because new code may have been inserted, and all line numbers are invalid now. instead, you search for a source pattern.
  • then you replace it with a couple of other lines, a replacement pattern.
  • then you save the file, and proceed to the next.

    and this is exactly what sfk patch does - fully automatized. instead of changing the target files directly, you write a patch script, or patch file. for example, c:\patch\comfort.hpp:
     
    :patch "working comfort optimization"
    
    :info skip authentication, automatize plugin and cert file load
    
    :root SecuFooBase
    
    :file Base\Authentication\StartupManager.cpp
    :from
       char szUser[100];
       char szPW[100];
       long lRC = showUserPWDialogue(szUser,100,szPW,100);
       if (lRC) { return 197; }
    :to
       // [patch-id]
       char szUser[100];
       char szPW[100];
       // long lRC = showUserPWDialogue(szUser,100,szPW,100);
       // if (lRC) { return 197; }
       strcpy(szUser, "tester1");
       strcpy(szPW, "hello123");
    :done
    
    :file Base\GUIController\MainCore.cpp
    :from
    void CMainCore::showHelp()
    :to
    // [patch-id]
    void autoRunFileLoad() {
       processUserMessage(eMsg_LoadPlugin, "d:\tmp\testplug.dll");
       processUserMessage(eMsg_Plugin+10 , "d:\tmp\testcert.x509");
    }
    
    void CMainCore::showHelp()
    :from
       GUIMsg *pMsg;
       while (true) {
          pMsg = getNextUserInput();
    :to
       autoRunFileLoad();
    
       GUIMsg *pMsg;
       while (true) {
          pMsg = getNextUserInput();
    :done
    

    In this example,

    Now, to actually patch the code, you have to

    Then sfk will load the patch file, load the target files, check if everything is OK, create backups of the targets, and actually apply the patch.

    You may also revoke the patch anytime by saying:
       sfk patch c:\patch\comfort.hpp -revoke

    And if you change the patchfile itself, i.e. you rework your patch, you may revoke and re-apply it in one step by saying:
       sfk patch c:\patch\comfort.hpp -redo

    sfk patching is called dynamic because it adapts to changing codebases. for example, as long as the code line "void CMainCore::showHelp()" itself is not changed, the insertion of "autoRunFileLoad" will work, even if MainCore.cpp is completely reworked many times. patch tools using line numbers are not that flexible.

    One word of WARNING: dynamic patching is a powerful mechanism - you can use it for all kinds of stuff, for example getting rid of tracing spam from other colleagues, or doing a total "source conversion", e.g. for improvements of the tracing system. But always take great care that you do NOT CHECK IN PATCHED FILES. It's also for your own security that every patched file contains the string [patch-id]. If you use patches regularly, always have a quick check that [patch-id] appears nowhere in the source you're about to check in.

    05: structuring work by batch files.     back to top

    the following batch file examples should be placed within c:\batch.

    ec.bat - edit command
    notepad c:\batch\%1.bat
    
       usage example: to create another batch "jamsrc.bat",
       you now simply type "ec jamsrc". of course, replace "notepad"
       by your favourite text editor. (say "ec init" first, then extend
       the PATH to c:\program files\your editor, or whereever it's located).
    
    e.bat - edit a file in the text editor
    @echo off
    "c:\program files\your favourite text editor\theEditor.exe" %1
    
       usage example: this is not so much a batch, but an alias -
       it reduces typing effort. for example, if I'm using UltraEdit,
       I never type "ultraedit mysource.cpp". I may use this command -
       running the editor - about 200 times a day, so it's inacceptable
       to always type more than 1 character for this essential function.
       instead, I say "e mysource.cpp". a trivial, primitive trick -
       but still not obvious for many people, therefore I'm mentioning it.
    
    jamsrc.bat - collect contents of local source tree
    sfk snapto=all-src.cpp -dir . !save_ -file .cpp .c .hpp .h .xml .cfg !all-head !all-src
    
       usage example: whereever you are within the source base,
       just type "jamsrc" to create a local source collection.
    
    setcur.bat - set current working directory
    set VCURRENT=proj%1
    
       usage example: structure your work by giving different code bases
       different numbers. e.g., the first codebase you get from cvs may be
       be called 100, residing under the work directory proj100.
       a few days later, you fetch the latest sources in another new directory,
       proj101, and so on. this way you can quickly switch back and forth
       between proven code and fresh code.
    
    jamsrc2.bat - collect contents of whole current working dir
    @ECHO off
    IF "%VCURRENT%"=="" goto xend
    sfk snapto=C:\%VCURRENT%\all-src.cpp -dir C:\%VCURRENT% !\save_ -file .cpp .c .hpp .h .xml !all-head !all-src
    dir C:\%VCURRENT% /S /B >C:\%VCURRENT%\lslr
    :xend
    
       usage example: whereever you are within the source base,
       just type "jamsrc" to re-create the global source collection.
       this collection is then available under c:\proj100\all-src.cpp
       (if your current codebase number is 100.)
    
    do-check-all.bat - check if patches can be applied, or are still valid
    @echo off
    IF "%1"=="" goto err01
    IF "%VCURRENT%"=="" goto xend
    cd C:\%VCURRENT%
    IF "%1"=="pre" (
    sfk run "sfk patch $pfile -sim -qs" -quiet -norec c:\patch .hpp
    ) ELSE (
    sfk run "sfk patch $pfile -verify -qs" -quiet -norec c:\patch .hpp
    )
    goto xend
    :err01
    echo supply pre or post
    :xend
    
       usage example: this batch expects that all your patch files
       are located in c:\patch and have a file type ".hpp".
    
       if you say "do-check-all pre", all patches are checked against
       the codebase, and sfk tells if the source patterns match, i.e.
       if the patches might be inserted, if you liked to do so.
       so it's a pre-check before doing the actual patching.
    
       however, if you have patched the code already, and synced to
       a new cvs codebase, you may say "do-check-all post" anytime
       to check if the applied patches are still intact.
    
    do-patch-all.bat - apply all patches to the codebase
    @echo off
    IF "%VCURRENT%"=="" goto xend
    C:
    cd \%VCURRENT%
    echo === applying to codebase ===
    sfk run "sfk patch -qs $pfile" -quiet -norec c:\patch .hpp
    :xend
    
       usage example: whenever you get a new codebase from cvs,
       you want to apply all your patches from c:\patch to it.
       let's say the new codebase is proj110, then you say
    
          setcur 110
          do-check-all pre
          do-patch-all
    
    do-revoke-all.bat - undo all patches
    @echo off
    IF "%VCURRENT%"=="" goto xend
    C:
    cd \%VCURRENT%
    echo === revoking all patches ===
    sfk run "sfk patch -revoke -qs $pfile" -quiet -norec c:\patch .hpp
    :xend
    
       usage example: you may have traced and analyzed the code enough
       by the aid of self-written patches - and now you actually want
       to change and check-in some code parts, which are within files
       also changed by patching. then you must first revoke the patches,
       before you can checkout for edit.
    
    erw.bat - edit as read-write
    @echo off
    attrib -R %1
    notepad %1
    
       usage example: whenever you check out from cvs, the stuff is
       read-only by default. so, if you want to apply some quick local changes,
       you have to "attrib -R" so often that it makes sense to provide
       this within another small batch. so simply say "erw foobar.cpp"
       to edit foorbar.cpp, even if it's readonly.
    
    

    06: sfk text processing primitives.     back to top

    examples: search for a string in all files of a dir tree
    
       sfk grep . mystring
       sfk grep -pat mystring -dir . -file .hpp .cpp
    
    examples: remove all tabs from source files
    
       sfk detab=3 . .hpp .cpp
       sfk detab=4 -dir . -file .h .cpp
    
    example: filter all file paths containing FooSys but not CoreLib
    
       sfk list . | sfk filter -+FooSys -!CoreLib
    
    examples: run a command on all .cpp files of the dir tree
    
       sfk run "mything.exe $pfile" -dir . -file .cpp
       sfk run "mything.exe $qfile" . .cpp
    
    for the syntax of all commands, type "sfk" in the command shell.
    


    SFK instant ftp server, and client    
    back to top

    Why an ftp server? because

    Anyway, just say "sfk ftpserv" on one machine, e.g. yourpc, and you have an instant ftp server - no installation, no configuration, no nothing. Then, on the other machine, say "ftp yourpc". You get instant read access to the directory where sfk ftpserv is running in.

    FTP is a slightly complicated protocol, creating an extra connection for every file transfer. This mechanism may fail sometimes, due to firewall problems. In this case, using "sfk ftp yourpc" may help: the sfk client is detecting an sfk server, and uses an easier transfer protocol. Furthermore, a different port may be specified.

    For example, I actually need sfk ftp everytime I want to compile the linux version of sfk. my linux is running on the same machine as windows, under vmware; from linux, I can ping to my host system, and I can even connect to the IP address using ftp. But with a normal ftp client, I can NOT transfer any files. All data connections are blocked due to reasons (windows service pack, network config, firewall...) I will not invest any further time to find out. Instead, I use "sfk ftp ipnumber get src.zip", and after compilation, I say "sfk ftp ipnumber put sfk-linux" to write the result back to the host.

    SFK ftpserv is very simple, and only one user can connect at a time. I.e. if you're connecting with two clients to the same server, you will be blocked until the 1st connection times out. If you need more power, download and install a full ftp server like filezilla.

    sfk ftpserv [-h[elp]] [-port=nport] [-rw] [-maxsize=n]
    
       creates an instant ftp server to enable easy file transfer.
       * the CURRENT DIRECTORY is made accessible, without subdirs.
       * any kind of directory traversal (.., / etc.) is blocked.
       * just ONE CLIENT (browser etc.) can connect at a time.
       * after 30 seconds of inactivity, the connection is closed.
       port: use other port than default, e.g. -port=30199.
       rw  : allow read+write access. default is readonly.
       maxsize: increment size limit per file write to n mbytes.
    
       NOTE: be aware that ANYONE may connect to your server.
             with -rw specified, ANYONE may also write large files.
             if this is a problem, do NOT use sfk ftpserv, but download
             and install a full-scale ftp server like filezilla.
    
       if you login to the server using a regular ftp client, but you cannot
       transfer any files, it's usually a firewall vs. ftp protocol problem.
       in this case, the sfk ftp client may help. type "sfk ftp" for info.
    
    sfk ftp host[:port] put|get filename
    
       simple anonymous ftp client. if connected to sfk server,
       this uses sfk/sft protocol, requiring fewer connections.
    
          sfk ftp farpc put test.zip
             send test.zip to farpc
    
          sfk ftp 192.168.1.99:30199 get test.zip
             receive test.zip from 192.168.1.99 port 30199
    
          sfk ftp hostname
             enter interactive mode, supporting commands:
                dir, get filename, put filename.
    
    

    sfk windows vs. sfk linux: syntax differences

    the syntax of all commands listed above is for the windows version of sfk. under linux (and all other unix systems), the bash is making problems with several characters, especially ! and $.

    therefore, sfk linux uses these replacements:

    • the exclusion-char ! is replaced by :
    • the run pattern identifier $ is replaced by #

    for the correct unix syntax of all commands, type "sfk" under linux.

    back to top