Re: [ Attn: Randy ] Ad-hoc Parsing?

From: Herbert Kleebauer (klee_at_unibwm.de)
Date: 12/25/04


Date: Sat, 25 Dec 2004 11:36:19 +0100

Phil Carmody wrote:
> Herbert Kleebauer <klee@unibwm.de> writes:

> > > > Now, with your superior shell and your posted method
> > > >
> > > > phil@nonospaz:tmp$ echo -n -e '\x01' >| crap
> > > >
> > > > you would need 4 characters for any binary byte giving a total
> > > > of 8192 bytes. In this case I prefer to use the inferior shell
> > > > which needs only 1812 bytes.

With your table given below I get:

  0- 6: 1236 * 2 = 2472
  7- 13: 42 * 2 = 84
 14- 15: 11 * 3 = 33
 16- 31: 105 * 4 = 420
 32- 38: 36 * 1 = 36
     39: 0 * 2 = 0
 40- 91: 195 * 1 = 195
     92: 0 * 2 = 0
 93-126: 301 * 1 = 301
127-159: 26 * 4 = 104
160-255: 96 * 1 = 96
-------------------------
                    3741 byte

compared to the 1871 this is still twice the size.

Now to your table itself.

> Character codes 0-7 can be done in 2 characters - \#

And what happens if the code 0x03 is followed by the
character '3': I suppose \33 would be interpreted as
octal 33, therefore 2 characters are not always sufficient
for codes 0-7

> Character codes 7-13 can be done in 2 characters - \a, \b, \t, \n, \v, \f, \r
> Character codes 14-15 can be done in 3 characters - \x#

The same as above. What when 0x0f is followed by the letter 'a'?

> Character codes 32-126 \ 39,92 can be done in 1 character - themselves
> Character code 39, 92 can be done in 2 characters - \', \\
> Character codes 160-255 can be done in 1 character - themselves

This will make your script a binary file. Maybe the shell doesn't
have a problem with binary files, but many editor have. If you
open the file in an ascii editor to change a single byte the
complete file can be corrupted.

> Assuming a binary with no bias towards any particular character, that's
> a mean chars/byte of
>
> (14*2+16*3+93*1+2*2+33*4+96*1)/256
> 1.56640625
>
> Which is substantially less than 4.

3741/2048 = 1.8

But as I already said, whether this factor is 4 (use \xnn for
any byte) 1.8 (in the example above, which will result in
encoding errors) or 0.9 (in the case of the self extracting
compressed ascii encoding used in my batch program) doesn't
matter. They are all in the same category, far above the
43 bytes for the com program solution.
 
> Of course, with Here Documents, as long as you don't mind binary data
> in your script file, then there's no character-esaping, so no overhead.

And if you mind about binary data, you have to use the
\xnn form for any byte >127 which will increase your
1.56 factor essentially.



Relevant Pages

  • Re: replacing text data in a binary file
    ... For a binary file, it really depends on the encoding, which I would try to ... If you start using ReadChar & WriteChar you are back to translating to Text. ... Assuming your file is not EBCDIC & you are not using any extended character ... EBCDIC you still have 1 byte per single character however ...
    (microsoft.public.dotnet.languages.vb)
  • Re: search and replace in binary file
    ... character, tab, etc. Looks like I can't attach files here, so best I can ... Dim objNetwork, objWMIService, objComputer, objFSO, objTextFile ... Dim strComputer, strUser, strUserProfilePath, strNextLine, strFileText ... 'OPTIONAL METHOD FOR DEALING WITH BINARY FILE, ...
    (microsoft.public.scripting.vbscript)
  • Re: TMA Assembler?
    ... That's the funny ... Difference between a binary file and a character file... ... and be able to criticize people basing your argumentation on something ...
    (alt.lang.asm)
  • search and replace in binary file
    ... character, tab, etc. Looks like I can't attach files here, so best I can do ... Dim objNetwork, objWMIService, objComputer, objFSO, objTextFile ... Dim strComputer, strUser, strUserProfilePath, strNextLine, strFileText ... 'OPTIONAL METHOD FOR DEALING WITH BINARY FILE, ...
    (microsoft.public.scripting.vbscript)
  • Re: differance between binary file and ascii file
    ... Plz tell the differance between binary file and ascii ... you are in big trouble now. ... This newsgroup is for C language related questions. ... A character is not some arbitrary size. ...
    (comp.lang.c)