Re: Help reading structured binary files



Robert Kilroy wrote:
I was given a task to read some binary files that we will be dumping
into our database. The files are in key/value pairs in the format of:

Field ID = 2 bytes ( (Byte2 * 256) + Byte1 )
Field Length = 1 byte (0-255)
Field Data = 1 to 255 bytes
Files terminated by 2 bytes of 255

I'm not entirely what these files are, I was told "Put these in the
database".

The bytes are converted into CHRs and I'm returning them in a
Stringlist for easier manipulation.

Better hope none of the IDs have the value 61.

The function below works fine, but
I'm frustrated at its speed and was wondering if someone could give me
a shove in a different direction. This is really the first time I've
messed with binary files like this and I'm sure I'm doing it all wrong.
:)

function TForm1.LoadWRK(filename : String) : TStringlist;
var
  i : Integer;
  tmpList : TStringlist;
  fid1, fid2, l, c : Byte;
  fieldid, value : String;
  MS : TMemoryStream;
begin
  tmpList := TStringList.Create;

You need to protect this object with a try-except block.

try

  // current function contents

except
  tmplist.Free;
  raise;
end;
Result := tmplist;

  MS := TMemoryStream.Create;
  try
    MS.LoadFromFile(filename);

Instead of loading the entire file into memory, why don't you just use a TFileStream?


      MS.Read(fid1, 1);
      MS.Read(fid2, 1);
      if (fid1 = 255) AND (fid2 = 255) then
        break;
      fieldid := IntToStr((fid2*256)+fid1);

I'd use a Word instead of two bytes.

MS.ReadBuffer(fid, SizeOf(fid));
if fid = $ffff then break;
fieldid := IntToStr(fid);

      MS.Read(l, 1);
      value := '';
      for i := 1 to l do
      begin
        MS.Read(c, 1);
        value := value + chr(c);
      end;

Ugh. No looping, no string re-allocation necessary.

MS.ReadBuffer(l, SizeOf(l));
SetLength(value, l);
if l > 0 then
  MS.ReadBuffer(value[1], l);

tmpList.Values[trim(fieldid)] := trim(value);

Why are you trimming fieldid? Are you expecting IntToStr to return a value with leading or trailing whitespace?


Why are you trimming the value at all? Doing that changes the value that you read from the file, so the data you store in your database isn't the same as what you read from the file. If it's really binary, non-text data, then that data could contain bytes that are less than or equal to 32.

Are you guaranteed that no field ID will be repeated in a file? If not, then the quoted line above destroys some of your data.

    end;
  finally
    MS.Free;
  end;
  result := tmpList;
end;

I was wondering if I could use some sort of Record structure, but
either way it seems I need to read the third byte to get the length of
the data. Maybe a BlockRead? But is there a faster way to convert each
byte into its CHR value for a string?

Chr is a function. What you want is the Char value. Char is a data type. If you wanted to load the bytes as Chars, then all you would have had to do would be to declare the C variable as a Char instead of a Byte. The data in the file doesn't have any inherent data type. It's all just plain old bytes on a disk. And the Read method doesn't care what type of variable you pass to it; it will fill a Byte just as well as it will fill a Char.


How long does it take just to load the file? To find out, remove anything to do with the string list. Just load the data and discard it. I suspect you'll find that the string operations are your bottleneck.

In the code I wrote above, I used ReadBuffer instead of Read. If for some reason you reach the end of the file prematurely, ReadBuffer will raise an exception. The Read method, on the other hand, pretend evenything's fine and leave the variables you pass to it unchanged.

--
Rob
.



Relevant Pages

  • Re: How to add thousand separators
    ... First, this code is obsolete as written, because char is a dead data type and should not ... Note that both of these should be stored as string resources since they might need to be ... 18 digits for any reason. ... you have made a VERY SERIOUS DESIGN ERROR. ...
    (microsoft.public.vc.mfc)
  • Re: what is the best way of passing floats into a string
    ... I do not null-terminate as snprintf takes care of this (according to ... But the easiest way to determine the size needed to format a number, ... int length_of_representation(double n,const char* format){ ... I get a nice result of -10.000000 in my char * string. ...
    (comp.unix.programmer)
  • Re: weird problem
    ... I already told you that the comparison between an integer and a float ... to strcmpwhich expects a pointer to a string. ... And now a question about something else: why do you use floating ... int,float, char, etc. ...
    (comp.lang.c)
  • Re: why I can not write to the file after initialize the MFC in a service program
    ... you don't use char, an obsolete data type ... Why do you need an intermedate buffer to write literal strings anyway? ... For example, if AfxWinInit fails, you copy a 45-character string into a ... So you are going to try to initialize MFC EACH TIME THROUGH THE LOOP? ...
    (microsoft.public.vc.mfc)
  • Re: why I can not write to the file after initialize the MFC in a service program
    ... you don't use char, an obsolete data type ... Why do you need an intermedate buffer to write literal strings anyway? ... For example, if AfxWinInit fails, you copy a 45-character string into a ... So you are going to try to initialize MFC EACH TIME THROUGH THE LOOP? ...
    (microsoft.public.vc.mfc)