Re: Scanf and number formats

From: Vig (gtg121p_at_mail.gatech.edu)
Date: 03/14/05


Date: Mon, 14 Mar 2005 13:12:53 -0500


"Walter Roberson" <roberson@ibd.nrc-cnrc.gc.ca> wrote in message
news:d14hdf$2u8$1@canopus.cc.umanitoba.ca...

> :Also, I cannot directly replace an e with a d
> :because Scientific notation is usually written as 0.123456e+01 while d is
> :1.23456d0 (I am not completely sure, which is why I want C to handle it
all
> :for me :) )
>
> On output, C's e format,
>
> is converted to the style [-]d.ddde+dd, where there is one digit
> before the decimal-point character (which is nonzero if the
> argument is nonzero)

Yes...It's pretty retarded of me to grumble about convention if converting
d's to e's will still be read correctly.

> On input, a string of digits is accepted before the decimal point.
> The sign after the 'e' on input is optional. Thus, 0.123456e+01
> and 1.23456e0 are equivilent [except perhaps in the last bit or two
> when one is at the limit of precision.]
>
>
> :Almost everything we read from files are numbers. Currently, it is
scanned
> :with a %lf unless otherwise specified. If we are to handle the problem
of
> :the 'd' that would mean almost multiplying our time for reading even good
> :files without d's by 3.
>
> No, that doesn't follow. The time required to read data from a file is
> largely dominated by the disk I/O rate... modified by operating
> system predictive reads, direct I/O or not, DMA block size, SCSI
> Command Tag Queuing (CTQ), ability of the OS to flip a DMA page
> directly into user space without having to copy it, and so on.

Ya...just thinking it out and talking to you has made me remove a lot of
ridiculous code I had put in place. I think the d to e substitution will
work albeit it would have to be done smartly when I am more awake :)

> When you use scanf(), then unless you have specifically turned off
> buffering, the C I/O library will usually [but not promised in the
> standard] fill a block from the I/O subsytem (or I/O cache),
> putting the block into your memory space; the block size is often
> 8 Kb. Once the block has been read in, scanf() is really just
> reading the data from memory, as if it were using getc() to fetch
> each character. [It has to be that way because you are allowed
> to mix getc() and scanf(), so they both have to read from the
> same input buffer, and it usually isn't worth duplicating the
> logic.] getc() is usually a macro that works with the FILE
> structure.
>
> The slow part of reading is getting the data from disk to your
> program the first time; once there, you could examine the data a
> number of times before the next batch was ready. For example if your
> disk subsystem is SCSI-2 Fast, your disk might be limited to
> 20 megabytes per second; on a 2 GHz CPU, you could run 100
> cycles per character and still keep up with the disk.
>
> If you are sufficiently starved for CPU resources that
> doing a quick scan-and-replace over the buffer is slowing you
> down, then you should probably already have done a bunch
> of work on custom I/O (e.g., using "real time" partitions,
> using a raw partition instead of a block device, using
> scatter-gather buffering, using any available O/S
> facilities to bypass caching; ensuring your input data
> is always a multiple of an I/O page and always reading
> in full blocks instead of going through the per-character
> end-of-buffer checks imposed by getc().) You should not
> presume that a simple scan over the buffer will prove
> to be the limiting speed factor on your program: it
> probably won't.
>
> Speaking of limiting speed factors: consider having a
> pre-pass program that does nothing other than reading in
> the data and converting it to binary and storing the
> binary as a file with fixed length records. Such a program
> could probably run asynchronously with whatever calculation
> you are doing -- and if you are reading the input file
> multiple times in different programs, you will have
> saved having to convert the ASCII multiple times.
> You will get about a 3:1 compression ratio by converting
> the input to binary.

That is actually a good idea, but I had to stamp it out of my head in about
10 seconds because I am only fixing a bug right now and there doesn't seem
to be a possibility of me being able to talk people into this :)

> Any sufficiently old bug becomes a feature.

And Vice Versa :)

Thanks for all the help

-- 
Vig


Relevant Pages

  • Re: What tool to use for processing large documents
    ... cannot parse faster than the disk can read the XML data. ... Reading 10 GB off a disk will take around 3 to 5 minutes ... I forgot to mention that my logs are in zipped xml. ... Get the set of nodes matching an XPath expression. ...
    (comp.text.xml)
  • Re: asymptote/hyperbola?
    ... He's presenting a character who's talking math. ... So a responsible reading will account for this nonsense in at least one ... Translation is the foundation of every reading. ... >> On your last hermeneutic occasion, you informed us that all the dancer ...
    (rec.arts.books)
  • Re: Update
    ... So my writing isn't making the jumps it could, ... otherwise not particularly good series --the lead character nearly ... The possible interruption was someone betareading one of my books, ... I liked reading the story. ...
    (rec.arts.sf.composition)
  • Re: *Fast* way to process large files line by line
    ... line and parse it. ... reading file loop> marshalling> parsing. ... the disk I am using is an LVM mapped ext3 local disk. ... Btw will using something like an mmap extension for ruby speed things ...
    (comp.lang.ruby)
  • Recently Read - February, 2006
    ... It's been a busy month with reading taking last place. ... almost as well as Robin McKinley on a so-so day. ... Good vs. Evil breaks out. ... Lt. Eve Dallas is after a cool character this time. ...
    (rec.arts.sf.written)

Loading