ANNOUNCE: Text-CSV_XS 0.28



Note the IMPORTANT CHANGE!

file: $CPAN/authors/id/H/HM/HMBRAND/Text-CSV_XS-0.28.tar.gz
size: 33289 bytes
md5: 8c00161d04793deaf383b4331fe09db4

2007-06-03 0.28 - H.Merijn Brand <h.m.brand@xxxxxxxxx>

* IMPORTANT CHANGE: new () returns undef if it gets unsupported
attributes. Until now, new ({ esc_char => "\\" }) was just
silently ignored. Rejecting it and failing is better than
continuing with false assumptions.
* Added allow_loose_quotes (see doc)
* Added t/65_allow.t
* Added allow_loose_escapes (see doc) RT 15076
* More code cleanup in XS
* Added allow_whitespace (see doc)

2007-05-31 0.27 - H.Merijn Brand <h.m.brand@xxxxxxxxx>

* checked with perlcritic (still works under 5.00504)
so 3-arg open cannot be used (except in the docs)
* 3-arg open in docs too
* Added a lot to the TODO list
* Some more info on using escape character (jZed)
* Mention Text::CSV_PP in README
* Added t/45_eol.t, eol tests
* Added a section about embedded newlines in the pod
* Allow \r as eol ($/) for parsing
* More docs for eol
* More eol = \r fixes, tfrayner's test case added to t/45_eol..t


=item allow_whitespace

When this option is set to true, whitespace (TAB's and SPACE's)
surrounding the separation character is removed when parsing. So
lines like:

1 , "foo" , bar , 3 , zapp

are now correctly parsed, even though it violates the CSV specs.
Note that B<all> whitespace is stripped from start and end of each
field. That would make is more a I<feature> than a way to be able
to parse bad CSV lines, as

1, 2.0, 3, ape , monkey

will now be parsed as

("1", "2.0", "3", "ape", "monkey")

even if the original line was perfectly sane CSV.

=item allow_loose_quotes

By default, parsing fields that have C<quote_char> characters inside
an unquoted field, like

1,foo "bar" baz,42

would result in a parse error. Though it is still bad practice to
allow this format, we cannot help there are some vendors that make
their applications spit out lines styled like this.
.



Relevant Pages

  • BCP and CSV - my conclusions
    ... There have been several threads here on reading/writing CSV files ... using BCP, and I suspect that they will continue to crop up for ever. ... contain a double-quotes character need to be wrapped in a translation ... within a field that is enclosed in quotes, and accepts the line-end ...
    (microsoft.public.sqlserver.tools)
  • Re: Literature authors with similar styles
    ... >>From a programming point of view, ... >>Parsing for character names would be a logical initial parsing point. ... Parsing for individual words *is* the ...
    (alt.usage.english)
  • Re: how can i change the text delimiter
    ... we receive the data in csv format ... ... Defines the character used to quote fields that ... Defines the character that will be used to separate ... Extracts fields from the CSV record in string. ...
    (comp.lang.python)
  • Re: StreamReader ReadLine alternate End Of Line
    ... That's EOL. ... You're correct about the internal state variables--I completely forgot ... custom reader that just scans each character once. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Reading messy files with Fortran
    ... In my case the messy files are csv extracts from a database (whose ... I discovered that Fortran sees spurious EOR markers within ... character fields and I couldn't see a rhyme or reason why. ...
    (comp.lang.fortran)