Re: RFC: The future of Text::CSV_XS



On Fri, 25 May 2007 15:22:02 -0400, "Richard Dice" <rdice@xxxxxxxxxxxxxxxxxx>
wrote:

Merijn,

Why no Cc: to the list?

Thanks for asking, and for your work on this. (Looks like you just took
over maintainership recently...?)

Yes.

My recent "I wish Text::CSV_XS could handle X..." experience was -

- Save an Excel spread*** to CSV format

My Spread***::Read module on CPAN includes a utility that does just that:

# xlscat -c file.xls >file.csv

- But some of the cells in the Excel spread*** contained line breaks

Shouldn't matter

- So iterating line-by-line through the file in order to have lines to
parse with Text::CSV_XS meant that any line derived from a row in Excel
containing a cell containing a line break would fail

That new feature idea regarding reading the whole file at once might be a
good place to address this.

Don't think so, but feel free to enlighten me on the reasoning you have

Other features that could be nice -

- given a file, tell it whether it has a header row and if so provide
a hash-key-style interface on each row per the names in columns of the
header row

Could be one of the options to the suggested

parse_file ($file, { cols => [ ...]. has_header_row => 1 });

causing the construct of

{ fields => [ .... ],

to change to

{ fields => { Name = "...", Address => "...", ... },

but I think that would be a huge impact on memory use and also be
quite easy to create yourself in a map {} construct;

- have it return how many rows and columns there are in the file

# xlscat -i file.csv

I don't think that kind of functionality should be in the low level
that this module lives in. Consider that reading CSV has no defined
way to jump back in the data stream, so once you've read the data,
you cannot go back. It has no random access structure like Excel.

- ability to automatically ignore trailing (and perhaps leading) empty
rows

Also an option in xlscat

- provide a "best guess" count of how many columns there _should_ be
in a row, based on the header row (if present) and/or general agreement
amongst the other rows in the file (if 99 have 14 columns in a row and 1 has
10 columns, that 1 could is likely an outlier)

Nice example. I like that. Should not be in the module itself, but could
be a file file in the examples/ folder.

- In the event of rows with fewer columns than the best-guess (or a
user-defined number of how many columns there should be) then provide
extra undef column (array) values

I would say you use Spread***::Read and do it in that framework.

- ability to extract a row/column range, e.g. columns 2 through 7 in
rows 3 through 13

You defenitely want xlscat :) Both supported as options

/home/merijn 101 > xlscat --help
usage: xlscat [-s <sep>] [-L] [-u] [ Selection ] file.xls
[-c | -m] [-u] [ Selection ] file.xls
-i [ -S sheets ] file.xls
Generic options:
-v[#] Set verbose level (xlscat)
-d[#] Set debug level (Spread***::Read)
-u Use unformatted values
--noclip Do not strip empty sheets and
trailing empty rows and columns
Input CSV:
--in-sep=c Set input sep_char for CSV
Output Text (default):
-s <sep> Use separator <sep>. Default '|', \n allowed
-L Line up the columns
Output Index only:
-i Show *** names and size only
Output CSV:
-c Output CSV, separator = ','
-m Output CSV, separator = ';'
Selection:
-S <sheets> Only print sheets <sheets>. 'all' is a valid set
Default only prints the first ***
-R <rows> Only print rows <rows>. Default is 'all'
-C <cols> Only print columns <cols>. Default is 'all'
-F <flds> Only fields <flds> e.g. -FA3,B16
/home/merijn 102 >

You planning on being at YAPC::EU? Maybe I'll run into you there.

Yes, and planning to talk about another (new) module. I've
already been registered.

--
H.Merijn Brand Amsterdam Perl Mongers (http://amsterdam.pm.org/)
using & porting perl 5.6.2, 5.8.x, 5.9.x on HP-UX 10.20, 11.00, 11.11,
& 11.23, SuSE 10.0 & 10.2, AIX 4.3 & 5.2, and Cygwin. http://qa.perl.org
http://mirrors.develooper.com/hpux/ http://www.test-smoke.org
http://www.goldmark.org/jeff/stupid-disclaimers/
.


Quantcast