l i n u x - u s e r s - g r o u p - o f - d a v i s
Next Meeting:
July 7: Social gathering
Next Installfest:
Latest News:
Jun. 14: June LUGOD meeting cancelled
Page last updated:
2003 Oct 20 20:19

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
[vox-tech] Another Perl CSV question - long lines = segfault
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[vox-tech] Another Perl CSV question - long lines = segfault

Thanks to the folks who helped me 'fix' the CSV file the other day.

I'm now stumbling upon another problem, where Perl is segfaulting on long
lines.  If I skip CSV lines > 3500 characters, it goes on its merry way.
Lines nearing 4000 characters or more cause Perl itself to die (segfault)
when I try to parse using the following code (from "Perl Cookbook"):

sub parse_csv
  my $text = shift;
  my @fields = ();

  while ($text =~ m{
    # Either some non-quote/non-comma text:

     # Or...

    # ...a double-quote field:
    " # field's opening quote; don't save this
     (  # now a field is either:
      (?:   [^"]    # non-quotes
          |      # or
            ""      # adjacent quote pairs
      )* # any number

      if (defined $1) { $field = $1; }
      else { ($field = $2) =~ s/""/"/g; }

      push @fields, $field;

  return @fields;

At first, I thought it was dying due to character combos (commas, quotes,
double-quotes, or what-have-you), but then I started examining line lengths
just before running the parse subroutine.

Looking at an strace, when I do this:

  open(F, $somefile);

  while (<F>)

...it's only reading 4096 bytes at a time.

Is there a way to increase that buffer, so that I can ensure it reads,
say, up to 20,000 bytes?  (The longest line in the file seems to be around
14,000 bytes; at least, that's what "wc -L" tells me)

In the meantime, I'll Google, since the books I'm looking at don't have
good examples of this. ;)



bill@newbreedsoftware.com                           Got kids?  Get Tux Paint! 
http://newbreedsoftware.com/bill/       http://newbreedsoftware.com/tuxpaint/

vox-tech mailing list

LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
Appahost Applications
For a significant contribution towards our projector, and a generous donation to allow us to continue meeting at the Davis Library.