l i n u x - u s e r s - g r o u p - o f - d a v i s
L U G O D
 
Next Meeting:
December 2: Social gathering
Next Installfest:
TBD
Latest News:
Nov. 18: Club officer elections
Page last updated:
2003 Jun 12 01:08

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
Re: [vox-tech] Parsing Html
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [vox-tech] Parsing Html



--sHrvAb52M6C8blB9
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Jun 11, 2003 at 06:41:00PM -0700, Michael Wenk wrote:
> On Wednesday 11 June 2003 02:54 pm, Mike Simons wrote:
> > On Wed, Jun 11, 2003 at 04:00:06PM -0500, Jay Strauss wrote:
> > > > http://quote.cboe.com/QuoteTable.asp?TICKER=3Dqqq&ALL=3D2
> >
> >   save the html into a file with wget, then feed that as an argument to
> > the perl below... if you want the calls and puts broken into separate
> > arrays or into hashes it should be easy from here.
>=20
> Silly question, why not open the wget in the open call ?  Ie:
>=20
> open FOOF, "wget -q -O - <url> |";=20
> while (<FOOF>) {=20
> $file .=3D $_;
> }
> $_ =3D $file;=20
> ...
>=20
> I realize using something fun like LWP would be better, but this would be=
 the=20
> poor mans way of doing it I would think...

Michael,=20

Good question... here are some reasons I didn't put the fetch in the code.

- I normally develop a script like that via multiple passes, adding
  one line of s### junk at a time, sending the output to less.  Even
  over DSL it takes much longer to develop if the page get's pulled for
  run... since it's not in it's final form anyway it's better to have
  Jay add the fetching later if he wants.

- I don't really like relying on shell commands to be there, so
  since perl can do most anything itself I like using perl stuff if
  available (like LWP).

- I already have sent samples of the LWP fetching method to vox-tech=20
  along with code to use a local cache of fetched pages which makes
  development much faster, but that code would overwhelm the simple
  "how to parse a html table" question Jay asked and I wasn't going
  to get any bonus points for including it anyway (*).
 =20

  With all that said putting a wget much like you mentioned is a great
idea for the final version.  I would do something like if I had to use
wget:
=3D=3D=3D
open GET, "wget -q -O - '$url' |" or die "wget failed on: '$url'";=20
$_ =3D join '', <GET>;
close GET;
=3D=3D=3D

    TTFN,
      Mike

*: The fancy LWP caching code is in the thread about downloading the=20
Debian Bug List of stupid packages like apt (which have many hundreds=20
of open bugs).

    http://simons-clan.com/~msimons/debian/howto/dldb/

--=20
GPG key: http://simons-clan.com/~msimons/gpg/msimons.asc
Fingerprint: 524D A726 77CB 62C9 4D56  8109 E10C 249F B7FA ACBE

--sHrvAb52M6C8blB9
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE+5+Ix4Qwkn7f6rL4RAsmgAJ0Zn3eBsj9YIsEcR2YYx/ZcDhNU2wCeIKnp
+KNMo8zdql+W07q784zQ+3o=
=3T2b
-----END PGP SIGNATURE-----

--sHrvAb52M6C8blB9--
_______________________________________________
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech








LinkedIn
LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
facebook
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
Appahost Applications
For a significant contribution towards our projector, and a generous donation to allow us to continue meeting at the Davis Library.