l i n u x - u s e r s - g r o u p - o f - d a v i s
Next Meeting:
July 7: Social gathering
Next Installfest:
Latest News:
Jun. 14: June LUGOD meeting cancelled
Page last updated:
2010 Aug 19 08:15

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
Re: [vox-tech] Suggestions for cleaning up repetitive HTML tags?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [vox-tech] Suggestions for cleaning up repetitive HTML tags?

On Wed, 2010-08-18 at 16:23 -0700, Bill Kendrick wrote:
> On Wed, Aug 18, 2010 at 01:29:14PM -0500, Chanoch (Ken) Bloom wrote:
> > Consider writing a SAX filter that just drops the offending <font> and
> > </font>.
> Well, we want the style info to remain... there's just no reason in
> the world for the document to specify it over and over again on
> a per-word or per-character(!) basis. :)

It doesn't have to drop all of them. I'm not sure how easy or hard it
will be to model the state that you need to do this correctly though.
That depends on the structure of your data.

> > Also consider using XPath, like my following example in Ruby (using the
> > Nokogiri XML library)
> Ooooh.  Thanks, I'll poke at this.  (I know there's some some Xpath stuff
> in PHP that I know nothing about, since I've only spoken to it about
> XML via its DOMDocument stuff, so far.

Note that part of the challenge here is dealing with the spaces that
come between the font tags -- deciding which ones should be included in
the newly-merged font tags, and which shouldn't. So for both techniques,
experimentation is in order.

vox-tech mailing list

LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
Appahost Applications
For a significant contribution towards our projector, and a generous donation to allow us to continue meeting at the Davis Library.