l i n u x - u s e r s - g r o u p - o f - d a v i s
Next Meeting:
July 7: Social gathering
Next Installfest:
Latest News:
Jun. 14: June LUGOD meeting cancelled
Page last updated:
2010 Aug 19 08:15

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
[vox-tech] Suggestions for cleaning up repetitive HTML tags?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[vox-tech] Suggestions for cleaning up repetitive HTML tags?

I've come across some documents that are formatted in
such a way that, when converted to HTML, they come out
something like this:

  <font face="Arial">And</font> <font face="Arial">then</font>
  <font face="Arial">they</font> <font face="Arial">looked</font>

or even worse:

  <font face="Arial">A</font><font face="Arial">n</font><font

I've come up with a way, using PHP's DOMDocument system, to
scrape a file clear of these, but it's very slow, and it's
basically something that can be done on a stream of text
(rather than having to worry about the document's structure).

I'm thinking of writing something in PHP or C to clean stuff
like this up, but am wondering if anyone else has any experience
and suggestions?

(And yes, I've used "htmltidy", but while that can merge _nested_
styles, e.g., a "<font face="Arial"><font size=+1>" get
combined into its own CSS stype, e.g., "<span class="c123">",
it doesn't seem to be able to merge _consecutive_ styles,
as shown in the examples above. :^/ )

Sent from my computer
vox-tech mailing list

LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
Sunset Systems
Who graciously hosts our website & mailing lists!