l i n u x - u s e r s - g r o u p - o f - d a v i s
L U G O D
 
Next Meeting:
October 7: Social gathering
Next Installfest:
TBD
Latest News:
Aug. 18: Discounts to "Velocity" in NY; come to tonight's "Photography" talk
Page last updated:
2010 Aug 19 09:08

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
Re: [vox-tech] Suggestions for cleaning up repetitive HTML tags?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [vox-tech] Suggestions for cleaning up repetitive HTML tags?



Check out ANTLR and perhaps creating a grammar for it and some of
your own production rules.

http://www.antlr.org

I am doing some stuff with ANTLR at the moment parsing fixed field data
results from a mainframe query.

brian

On Wed, Aug 18, 2010 at 10:48:58AM -0700, Bill Kendrick wrote:
> 
> I've come across some documents that are formatted in
> such a way that, when converted to HTML, they come out
> something like this:
> 
>   <font face="Arial">And</font> <font face="Arial">then</font>
>   <font face="Arial">they</font> <font face="Arial">looked</font>
> 
> or even worse:
> 
>   <font face="Arial">A</font><font face="Arial">n</font><font
>   face="Arial">d</font>
>   ...
> 
> 
> I've come up with a way, using PHP's DOMDocument system, to
> scrape a file clear of these, but it's very slow, and it's
> basically something that can be done on a stream of text
> (rather than having to worry about the document's structure).
> 
> I'm thinking of writing something in PHP or C to clean stuff
> like this up, but am wondering if anyone else has any experience
> and suggestions?
> 
> (And yes, I've used "htmltidy", but while that can merge _nested_
> styles, e.g., a "<font face="Arial"><font size=+1>" get
> combined into its own CSS stype, e.g., "<span class="c123">",
> it doesn't seem to be able to merge _consecutive_ styles,
> as shown in the examples above. :^/ )
> 
> 
> -- 
> -bill!
> Sent from my computer
> _______________________________________________
> vox-tech mailing list
> vox-tech@lists.lugod.org
> http://lists.lugod.org/mailman/listinfo/vox-tech

-- 
Brian Lavender
http://www.brie.com/brian/

"There are two ways of constructing a software design. One way is to
make it so simple that there are obviously no deficiencies. And the other
way is to make it so complicated that there are no obvious deficiencies."

Professor C. A. R. Hoare
The 1980 Turing award lecture
_______________________________________________
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech



LinkedIn
LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
facebook
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
EDGE Tech Corp.
For donating some give-aways for our meetings.