LIBTIDY: Tidy up your html code
Out of various uses of libtidy, this post contain one simple use of libtidy.
It convert your html page code, parse it , tidy it and finally produce an xml doc ready to be fed to your xml parser like libxml.
For that we either provide command line options to the tidy or we can write down all the command line option in a single file so that we only need to pass only one command line argument and that is you guess it right, it's config file path. :D
Config file used:
fix-bad-comments: yes
tidy-mark:no
write-back:yes
fix-uri: yes
hide-comments: yes
bare:yes
markup: yes
clean: yes
wrap-attributes:no
wrap-script-literals: yes
wrap-sections: no
input-encoding: ascii
output-encoding: utf8
error-file: errors.txt
indent: auto
indent-spaces: 2
indent-cdata: no
show-warnings: yes
show-body-only: no
break-before-br: no
uppercase-tags: no
uppercase-attributes: no
char-encoding: utf8
vertical-space:no
output-xml: no
input-xml: no
output-html: no
output-xhtml: yes
add-xml-decl: yes
add-xml-space: yes
new-inline-tags: cfif, cfelse, math, mroot,
mrow, mi, mn, mo, msqrt, mfrac, msubsup, munderover,
munder, mover, mmultiscripts, msup, msub, mtext,
mprescripts, mtable, mtr, mtd, mth
#new-blocklevel-tags: cfoutput, cfquery
new-empty-tags: cfelse
Now, having your config file ready, it's time to run tidy to convert your html to nearly xml doc.
Fire up your cmd and type:
tidy -config tidyConfig.txt MyWebPage.html
Now your html oops, YOUR XML is ready . :)
Out of various uses of libtidy, this post contain one simple use of libtidy.
It convert your html page code, parse it , tidy it and finally produce an xml doc ready to be fed to your xml parser like libxml.
For that we either provide command line options to the tidy or we can write down all the command line option in a single file so that we only need to pass only one command line argument and that is you guess it right, it's config file path. :D
Config file used:
fix-bad-comments: yes
tidy-mark:no
write-back:yes
fix-uri: yes
hide-comments: yes
bare:yes
markup: yes
clean: yes
wrap-attributes:no
wrap-script-literals: yes
wrap-sections: no
input-encoding: ascii
output-encoding: utf8
error-file: errors.txt
indent: auto
indent-spaces: 2
indent-cdata: no
show-warnings: yes
show-body-only: no
break-before-br: no
uppercase-tags: no
uppercase-attributes: no
char-encoding: utf8
vertical-space:no
output-xml: no
input-xml: no
output-html: no
output-xhtml: yes
add-xml-decl: yes
add-xml-space: yes
new-inline-tags: cfif, cfelse, math, mroot,
mrow, mi, mn, mo, msqrt, mfrac, msubsup, munderover,
munder, mover, mmultiscripts, msup, msub, mtext,
mprescripts, mtable, mtr, mtd, mth
#new-blocklevel-tags: cfoutput, cfquery
new-empty-tags: cfelse
Now, having your config file ready, it's time to run tidy to convert your html to nearly xml doc.
Fire up your cmd and type:
tidy -config tidyConfig.txt MyWebPage.html
Now your html oops, YOUR XML is ready . :)
No comments:
Post a Comment