How do you configure tidy
to parse XML instead of HTML?
Explanation:
A while ago, a co-worker showed me a trick to use tidy
to clean up XML.
Apparently, you create a tidyrc
file like so:
input-xml: yes
quiet: yes
indent: yes
indent-attributes: yes
indent-spaces: 4
char-encoding: utf8
wrap: 0
wrap-asp: no
wrap-jste: no
wrap-php: no
wrap-sections: no
Even after adding this to ~/.tidyrc
, tidy
is still attempting to parse as the default HTML, and not XML:
$ cat -v foo.out | tidy > foo.xml
line 3 column 1 - Error: <data> is not recognized!
line 3 column 1 - Warning: missing <!DOCTYPE> declaration
line 3 column 1 - Warning: discarding unexpected <data>
I've tried various permissions:
[root@mongo-test3 tmp]# ls -ial ~
51562 -rw------- 1 root root 11550 Jul 16 02:17 .bash_history
50973 -rw-r--r-- 1 root root 18 May 1 00:40 .bash_logout
51538 -rw-r--r-- 1 root root 176 May 1 00:40 .bash_profile
51537 -rw-r--r-- 1 root root 124 May 1 00:40 .bashrc
51561 -rwxr-xr-x 1 root root 164 Jul 16 22:16 .tidyrc
I've tried naming the file .tidyrc
and just tidyrc
Versions:
I've tried this on both MacOS and Cent 6.4
Mac OSX 10.8.4
Darwin spuders-macbook-pro 12.4.0 Darwin Kernel Version 12.4.0: Wed May 1 17:57:12 PDT 2013; root:xnu-2050.24.15~1/RELEASE_X86_64 x86_64
CentOS 6.4
Linux mongo-test3 2.6.32-279.22.1.el6.x86_64 #1 SMP Wed Feb 6 03:10:46 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
Research:
Normally I would ask the person who taught me this trick, but they are incommunicable.
Workaround:
As a work around, I can use the -xml
flag, but I would prefer to get the tidyrc
working:
$ cat -v foo.out | tidy -xml foo.xml