1

The sample XML looks like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<root>
  <level01>
    <field01>AAAAAAAAAAAAAAAAAAAA</field01>
    <field02>BBBBBBBB</field02>
    <field03>CCCCCCCCCCCCCCCCCCCC</field03>
    <field04>DDDDDDDDDDDDDDDDDDDD</field04>
    <field05>DDD</field05>
    <level02>
      <field01>EEEEEEEEE</field01>
      <field02>FFF</field02>
      <field04>GGGGGGGGGGs</field04>
      <field05>HHH</field05>
      <level03>
        <field01>IIIIIIIII</field01>
        <field02>JJJ</field02>
        <field04>KKKKKKKKK</field04>
        <field05>L</field05>
      </level03>
    </level02>
  </level01>
</root>

The desired output looks like:

AAAAAA,BBBBB, CCCCCCCCCCCCC ,DDDDDDDDDD ,DDD,EEE,FFF,GGGG,HHH,III,JJJ,KKK,L
Chris Davies
  • 116,213
  • 16
  • 160
  • 287
sdfsdf
  • 11
  • How important are the spaces in the comma-separated fields of your output? I don't see those spaces in your input file, and they're not consistent.. – Chris Davies Jul 07 '16 at 10:30
  • Does the solution have to use xmlstarlet or can any suitable tool be used? – Chris Davies Jul 07 '16 at 10:31
  • and why are some fields truncated in the output to 3 or 4 characters while others are full-length? – cas Jul 07 '16 at 10:54

2 Answers2

2

xmlstartlet arguments are a little bit tricky. You have to see them as templtes (-t) in the xsl way...

xmlstarlet sel -B -t -m '//text()' -c 'concat(.,",")' x1.xml

where:

  • -B : generically remove spaces
  • -t : template in the xsl sense
  • -m : match xpath exp
  • -c : copy-of xpath exp

This expression, produces an extra ",". Naturally we can uses normal Unix tools to help:

xmlstarlet sel -B -t -v '//text()' x1.xml | 
    sed -z 's/\n/, /g; s/$/\n/'
  • -t : a template (in xsl sense)
  • -v : value-of (xpath expression)
  • sed... to trim ,
JJoao
  • 12,170
  • 1
  • 23
  • 45
1

Using xml2 (available packaged for debian and most other distros) instead of xml2starlet, along with awk and paste:

$ xml2 <sdfsdf.xml | awk -F= '{ print $2 }' | paste -sd,
AAAAAAAAAAAAAAAAAAAA,BBBBBBBB,CCCCCCCCCCCCCCCCCCCC,DDDDDDDDDDDDDDDDDDDD,DDD,EEEEEEEEE,FFF,GGGGGGGGGGs,HHH,IIIIIIIII,JJJ,KKKKKKKKK,L

if you want spaces after each comma, add them with sed:

xml2 <sdfsdf.xml | awk -F= '{ print $2 }' | paste -sd, | sed -e 's/,/, /g'

cut can also work in place of awk but I'm guessing there's other criteria you haven't mentioned yet, so I'll stick with awk for now. Anyway, here's the cut version:

xml2 <sdfsdf.xml | cut -d= -f2 | paste -sd,
cas
  • 78,579