2

How can I sort this XML?

  1. first by alphabetical element: module before property.
  2. then by the alphabetical name attribute: <module name="ClassTypeParameterName"/> before <module name="PackageName"/>.
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE module PUBLIC "-//Checkstyle//DTD Checkstyle Configuration 1.3//EN" "https://checkstyle.org/dtds/configuration_1_3.dtd">
<module name="Checker">
  <property name="severity" value="error"/>
  <property name="fileExtensions" value="java"/>
  <module name="NewlineAtEndOfFile"/>
  <module name="FileTabCharacter"/>
  <module name="TreeWalker">
    <module name="PackageName"/>
    <module name="ClassTypeParameterName"/>
    <module name="InterfaceTypeParameterName"/>
    <module name="MethodTypeParameterName"/>
    <module name="LambdaParameterName"/>
    <module name="PatternVariableName"/>
    <module name="RecordComponentName"/>
    <module name="RecordTypeParameterName"/>
    <module name="TypeName">
      <property name="format" value="^[A-Z][_a-zA-Z0-9]*$"/>
    </module>
    <module name="AvoidDoubleBraceInitialization"/>
    <module name="AvoidNoArgumentSuperConstructorCall"/>
    <module name="OneTopLevelClass"/>
    <module name="OuterTypeFilename"/>
  </module>
</module>

I'd like to use xq as used here: Sorting an XML file in UNIX with a Bash script?

caduceus
  • 145

2 Answers2

3

You may want something like this:

xq -x -S 'walk(if type == "array" then sort_by(."@name") else . end)' file

This uses -S (or --sort-keys) to sort the keys (XML tags) using their names so that the module keys come before the property keys.

It then uses the recursive walk() function to apply sort_by() to each array, sorting the elements of each array based on the value of the name attribute (written ."@name").

This walk() usage is almost identical to an example in the jq manual.

This would produce the following output:

<module name="Checker">
  <module name="FileTabCharacter"></module>
  <module name="NewlineAtEndOfFile"></module>
  <module name="TreeWalker">
    <module name="AvoidDoubleBraceInitialization"></module>
    <module name="AvoidNoArgumentSuperConstructorCall"></module>
    <module name="ClassTypeParameterName"></module>
    <module name="InterfaceTypeParameterName"></module>
    <module name="LambdaParameterName"></module>
    <module name="MethodTypeParameterName"></module>
    <module name="OneTopLevelClass"></module>
    <module name="OuterTypeFilename"></module>
    <module name="PackageName"></module>
    <module name="PatternVariableName"></module>
    <module name="RecordComponentName"></module>
    <module name="RecordTypeParameterName"></module>
    <module name="TypeName">
      <property name="format" value="^[A-Z][_a-zA-Z0-9]*$"></property>
    </module>
  </module>
  <property name="fileExtensions" value="java"></property>
  <property name="severity" value="error"></property>
</module>

Note that xq writes out the end tags explicitly, even for empty nodes. If you want to fix that (so that <tag attr="..."></tag> is changed to <tag attr="..."/>), pass the result through xmlstarlet fo or xmlstarlet format.


As a reference, the JSON document that the original XML document is translated into (with no sorting whatsoever) and to which the jq expression is applied is the equivalent of the following:

{
   "module": {
      "@name": "Checker",
      "module": [
         { "@name": "NewlineAtEndOfFile" },
         { "@name": "FileTabCharacter" },
         {
            "@name": "TreeWalker",
            "module": [
               { "@name": "PackageName" },
               { "@name": "ClassTypeParameterName" },
               { "@name": "InterfaceTypeParameterName" },
               { "@name": "MethodTypeParameterName" },
               { "@name": "LambdaParameterName" },
               { "@name": "PatternVariableName" },
               { "@name": "RecordComponentName" },
               { "@name": "RecordTypeParameterName" },
               {
                  "@name": "TypeName",
                  "property": { "@name": "format", "@value": "^[A-Z][_a-zA-Z0-9]*$"
               },
               { "@name": "AvoidDoubleBraceInitialization" },
               { "@name": "AvoidNoArgumentSuperConstructorCall" },
               { "@name": "OneTopLevelClass" },
               { "@name": "OuterTypeFilename" }
            ]
         }
      ],
      "property": [
         { "@name": "severity", "@value": "error" },
         { "@name": "fileExtensions", "@value": "java" }
      ]
   }
}
Kusalananda
  • 333,661
0

I had a very similar issue to the OP (with the added problem of the xml containing customer data, ruling out online tools) and first went down the route of using xq. I had some decent success specifically with this script here as a starting point: https://unix.stackexchange.com/a/659245/367314.

In the end however I found a nice plugin for vscode which allows you to sort blocks of code allowing you to configure depth too. Thought I'd post here in case it helps anyone else.

https://marketplace.visualstudio.com/items?itemName=1nVitr0.blocksort

demo of the blocksort plugin

Should you want to sort only certain blocks in a file, you can select those blocks in the UI and just sort the ones you're interested in or smartly sort the entire document.

Not a good solution if you're trying to automate things, but is good for one-offs.