1

I'm new in linux (shell). I need to decode base64 text in xml file using linux shell script. Could you please help me to write linux shell script for decoding the values of those tags where attribute is encoding="base64" the structure of my file is

    <directory-entries>
        <entry dn="ads">
        <attr name="memberof">
        <value>CN=VPN-employee</value>
        <value encoding="base64">aGVsbG8gd29ybGQ=   </value>
<value encoding="base64">
Q049RmxvcHB5IC0g0LTQvtGB0YLRg9C/INC30LDQutGA0YvRgixPVT1EZXZpY2UgQ29udHJv
bCxPVT1Hcm91cHMsT1U90JHQkNCd0JosREM9aHEsREM9YmM=
    </value>
    <value encoding="base64">
Q049VVNCLdC00LjRgdC60LggLSDRgtC+0LvRjNC60L4g0YfRgtC10L3QuNC1LE9VPURldmlj
ZSBDb250cm9sLE9VPUdyb3VwcyxPVT3QkdCQ0J3QmixEQz1ocSxEQz1iYw==
    </value>
    </attr>
    </entry>
    </directory-entries>

The wanted output is

    <directory-entries>
        <entry dn="ads">
        <attr name="memberof">
        <value>CN=VPN-employee</value>
        <value encoding="base64">Hello world  </value>
       <value encoding="base64"> decoded         </value>
       <value encoding="base64">    decoded         </value>
    </attr>
    </entry>
    </directory-entries>

I'm generating XML from Active Directory using ldapsearch. The script that I used to obtain this file is:

ldapsearch -h host -p 389 -D "CN=informatica,OU=Accounts for System Purposes,OU=System Accounts,DC=hq,DC=bc" -w password -s sub -B -E UTF-8 -X "(&(objectClass=organizationalPerson)(CN=*))" employeeID memberof > ldap_logins.xml

I don't know if it is possible to decode the text while generating the xml file. Thank you in advance!

  • I don't have a complete answer, but a couple of hints. On the ldapsearch side, you can use the -t option to output "non-printable" text to temporary files rather than Base64-encoded values. If you want to parse XML, check out XMLStarlet. Also, does the output need to be valid XML? Shouldn't the "encoded" attribute be dropped from the output? – Stephen Kitt May 20 '15 at 08:02
  • Thank you for feedback. Yes, the output should be valid XML. I need decoded value, the attribute itself can be dropped from the output – Meruyert May 20 '15 at 08:31
  • @Meruyert I've provided a proper answer using an xml parser called xmlstarlet. Just check it, if it helps. – shivams May 22 '15 at 02:57

4 Answers4

1

Compact Script

Assuming the xml is in file.xml, just do:

sed -r 's/("base64">)([[:graph:]]+)/\1'"`grep -oP '"base64">\K[[:graph:]]+' file.xml | base64 -d`"'/g' file.xml 

This is a compact regex, which will do the task. Let me break it down and explain.

Break Down

First I select the base64 string using grep and decode it:

grep -oP '"base64">\K[[:graph:]]+' file.xml | base64 -d

I could save this in a variable:

baseString=`grep -oP '"base64">\K[[:graph:]]+' file.xml | base64 -d`

Then use sed to replace the base64 with the decoded string saved in the variable:

sed -r 's/("base64">)([[:graph:]]+)/\1'"$baseString"'/g' file.xml
shivams
  • 4,565
  • Thank you for your answer! The script works for cases where values do not have line breaks. I have line breaks in values. I've updated the structure of the file in the question, added more examples. Do you have any ideas how to deal with those line breaks? – Meruyert May 20 '15 at 10:57
  • Oh! Multi-line regex is very tricky using bash. For such cases, it is better advised to go for some proper xml parser. However, I will provide some solution using regex. Wait. – shivams May 20 '15 at 11:29
  • I know this is old. I want to use the sed command, but it says "test" is not defined. Do you remember how it was defined? – elysch Aug 30 '18 at 00:12
  • @elysch: test is not a command here. I used it to denote the file-name. I should have used file.xml instead. I am correcting it. – shivams Aug 30 '18 at 00:19
  • I tried that but I get an error sed: -e expression #1, char 297: unknown option tos'`. Don't know how to find which value is causing problems – elysch Aug 30 '18 at 00:22
  • Annother question: How would it know how to "select" each base64 string in the right place? Testing the grep command on its own, it shows all the base64 strings, not just one – elysch Aug 30 '18 at 00:29
  • I added a specific question here – elysch Aug 30 '18 at 00:56
1

I'll say what I always do. Please NEVER use regular expressions to parse XML. It's bad news. XML has some various formatting which means semantically identical XML will match or not match certain regular expressions. Simple things like line wrapping, unary tags, etc.

This means you create brittle code, which one day might mysteriously break because of an upstream and perfectly valid change to your data flow.

For parsing your XML I would suggest using perl and the quite excellent XML::Twig module.

#!/usr/bin/perl
use strict;
use warnings;

use XML::Twig;
use MIME::Base64;

#we take a "value" element, check it for an "encoding=base64" and if it is
#we rewrite the content and delete that attribute in the XML. 
sub decode_value {
    my ( $twig, $value ) = @_;
    if (    $value->att('encoding')
        and $value->att('encoding') eq "base64" )
    {
        my $decoded_text = decode_base64( $value->text );
        if ( $decoded_text =~ m/[^\s\d\w\=\-\,\.]/ ) {
            $decoded_text = "decoded";
        }
        $value->set_text($decoded_text);
        $value -> del_att('encoding');

    }
}


#twig handlers 'fires' a piece of code each time you hit a 'value' element. 
#it passes this piece of code that chunk of XML to handle, which means
#you can do things like dynamic XML rewrites 
#pretty print controls output XML rendering - there's a variety of options
#check the manpage. 
my $twig = XML::Twig->new(
    pretty_print  => "indented",
    twig_handlers => { 'value' => \&decode_value, }
);
$twig->parsefile('your_xml_file');
$twig->print;

This will give:

<directory-entries>
  <entry dn="ads">
    <attr name="memberof">
      <value>CN=VPN-employee</value>
      <value encoding="base64">hello world</value>
      <value encoding="base64">decoded</value>
      <value encoding="base64">decoded</value>
    </attr>
  </entry>
</directory-entries>

You could alternatively transform $decoded_text like this:

$decoded_text =~ s/[^\s\d\w=,-. ]+/_/g;

(URI::Escape module is worth a look here too, as it 'percent encodes' text URL style. )

Which would give instead:

  <value encoding="base64">CN=Floppy - _ _,OU=Device Control,OU=Groups,OU=_,DC=hq,DC=bc</value>
  <value encoding="base64">CN=USB-_ - _ _,OU=Device Control,OU=Groups,OU=_,DC=hq,DC=bc</value>

But you might also find using Net::LDAP does what you need.

#!/usr/bin/perl
use strict;
use warnings;

use Net::LDAP;

my $ldap   = Net::LDAP->new('host');
my $result = $ldap->bind(
    'CN=informatica,OU=Accounts for System Purposes,OU=System Accounts,DC=hq,DC=bc',
    'password'
);
if ( $result->code ) { die "Error connecting to LDAP server"; }

my $ldap_search = $ldap->search(
    base   => 'DC=hq,DC=bc',
    scope  => 'subtree',
    filter => '(&(objectClass=organizationalPerson)(CN=*))',
    attrs  => [ 'employeeID', 'memberOf' ],
);

foreach my $entry ( $ldap_search->entries ) {
    print "dn:\t", $entry->dn(), "\n";
    foreach my $attr ( $entry->attributes ) {
        print "$attr:";
        foreach my $value ( $entry->get_value($attr) ) {
            next unless defined $value;
            if ( $value =~ m/[^\s\d\w,-=+@\'.()]/ ) { $value = "binary_data" }
            chomp($value);
            print "\t$value\n";
        }
    }
}
Sobrique
  • 4,424
  • Yes. Using an xml parser is always the only sane option. @Meruyert please use this solution (if it works fine) , rather than going for my regex based solution. – shivams May 20 '15 at 14:57
  • It is unclear which language you are using. @Sobrique. – shivams May 20 '15 at 15:00
  • Wow, that's impressive on my part. Amended answer to indicate that I do mean perl here ;) – Sobrique May 20 '15 at 15:02
  • Sorry for my ignorance. But I am really a new kid. Born in the era of Python, rather than Perl. Done a lot of bash but never touched Perl :/ Perhaps, I should be ashamed :| – shivams May 20 '15 at 15:03
  • Hardly. Perl and Python have very similar use cases. I'm crusty enough to pre-date python, and learned perl back when it was really the only option for extending shell scripting. Still like it though, not least because it remains pretty similar to shell, and very widely supported. – Sobrique May 20 '15 at 15:45
0

Here is a proper answer using xmlstarlet. This is a tool used for xml parsing and editing. First of all, install this package on your system. If you're on a Debian-based system, then do:

sudo apt-get install xmlstarlet

Now,

  1. first we read the value of base64 encoded string
  2. then we decode this string
  3. then we modify the corresponding tag value

Here is the complete script for that:

#!/bin/bash

for i in $(seq 3)
do
    #Find the string and decoded it and save it in a variable
    decodedString=`xmlstarlet sel -t -v "/directory-entries/entry/attr/value[@encoding='base64'][$i]" file.xml | tr -d \r\n[:space:] | base64 -d`

    #Now modify the xml document
    xmlstarlet ed -L -u "/directory-entries/entry/attr/value[@encoding='base64'][$i]" -v "$decodedString" file.xml
done

I have done it for a loop of 3. You do it for whatever number of elements you have.

shivams
  • 4,565
0

Using xq from https://kislyuk.github.io/yq/:

xq -x '( .. | select(type == "object" and ."@encoding" == "base64")."#text" )
    |= ( gsub("\n"; "") | @base64d )' file.xml

This walks through the whole document recursively and finds any node that has an encoding attribute with the value base64. For each such node, it takes the node's value, removes all newlines (using gsub()), and decodes the base64 string using the @base64d operator. The decoded value replaces the original base64 data.

Given the document in the question, the output will be

<directory-entries>
  <entry dn="ads">
    <attr name="memberof">
      <value>CN=VPN-employee</value>
      <value encoding="base64">hello world</value>
      <value encoding="base64">CN=Floppy - доступ закрыт,OU=Device Control,OU=Groups,OU=БАНК,DC=hq,DC=bc</value>
      <value encoding="base64">CN=USB-диски - только чтение,OU=Device Control,OU=Groups,OU=БАНК,DC=hq,DC=bc</value>
    </attr>
  </entry>
</directory-entries>

The xq tool also has an option, -i or --in-place, to do "in-place" editing.

Kusalananda
  • 333,661