1

I have a data file (see sample three lines below) separated by pipes.

What I want to be able to do is to "chomp" out the fields and put them into variables. They are of pre-defined lengths. I want to turn the file into an SQL script as per:

Input:

|          416|CAWNBORE LIMITED                                                                                              |CAWNBORE                 |     8|            0|     0|00.00     |            0|            0|********NO ADDRESS DETAILS*******                 |********NO ADDRESS DETAILS*******                 |********NO ADDRESS DETAILS*******                 |********NO ADDRESS DETAILS*******                 |            0|            0|            0|            0|            0|
|          431|MAIN HOLDINGS LIMITED                                                                                         |MAINHOLDINGSCHA          |     8|            0|     0|00.00     |            0|            0|********NO ADDRESS DETAILS*******                 |********NO ADDRESS DETAILS*******                 |********NO ADDRESS DETAILS*******                 |********NO ADDRESS DETAILS*******                 |            0|     19650509|            0|            0|            0|
|          432|DUBLIN NORTH CITY MILLING COMPANY LIMITED                                                                     |DUBLINNORTHCITY          |     8|            0|     1|00.00     |     18750125|     19830124|113 PHIBSBORO ROAD                                |DUBLIN                                            |                                                  |                                                  |       216410|     19901106|            0|     20030124|            0|

Basically, strip the pipes - separate the data with commas and insert the SQL string INSERT INTO .... VALUES( data_in_here comma separated );

Desired output:

INSERT INTO tbcrocompany (id_company, nm_company, id_keyword, cd_status, dt_company_status, cd_type, cd_principle_obj, dt_register, dt_last_ar, ad_line_1, ad_line_2, ad_line_3, ad_line_4, cd_town_number, dt_dissolved, dt_bond_expiry, dt_next_ar, dt_last_accounts) VALUES (416,'CAWNBORE LI|MITED','CAWNBORE',8,0,0, '00.00', 0, 0, '********NO ADDRESS DETAILS*******', '********NO ADDRESS DETAILS*******', '********NO ADDRESS DETAILS*******', '********NO ADDRESS DETAILS*******', 0, 0, 0, 0, 0);
INSERT INTO tbcrocompany (id_company, nm_company, id_keyword, cd_status, dt_company_status, cd_type, cd_principle_obj, dt_register, dt_last_ar, ad_line_1, ad_line_2, ad_line_3, ad_line_4, cd_town_number, dt_dissolved, dt_bond_expiry, dt_next_ar, dt_last_accounts) VALUES (431,'MAIN HOLDIN|GS LIMITED','MAINHOLDINGSCHA',8,0,0, '00.00', 0, 0, '********NO ADDRESS DETAILS*******', '********NO ADDRESS DETAILS*******', '********NO ADDRESS DETAILS*******', '********NO ADDRESS DETAILS*******', 0, 19650509, 0, 0, 0);
INSERT INTO tbcrocompany (id_company, nm_company, id_keyword, cd_status, dt_company_status, cd_type, cd_principle_obj, dt_register, dt_last_ar, ad_line_1, ad_line_2, ad_line_3, ad_line_4, cd_town_number, dt_dissolved, dt_bond_expiry, dt_next_ar, dt_last_accounts) VALUES (432,'DUBLIN NORTH CITY MILLING COMPANY LIMITED','DUBLINNORTHCITY',8,0,1, '00.00', 18750125, 19830124, '113 PHIBSBORO ROAD', 'DUBLIN', '', '', 216410, 19901106, 0, 20030124, 0);
Vérace
  • 561
  • You are setting the field separator as , but there are no commas in your sample data. Please clarify exactly what you are trying to do... – jasonwryan Mar 24 '16 at 21:09
  • if you change while IFS="," to while IFS="|" your script snippet should work. Although '$c5' will output $c5 literally, not the value of variable c5 – MelBurslan Mar 24 '16 at 21:11
  • 2
  • 2
    Seriously, use awk. You will be able to do this job much, much easier because awk is actually designed for text processing. bash is not. – Wildcard Mar 24 '16 at 21:19
  • @Vérace, I'd be more than happy to. Could you please edit your post to include an example input and example output? It doesn't have to be real data, but realistic — with xxxxxxxx I can't tell which part you want to end up where. (Use names of fruits or "example.com" or whatever.) – Wildcard Mar 24 '16 at 21:23
  • 3
    @MelBurslan, by the way, echo "Some text and '$c5'" will correctly expand $c5. The single quotes aren't special within double quotes. – Wildcard Mar 24 '16 at 21:28
  • @Windcard. Sorry but I disagree. Bash is designed for text processing - a shell script is a text file. – fpmurphy Mar 25 '16 at 00:41
  • 1
    @fpmurphy1, by that logic, gcc is designed for text processing, because C++ source code is stored in a text file. Look at the post I linked to earlier; it has a great discussion of the conceptual purpose of a shell. – Wildcard Mar 25 '16 at 00:42
  • 1
    @Wildcard Yes, it's marvellous what one can do with awk, but honestly, the syntax makes complex SQL look like a children's story book :-) - same goes for sed. Having said that Bruce Momjian has written an SQL in shell - see here (also here). Apparently awk is involved. – Vérace Mar 25 '16 at 01:02

2 Answers2

4

Assuming your data is in a file data, this should work for reasonable input (e.g. you don't have | or newline characters in your data):

sed -e 's/^ *| *//' -e 's/ *$//' -e 's/ *| */|/g' data |
    while IFS='|' read -r f1 f2 f3 f4 f5 f6
    do
        # INSERT INTO mt (F1, F2, F3, F4, F5, F6) VALUES ( 16524,01,'10/17/2012','3930621977','XXNPUES        ', 'S1');
        echo "INSERT INTO mt (F1, F2, F3, F4, F5, F6) VALUES ($f1,$f2,'$f3','$f4','$f5','$f6');"
    done

The first line trims leading and trailing spaces from the data inputs, and discards the initial |, so that | 416|CABlah |Somewhere else | becomes 416|CABlah|Somewhere else|. The expression(s) can be modified or removed entirely if this isn't appropriate for your use case.

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
Chris Davies
  • 116,213
  • 16
  • 160
  • 287
  • For those wondering why this output doesn't match the asker's desired output, it's because the question got edited (again). – Chris Davies Mar 25 '16 at 08:28
1

Here's a perl script which does the job. It can easily be modified to work for other tables just by changing the $tablename string as well as the @fields and @types arrays.

This script would be significantly shorter if the input file had the field names in the first line, as is common with many CSV files. The @types array would still need to be manually created, otherwise there's no way for the script to know which fields are strings and which are numbers. OTOH, if the input file had such a first line, the script could be written to use either the Text::CSV or DBD::CSV module.

It gets its input from any/all filenames specified on the command line and/or from stdin.

#! /usr/bin/perl

use strict;

my $tablename = 'tbcrocompany';

# field names array.  This will be used later to construct
# the %fields hash, and to print the field names in the 
# INSERT statement.
my @fields = qw(id_company nm_company id_keyword cd_status
    dt_company_status cd_type cd_principle_obj dt_register
    dt_last_ar ad_line_1 ad_line_2 ad_line_3 ad_line_4
    cd_town_number dt_dissolved dt_bond_expiry dt_next_ar
    dt_last_accounts);

# fields can be string (s) or number (n)
my @types = qw(n s s n n n n n n s s s s n n n n n);

# initialise the field types hash.
my %types=();
my $fieldnum=0;
foreach my $f (@fields) {
    $types{$f} = $types[$fieldnum++];
}

while (<>) {
    my %fields=();

    # remove leading '|' from line
    s/^\|//;

    # split each input line by '|'
    my @row = split(/\|/);

    # add each element of @row to the appropriate record in the %fields hash
    my $fieldnum=0;
    foreach my $f (@fields) {
        $fields{$f} = $row[$fieldnum++];
        # strip leading and trailing spaces from field value
        $fields{$f} =~ s/^\s*|\s*$//g;
    }

    # construct the VALUES( ... ) section of the output line,
    # with quotes for string values but not for numbers.
    my @values = ();
    foreach my $f (@fields) {
        if ($types{$f} eq 's') {
            push @values, "'$fields{$f}'"
        } else {
            push @values, $fields{$f}
        };
    }

    # and, finally, output the line.
    print "INSERT INTO $tablename (", join(', ',@fields),
          ") VALUES (", join(", ",@values), ";\n";
    }
cas
  • 78,579