-2

I have some addresses.csv in this formats

Street 1
Street 10
Street 100
Street 1000
Straße 1b
Straße1b
Street 1 B
Street, 1B
The Street 1B
The-Street 1B
The'Street 1B
The&Street 1B
The Str. 1B
Street 1-3
Street 1 - 3
Street 1A-3B
Street 1A -3 B
Super's Street-Str., 1 - 1000B

Is there a way to seperate/extract all street names and street numbers?

output-names.csv

Street
Street
Street
Street
Straße
Straße
Street
Street
The Street
The-Street
The'Street
The&Street
The Str.
Street
Street
Street
Street
Super's Street-Str.

output-numbers.csv

1
10
100
1000
1b
1b
1 B
1B
1B
1B
1B
1B
1B
1-3
1 - 3
1A-3B
1A -3 B
1 - 1000B

I have found a solution that I want to share here:

R 9000
  • 167
  • 2
    Please don't post the same question twice. Once is quite enough – Chris Davies Mar 02 '23 at 23:43
  • I think this question is different than my other question. – R 9000 Mar 02 '23 at 23:52
  • 1
    No, it's not, and the answer to it is the same answer you already have. – Ed Morton Mar 03 '23 at 00:12
  • @EdMorton 1) It's definitely a different question!... Here I asked for another address-format (only the regular German addresses) than in the other post.. 2) Yeah, maybe your answer solves this question and my other question.. but the questions are still different. 3) syntax error -> so I can't know if your answer is working or not. – R 9000 Mar 03 '23 at 00:43
  • 1
    No, they're not different - you have some set of street addresses in some sort of formats and want to extract the numbers and names in both questions. What the specific formats are is completely irrelevant - it's some set of street addresses and all that changes is the regexp(s) to match them. You're getting a syntax error because you didn't use the awk version I said was required. – Ed Morton Mar 03 '23 at 01:02
  • Even with GNU awk your solution is not working for all the addresses.csv here. – R 9000 Mar 03 '23 at 01:36
  • 1
    As I said in my answer - I wrote a couple of the regexps for you, you'll have to write the rest. Just take the regexp you have here and use it in match() as I show in my answer if you like and you think that regexp you wrote is adequate for the set of addresses you want to match. – Ed Morton Mar 03 '23 at 01:44

1 Answers1

-1

My solution is

  1. checking if the address-format is valid
if [[ ${var_street_and_number} =~ ^[[:alpha:][:space:]\.\'\&\-]+[,]?[[:space:]]?[0-9]{1,4}[[:space:]]?[a-zA-Z]?[[:space:]]?[-]?[[:space:]]?[0-9]{0,4}[[:space:]]?[a-zA-Z]?$ ]];
    then
    echo "Adress is format is valid :)";
else
    echo "Adress is format is invalid \!";
fi;

The variable var_street_and_number should be one line with the street name + number

If you have a file with many streets and numbers (= many lines) you can use :

while read line; do
    if [[ ${line} =...
done < addresses.csv
  1. If the adress format is valid you can use sed
sed 's/[,]\{0,1\}[[:space:]]\{0,1\}[[:digit:]].*$//' address.csv > output-name.csv
sed 's/^[^[:digit:]]*//' address.csv > output-number.csv

R 9000
  • 167
  • The number is not correct if the street name has a number in its name like "Straße des 17. Juni 135" :) – Freddy Mar 02 '23 at 23:40
  • I know.. there are also streets like "Straße 5 Nr. 6" or "F5 12" in 68159 Mannheim... but my answer is to all the address-formats that I listet above. (Not for all address-formats in Germany) – R 9000 Mar 02 '23 at 23:54
  • 1
    This is just a slower, buggier, more fragile, less flexible, less maintainable implementation of the answer I gave you at https://unix.stackexchange.com/a/738460/133219. If you only want to test for 1 regexp, then just provide that 1 regexp within the awk script I showed you, and no matter how many regexps you have, don't additionally use sed to try to carve out parts of them. – Ed Morton Mar 03 '23 at 00:06
  • For me is this solution working (and I am using it now). Your answer is not working for me. So this is clearly the better solution. – R 9000 Mar 03 '23 at 00:47
  • Your solution will take orders of magnitude longer to run than the one I provided, requires more code and more complicated code to write (once you show the full script including the read loop), and will fail in various ways given various inputs so - no, it is not clearly the better solution. My answer would work for you if you use GNU awk as I said is required and write/use a regexp that matches the input you show in this question as I also said is required (use the regexp you have in this answer if you're happy with it - just put parens around the name and number parts you want to isolate). – Ed Morton Mar 03 '23 at 14:10