25

I have to take a list (loads) of IP addresses in this format:

 134.27.128.0
 111.245.48.0
 109.21.244.0

and turn them into this format with a pipe in-between (IPs made up)

134.27.128.0 | 111.245.48.0 | 109.21.244.0 | 103.22.200.0/22

I think it is a find and replace command like sed but I can't get it to work.

jasonwryan
  • 73,126
  • 3
    You just want to translate newlines into | pipes? Like <ipfile tr \\n \| >outfile? – mikeserv Apr 01 '15 at 17:26
  • Is the space around | required? – cuonglm Apr 01 '15 at 17:28
  • yeah the space in between the pipes is required. Please remember I am clueless so lets say my doc is in gedit format and is called mydoc and I want to put it into a file called mydoc2. would I then use the command as above as. mydoc tr \n | >mydoc2 ??? – uselesslinuxman Apr 01 '15 at 17:32
  • 2
    @uselesslinuxman - no. You'd need the input redirect <. So <mydoc tr \\n \| >mydoc2. But that won't get you the spaces. For those, probably the quickest solution is paste -d' | ' mydoc /dev/null /dev/null >mydoc2 – mikeserv Apr 01 '15 at 17:55
  • @mikeserv: your paste won't work in this case. – cuonglm Apr 01 '15 at 18:15
  • @cuonglm - oh man. What'd I get wrong? – mikeserv Apr 01 '15 at 18:17
  • @mikeserv: You will get separated lines instead of all in one line. – cuonglm Apr 01 '15 at 18:19
  • 1
    @mikeserv: I don't think it will work. paste writes lines corresponding from each file. Without -s, you will get back number of lines you have in file. – cuonglm Apr 01 '15 at 18:27
  • If you do not need the space around |, this column ip | tr -s '\t' '|' produces 134.27.128.0|111.245.48.0|109.21.244.0. – Ivan Chau Apr 02 '15 at 07:50
  • Using your shell only while read -r ip; do printf '%s | ' "$ip"; done < file – Valentin Bajrami Apr 02 '15 at 10:43
  • 2
    @val0x00ff: I invite you to read http://unix.stackexchange.com/q/169716/38906 – cuonglm Apr 02 '15 at 18:20

9 Answers9

22

Using sed, based on Famous Sed One-Liners Explained, Part I:: 39. Append a line to the next if it ends with a backslash "\" (except here we ignore the part about the backslash, and replace the \n newlines with the required | separator):

sed -e :a -e '$!N; s/\n/ | /; ta' mydoc > mydoc2

should produce in mydoc2

134.27.128.0 |  111.245.48.0 |  109.21.244.0
steeldriver
  • 81,074
  • @don_crissti sorry that was a type - corrected, thanks – steeldriver Apr 01 '15 at 18:00
  • This doesn't actually work in practice, unfortunately. At least, not for unlimited streams. When you do this you have to swallow the whole of your input a line a time and cannot write even a single byte of it to output until you have digested it all - all of it transformed into a single line. It's unwieldy and prone to segfault. – mikeserv Apr 01 '15 at 18:20
  • A million IP's is <16M, you'd need an awfully big list to blow limits here. Using search for eof detection is more problematic, as is this'll run O(N^2) on the input file size. sed 'H;1h;$!d;x;s/\n/ | /g' is linear. – jthill Apr 01 '15 at 21:04
  • @jthill - POSIX only guarantees a sed pattern space of 8K; that's a whole lot less than 16M. – mikeserv Apr 01 '15 at 23:54
14

I was curious to see how some of these (+ some alternatives) work speed-wise with a rather large file (163MiB, one IP per line, ~ 13 million lines):

wc -l < iplist
13144256

Results (with sync; echo 3 > /proc/sys/vm/drop_caches after each command; I repeated the tests - in reverse order - after a couple of hours but the differences were negligible; also note that I am using gnu sed):

steeldriver:
Very slow. Aborted after two minutes of waiting... so no result for this one.

cuonglm:

awk 'FNR!=1{print l}{l=$0};END{ORS="";print l}' ORS=' | ' iplist

real    0m3.672s

perl -pe 's/\n/ | / unless eof' iplist

real    0m12.444s

mikeserv:

paste -d\  /dev/null iplist /dev/null | paste -sd\| - 

real    0m0.983s

jthill:

sed 'H;1h;$!d;x;s/\n/ | /g' iplist

real    0m4.903s

Avinash Raj:

time python2.7 -c'
import sys
with open(sys.argv[1]) as f:
    print " | ".join(line.strip() for line in f)' iplist

real    0m3.434s

and

val0x00ff:

while read -r ip; do printf '%s | ' "$ip"; done < iplist

real    3m4.321s

which means 184.321s. Unsurprisingly, this is 200 times slower than mikeserv's solution.


Here are some other ways with
awk:

awk '$1=$1' RS= OFS=' | ' iplist

real    0m4.543s

awk '{printf "%s%s",sep,$0,sep=" | "} END {print ""}' iplist

real    0m5.511s

perl:

perl -ple '$\=eof()?"\n":" | "' iplist

real    0m9.646s

xargs:

xargs <iplist printf ' | %s' | cut -c4-

real    0m6.326s

a combination of head+paste+tr+cat:

{ head -n -1 | paste -d' |' - /dev/null /dev/null | tr \\n \ ; cat ; } <iplist

real    0m0.991s

If you have GNU coreutils and if your list of IPs isn't really huge (let's say up to 50000 IPs) you could also do this with pr:

pr -$(wc -l infile) -tJS' | ' -W1000000 infile >outfile

where

-$(wc -l infile)         # no. of columns (= with no. of lines in your file)
-t                       # omit page headers and trailers
-J                       # merge lines
-S' | '                  # separate columns by STRING
-W1000000                # set page width

e.g. for a 6-lines file:

134.28.128.0
111.245.28.0
109.245.24.0
128.27.88.0
122.245.48.0
103.44.204.0

the command:

pr -$(wc -l <infile) -tJS' | ' -W1000 infile

outputs:

134.28.128.0 | 111.245.28.0 | 109.245.24.0 | 128.27.88.0 | 122.245.48.0 | 103.44.204.0
don_crissti
  • 82,805
  • don - could you also add in the suggestion in the question by @val0x00ff for the while ... read loop? I'm curious to see what 163k read() and write() calls translates to in a benchmark. Great answer, by the way. – mikeserv Apr 02 '15 at 17:54
  • 1
    @mikeserv - no problem, I'll do it (it will be really slow though). – don_crissti Apr 02 '15 at 17:58
  • That's a really cool link. I especially like that the author offers a link to a similar 6 year old benchmark there as well. Do you notice that sed seems to have improved its standing in that time (and had probably only a very few changes to its regexp engine) but grep seems to have dramatically fallen behind in its performance (especially for the longer lines)? I wonder if the perl additions to its engine have any bearing on those results... It's also neat that dash isn't abysmal. The bash here would likely be far slower w/ the common IFS= prepended. – mikeserv Apr 02 '15 at 18:18
  • hmm... that link is yet another strong indicator that I really need to buckle down and learn C so I can finally start using lex properly. – mikeserv Apr 02 '15 at 18:25
8

You can use awk:

awk 'FNR!=1{print l}{l=$0};END{ORS="";print l}' ORS=' | ' file > new_file

ORS=' | ' set the output record separator to ' | ' instead of newline.

or edit in-place with perl:

perl -pe 's/\n/ | / unless eof' file
cuonglm
  • 153,898
  • thanks man. I just learned how paste works. much appreciated. – mikeserv Apr 01 '15 at 19:25
  • @mikeserv: You're welcome. as don_crissti shown in his benchmark, the paste solution is the fastest one. – cuonglm Apr 02 '15 at 17:24
  • The output does not end with a newline. You might have to replace ORS="" inside the END block with ORS="\n" so that it does. – phk Jan 17 '17 at 19:55
6

one-liner with tr and sed:

cat file | tr '\n' '|' | sed 's/||$/\n/'
134.27.128.0|111.245.48.0|109.21.244.0
  • Why delete 2 trailing pipes? There will only be 2 at the end if the input ended with a blank line (two newlines). – JigglyNaga Jan 19 '17 at 20:50
5

So I had the whole thing wrong - and this question has taught me a lot about paste. As cuonglm correctly notes, unless you paste an in file in -serial, you'll always wind up w/ the last \newline from your infile list being appended to the output as it is written. I was mistaken in the belief that paste -s behavior was its default mode - and this is a misconception which, apparently busybox paste was happy to reinforce. The following command does work as advertised w/ busybox:

paste -d'|  ' - - infile </dev/null >outfile

It does not work according to spec, though. A correctly implemented paste would still append a trailing \newline for each sequence written. Still, that's no big deal after all:

paste -d\  - infile - </dev/null | paste -sd\| - >outfile
mikeserv
  • 58,310
  • @don_crissti - dangit. stupid tablet. I guess the obvious thing to do is two pastes. – mikeserv Apr 01 '15 at 18:57
  • 1
    Well, I had pr in mind but apparently it runs out of steam with huge input files so I couldn't actually test the speed but with reasonable length files it works OK. You solution is by far the fastest (no surprise - paste is really fast), see my post. – don_crissti Apr 02 '15 at 17:25
3

Utilize vim:

vim -n -u NONE -c '1,$-1s/\n/ | /g|wq!' data

Explanation:

-n disable swap file

-u NONE is used to skip all initializations.

-c {command} execute commands after file has been read.

1,$-1s/\n/ | /g is s/\n/ | /g (replace newline with space pipe space) for the range 1,$-1s (1st line to last line - 1)

wq! force write and quit


Note:

Depending on how big your file really is, this may be a bad idea.

FloHimself
  • 11,492
  • 1
    I thank you all, because basically nearly every one of these commands works for what I need to achieve. I know where to come now if (when) I am stuck again. Thanks – uselesslinuxman Apr 01 '15 at 22:42
3

For completeness sake, here is another awk-based solution, this one is not using the ORS at all:

awk 'BEGIN { ORS="" } { print p$0; p=" | " } END { print "\n" }' file > new_file

For an explanation see my post at https://unix.stackexchange.com/a/338121/117599.

phk
  • 5,953
  • 7
  • 42
  • 71
2

Through python.

$ python -c '
import sys
with open(sys.argv[1]) as f:
    print " | ".join(line.strip() for line in f)' file

spaces before print was very important.

Avinash Raj
  • 3,703
2

Here is another one using xxd

xxd -c1 -ps data | sed '$!s/0a/207c20/' | xxd -r -ps
FloHimself
  • 11,492