Output everything before the first slash '/' in a line

Question

I have a set of URLs and I'm really only interested in anything up to the first /.

How can I capture this info to a text file?

Input (foo.txt):

apple.com/nothing.js  
t1.msn.com/cookie=22  
happy.net/whatever

Output (redirected to file: foo_filter.txt):

apple.com/  
t1.msn.com/  
happy.net/

do you have URLs such as unix.stackexchange.com? and not something unix.stackexchange.com/questions for example. then what output you are expecting? skip those lines or print the URL? — αғsнιη, Feb 10 '23 at 04:02
and what about for URLs including https:// like https://unix.stackexchange.com/help/someone-answers? — αғsнιη, Feb 10 '23 at 04:13
@αғsнιη I don't think this is a duplicate. Here, the obvious pattern match is to be included in the result, which can be done trivially because it's a single character. In the other answer the obvious pattern match is to be excluded from the result. Not the same at all — Chris Davies, Feb 10 '23 at 07:22
@roaima these are same. there and also here OPs want to cut off upto specifc patterns (here up to first / string, there up to .com string) and in both they wanted that strings to be included in the result. so no doubt from me that these are exact duplicates. — αғsнιη, Feb 10 '23 at 07:31
@αғsнιη no, here they want the split character, there they do not. — Chris Davies, Feb 10 '23 at 07:44
@roaima there also they wanted the split string too (which that is .com). please read the Q there slowly then you will find it. in addition you can compare your answer with answers there. both Q and A are duplicates except you have an extra cut approach which that answer even is not what OP wanted here but later approaches (printing the split character) is — αғsнιη, Feb 10 '23 at 07:52

Chris Davies · Answer 1 · 2023-02-10T07:17:44.227

If you don't want the trailing slash, it's very straightforward

cut -d/ -f1 foo.txt
awk -F/ '{print $1}' foo.txt
sed 's!/.*!!' foo.txt

If you do want that trailing slash, then

awk -F/ '{print $1 "/"}' foo.txt
sed 's!/.*!/!' foo.txt

All of these will write to stdout (your screen) so you can see the result immediately. To redirect them to your target file, use >foo_filter.txt on the end of the command. For example,

awk -F/ '{print $1 "/"}' foo.txt >foo_filter.txt

score 1 · Answer 2 · edited Feb 10 '23 at 03:59

1

Using Miller:

mlr --nidx --ifs '/' -N cut -f 1 file

or using GNU datamash:

datamash dirname 1 <file

edited Feb 10 '23 at 03:59

αғsнιη

41,407

answered Feb 09 '23 at 21:57

Prabhjot Singh

1,925

score 1 · Answer 3 · answered Feb 09 '23 at 23:27

1

$ awk 'sub("/.*","/")' foo.txt
apple.com/
t1.msn.com/
happy.net/

answered Feb 09 '23 at 23:27

Ed Morton

31,617

score 0 · Answer 4 · edited Feb 10 '23 at 04:11

Your better option here is actually sed, because it edits the stream on a line by line basis.

Try the following:

sed 's/\/.*//' foo.txt > foo_filter.txt

This tells sed that - per line - replace anything after the / with nothing. You then redirect the output to the new file with the >. You can read more from the sed manual here.

Note: because sed is greedy, you might need to specify the first slash with a 1 at the end of the command:

sed 's/\/.*//1' foo.txt > foo_filter.txt

You definitely can use awk if you have strings with multiple slashes:

awk -F"/" '{print $1"/"}' foo.txt > foo_filter.txt

The -F"/" sets the field delimiter to forward slash, and '{print $1"/"}' prints the first field followed by a slash (since it's the field delimiter, it gets removed on print and has to be re-included).

print $1/ in awk would be a syntax error. Don't escape the / in your sed command, just pick a different delimiter like :. — Ed Morton, Feb 09 '23 at 23:28

score 0 · Answer 5 · answered Feb 09 '23 at 19:18

0

Here's an awk and a cut solution.

$ cut -f1 -d/ foo.txt
apple.com
t1.msn.com
happy.net
$ awk -F/ '{ print $1 }' foo.txt
apple.com
t1.msn.com
happy.net
$ awk -F/ '{ print $1"/" }' foo.txt
apple.com/
t1.msn.com/
happy.net/
$

answered Feb 09 '23 at 19:18

steve

21,892

Gilles Quénot · Answer 6 · 2023-02-09T19:26:19.397

0

With just grep:

$ grep -oE '^[^/]+/' foo.txt

Output:

apple.com/
t1.msn.com/
happy.net/

To fulfill all the requirements:

grep -oE '^[^/]+/' foo.txt | tee foo_filter.txt

edited Feb 09 '23 at 19:26

answered Feb 09 '23 at 19:19

Gilles Quénot

33,867

1

Thanks, time to learn some switches, apparently! – Scott Feb 09 '23 at 20:23
Advice to newcomers: If an answer solves your problem, please accept it by clicking the large check mark (✓) next to it and optionally also up-vote it (up-voting requires at least 15 reputation points). If you found other answers helpful, please up-vote them. Accepting and up-voting helps future readers. – Gilles Quénot Feb 09 '23 at 20:40

Output everything before the first slash '/' in a line

6 Answers6

With just grep:

Output:

To fulfill all the requirements: