-1

I have a set of URLs and I'm really only interested in anything up to the first /.

How can I capture this info to a text file?

Input (foo.txt):

apple.com/nothing.js  
t1.msn.com/cookie=22  
happy.net/whatever

Output (redirected to file: foo_filter.txt):

apple.com/  
t1.msn.com/  
happy.net/  
αғsнιη
  • 41,407
Scott
  • 1
  • do you have URLs such as unix.stackexchange.com? and not something unix.stackexchange.com/questions for example. then what output you are expecting? skip those lines or print the URL? – αғsнιη Feb 10 '23 at 04:02
  • and what about for URLs including https:// like https://unix.stackexchange.com/help/someone-answers? – αғsнιη Feb 10 '23 at 04:13
  • @αғsнιη I don't think this is a duplicate. Here, the obvious pattern match is to be included in the result, which can be done trivially because it's a single character. In the other answer the obvious pattern match is to be excluded from the result. Not the same at all – Chris Davies Feb 10 '23 at 07:22
  • 1
    @roaima these are same. there and also here OPs want to cut off upto specifc patterns (here up to first / string, there up to .com string) and in both they wanted that strings to be included in the result. so no doubt from me that these are exact duplicates. – αғsнιη Feb 10 '23 at 07:31
  • @αғsнιη no, here they want the split character, there they do not. – Chris Davies Feb 10 '23 at 07:44
  • 1
    @roaima there also they wanted the split string too (which that is .com). please read the Q there slowly then you will find it. in addition you can compare your answer with answers there. both Q and A are duplicates except you have an extra cut approach which that answer even is not what OP wanted here but later approaches (printing the split character) is – αғsнιη Feb 10 '23 at 07:52

6 Answers6

1

If you don't want the trailing slash, it's very straightforward

cut -d/ -f1 foo.txt
awk -F/ '{print $1}' foo.txt
sed 's!/.*!!' foo.txt

If you do want that trailing slash, then

awk -F/ '{print $1 "/"}' foo.txt
sed 's!/.*!/!' foo.txt

All of these will write to stdout (your screen) so you can see the result immediately. To redirect them to your target file, use >foo_filter.txt on the end of the command. For example,

awk -F/ '{print $1 "/"}' foo.txt >foo_filter.txt
Chris Davies
  • 116,213
  • 16
  • 160
  • 287
1

Using Miller:

mlr --nidx --ifs '/' -N cut -f 1 file

or using GNU datamash:

datamash dirname 1 <file
αғsнιη
  • 41,407
1
$ awk 'sub("/.*","/")' foo.txt
apple.com/
t1.msn.com/
happy.net/
Ed Morton
  • 31,617
0

Your better option here is actually sed, because it edits the stream on a line by line basis.

Try the following:

sed 's/\/.*//' foo.txt > foo_filter.txt

This tells sed that - per line - replace anything after the / with nothing. You then redirect the output to the new file with the >. You can read more from the sed manual here.

Note: because sed is greedy, you might need to specify the first slash with a 1 at the end of the command:

sed 's/\/.*//1' foo.txt > foo_filter.txt

You definitely can use awk if you have strings with multiple slashes:

awk -F"/" '{print $1"/"}' foo.txt > foo_filter.txt

The -F"/" sets the field delimiter to forward slash, and '{print $1"/"}' prints the first field followed by a slash (since it's the field delimiter, it gets removed on print and has to be re-included).

αғsнιη
  • 41,407
Yehuda
  • 259
  • print $1/ in awk would be a syntax error. Don't escape the / in your sed command, just pick a different delimiter like :. – Ed Morton Feb 09 '23 at 23:28
0

Here's an awk and a cut solution.

$ cut -f1 -d/ foo.txt
apple.com
t1.msn.com
happy.net
$ awk -F/ '{ print $1 }' foo.txt
apple.com
t1.msn.com
happy.net
$ awk -F/ '{ print $1"/" }' foo.txt
apple.com/
t1.msn.com/
happy.net/
$
steve
  • 21,892
0

With just :

$ grep -oE '^[^/]+/' foo.txt

Output:

apple.com/
t1.msn.com/
happy.net/

To fulfill all the requirements:

grep -oE '^[^/]+/' foo.txt | tee foo_filter.txt
  • 1
    Thanks, time to learn some switches, apparently! – Scott Feb 09 '23 at 20:23
  • Advice to newcomers: If an answer solves your problem, please accept it by clicking the large check mark (✓) next to it and optionally also up-vote it (up-voting requires at least 15 reputation points). If you found other answers helpful, please up-vote them. Accepting and up-voting helps future readers. – Gilles Quénot Feb 09 '23 at 20:40