0

I need to remove all http and https from txt file.

Like this:

http://ac.tecnicasdeinvasao.com
http://go.tecnicasdeinvasao.com
http://lp.tecnicasdeinvasao.com
https://ac.tecnicasdeinvasao.com
http://secreto.tecnicasdeinvasao.com
https://go.tecnicasdeinvasao.com
https://lp.tecnicasdeinvasao.com
https://secreto.tecnicasdeinvasao.com

To

ac.tecnicasdeinvasao.com
go.tecnicasdeinvasao.com
lp.tecnicasdeinvasao.com
ac.tecnicasdeinvasao.com
secreto.tecnicasdeinvasao.com
go.tecnicasdeinvasao.com
lp.tecnicasdeinvasao.com
secreto.tecnicasdeinvasao.com

I tryed using sed but without success.

3 Answers3

2

I prefer awk to sed, so here's what I'd do:

awk allows you to define custom field separators, which makes your problem fairly straightforward:

Assuming the file containing the full URLs is tstfile.txt, declare the field separator (-F) as //, and then print the second field ($2):

$ awk -F'//' '{print $2}' tstfile.txt
ac.tecnicasdeinvasao.com
go.tecnicasdeinvasao.com
lp.tecnicasdeinvasao.com
ac.tecnicasdeinvasao.com
secreto.tecnicasdeinvasao.com
go.tecnicasdeinvasao.com
lp.tecnicasdeinvasao.com
secreto.tecnicasdeinvasao.com

If you want the results in a file, you can use a redirection >somefile.txt.

And if your distro uses the GNU version of awk, a.k.a. gawk, you have the option of updating your input file in-place:

$ awk -i inplace -F'//' '{print $2}' tstfile.txt
$

Now, the contents of tstfile.txt will be exactly as shown above, which may save you a step or two - depending on your end objective. Other options, including saving the original file under a different file name are covered in this answer.

Seamus
  • 2,925
  • that worked! what mean {print $2}? why this worked? – Lucas Fernandes Nov 17 '21 at 01:57
  • 2
    @Chead: Generally speaking, awk treats each line as a record, and each record has one or more fields. A field separator tells awk what the field boundaries are. In this case, because we declared // as the field separator, the characters to the left of the field separator is the first field, the chars to the right are the second field. As awk reads the line/record, each field is stored in $1 (first record), $2 (2nd record), etc. This worked because we declared // as the field separator, and printed the second field: $2. Lots of good tutorials on awk online. – Seamus Nov 17 '21 at 02:59
2
Because you asked with sed

With sed easier to read:

sed  -E 's/http.+[/]//'

With sed easier to write:

sed -E 's/http.+\///'

Both do the same. sed use the format of /select/replace/, because of this is harder to read with \ backlash.

  • -E is to use regular expression "newer" versions
  • s is for find and replace a string
  • .+ has 2 meanings, the . dot is for any character, in this case the s and the + sign is for selecting everything else after the .
  • if a character has special meaning in regex world, you need to escape them, usually with \ backlash. In this case, you require telling sed to select everything until the last /, so you require escaping it like this \/ but is harder to read \///, the last // is from the format of /select/replace/ you can also escape with brackets [/], and because you want to remove what you have selected then you write nothing --> //

and because everything starts with h(is the same), so you can make it even shorter

sed -E 's/.+\///'  
sed -E 's/.+[/]//'

this means --> select everything from the beginning of the TEXT until the last / --> whateverGoesHere/

AlexPixel
  • 290
  • Note that in cases where the search pattern or replacement string contain /, it is easiest to instruct sed to use a different command argument separator, e.g. sed -E 's|http.+/||' or sed -E 's,http.+/,,' – AdminBee Nov 18 '21 at 08:58
  • 1
    man you were so good to explain sed! now i understood! thanks so much :) – Lucas Fernandes Nov 18 '21 at 14:28
1

Try this:

awk '{sub(/https?:\/\//,"");print}' file.txt > outfile.txt

awk loops through the input lines performing the program on each line. I didn't specify any regular expression to match, so the code in curly braces is executed on each line. The sub function matches the regular expression between slashes and replaces it with the quoted empty string. The question mark in the regular expression makes the "s" optional