0

I have web server log files that look like this:

2001:67c:1220:80c:d4:985a:df2c:d717 - - [22/Feb/2019:07:49:01 +0100] "GET / HTTP/1.1" 200 58266 "-" "curl/7.61.1"
2001:67c:1220:80c:d4:985a:df2c:d717 - - [22/Feb/2019:08:49:01 +0100] "GET / HTTP/1.1" 200 58341 "-" "curl/7.61.1"
2001:67c:1220:808::93e5:8ad - - [22/Feb/2019:08:56:10 +0100] "POST /wp-cron.php?doing_wp_cron=1550822170.2184400558471679687500 HTTP/1.1" 200 3279 "https://ios-example.com/wp-cron.php?doing_wp_cron=1550822170.2184400558471679687500" "WordPress/4.9.9; https://ios-example.com"
...

I need to extract dates and times in this format 22/Feb/2019:07:49:01.

This is what I have now (shamelessly copied from this thread: extracting date field from the lines):

file="filename"
while IFS= read -r line
do
    echo "`cut -d '[' -f2 $line | cut -d ' ' -f1`" # echoing now for testing purposes
done <"$file"

And this is the output when I run the script:

cut: '2001:67c:1220:80c:d4:985a:df2c:d717': Adresář nebo soubor neexistuje
cut: '[22/Feb/2019:07:49:01': Adresář nebo soubor neexistuje
cut: +0100]: Adresář nebo soubor neexistuje
cut: '"GET': Adresář nebo soubor neexistuje
cut: /: je adresářem
cut: 'HTTP/1.1"': Adresář nebo soubor neexistuje
cut: 200: Adresář nebo soubor neexistuje
cut: 58266: Adresář nebo soubor neexistuje
cut: '"-"': Adresář nebo soubor neexistuje
cut: '"curl/7.61.1"': Adresář nebo soubor neexistuje
22/Feb/2019:08:49:01
22/Feb/2019:08:56:10
22/Feb/2019:08:56:10
22/Feb/2019:09:24:33
22/Feb/2019:09:24:33
22/Feb/2019:09:43:13
22/Feb/2019:09:43:24
...

"Adresář nebo soubor neexistuje" means "Directory or file does not exist".

For a reason unknown to me, it does not work on the first line of the log file, but works fine with the rest of the file.

Kusalananda
  • 333,661
stitch123
  • 111

2 Answers2

1

You made multiple mistakes :

  • cut use a file name as argument
  • you forget some double quote ( " )

So if i rewrite you example , with a minimal number of change :

  • the use of $( instead of ` . this is more robust and it can be recursive .
  • the use of ${VARIABLE_NAME} instead of $VARIABLE_NAME . this is more robust

a new version of

file="filename"
while IFS= read -r line
do
    EXTRACT_DATE=$( echo "$line" | cut -d '[' -f2 | cut -d ' ' -f1  )
    echo "${EXTRACT_DATE}"        
done <"$file"
EchoMike444
  • 3,165
  • This fixed the issue, thank you very much for a quick answer! – stitch123 Mar 16 '19 at 15:42
  • 1
    There is absolutely no difference between $var and ${var}. Both mean exactly the same thing. The important thing is the double quoting of the variable expansion. Your code uses both variations of $var and ${var} for no good reason. The only place where you need ${var} is when the expansion is part of a string and the very next character is a character that is valid in a variable name, as in "${var}x". – Kusalananda Mar 16 '19 at 15:42
1

The main issue, which creates the errors, is that you using the read line in $line as a filename for cut to read.

You are also using echo to output the result of a command substitution. This is an anti-pattern. Just run the pipeline, without echo nor command substitution. It will output its result to the terminal by itself.

Here, we use printf to give cut the line read from the file:

file="filename"

while IFS= read -r line; do
    printf '%s\n' "$line" | cut -d '[' -f2 | cut -d ' ' -f1
done <"$file"

The next thing to note is that the while loop is totally unnecessary. You are calling cut twice for each line in the log file. The cut utility is perfectly capable of reading the file line by line by itself:

file="filename"

cut -d '[' -f2 "$file" | cut -d ' ' -f1

Or, you could use GNU grep:

grep -oP '(?<=\[)[^ ]+' "$file"

(This extracts everything up to the first space after the first [)

or standard sed,

sed 's/\].*//; s/.*\[//; s/ .*//' "$file"

(This deletes everything after the first ], then deletes everything to the first [, then chops of the space and the rest ofter that)

Related:

Kusalananda
  • 333,661