7

I am writing a menu based bash script, one of the menu options is to send an email with a text file attachment. I am having trouble with checking if my file is a text file. Here is what I have:

fileExists=10
until [ $fileExists -eq 9 ]
do
  echo "Please enter the name of the file you want to attach: "
  read attachment
  isFile=$(file $attachment | cut -d\ -f2)
  if [[ $isFile = "ASCII" ]]
    then
      fileExists=0
    else
      echo "$attachment is not a text file, please use a different file"
  fi
done

I keep getting the error cut: delimiter must be a single character.

Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232
Powea
  • 73
  • 3
    Put an extra space after -d\. – Michael Homer Jun 11 '15 at 09:34
  • 1
    Depending on the file version you have available you should consider using some options like --brief (which doesn't output the filename so you will have less of a problem with filenames that contain spaces) or --mime which returns the MIME type (e.g. text/plain) instead of a textual description of the file type. – Dubu Jun 11 '15 at 10:00
  • 2
    Just a note on the off-topic closure - This question would still help a lot of future readers like me. I was looking for an if statement to check if a file contained text, and this one helped me perfectly. – thepiercingarrow Mar 16 '16 at 23:58

5 Answers5

7
  1. From the fact that it says file $attachment rather than file "$attachment", I guess your script cannot handle filenames that contain spaces.  But, be advised that filenames can contain spaces, and well-written scripts can handle them.  Note, then:

    $ file "foo bar"
    foo bar:  ASCII text
    
    $ file "foo bar" | cut -d' ' -f2
    bar:
    

    One popular and highly recommended approach is to null-terminate the filenames:

    $ file -0 "foo bar" | cut -d $'\0' -f2
    :  ASCII text
    
  2. The file command makes educated guesses about what type of file a file is.  Guesses, naturally, are sometime wrong.  For example, file will sometimes look at an ordinary text file and guess that it is a shell script, C program, or something else.  So you don't want to check whether the output from file is ASCII text, you want to see whether it says that the file is a text file.  If you look at the man page for file, you will see that it more-or-less promises to include the word text in its output if the file is a text file, but this might be in a context like shell commands text.  So, it may be better to check whether the output from file contains the word text:

    isFile=$(file -0 "$attachment" | cut -d $'\0' -f2)
    case "$isFile" in
       (*text*)
          echo "$attachment is a text file"
          ;;
       (*)
          echo "$attachment is not a text file, please use a different file"
          ;;
    esac
    
  • 4
    You cannot rely on substrings in the output of file as for many formats file extracts and displays strings from the file (look for %s in the magic sources) which may include text – Stéphane Chazelas Jun 12 '15 at 22:36
  • 1
    @StéphaneChazelas: Perhaps you should take the wisdom that you have sprinkled over this page (-b, < filename, --mime) and add it to your answer, rather than scribbling it on leaves and letting them blow into everybody else’s yards.  :-)  ⁠ – G-Man Says 'Reinstate Monica' Jun 16 '15 at 23:52
6
case $(file -b --mime-type - < "$attachment") in
  (text/*)
     printf '%s\n' "$attachment is probably text according to file"
     case $(file -b --mime-encoding - < "$attachment") in
       (us-ascii) echo "and probably in ASCII encoding"
     esac
esac
4

I would circumvent the escaping and do:

... | cut -d' ' -f2 

that way it is clear that you need a space between the delimiter character (specified by the three letters sequence ' ') and the following option. With -d\ -f2 it is easy to miss you should have done -d\ -f2.

Anthon
  • 79,293
4

The problem occurs in cut -d\ -f2. Change it to cut -d\ -f2.

To cut, the arguments look like this:

# bash: args(){ for i; do printf '%q \\\n' "$i"; done; }
# args cut -d\ -f2
cut \
-d\ -f2 \

And here is the problem. \ escaped the space to a space literal instead of a delimiter between arguments in your shell, and you didn't add an extra space so the whole -d\ -f2 part appears as one argument. You should add one extra space so -d\ and -f2 appear as two arguments.

To avoid confusion, many people use quotes like -d' ' instead.

P.S.: Instead of using file and making everything ASCII, I'd rather use

if file "$attachment2" | grep -q text$; then
    # is text
else
    # file doesn't think it's text
fi
Mingye Wang
  • 1,181
2

Another option is to not use cut and to match a regex against the full output of file:

#...
isFile=$(file $attachment)
if [[ "$var" =~ ^[^:]*:\ ASCII ]]
#...
kos
  • 2,887
  • 1
    The elephant in the room is that the output from file is of the form “(filename) (colon) (whitespace) (filetype)” — i.e., it includes the filename.  Therefore, if there is a binary file called VACASCIING.JPG, the output from file will be VACASCIING.JPG: JPEG image, and your code will call it a text file because the filename matches ASCII.  Note that the second part of my answer, which is functionally comparable to yours,   … (Cont’d) – G-Man Says 'Reinstate Monica' Jun 12 '15 at 18:57
  • 1
    (Cont’d) …  would not have needed the cut command but for this.  (The OP is using cut to extract the first word of the filetype, which he tests for equality.  I’m doing a pattern match, like you, so I could have just used the entire output, but I use cut to get the filetype only, without the filename.)  Also, Stéphane Chazelas's answer uses file < "$attachment", and, while not explained, I presume that that trick is also intended to get output without the filename. – G-Man Says 'Reinstate Monica' Jun 12 '15 at 18:58
  • @G-Man Indeed that was a pretty big elephant. Thanks for your comment. I've updated making the regex way more consistent. – kos Jun 12 '15 at 20:42
  • @G-Man, no -b avoids printing the filename. Using < is to avoid problems with a file called -. – Stéphane Chazelas Jun 12 '15 at 20:44
  • 1
    @kos, now you're moving the problem to files with : in their name. – Stéphane Chazelas Jun 12 '15 at 20:46
  • @StéphaneChazelas You're right. I'm coming from Windows and I always forget about this. Either I'll work out a solution to make this work consistently or I'll delete this answer. – kos Jun 12 '15 at 20:52
  • @StéphaneChazelas: Wow; I keep on learning things from you.  Thanks².  But I guess any filename beginning with - would be a problem.  And it looks like you could have used file -- "$attachment", although it’s not mentioned in the man page. – G-Man Says 'Reinstate Monica' Jun 12 '15 at 21:56
  • 1
    @G-Man, I used file - < "$attachment". file - tries to identify the file content of stdin. so it's simliar to file -sL -- "$attachment" except that it also work for a file called - (-- doesn't help with that). – Stéphane Chazelas Jun 12 '15 at 22:23
  • @G-Man - i think the simple solution to the filename thing is simply to cancel by compariso . It can't that much of a hurdle if we already have all of the pertinent values in named vars at our disposal. Just: printf 'file reports type:\t%s\n' "${var#*"$attachment"*:}" – mikeserv Jun 13 '15 at 09:03