3

This is a followup to Normal looking text file, detected by file as ASCII Pascal program text

It seems file can't be trusted all the time.

It seems possible that a particular file may be detected as having both encoding/format1 and encoding/format2. file tells me it's encoding/format1, but I need to check, whether it also satisfies the constraints of encoding/format2.

  • Is there a way to do that?
  • Is there a way I can ask "Check if this file follows the rules of ASCII English Text (or some other encoding)" and the answer will be "yes" or "no"?
user13107
  • 5,335

2 Answers2

4

Maybe you can use the --keep-going (-k) option of file. It writes out more matching formats.

Related man page description of this option:

Don't stop at the first match, keep going. Subsequent matches will be have the string ‘\012- ’ prepended. (If you want a newline, see the -r option.) The magic pattern with the highest strength (see the -l option) comes first.

jofel
  • 26,758
  • thanks, i should have checked manpages. But unfortunately this doesn't work for the file I have uploaded in the linked question. file -k tmp still shows only ASCII Pascal program text as the only matching format whereas I expect it to match it with ASCII English Text. – user13107 Jul 02 '14 at 12:08
  • file: invalid option -- 'l' @illuminÉ – user13107 Jul 02 '14 at 12:13
  • 1
    @illuminÉ no, the -l option just list all patterns with their strengh value, which are unrelated how good they actually fit to the file. – jofel Jul 02 '14 at 12:27
0

Only answering your second question, as jofel has already answered the first.

  • Is there a way I can ask "Check if this file follows the rules of ASCII English Text (or some other encoding)" and the answer will be "yes" or "no"?

See the -e/--exclude option. From file(1):

-e, --exclude testname

Exclude the test named in testname from the list of tests made to determine the file type.

[...]

  • soft Consults magic files

Those magic files are responsible for the Pascal report, so -e soft should be enough. You could try excluding other tests from that list too, as long as you leave ascii enabled.

For your "yes/no" test, combine with -b ("brief" ie. without the filename) and --mime-encoding, to output only the encoding. Then it's a simple string comparison:

if [ "$(file -b --mime-encoding -e soft $file)" == "us-ascii" ] ; then
  echo yes
else
  echo no
fi
JigglyNaga
  • 7,886