7

I have a "normal" looking text file (contains english sentences) which is getting detected by file command as ASCII Pascal program text.

How does Pascal program text differentiate itself from normal ASCII English text?

I did head -10 file > tmp

file tmp still shows Pascal. tmp when opened in VI and :set list

HELEN'S BABIES$
$
With some account of their ways, innocent, crafty, angelic, impish,$
witching and impulsive; also a partial record of their actions during$
ten days of their existence$
$
By JOHN HABBERTON$
$
$
$

Output of head file | od -c

0000000   H   E   L   E   N   '   S       B   A   B   I   E   S  \n  \n
0000020   W   i   t   h       s   o   m   e       a   c   c   o   u   n
0000040   t       o   f       t   h   e   i   r       w   a   y   s   ,
0000060       i   n   n   o   c   e   n   t   ,       c   r   a   f   t
0000100   y   ,       a   n   g   e   l   i   c   ,       i   m   p   i
0000120   s   h   ,  \n   w   i   t   c   h   i   n   g       a   n   d
0000140       i   m   p   u   l   s   i   v   e   ;       a   l   s   o
0000160       a       p   a   r   t   i   a   l       r   e   c   o   r
0000200   d       o   f       t   h   e   i   r       a   c   t   i   o
0000220   n   s       d   u   r   i   n   g  \n   t   e   n       d   a
0000240   y   s       o   f       t   h   e   i   r       e   x   i   s
0000260   t   e   n   c   e  \n  \n   B   y       J   O   H   N       H
0000300   A   B   B   E   R   T   O   N  \n  \n  \n  \n
0000314

File uploaded here: http://www.fileswap.com/dl/L0eCWJTvy/

I'm on CentOS release 6.5, file version 5.04

There is something in the 4th line. Removing from 4th line onwards detects it as only text file

polym
  • 10,852
user13107
  • 5,335
  • 2
    Please give us a minimal example of the file that reproduces the issue. Also have a look through head file | od -c to check for non-printing characters. – terdon Jul 01 '14 at 14:13
  • @terdon done it. – user13107 Jul 01 '14 at 14:20
  • Thanks. What operating system are you running? Could you provide us with a link where we can download the actual file? There seems to be nothing strange in the od output. – terdon Jul 01 '14 at 14:24
  • Do you have a $HOME/.magic or $HOME/.magic.mgc file? – steeldriver Jul 01 '14 at 14:35
  • @steeldriver yes, should i share it? Is there some version number for that file that will make you identify it? – user13107 Jul 01 '14 at 14:38
  • TBH I don't understand the syntax of the magic file specification - but someone else might. In any case, you could temporarily rename that file to see if that fixes the issue. – steeldriver Jul 01 '14 at 14:41
  • @steeldriver see edit at the end of question. – user13107 Jul 01 '14 at 14:48
  • It is reported as ASCII for me. @steeldriver's suggestion about ~/.magic is probably right on. Did you try running file as another use or renaming the ~/.magic file? That's where file reads the filetype patterns it recognizes from. – terdon Jul 01 '14 at 14:57
  • Hang on, you said you had a $HOME/.magic file. If not, which one are you talking about? /etc/magic? That's normal, you're supposed to have it. – terdon Jul 01 '14 at 15:05
  • yes, it's /etc/magic. not under $home, i just wanted to say that i have the magic file somewhere (it shows path when you do file -v) – user13107 Jul 01 '14 at 15:11
  • Please answer your question yourself with your new findings, and accept it - so it will not stay open forever. – Volker Siegel Jul 01 '14 at 19:55

1 Answers1

7

I was able reproduce this both on OS X 10.6.8 and OpenBSD 5.5-current.

Printing out debug information using file -D tmp, it turns out that your text file fails roughly 2000 tests before file(1) recognizes the Pascal keyword record and decides that it must be a Pascal program text.

A minimal working example can be obtained as follows:

$ echo record > test
$ file test
test: ASCII Pascal program text

After numerous heuristics, only the "third & last set of tests, based on hardwired assumptions" in ascmagic.c applies. These tests recognize "file types that we know based on keywords that can appear anywhere in the file". Therefore, minimal changes to your file result in the correct identification as ASCII English text, for example changing their to the in the third line.

damien
  • 608
  • Arch file-5.19 detects Op's file as ASCII text but your test shows Pascal source, ASCII text. After trying strings /usr/share/file/misc/magic.mgc | grep -C 10 'pascal' I can see it's record or program at the beginning of a line or (input, anywhere in the text which triggers such identification in my case. +1 –  Jul 02 '14 at 07:12
  • could you have a stab at http://unix.stackexchange.com/questions/140359/test-for-a-particular-file-format? thanks. – user13107 Jul 02 '14 at 12:12