I have a bunch of LaTeX source files, all have the same structure, all have Unix-style line endings and all are UTF-8, all are roughly the same size (1-2KB), all use spaces for indentation-formatting. They are included in a bigger document, each file handling a separate section in the document with each section having the same layout (so each file is structured identical with mostly the same LaTeX commands, just with different text content), so all files directly start/end with and contain many LaTeX commands. The strange thing now is this:
$ file *.tex
file1.tex: LaTeX document, Unicode text, UTF-8 text
file2.tex: CSV text
This is just a tiny excerpt, the detection of CSV vs. LaTeX is totaly random, while CSV is slightly less often detected (maybe 40% CSV, 60% LaTeX), but for each file the type is reproducible.
I tried varying some formatting and content in CSV-detected files, but they stay detected as CSV.
What is going on here?
man file
. Inspect the first few bytes of your files withhead -n 1 file | od -bc
.file
only looks at the first few bytes. – waltinator Nov 25 '23 at 21:59file
reads up to the first mebibyte of a file to identify it. – Stephen Kitt Nov 25 '23 at 22:31