I have a list of raw files that were scraped and it consists of both text and source codes. Below is the filetypes listed and I wanted to remove all files which are of type C Source, python script, HTML and empty files and only keep ASCII and unicode files in place.
file *
1dW6WJMN.txt: Python script, ASCII text executable
9dJbZ3Vv.txt: ASCII text, with CRLF line terminators
9dQsmVU4.txt: Python script, UTF-8 Unicode text executable, with CRLF line terminators
A5hENB7D.txt: C source, ASCII text, with CRLF line terminators
cidREdJG.txt: UTF-8 Unicode text, with very long lines, with CRLF line terminators
exhjw1gK.txt: UTF-8 Unicode text, with CRLF line terminators
iu7LPrqz.txt: ASCII text, with very long lines, with CRLF line terminators
LsDHarjD.txt: ASCII text
nLABt1a6.txt: C source, ASCII text, with CRLF line terminators
nqMDtVuz.txt: ASCII text, with CRLF line terminators
nqPuYb23.txt: UTF-8 Unicode text, with CRLF line terminators
nQtzxhfQ.txt: ASCII text, with CRLF line terminators
NQuLWwpt.txt: ASCII text, with CRLF line terminators
nQXeJeED.txt: ASCII text, with CRLF line terminators
nqXGv6ws.txt: UTF-8 Unicode text, with CRLF line terminators
nQxr4Hwi.txt: ASCII text, with CRLF line terminators
nQxr4Hwii.txt: empty
VQjrxevh.txt: HTML document, UTF-8 Unicode text, with very long lines, with CRLF line terminators
yfDEfn4L.txt: C source, ASCII text, with CRLF line terminators
yydAEDRn.txt: HTML document, ASCII text, with very long lines, with CRLF line terminators
I tried using a simple grep with ASCII but all the source code files also contain the term ASCII. Is there any other way to filter out these source code files as sometimes there are also PHP, javascript files which I wanted to get rid of. I'm quite new to linux and any help would be appreciated. Thanks in advance