1

If using FreeBSD as a file server for windows clients, it's useful to be able to run file searches server-side rather than client-side.

A typical example might be: find all files meeting some metadata criteria (name, path, size, date etc) with some literal or regex in their text-extracted content. The search is across a large recursive directory that contains mixed files, and content hits could be in any (or multiple) of: .txt notes, .docx/.xlsx documents, .pdf, .zip/.rar/.tgz/.iso compressed archives, or failing which maybe even strings in a binary file.

The first part is easy, just use find. Searching in one type of file isn't hard either. But FreeBSD doesn't have a notion of "well known" file filters or a specific single API for parsing file data to text that uses pluggable filters to a common format (although there are well-known text extraction filters for many individual filetypes such as pdf, doc/docx, xls/xlsx, archive formats, sqlite databases, binary files containing strings, etc) so you can't just throw grep, find -exec, pdftotext, or unzip | sed using Microsoft XML extraction code universally at the results. I guess you would have to generate a list or stream of filenames with find, then pass each through its appropriate filter based on extension or file, and gather up whatever passes through, as the output.

If I need to do this kind of content search quite often in a large file store, is there a specific tool that's designed and more efficient for it, or what's the most efficient approach out there?

Update - I'm only interested in direct file-by-file CLI search. I'm not interested even slightly, in indexing content and later searching an index. This question relates to file-by-file on-the-spot literal/regex search, as with find, but when the content is also searched and isn't plain text but is multiple file types with varied text-extract filters. So it's not a dup of the existing questions about indexed content searching. Sorry this wasn't clear before, I hadn't realised the ambiguity.

Stilez
  • 1,261
  • I'm not asking for indexing based systems, but pure search systems. Question updated to clarify this, as This question asks about non-indexed searching (similar to find but across different format files). Sorry this wasn't clear, please unmark as dup. – Stilez Jun 23 '17 at 05:27

0 Answers0