1

I have a big number of files around 185000, 99% of then start with a 6 digit number follow by underscore and other random symbols and random extensions

 312095_ck_image-24-10-20-11-29-1.jpeg
 312095_ck_image-24-10-20-11-29-2.jpeg
  312095_ck_image-24-10-20-11-29.jpeg

Basically this six digit number is a userid(user id given by some backend oracle db),
each of the user ids can appear multiple times in a directory. I also have a text file of 6 digit numbers one per each line( around 18000 numbers) . Is it possible to match files directory with the content of a text file. So if a file starts with a number that is in the text file I want it moved to another directory(regardless off the rest of the name or extension) so I can delete that folder later. I just want all the matching files in one folder so I can delete entire folder and not each individual files. Is this even possible in Linux (in shell or by installing/building from source some other Linux program). OS version is RHEL Linux 6. If it makes this any easier I can load a list of files in a directory to a db table and match it against the list of numbers I have in a text file, so I can know exactly what is the of the file to be re/moved. I just don't know how to then feed that list to mv command so it can move/delete the files. what's the easiest way to achieve it? So if my actual folder Is /new_upload/entrants/ and I have a empty folder called junk on the same level as entrants /new upload/junk and if 312095 apears inside a list of ids I want to execute

mv 312095_*   /new_upload/junk

[idis] entrants# sh -h sh-4.1# [idis] entrants# $SHELL --version GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu) Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.

  • Start with the grep manpage, particularly the -f flag which allows you to pass a filename full of patterns (your file of user IDs). Also, see this answer which talks generally about how to approach problems with a simple text-based approach to where you're actually creating a script, instead of executing one. IMO, it is much more conducive to learning, experimentation and testing, while minimizing the risk of damage or malfunction. – Jim L. Dec 12 '23 at 19:25
  • Also, please [edit] your post to tell us what shell you're using and show us the sample commands you would like to have executed on the three sample files for user ID 312095. If files can be in various directories, please include some examples that demonstrate that. – Jim L. Dec 12 '23 at 19:31
  • 2
    with the same effort you could move, you could just as well delete the right files … so that moving seems to be a "decoy", something you don't actually need but makes your problem a bit more complicated – Marcus Müller Dec 12 '23 at 19:50
  • You really need tools like RANGER or FZF for this job. Install them and try before doing anything else. – user9101329 Dec 12 '23 at 19:57
  • @–Marcus Müller The only reason I want to mv the files instead of just deleting them just to know the size of junk folder and possibly double check before making such a drastic change to the fie system – Azat Usmanov Dec 12 '23 at 20:05
  • that makes no sense. "Junk folders" are a thing that only file manager applications know, just delete the file, no trash folder involved. and regarding drastic changes: either you want to delete all these non-six-digit-files, or you don't. Your computer is quite good at selecting files starting with six digits, I dare say it's better than your eye! Anyway, you can just echo "${file}" instead of rm "${file}" and get a list of files that would have been deleted. – Marcus Müller Dec 12 '23 at 20:14

2 Answers2

0

On a GNU system, you could do:

find . -type f -print0 |
  LC_ALL=C gawk -F/ '
    !list_processed {user[$0]; next}
    match($NF, /^([0-9]{6})_/, f) && f[1] in user
    ' user-list.txt list_processed=1 RS='\0' ORS='\0' - |
  xargs -r0 mv -it /where/to/move/them --
0

You have GNU bash, so this is straightforward using shell constructs and GNU grep:

#!/bin/bash

for file in *; do match=$(echo "${file}" | grep -o '^[[:digit:]]{6}') if [[ -z "${match}" ]] ; then # File did not start with 6 digits, delete rm -- "${file}" else grep -q -F ${match} numbersfile.txt || rm -- "${file}" fi done

that is:

  • loop over all file names
  • for each file name, check using grep whether it starts with 6 digits
  • if not, delete file
  • if so, check using grep whether the matching digits are in numbersfile.txt
  • if not, delete file

(if you want to test before deleting, just replace rm with echo and you'll get the list of all files to be deleted)