Arithmetic is possible in the the shell, but it's always awkward, so I recommend you look for another scripting language to do most the work here. The following uses awk
, but you could use perl
equally well. I'd like to be able to say that you could also use python
easily in the example below, but aspects of python
's syntax make it not obvious how to embed a python script in-line into a pipeline like this. (It can be done, but it's irritatingly tricky.) Note that I don't use awk
to perform the actual moves, just to do the calculation needed to produce the needed destination directory. If you use perl
or python
, they can perform the filesystem operations as well.
Some assumptions:
You want to move the file with its full original name. It's not much harder to modify the script to strip off the numeric prefix of the original (although then it had better be the case that the files don't all end in _file.txt
).
There is only a single _
and no spaces in the filenames. If that's not true, something like the following can still work but you need to be more careful in the awk script and following shell loop.
So, given those, the following should work.
ls |
awk -F_ '
{
n = $1 - 1 # working zero based is easier here
base = n - (n % 100000) # round down to the nearest multiple of 100,000
printf "%d_%d %s_%s\n", base + 1, base + 100000, $1, $2
}' |
while read destdir orig
do
mkdir -p $destdir
mv $orig $destdir
done
So, what's going on here?
ls | ...
This just lists the filenames, and because the output is going to a pipe and not the terminal, it lists them one per line. The files will be sorted by ls
's default order, but the rest of the script doesn't care about that and will work fine with a randomized list of filenames.
... | awk -F_ '
{
n = $1 - 1 # working zero based is easier here
base = n - (n % 100000) # round down to the nearest multiple of 100,000
printf "%d_%d %s_%s\n", base + 1, base + 100000, $1, $2
} | ...'
This is not complicated, but if you haven't played with awk
before it's a bit tricky to understand. First, the goal here is to read the filenames one at a time from ls
, and then for each filename produce an output line with two fields: the first field with the appropriate destination directory for the original filename, and the second field passing on the original filename so the following part of the pipeline can use it. So, in more details,
The -F_
flag to awk
tells it to split each input line into fields on the _
character. Assuming that _
occurs only once in these filenames, awk will assign $1
to the numeric part of the name, and $2
to all the text after the _
. Then, the braced block is applied with $1
and $2
set as just described.
The calculation of base
identifies the which block of 100000 files this file belongs in. First, calculate n
by subtracting 1
from the initial number of the filename. This zero-bases the number, which makes it easier to work with the modular arithmetic used in the next line. Next, round n
down to the nearest multiple of 100,000. If n
is already a multiple of 100,000 it is left undisturbed. (If you're not familiar with '%' operator, it N % M
computes the remainder when N
is divided by M
. So, 5 % 3 == 2
, 6 % 3 == 0
, and so on.)
Finally, the printf
assembles the output line necessary for the following stage of the pipeline. It produces a line with two fields, separated by a space. The first is the name of the destination directory, generated by using base
to derive the upper and lower bound parts of the directory name; it's here that move back into a 1-based counting scheme for output. The second field is the reconstructed original input filename.
... | while read destdir orig
do
mkdir -p $destdir && mv $orig $destdir
done
This is the final stage of the pipeline, and actually does all the moves. It reads each line produced by the awk
script as two fields, and then
- it ensures the directory exists, using
mkdir -p
(which does nothing if the directory already exists),
- and if that succeeds, it moves the original file to the new directory.
It's often a good idea to use the mkdir ... && mv ...
pattern in shell scripts, because if mkdir
fails for any reason, the rename is not attempted.
This pattern of multiple pipeline stages, each incrementally transforming the data in some simple but useful way, is a very effective way of writing many sorts of shell scripts. It plays to the shell's strengths in process and pipeline control, while allowing you to push the more complex calculations that the shell isn't good at into the more appropriate languages.
for
loop tomv
each file into the correct location. Alternatively, it might be easier to pipe the output fromls
intosplit -l 100000
in order to generate the directories you want. Maybe someone else will come along and write a one-liner for you. – sjy Sep 15 '20 at 00:02