4

I am a newbie to this and was wondering how to go about building this shell script:

I have files in directory1 with filenames A1-001.xyz A29-002.xyz A82-003.xyz and I wanted to move these based on the 2nd part of the filename e.g. 001 002 003 into directory2 with folder name 001 002 003.

Here is what I have done so far:

for file in /path/to/directory1/** ; do
echo "$file" | awk -F '[-]' '{print $2}' | cut -f 1 -d '.' ;
done >> dummy.txt 

input="dummy.txt"
while IFS= read -r file; do
echo "$file" | mv "$file" /path/to/directory2/$file ;
done 

My thinking was putting the output filenames from 1st part into a dummy.txt then reading the filenames and moving it. The 2nd part of the script doesn't seem to be working so are there any suggestions on how to do this?

Sri
  • 41

2 Answers2

8

Start small

Break your problem into smaller pieces. Part of the reason you're stuck is because you're trying to craft the entire solution in one whack, even while you are trying to learn how to operate the tools you're using to craft the solution itself.

Here's a tip that I hope will help a light bulb go off for you, and that you and other beginning scripters will benefit from when you have to break down and analyze similar problems in the future:

Start by specifying the exact nature of what needs to be done to each file. In fact, you should be able to manually write the commands you need to process one specific filename sampled from your list of files. Don't do the work, just write the commands. In your example, each file needs to be moved, yes? Therefore, each file requires one mv command. Rather than struggle with how to do the mv command, just worry about how to create it. How would you manually write just one such mv command to move the file? Then the question becomes how to get awk (or whatever tool you want to use) to output that command:

mv (filename) (to-where-you-want-it)

for each filename that you give it. When you're learning new tools, it's much easier to debug a script that simply creates a series of shell commands as its output, without actually doing anything, than it is to debug a script that just went sideways and moved hundreds of the wrong files into hundreds of the wrong directories, and now you're no longer sure where anything is.

For starters, consult the man page for the tool you think will work for you. Then experiment with that command in a manual mode, just to learn what you need to do to get that tool to parse your input the way you want and to create the output that you need. Before you can write a script to move 100 or 1000 files, you need a script that can correctly move just one file. So create a Test Case of One, and take the time you need to take to "make friends" with the tool or tools you think will work. Your post is tagged and I think that's a wise choice, so let's go with that.

awk has a -F parameter that can be used to specify the delimiter that awk should use to break the string up into component fields. That delimiter can be a simple character, or it can be any of several characters enclosed in brackets. In regexp parlance, that is known as a character class. Your input uses both a hyphen '-' and a period '.' as field separators, so we can specify the character class [-.] to tell awk to split on either a hyphen or a period. Note carefully that awk doesn't care which one is which, and ensure that your source directories don't contain any hyphens or periods.

Using awk to break each filename into component fields

Take a sample case of filename A1-001.xyz and try running it through this awk command manually, to learn what awk does with that filename:

$ awk -F[-.] '{print $0 " " $1 " " $2 " " $3}' <<< 'A1-001.xyz'

That command tells awk, "Using both hyphen and period as field delimiters, print the entire input line ($0), a space, field 1, a space, field 2, a space, and finally field 3.

The output is:

A1-001.xyz A1 001 xyz

Hopefully that shows you a lot: that $0 is what you need in the mv command source, because that's the full, original filename; and that $2 is what you need in the mv command destination, because that's the numeric directory name you want. The biggest realization is that awk can entirely format the mv command for you, and print it out. All it takes is to tweak awk's print statement a little. Rather than trying to have your script do everything, just have the script create the commands you need to execute. That way, an error in your scripts won't make it blow up and move files to the wrong places. It will just print some output that is wrong, and you'll notice that it's wrong, but there will be no harm done.

Second iteration of refining the awk command

The filename may well have a source path in front of it. But make sure there aren't any . or - characters in the path! So the mv command for each file obviously starts with mv and a space, then the filename (including the full source path, perhaps), another space, and the directory you're moving the file to. For good measure, we'll put a slash after the destination directory. Since you are not changing the name of the file, we'll just specify a destination directory and omit the destination filename. Doing that is also easier, which is worth noting. Don't make things any more difficult than there is need to.

$ awk -F[-.] '{print "mv " $0 " " $2 "/"}' <<< '/path/to/directory1/A1-001.xyz'
mv /path/to/directory1/A1-001.xyz 001/

Look at the print command: begins with mv space, then $0 which is the full filename; another space, then $2 which is the output sub-directory. Again, you'll have to make sure your source path names DO NOT contain any hyphens or periods, because they have special meaning as field delimiters within your filenames. More's the problem, awk won't split your fields properly, and your script will break.

But the destination directory isn't just $2, it has a prefix in front of it, like the source filename did. We can get awk to print that for us, since it's the same every time:

$ awk -F[-.] '{print "mv " $0 " /path/to/directory2/" $2 "/"}' <<< '/path/to/directory1/A1-001.xyz'
mv /path/to/directory1/A1-001.xyz /path/to/directory2/001/

Test the solution over the entire list of files

So that looks promising. Now make a list of files in file-list.txt:

$ cat file-list.txt 
A1-001.xyz
A29-002.xyz
A82-003.xyz

and then run your awk command over that entire list of files. Remember, there's no harm here, because all awk is doing is printing stuff. It's not actually doing anything about moving the files. It's just showing you the commands that will do what you want to do.

$ awk -F[-.] '{print "mv " $0 " /path/to/directory2/" $2 "/"}' < file-list.txt 
mv A1-001.xyz /path/to/directory2/001/
mv A29-002.xyz /path/to/directory2/002/
mv A82-003.xyz /path/to/directory2/003/

Inspect the output carefully, test, and execute

If you have lots of files to move, you'll want to pipe the awk command above into less so that you can inspect it carefully. Look for dots and dashes in the wrong places, or other strange characters in file or directory names. If you wanted to you could copy-and-paste a sample line of that output into a shell prompt, to ensure that it does the right thing. But this is a simple enough example that we can test by inspection. Once you're satisfied that this list of mv commands is what you want to do, just pipe the output of awk directly into sh to execute it. If you want to see the commands while they execute, use sh -v instead of just sh:

$ awk -F[-.] '{print "mv " $0 " /path/to/directory2/" $2 "/"}' < file-list.txt | sh -v
mv A1-001.xyz /path/to/directory2/001/
mv A29-002.xyz /path/to/directory2/002/
mv A82-003.xyz /path/to/directory2/003/
$

Conclusion

I hope you don't object to having such a detailed breakdown, but this sort of question arises a lot on Stack Exchange, and many beginning scripters think that their problem is a unique, one-off problem that requires a unique solution.

The real key to scripting is to realize that scripting provides generalized tools that can meet a wide variety of problems, and one of the first steps to gaining proficiency is in learning how to do small things with those tools, and then combine those small things into larger and larger things.

The first step was just in learning how to tell awk how to break the filename up the way we needed it. That's a critical step anytime you're trying to parse component fields out of a filename that has multiple pieces of information embedded in it.

The second step was to tell awk to automatically print the parts of the command that were always the same for each file (the mv at the beginning, the destination path before the $2 field), and to place the extracted fields of the filename in the correct places. print statements and their kin are one of the most basic pieces of any type of coding, and I can't recall much harm that ever came from a well-place print statement. To be sure, you want to output only what is necessary, but when you're learning, and you don't know what a variable is, print it, it rarely hurts to ask. Long term you'll take that print statement back out, but the whole point of the "print-it-then-pipe-to-shell" technique of scripting is that you have a "dry run" built in, because you always look at the shell commands output by your script before you actually pipe them to a shell to execute. In complex cases, even putting comments in your output is fair game, to "show your work":

$ awk -F[-.] '{print "# move file " $0 " to subdir " $2; print "mv " $0 " /path/to/directory2/" $2 "/"}' < file-list.txt 
# move file A1-001.xyz to subdir 001
mv A1-001.xyz /path/to/directory2/001/
# move file A29-002.xyz to subdir 002
mv A29-002.xyz /path/to/directory2/002/
# move file A82-003.xyz to subdir 003
mv A82-003.xyz /path/to/directory2/003/

And the third key, perhaps closely related to my second point, but one I think is often overlooked is, when you're doing something that's a bit of a stretch for you, don't write a script that could potentially go wrong and leave your files all littered about in umpteen different but wrong places. Just write a script that writes the script to do the work. It's much easier to troubleshoot that way. Then, when you finally have the script correct, just pipe the script output (in your example, the series of mv commands, one per file) into a shell, and they will run.

Jim L.
  • 7,997
  • 1
  • 13
  • 27
1

There are two problems with the second part of your script is failing. First, you're not actually reading any input into the loop. You had:

while IFS= read -r file; do something; done

But you need:

while IFS= read -r file; do something; done < "$inputFile"

Then, mv cannot read from an input stream, there is no point in piping data to it. It needs filenames as input not just text, and it doesn't read from stdin anyway. So echo "$file" | mv "$file" "/somewhere" is exactly the same as just running mv "$file" "/somewhere". The echo $file is pointless. And it doesn't work since $file has only the second part of the file name (001, 002 etc.), and not the actual file name.

In any case, you can do the whole thing with a single loop directly, no need for an intermediate file:

for file in /path/to/directory1/** ; do 
    dirName=$(awk -F[-.] '{print $2}' <<<"$file"); 
    echo mv "$file" "/path/to/directory2/$dirName"; 
done

If that prints out what you need, remove the echo and run it again to actually move the files.

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
terdon
  • 242,166