0

Running a deployment script that removes the .html part of a file before uploading it to an S3 bucket. Currently using the following:

cd out/ 
for file in `ls -l *.html`; do 
    newname=`echo $file|sed 's/\.html$//g'`
    mv $file $newname
    aws2 s3 cp ./$newname s3://$S3_BUCKET_NAME/
done

This works perfectly for files in the out/ folder, but it completely ignores, for example, out/folder/file.

Ideally I would be able to perform the same action (of removing the .html part of the filename) on sub-folders and pass that to my upload command.

jesse_b
  • 37,005
t988GF
  • 208
  • The reason this was a oneliner is because it is part of a CICD process. – t988GF Dec 19 '19 at 23:09
  • Huh? If "CICD" means "continuous integration and continuous delivery", that should have no impact on how a script is formatted. If anything, it should mean not writing it as a one-line would be preferred, with comments, so that team members easily can see what it does, what failure conditions it has etc. – Kusalananda Dec 19 '19 at 23:15

1 Answers1

3
shopt -s globstar nullglob

for pathname in out/**/*.html; do
    newname=${pathname%.html}
    mv "$pathname" "$newname"
    aws2 s3 cp "$newname" "s3://$S3_BUCKET_NAME/"
done

This takes care to

  1. not parse the output of ls (see "Why *not* parse `ls` (and what to do instead)?") (your loop would additionally loop over all the other info that ls -l output (permissions, ownership, timestamps etc.), so I find it strange that you say that it work at all), and
  2. properly quote variables (see "Why does my shell script choke on whitespace or other special characters?"), and
  3. handle pathnames as entities and not as strings (sed etc. are primarily for editing lines of text; when it comes to pathnames on Unix, these may contain newlines (unlikely, but it's allowed); using a simple parameter substitution to remove a filename suffix is safer, and quicker, than calling sed (and echo) to do it), and
  4. find files in any subdirectory of out that has filenames ending in .html using the ** glob (enabled via the globstar shell option), and
  5. not run the loop if the pattern doesn't match anything (using the nullglob shell option).

It additionally does not need to use cd as the $pathname and $newname values will include the full path from the current directory to any matched filename.

Kusalananda
  • 333,661
  • Won't this ignore the out/ directory, which holds .html files that also need to be stripped of the .html? Files are both in the out/ as well as sub-directories. – t988GF Dec 19 '19 at 23:08
  • @t988GF It will not ignore any .html files in the out directory. The ** glob will also match inside that directory. – Kusalananda Dec 19 '19 at 23:10
  • worked very well. Only issue I'm having is that when the upload happens, it does not include the folder the file came from (file from root/folder/file is uploaded as file, and not as folder/file) Any tips on addressing that? – t988GF Dec 20 '19 at 01:34
  • @t988GF That's an issue related to how the aws2 s3 cp command works and how it treats pathnames. I have never used S3 and I don't really know how their "buckets" work. You could potentially try using "s3://$S3_BUCKET_NAME/$newname" as the destination path, but I don't know if that would work as I have no way of testing it. – Kusalananda Dec 20 '19 at 06:35
  • that also fixed it. Specially aws2 s3 cp "$newname" s3://$S3_BUCKET_NAME/"$newname". Only issue I'm having now is that even the "out" folder is being appended to the files (aka the root folder). This work perfectly for sub-folders but, since out/ is the root folder, it would be good if it wasn't in the path that is given to $newname. – t988GF Dec 20 '19 at 18:51
  • 1
    @t988GF Just strip the leading out/ off from the destination name: "s3://$S3_BUCKET_NAME/${newname#out/}". Note that the whole destination path needs to be double quoted, not just the variable substitution at the end (you have another variables, $S3_BUCKET, in there that also should be quoted). – Kusalananda Dec 20 '19 at 19:15