-1

I'm writing a bash script that needs to fetch all *_out.csv from a directory, on a remote server. All these files are several directories deep inside of another directory. So for instance, say the directory is called ox_20190404/. I can find all my files by going:

find ox_20190404/assessment/LWR/validation -type f -name "*_out.csv"

This question answers part of my question, but since I don't want to copy the directory in it's entirety I need to figure out how to implement the above code. Suppose I start with this:

$ dir="/projects/ox/git"
$ server="myusername@server"
$ scp $server:$dir/$(ssh $server 'ls -t $dir | head -1') .

How would I grab the files I need from there?

The last part of my question wonders if there is a way to then take all the copied files and place them in the same file path and directory they were in on the remote server.

dylanjm
  • 201
  • To clarify, you want the directory under $dir that has the most recent ... name? timestamp? You hard-coded ox_20190404 in the lead-up, so it's not clear how you selected it. – Jeff Schaller Apr 05 '19 at 20:11
  • @JeffSchaller Suppose I'm ssh'd into the server. If I type ls -t /projects/ox/git | head -1 then ox_20190404 is the directory that is returned. I then want to go inside that folder and get the files from there. – dylanjm Apr 05 '19 at 20:13
  • is zsh available on $server? – Jeff Schaller Apr 05 '19 at 20:14
  • @JeffSchaller It appears so, but it's not really setup (no .zshrc files). – dylanjm Apr 05 '19 at 20:15
  • and so the final scp command would explicitly list all of the *_out.csv files underneath the most recent directory under $dir in order to be copied locally? – Jeff Schaller Apr 05 '19 at 20:20
  • @JeffSchaller it would be several levels deeper than the most recent. It would specifically be ox_20190404/assessments/LWR/validation/. That's where I want to find all my *_out.csv files. – dylanjm Apr 05 '19 at 20:22
  • I would try to approach this from the $server side, instead -- since you have ssh access. Alternatively, maybe an sshfs mount? – Jeff Schaller Apr 05 '19 at 20:24
  • @JeffSchaller so you're suggesting to maybe write a script that's stored on the server side to just pass the files to my local machine? That would take out the back & forth and simplify the problem a bit? I'm not sure about sshfs because this will be a generalized script that will eventually need to be run on a cron-job or something. – dylanjm Apr 05 '19 at 20:36
  • Are you asking “How do I find the most recently changed folder?” – ctrl-alt-delor Apr 06 '19 at 10:46
  • @ctrl-alt-delor No, I already know how to do that, I'm asking how to then access files several folders deep after finding the most recent folder and then use scp or anything else to fetch those files and place them in the same folder structure on my machine. – dylanjm Apr 06 '19 at 18:02

3 Answers3

1

I've adjusted some of your variable names a bit.

Surely there are better ways to do this than something dangerous like parsing the output of ls, but see whether this works for you:

$ pth="/projects/ox/git"
$ server="myusername@server"
$ dir="$(ssh $server "ls -t \"$pth\" | head -1")"
$ mkdir -p "$pth/$dir"
$ scp -p $server:"$pth/$dir"/'*_out.csv' "$pth/$dir"/

Once dir has been set to the newest remote directory, mkdir -p is used to ensure that the same directory name exists locally. Then scp the files into a local directory with the same path and name as the remote directory. I was looking for an rsync solution, but couldn't think of one.

Jim L.
  • 7,997
  • 1
  • 13
  • 27
  • Your answer gets me most of the way there, but is there a recursive flag we can set to get all the *_out.csv files? The files I want aren't directly in $dir but scattered about inside that folder. Do you get what I mean? – dylanjm Apr 08 '19 at 16:42
  • As an exercise to build your skills, can you craft a find command line to run on the remote server that will create a list of all of the remote server's *_out.csv files that are in or below $pth/$dir? Then capture that list into a temp file foo or some better name, and then rsync -av $server: "$pth/$dir" --files-from foo Then update your question to fully describe the NEW steps you're taking, and describe what part of the task you're still missing. – Jim L. Apr 08 '19 at 23:25
  • Please see my answer as I think it will shed more light on what I was trying to do. – dylanjm Apr 09 '19 at 14:56
0

This will find the most recently modified (created) directory, assuming that the directory name does not contain a newline (\n)

newest=$(
    ssh -qn REMOTE 'find ./* -mindepth 0 -maxdepth 0 -type d -printf "%T@\t%f\n"' |
    sort -t$'\t' -r -nk1,2 |
    head -n1 |
    cut -f2-
)

If you can guarantee that the target contains only directories of interest you can simplify it considerably (again, bearing in mind the newline issue)

newest=$(ssh -qn REMOTE ls -t | head -n1)

You can copy an entire tree of files using scp, but if you want to filter it you'll probably be better off using rsync

rsync -av --include '*/' --include '*_out.csv' --exclude '*' --prune-empty-dirs REMOTE:"$newest" "$newest"

If you're keeping the previous set of files locally and you really just wanted to add the latest set without copying the previous ones, rsync can do that too

rsync -av --include '*/' --include '*_out.csv' --exclude '*' --prune-empty-dirs REMOTE: .
Chris Davies
  • 116,213
  • 16
  • 160
  • 287
0

This is the code that ended up working for me. I might not have described my question perfectly, but I wasn't having trouble finding the most recently changed directory. My problem was then finding all the files in that directory and ensuring they ended up in the right place on my local machine. Here is the bash script to do so:

# Grab the most recently updated ox file off of server; return as string
# in the form of ox_XXXXXXXX/assessment/LWR/validation/*
newest=$(
    ssh -qn username@server 'find /projects/ox/git/* -mindepth 0 -maxdepth 0 -type d -printf "%T@\t%f\n"' |
    sort -t$'\t' -r -nk1,2 |
    head -n1 |
    cut -f2- |
    awk '{print "/projects/ox/git/"$1"/assessment/LWR/validation/HBEP/analysis/BK363/*"}'
      )

# Take $newest and find all associated *_out.csv files beneath that directory
newestf=$(
    ssh -qn username@server "find $newest -type f -name '*_out.csv'"
    )

# Write these filepaths to a .csv on the local machine
echo "$newestf" | tr " " "\n" remote_fp.csv

# Run Rscript to parse and transform the filepaths so they go to the write place on local machine
Rscript ~/transform_fp.R

# Read content from .csv remote file paths - we'll need these to actually pull the files using scp
get_scp_fp=$(awk -F "\"*,\"*" '{print $1}' ~/remote_fp.csv)

# Read content from .csv local file paths - we'll need these to actually write the data locally
get_local_fp=$(awk -F "\"*,\"*" '{print $1}' ~/local_fp.csv)

# Loop through file paths and pull data from remote to local. 
for i in $get_scp_fp; do
    for j in $get_local_fp; do
    scp -p username@server:"$i" "$j"
    done
done

Rscript:

suppressPackageStartupMessages(library(tidyverse))

test <- read_csv("remote_fp.csv", col_names = FALSE)

str_replace_all(test$X1, "/projects/ox/git/ox_[0-9]{8}", "~/Documents/projects/ox") %>% 
  str_replace_all("(?:analysis).*$", paste0("doc/figures/", basename(.))) %>% 
  tibble() %>% 
  write_csv(path = "~/local_fp.csv", col_names = FALSE)
dylanjm
  • 201