0

Working from home, the contents of my laptop undergoes a lot of changes. Before lockdown, I was usually in the office, and I could manually mirror files to a network drive.

Working from home means no network access. I have to compare the network drive contents to the laptop drive contents using a listing command such as:

touch ~/tmp/2021-07-25.txt -d 2021-07-25

stdbuf -i0 -o0 -e0
find ~/!(tmp) ~/.[^.]*
-type f -newer ~/tmp/2021-07-25.txt
-printf '%p\t%CY%Cm%Cd.%CH%CM\t%s\n'
2>&1 | tee ~/tmp/find.out

The "stdbuf" command simply disables buffering for the ensuing command so that I can see signs of life. The file/folder arguments to "find" exclude "tmp" and include all the files/folders starting with ".", but not the current folder "." itself and not the parent folder "..". The printf clause prints out the file information in the format:

File path <tab> YYYYMMDD.HHmm <tab> size

YYYY = 4-digit year MM = 2-digit month DD = 2-digit day-of-month HH = 24-hour hour MM = 2-digit minute

The above is for the laptop. I can remote in to the on-site desktop and do a similar listing of my network drive for comparison. There, find's file/folder arguments consist only of "/i" (the letter-drive mapping of the network drive). That top folder doesn't contain files/folders starting with ".". I have to clean up the two listings to avoid the discrepancy between "~" and "/i".

My next step is to "diff" the listings from the laptop and the network drive. Unfortunatley, diff is easily thrown off by the commonality in the "dirname" portions of the path. I tried modifying find's printf format to obtain:

Basename <tab> File path <tab> YYYYMMDD.HHmm <tab> size

Unfortunately, diff is easily misled to match files based on initial few characters.

What I really want is for diff to match lines based on the file path, then match characters thereafter, i.e., based on the YYYYMMDD.HHmm stamp, then on the size. Is there a way to steer diff into doing this?

Afternote: I'm using Cygwin on both ends. Unfortunately, being in a Windows environment, names of files shared with (or coming from) others have spaces. Some folder names also have punctuations and spaces. If needed, I can mitigate the problem by embedding a marker (e.g., "QQQQ") after the file path, but the fields in find's output are already delimited by <tab>. I still need "diff" to match lines based on the entire file path, then look for discrepancies (if any) after the file path field.

  • 1
    "Working from home means no network access" does your company not offer VPN connectivity? – Chris Davies Apr 25 '22 at 21:17
  • 1
    Not for the network being discussed. Only remote desktop and FTP, which I will use surgically to migrate files that differ. – user2153235 Apr 25 '22 at 21:17
  • [ ! -f ~/marker ] && touch -t 197001010000 ~/marker; touch ~/marker.new; find ~ -type f -newer ~/marker -print; mv ~/marker.new ~/marker will get you files from find that have been created or modified since the last run. (It won't identify deleted files, however.) – Chris Davies Apr 25 '22 at 21:20
  • I'm interested in a less specific diff between file listings because I may have deleted files on the laptop (or re-arranged some of the hierarchy), which I also want to mirror to the on-site network drive. I also want want to explicitly choose a time window starting before my last "sync" because I might catch stuff missed before. It is a very manual process. – user2153235 Apr 25 '22 at 21:35
  • Sure no worries – Chris Davies Apr 26 '22 at 09:13

1 Answers1

1

This answer is admittedly incomplete, not least because I don't have a Windows machine from which to test. But I will touch first briefly on the grave disservice your IT department is doing you by not providing a VPN service to facilitate your work from home. Certainly in these unprecedented times, a greater effort on their part would save you untold hours of unnecessary finagling.

The only solution I can think of that's going to come close to solving your needs is rsync, which is available on CygWin. The problem will arise in creating a viable SSH link between your home laptop and office network drive. Fortunately, that is arguably more straight-forward to solve than some sort of Windows-compatible ad-hoc file diff'ing system. You will likely have to set your office machine to SSH out to your home laptop (with a TCP port pass-through set up on your home router). That in turn may require your home laptop to be set up on some sort of dynamic DNS so that the office machine can SSH to it reliably. I'm envisioning that the office machine will SSH to your laptop solely to create a tunneled TCP connection to bind perhaps to port 22 on IP 127.0.1.1 on your laptop. Then it shouldn't be too great a leap to being able to rsync from your laptop to 127.0.1.1 (your office machine).

In brief:

  1. set up dynamic DNS on your home router's IP so that it can be reached at a known DNS name;

  2. set up a pass-through TCP port (nominally port 922) on the home router;

  3. set up an SSH VPN that your office machine can initiate to connect outgoing through your office firewall (since your IT department apparently won't accommodate incoming SSH connections) to port 922 at your home router's dynamic DNS name, and configure your home laptop to allow the incoming connection attempts to bind to any known, unused TCP port on a local (to your laptop) IP.

Once the office computer has connected out to your home laptop, the SSH tunneling should give you a functional VPN tunnel whereby you can:

  • SSH from home laptop to 127.0.0.1 port 2222 (or some unused port of your choosing) and be logged in to your office machine
  • rsync from home laptop to 127.0.0.1 port 2222 to upload or download files

Granted, that is somewhat of a tall order, with many steps. I suspect there are already resources on StackExchange that address how to accomplish those steps.

You will likely find better links, but to get you started:

A vote for ddclient: https://apple.stackexchange.com/questions/75372/how-do-i-configure-dynamic-dns-when-my-router-does-not-support-it

A brief SSH tunneling how-to: How can I configure a reverse SSH connection to the connecting computer?

By exploring those resources and others you may find on your own, I'm confident you can piece together the parts of a greater solution and coalesce them into a usable tool for your needs. Good luck! And post any follow-up questions you have as you work through this chain of tasks.

Jim L.
  • 7,997
  • 1
  • 13
  • 27
  • I want to express my appreciation for the time you took to elaborate on your answer. Unfortunately, a solution involving connections other than the few approved methods is a non-starter. At this point, however, with so many disparities between the two file systems, I would still need a diff'ing solution even if I had unfettered network access. What I did was to read the listings from the two file systems into Matlab dataframes (which they call "tables, same as in SQL) and outer-join the tables based on the "Path" field. – user2153235 Apr 26 '22 at 00:07
  • In the outer-joined table, I whittled away records having mismatched date/time, as long as the size matches. Otherwise, they all mismatch, likely because the the date/time reflects when the file was copied. Find's printf uses %C for status change, but it also has %T for modification times. I may try the latter next time, but for this round, I will accept the risk just to get the job done. I don't know how accurately the Windows file timestamps translates into Unix time stamps anyway. This leaves "only" 1333 discrepancies to manually reconcile. – user2153235 Apr 26 '22 at 00:13
  • Nice work, treat it as a database using the filename as the key value. Kudos for an outside-the-box solution. – Jim L. Apr 26 '22 at 00:14
  • 1
    Outside the box thinking is just a nice way of saying mad desperation :) . In case it helps anyone else, I found that I needed to redo the process using %T because if the sizes mismatch, I need a hint of which has more recent modifications. Thankfully, using %T gave saner results. There are only about 2 dozen files that match in size but differ in modification time. Previously, I used time of status change, which made all files with matching paths differ. – user2153235 Apr 26 '22 at 00:43