1

I am asking the same question as this: How to list files that were changed in a certain range of time?

BUT. with a twist:

I have 10 folders with huge content(13Tb) and multiple folders with multiple level of subfolders within it. I would like to get for each folders , the changed file list for a certain period of time with decent performance ( returns within minutes instead of hours... )

example of a directory tree is as follows: Folder repository1 has 26 folder that is A - Z , each of the 26 folders has 26 folders as subfolders. This go till 100 more level like this. Each folder specified above has about >100 at least of images about 300kb - 1mb

In the end, we want to sync 2 systems in different data center with delta changes. We tried Rsync which it takes few hours to detect the changes and it is way over the SLA...

I am asking any linux command or file index with timestamp that I can query the changed file list within a time period. So that i can do Rsync the individual file over.

I am good too if you can suggest any open source tool for this job.

Seng Zhe
  • 111

1 Answers1

2

This depends on your choice of filesystem and how your filesystem maintains this information. What filesystem do you have now? Can you change it if necessary?

If you have a traditional filesystem (such as UFS or EXT), then there's no separate index maintained for timestamps or changes. The only way to find the changes is to visit every inode and examine the timestamp. When the filesystem is large (> 10M inodes), it's going to take a while to query. If your disks are fast, you can probably improve speeds a bit by splitting up the search and running multiple threads. If your disks are slow or are IOPs bound already, then multiple threads may not improve anything.

Other filesystems (such as BTRFS or ZFS) can maintain a record of all changes over a period of time and can transmit those changes to a replica location. You can send the incremental differences to your replica location quickly without using rsync.

example of a directory tree is as follows:

Except for making it easier to split up to multiple commands, the structure doesn't matter at all. For a traditional filesystem (and definitely for an NFS client), all you can do is search each and every file. That means running one or more find/rsync processes and waiting for it to finish.

my filesystem is NFS.

You are using NFS currently to access some other filesystem. If this is all you can do, then you're going to have to visit every file.

On the other hand, there could be anything that is serving the NFS. If it's a netapp appliance, then it can ship changed blocks to another netapp (assuming you have another netapp and the correct licenses).

Netapp does have a vendor API to gather information about changed blocks (snapdiff), but it's not available to you.

BowlOfRed
  • 3,772
  • inode is only for a folder and not its subfolder right? So i have iterate each level and retrieve its inode of subfolders? – Seng Zhe Mar 23 '17 at 08:00
  • Basically every file and folder in the filesystem has its own inode (ignoring multiple hard links), and you'll have to visit every one to guarantee finding all changes in a traditional filesystem. – BowlOfRed Mar 23 '17 at 08:06
  • my filesystem is NFS. does that mean i have to go with inode too ? – Seng Zhe Mar 23 '17 at 08:09
  • Your filesystem may be accessed via NFS, but exist on the NFS server in some other form. Do you know what that server is? It may be easier to copy things from that server directly (Linux? Netapp appliance? Synology home device?) – BowlOfRed Mar 23 '17 at 08:12
  • It is a netapp mounted drive, not entirely sure whether we can run script on that machine – Seng Zhe Mar 23 '17 at 08:17