4

My company developed an application that picks up (watches for) xml files of <10kb in size from a directory, reads it in sends the body as an api call to an external service and then moves the file into a processed directory.

Due to the volume of files - roughly 2000/min we were getting dreadful performance out of NTFS. We were no where near able to keep up with the processing.

I'm a Linux guy through and through and from experience Linux would handle this situation a lot better especially with things like inotify which are leaps and bounds ahead of the ntfs api, that's why I've ported the code to .NET Core to give it a shot.

At home I use XFS on my Workstations and ZFS on my servers, so aside from ext4 - I have no real experience with any other filesystem.

So my question is - which filesystem (preferably in-tree) would be the most performant for this kind of workload.

Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232
  • 3
    Just try ext4. Don't over optimise early. – ctrl-alt-delor Nov 04 '18 at 20:17
  • 1
    What about the hardware (the drive, where your file system will reside)? – sudodus Nov 04 '18 at 20:24
  • Currently it's on hyper-v with shared vhdx, it's going to move over to ESXi - I don't know the exact specs of the hypervisor. – user3861788 Nov 04 '18 at 20:33
  • If the data is small in relation to the amount of memory then I would look at tmpfs; basically a RAMdisk. It's non-persistent, but is great for temporary files. So code your primary loop to work from tmpfs, and then have a batch (once an hour?) move processed files to persistent storage (if it's needed). – Stephen Harris Nov 04 '18 at 21:49
  • @ctrl-alt-delor It is probably nearly impossible for the OP to change his root filesystem. – peterh Nov 04 '18 at 22:05
  • See https://unix.stackexchange.com/questions/28756/what-is-the-most-high-performance-linux-filesystem-for-storing-a-lot-of-small-fi – Panther Nov 04 '18 at 22:28
  • @peterh if the OP could not change it, then they would not ask. – ctrl-alt-delor Nov 05 '18 at 10:35
  • 1
    @StephenHarris what you are describing is caching. You will probably not be better at it then the kernel is: once you have read the data, it will all be in ram (cached). It will only fall out of cache if the data set is large. – ctrl-alt-delor Nov 05 '18 at 10:36
  • @ctrl-alt-delor No.. tmpfs is different to caching. – Stephen Harris Nov 05 '18 at 18:13
  • @StephenHarris yes a little, but. – ctrl-alt-delor Nov 05 '18 at 21:56

0 Answers0