38

Is there a way to apply the dos2unix command so that it runs against all of the files in a folder and it's subfolders? man dos2unix doesn't show any -r or similar options that would make this straight forward?

sam
  • 22,765

6 Answers6

53

find /path -type f -print0 | xargs -0 dos2unix --

8

Using bash:

shopt -s globstar
dos2unix **

The globstar shell option in bash enables the use of the ** glob. This works just like * but matches across / in pathnames (hence matching names in subdirectories too). This would work in a directory containing a moderate number of files in its subdirectories (not many thousands).

In the zsh and yash shells (with set -o extended-glob in yash), you would do

dos2unix **/*
Kusalananda
  • 333,661
6

Skipping binaries and hidden files were important for me:

This one worked well for me:

find . -type f -not -path '*/\.*' -exec grep -Il '.' {} \; | xargs -d '\n' -L 1 dos2unix -k

Which translates to: find all non-hidden files recursively in the current directory, then using grep, list all non-binary (-I) non-empty files, then pipe it into xargs (delimited by newlines) one file at a time to dos2unix and keep the original timestamp.

See also:

https://github.com/mdolidon/endlines

Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232
phyatt
  • 607
5

You can use find to find all of the files in a directory structure that you want to run through your dos2unix command

find /path/to/the/files -type f -exec dos2unix {} \;

Take a look at the man pages for find, there are a lot of options that you can use to specify what gets evaluated

JayJay
  • 160
2

How to recursively run dos2unix (or any other command) on your desired directory or path using multiple processes

This answer also implicitly covers "how to use xargs".

I've combined the best from this answer, this answer, and this answer, to make my own answer, with 3 separate solutions depending on what you need:

  1. Run dos2unix (or any other command) on all files in an entire directory.

    find . -type f -print0 | xargs -0 -n 50 -P $(nproc) dos2unix
    

    (NB: do not run the above command in a git repository or else it will botch some of the contents of your .git dir and make you have to re-clone the directory from scratch! For git directories, you must exclude the .git dir. See the solutions below for that.)

  2. Run dos2unix (or any other command) on all files, or all checked-in files, in an entire git repository:

    # A) Use `git ls-files` to find just the files *checked-in* to the repo.
    git ls-files -z | xargs -0 -n 50 -P $(nproc) dos2unix
    

    Or B): use find, to find all files in this dir, period, but exclude the

    .git dir so we don't damage the repo.

    - See my answer on excluding directories using find:

    https://stackoverflow.com/a/69830768/4561887

    find . -not ( -path "./.git" -type d -prune ) -type f -print0
    | xargs -0 -n 50 -P $(nproc) dos2unix

  3. Run dos2unix (or any other command) on all files, or all checked-in files, in a specified directory or directories within a git repository:

    # 1. only in this one directory: "path/to/dir1":
    

    A) Use git ls-files to find just the files checked-in to the repo.

    git ls-files -z -- path/to/dir1 | xargs -0 -n 50 -P $(nproc) dos2unix

    Or B): use find to find all files in this repo dir, period.

    find path/to/dir1 -type f -print0 | xargs -0 -n 50 -P $(nproc) dos2unix

    2. in all 3 of these directories:

    A) Use git ls-files to find just the files checked-in to the repo.

    git ls-files -z -- path/to/dir1 path/to/dir2 path/to/dir3
    | xargs -0 -n 50 -P $(nproc) dos2unix

    Or B): use find to find all files in these 3 repo dirs, period. Note

    that by specifying specific folders you are automatically excluding the

    .git dir, which is what you need to do.

    find path/to/dir1 path/to/dir2 path/to/dir3 -type f -print0
    | xargs -0 -n 50 -P $(nproc) dos2unix

Speed:

Unfortunately, I didn't write down the time it took when I ran it, but I know that the git ls-files -z | xargs -0 -n 50 -P $(nproc) dos2unix command above converted about 1.5M files in my massive git repo in < 3 minutes. The multi-process command I used above helped a ton, allowing my computer's total CPU processing power (consisting of 20 cores) to be as high as 90% utilized overall throughout the duration of the procedure.

Explanation:

  1. dos2unix is the command we are running via xargs.
  2. The -print0 in find, -0 in xargs, and -z in git ls-files, all mean to "zero-separate", or "null-separate" file path listings. This way, file paths with special chars and spaces are easily separated by simply looking for the binary zero separating them.
  3. nproc lists the number of CPU cores your computer has (ex: 8). So, passing -P $(nproc) says to spawn as many processes to run the command (dos2unix in our case) as we have cores. This way, we attempt to optimize the run-time by spawning one worker process per CPU core.
  4. xargs allows running individual commands from input piped to it in a stream.
  5. -n 50 says to pass 50 filepaths to each process spawned to run the command (dos2unix in our case); this way, we reduce the overhead of spawning a new dos2unix process since we pass it many files at once to process, rather than just one or two or a few.
  6. find . finds files (-type f) in the current directory (.).
  7. git ls-files lists all files in your git repository.
    1. -- ends the options passed to git ls-files by marking to its parser that no more options to this function will come afterwards. In this way, it knows that everything after -- is going to be a list of file or folder paths.

References:

  1. The 3 answers I linked to above.
  2. Where I learned about nproc: How to obtain the number of CPUs/cores in Linux from the command line?
  3. My answer on How do I exclude a directory when using find?

See also:

  1. How to find out line-endings in a text file? - use file instead of dos2unix in the commands above if you just want to see what the line endings currently are for all files in a given directory.
  2. My answer: What are the file limits in Git (number and size)?
  3. GitHub: Configuring Git to handle line endings
  4. Another xargs example of mine, with the addition of the -I{} option to specify argument placement: How to unzip multiple files at once...using parallel operations (one CPU core per process, with as many processes as you have cores), into output directories with the same names as the zip files
  5. Sometimes you need to use bash -c with xargs in order to get proper substitution, such as with dirname. See here: Stack Overflow: Why using dirname in find command gives dots for each match?
    1. I used that trick in some of the xargs commands to extract .zip files in my repo here: https://github.com/ElectricRCAircraftGuy/FatFs. See the readme for those xargs commands.
1

Use a wildcard. Like this: (If you're in the folder)

dos2unix *

or if you're outside of the folder do:

dos2unix /path/to/folder/*
DisplayName
  • 11,688