2

I have a file with 20000 lines and I want to split it into smaller files with a numeric suffix, each with 2 lines. So, very simply I am using

split -l 2 -d my_file my_file_new

The output files with numeric suffix are not in order, which means I get files from 00-89 and then instead of 90, 91,92, ... it jumps to 9000, 9001! Does anyone know what can be wrong?

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
Anna1364
  • 1,026
  • 4
    Have you tried using the -a (or --suffix-length) option to pad the numeric suffixes to the same number of digits? That way the shell should sort them naturally. – steeldriver Sep 30 '17 at 22:07
  • Thanks @steeldriver, yes you are right. I just added split -l 2 -a 5 -d my_file my_file_new and they are sorted this time – Anna1364 Sep 30 '17 at 22:20
  • 1
  • I think it is a simple question, but I can't see where would it be a dupe. – peterh Oct 01 '17 at 00:59
  • 2
    I agree with peterh; this Q is not about removing a suffix or changing how the existing files sort, but about telling split to create files that sort naturally. – Jeff Schaller Oct 01 '17 at 01:59
  • @JeffSchaller: I would argue that you’re getting into XY territory.  The OP wants to break a big file into small files that, when listed, contain all the data from the original file (in the correct order).  There are two ways to approach that problem.  The OP specifically laments that file 89 is followed by 900 or 9000;  why not attack from that angle? – G-Man Says 'Reinstate Monica' Oct 01 '17 at 02:18

1 Answers1

1

This seems to be by design, so that when you list the files or use a wildcard to match all of them, they will be shown in the correct order. If the names were strictly sequential, suffix 99 would be followed by 100, but filename.100 sorts between filename.10 and filename.11 (filenames are normally sorted lexicographically, not numerically).

So when it reaches the 90's, it adds more digits to the suffix, to ensure that the additional files will sort correctly if there are more than 10 more. I suppose it could have waited until 99, and then continued with 9900, 9901, etc., but then when it reaches 9999 it will have to add digits again; by increasing at 90 it means it can then handle an additional 1000 files before having to grow.

As mentioned in the comments, you can use the -a option to specify the suffix length instead of letting it choose the default (starting at 2 digits until it reaches 90).

Barmar
  • 9,927