0

It's kinda basic but surprised me a lot.

Im using on WSL (Unubtu 18.04 LTS) poppler-utils for managing PDFs, usually it's as simple as

cd <DIR of bunch of PDFs>
pdfunite `ls` out.pdf

as pdfunite SYNOPSIS just requires list of PDFs to merge and output file name.

But recently I got some files with characters specific for my county as ó, ś, ź and then I'm getting following I/O error:

I/O Error: Couldn't open file 'Og<c3><b3>rki': No such file or directory.
Syntax Error: Could not merge damaged documents ('Og<c3><b3>rki')

(Ogórki witch is polish for cucumbers ;) )

C3 B3 of course corresponds to UNICODE:

U+00F3 ó c3 b3 LATIN SMALL LETTER O WITH ACUTE

But is there an option to force ls to pass such chars in correct format as well when they are substitued?

Or problem is elsewhere, as when I'm substituting them with echo:

echo `ls`

I'm getting correctly formatted UNICODE chars.

Thanks in advance!

Tomas
  • 101
  • 1
  • Does it work if you replace the command substitution with the safer *? I.e. pdfunite * out.pdf – Kusalananda Jun 14 '21 at 10:51
  • Yes it does, but main question remain: can we make it work with substitution, or if no, then why so :) And BTW, how do you evaluate * is "safer"? – Tomas Jun 14 '21 at 11:02
  • 5
    ls can print any file name, including the ones with spaces in them, line breaks, semicolons..., and your shell will then take that string and tear it apart into individual arguments, incorrectly. * tells your shell to take exactly one file name per argument, never a problem. "Don't use ls to get a list of files in shell scripts" is a rule that usually we hammer into beginners very early, so this might be your lucky day :) – Marcus Müller Jun 14 '21 at 11:15
  • 7
    @Tomas because parsing ls is always unsafe precisely because it leads to this sort of error. See https://mywiki.wooledge.org/ParsingLs and Why *not* parse `ls` (and what to do instead)?. – terdon Jun 14 '21 at 11:16
  • Are these files on Linux or Windows filesystem? – tansy Jun 14 '21 at 11:33
  • @tansy they live on Windows FS – Tomas Jun 14 '21 at 11:35
  • 1
    You don't want to parse ls output. Apart from @terdon's examples mentioned before, here you can see how it manifests in real life. – tansy Jun 14 '21 at 11:38
  • Check what happens if you copy them onto Linux fs. It may be ecoding question. Linux codes names with utf-8 (as you shown) but Windows in utf-16, so it may be that name in file system is actually different (differently encoded) on fs and different in ls. – tansy Jun 14 '21 at 11:41
  • This is a perfect case of "Doctor, it hurts when I do this..." "Then don't do that" ; wildcards are there for exactly this case. And you shouldn't do * either, it should be *.pdf ; output of ls is designed to be read, not substituted. – user10489 Jun 14 '21 at 11:55
  • I tried to reproduce it with file on ntfs with some diacritic $ cat \ls b*` > cc` but it worked, so I'm not sure. The others have spaces in name which disqualifies ls usage. – tansy Jun 14 '21 at 12:13

0 Answers0