Delete files with names that appear to begin with '?' in command line

Question

I am running under Debian stable, Cinnamon DE and I have some files that I would like to delete with a command line (for now I delete these files with Nemo).

For example, these .txt files begin with '?' in the shell and in Nemo this '?' is replaced by a carriage return:

$@debian: ls
ssolveIncpUL46pK  ?ssolveIncpUL46pK.txt

I tried:

 rm ?ss*
 rm \?ss*
 rm \ ss*

Most likely, rather than ?, it's a non-printable character that ls renders as ?. What's the output of ls | sed -n l? — Stéphane Chazelas, Aug 21 '18 at 16:39
what about rm "?FileName" or find . -name "\?*" -exec rm {} \; rm has not regex type — Hossein Vatani, Aug 21 '18 at 16:43
@HosseinVatani rm does not parse ? or *, but the shell expands these globs before rm executes. — Kevin Kruse, Aug 21 '18 at 16:45
@ kevin-kruse yes, I meant he should not try to pass regex phrase to rm — Hossein Vatani, Aug 21 '18 at 16:49
@StéphaneChazelas Maybe ls -Q would work better in Debian. — , Aug 21 '18 at 18:16
If you take the time to search... https://unix.stackexchange.com/q/28983/22142 or https://unix.stackexchange.com/q/402558/22142 or https://unix.stackexchange.com/q/33483/22142 and plenty of others... — don_crissti, Aug 21 '18 at 18:55
@isaac, better as LC_ALL=C ls -Q as there are plenty of Unicode characters that would result in ambiguous output with GNU ls -Q. Those should be OK with GNU sed -n l, but you would also need LC_ALL=C with some other sed implementations. It's true that newline characters would be a problem with ls | sed -n l. — Stéphane Chazelas, Aug 22 '18 at 19:47
@StéphaneChazelas Care to give a couple of examples of ambiguous output of Unicode? — , Aug 22 '18 at 22:40
@isaac, see with touch $'\ue9' $'e\u301' $'foo\u200bbar' foobar; ls -Q and the many other "invisible" characters, or the many characters that look the same or are the same but meant to be used in different contexts (like U+00C5 vs U+212B or the mathematical letters...). — Stéphane Chazelas, Aug 22 '18 at 22:58
@StéphaneChazelas The intent of using -Q is to expose what is the ? encoding. This works reasonably well for humans, where a human could differentiated the filenames. That human language has confusing characters (like a Cyrillic а ($'\U430') and a Latin a ($'\U61')) that may "look" exactly the same is a different problem for which there is no simple solution. In any case, I find \303\251 much more difficult to process visually than é in everyday use. Programs and scripts do not have that problem. — , Aug 23 '18 at 00:00
@StéphaneChazelas In looking back at this issue I find that it happens that none of your example filenames are encoded with a ? with ls. So, ls -Q is not a solution for your problem. — , Aug 23 '18 at 00:17
Note the ls options --show-control-chars, --hide-control-chars (-q) and --escape (-b). — Volker Siegel, Aug 23 '18 at 02:01
@isaac Here a possible explanation could be $'\ufeff\nssolveIncpUL46pK.txt' (with a UTF-8 BOM as sometimes found at the start of strings coming from the Microsoft world) which would show as ?ssolveIncpUL46pK.txt (and "\nssolveIncpUL46pK.txt" with ls -Q and "\357\273\277\nssolveIncpUL46pK.txt" with LC_ALL=C ls -q) but not match the ?ss* as there are two characters before ss. — Stéphane Chazelas, Aug 23 '18 at 06:35
@StéphaneChazelas There are three levels here IMO. (1) In general the file names should be meaningful to users (otherwise we would just use the inode number). (2) If the name may be confusing, it may help to use ls -Q (not -q as in your comment as it means something different) and (3) If both tests fail, then use something like ls | od -c or even ls | od -tx1c as a tool of last resort. I really dislike the use of LC_ALL=C for everything. — , Aug 23 '18 at 18:00

Siva · Answer 1 · 2018-08-21T17:33:56.017

26

The appropriate way to remove these kind of files is by using the inode value of the file.

Use the following command to get inode value

 ls -li 

 12582925 -rw-r--r--  1 root root   646 May 23 02:19 ?ssolveIncpUL46pK.txt

The first field of the longlisted result is inode value.

Then use the find command to delete the file w.r.t inode.

find . -inum 12582925 -exec rm -i {} \;

edited Aug 21 '18 at 17:33

answered Aug 21 '18 at 16:54

Siva

9,077

This would be appropriate if no part of the filename could be matched with a filename globbing pattern, and if the find implementation on the system in question supports the nonstandard -inum predicate (which e.g. GNU find does). – Kusalananda Aug 22 '18 at 14:58
4

This is actually unsafe. More than one file can have the same inode (using hard links). So instead of deleting just the file with the apparent question mark in it, you would be deleting that file and all the other files with the same inode number, in that directory and in any descendent subdirectories. – Flimm Aug 22 '18 at 19:35
@Flimm ls -li will tell you whether the file has any hard links. (If the number immediately before root root is 1, then there are no hard links). – craq Aug 23 '18 at 04:21
1

@craq Actually, I think this may be unsafe even in the case where ls is showing only 1 hard link. This is because two separate files with separate data entries can actually share the same inode number if they are on different mounted filesystems, I think. You should probably use the -x or -xdev option to avoid this (although you still have the problem of the previous comment.) – Flimm Aug 23 '18 at 12:44
@Flimm inodes belong to one filesystem only, and it is not possible to hardlink across filesystems. See https://unix.stackexchange.com/questions/290525/why-are-hard-links-only-valid-within-the-same-filesystem or https://www.cyberciti.biz/tips/why-isnt-it-possible-to-create-hard-links-across-file-system-boundaries.html – craq Aug 23 '18 at 21:22
1

@craq What Flimm is getting at is that an inode n may be allocated on several different mounted filesystems, for very different files. Having inode 1234 on one filesystem does not exclude that the same inode, 1234, exists in another. – Kusalananda Aug 24 '18 at 14:58
@Kusalananda, wow, I never thought of that. Now I agree that deleting by inode is very dangerous. The files would be completely unrelated, so you'd probably never even realise that you'd deleted more than you thought. – craq Aug 24 '18 at 19:23
@Flimm see above (I can only notify one person per comment). That new question you asked really helped clarify this for me too. Thanks for that. – craq Aug 24 '18 at 19:24
@Flimm What if additional limitations are placed on the search, say inode value of x and within directory y, would that then be OK? – Josh Rumbut Aug 27 '18 at 13:50

Kusalananda · Accepted Answer · 2018-08-21T19:03:52.477

18

The character is not a question mark. The ls utility will replace non printable characters with ?. It is further unclear whether the non printable character really is the first character in the filename or whether there may be one or several spaces before that.

Would you want to delete both those files, you could match the "bad part" with * and then specify the rest of the visible filename more closely:

rm -i ./*ssolve*

This would first expand the given pattern to all the filenames matching it, and then rm would remove them. Be more specific and specify a longer part of the filename if there are files that you don't want to delete that matches the above short pattern, e.g. with

rm -i ./*ssolveIncpUL46pK*

This is assuming that you are located in the same directory as the files that you want to delete.

The -i option to rm makes it ask for confirmation before actually deleting anything.

edited Aug 21 '18 at 19:03

answered Aug 21 '18 at 16:39

Kusalananda

333,661

8

The next question would be: why does *ssolve* match it when ?ss* doesn't? – Stéphane Chazelas Aug 21 '18 at 17:29
4

@StéphaneChazelas Because ? does not match enough of the "bad part" of the filename. It may have something to do with multi-byte characters? Or it may simply be that there's a space before the non printable character. You're better than me with that sort of thing. – Kusalananda Aug 21 '18 at 17:37
We'll probably never know unless the OP posts the output of ls | sed -n l. – Stéphane Chazelas Aug 21 '18 at 18:01
Might want to do echo ./*ssolve* before the rm... just in case. – Christopher Schultz Aug 22 '18 at 19:44

score 12 · Answer 3 · 2018-08-26T00:04:17.187

It is not recommended to use a * to remove files. It could match more than you like.

Being in Debian, the ls (from GNU) command is able to print the values of the files in quoted form^[1]:

$ ls -Q
"\nssolve"  "\n\nssolve"  "y"  "z"

Or, even better, list files with quoted names and inodes:

$ ls -iQ
26738692 "\nssolve"  26738737 "\n\nssolve"  26738785 "y"  26738786 "z"

Then, use rm with the inode number to ensure that only the correct files are removed:

$ find . -xdev -inum 26738737 -exec rm -i {} \;

The call to find is limited to one filesystem (-xdev) to avoid matching a file on other filesystem with the same inode number. Note also that rm is being called with the -i (interactive) option, so it will ask in the command line if each file should be erased.

^[1] Note that this do not solve the problem with visually confusing characters like a Cyrillic а ($'\U430') and a Latin a ($'\U61') that look exactly the same but are not. To have a better look at the bytes that a filename is using we need to use an hex viewer;

$ touch а a é $'e\U301' $'\U301'e
$ ls
a  ́e  é  é  а              # what you "see" here depends on your system.

$ printf '<%s>' * | od -An -c
   <   a   >   < 314 201   e   >   <   e 314 201   >   < 303 251
   >   < 320 260   >

Delete files with names that appear to begin with '?' in command line

3 Answers3