11

I found the inode of a file using ls -li. Then I found the file’s starting block on disk. I copied the contents of the block to another directory using the dd command. Then I shredded the file using the shred utility: shred -uvz -n=10 file1.txt. I ran the same dd command again. The file was recovered. I was supposed to get 00000 after the file was shredded. What am I missing?

In the second iteration I ran shred -vz -n=10 file2.txt instead, which does not remove the file. Following the same steps as previously, I was again able to recover the original file using the dd command and block position. However the contents of the shredded file were 00000 as shown by hexdump file2.txt. What am I missing?

Stephen Kitt
  • 434,908
  • 6
    Kindly state the exact series of commands you used. Is it reproducible? – FelixJN Nov 01 '23 at 08:43
  • 10
    What filesystem? What type of disk? – symcbean Nov 01 '23 at 08:54
  • 3
    Only hearsay, but surely the SSD algorithms that ensure even wear rates will defeat shredding methods (because the re-written blocks are not the original ones) ? – Paul_Pedant Nov 01 '23 at 10:04
  • 3
    @Paul_Pedant, yes, but you can't access the old data through the normal block device node. Same applies to any filesystem that doesn't hold the data in a single place always (data-journaling, log-based systems, any copy-on-write), and with those you can read the old data through the block device node. – ilkkachu Nov 01 '23 at 10:08
  • 1
    @ilkkachu I understood the reference to removal as describing shred -u — that’s the only difference between the two commands, and it removes the file after shredding. – Stephen Kitt Nov 01 '23 at 10:41
  • @StephenKitt, oh, right, the flags were different between the two – ilkkachu Nov 01 '23 at 11:13
  • 1
    @paul_pedant the internals of the SSD shuffling should be perfectly abstracted. The SSD does not know about the filesystem so does not know about deletion. The CHS and byte offset are wear-level-mitigation agnostic. – mckenzm Nov 01 '23 at 21:34

1 Answers1

19

Filesystems are supposed to have exclusive access to their block devices. You are not supposed to use dd directly on the block device while the filesystem is mounted.

When you use dd (or any other userspace program) to read bytes directly from a block device, this read is cached. If you repeat dd again, it will read data from cache.

Writing data (through a filesystem) unfortunately does not update this cache. So you get into this situation where data in cache does not reflect data on disk.

Another example where this happens is TRIM. If any block device data was cached, you still get the data from cache, even if TRIM already removed it. That's why you have to drop caches when testing TRIM.

Make a filesystem with some files:

# truncate -s 10G filesystem.img
# mkfs.ext4 filesystem.img
# losetup --find --show filesystem.img
/dev/loop1
# mount /dev/loop1 loop/
# for n in {000..100} ; do yes $n | dd bs=1M count=1 iflag=fullblock of=loop/$n; done 2> /dev/null
# sync

Check file contents and physical offsets:

# hexdump -C loop/042
00000000  30 34 32 0a 30 34 32 0a  30 34 32 0a 30 34 32 0a  |042.042.042.042.|
*
00100000
# filefrag -ve loop/042
Filesystem type is: ef53
File size of loop/042 is 1048576 (256 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..     255:      44800..     45055:    256:             last,eof
loop/042: 1 extent found

Read from block device:

# dd bs=4096 skip=44800 count=256 if=/dev/loop1 | hexdump -C
00000000  30 34 32 0a 30 34 32 0a  30 34 32 0a 30 34 32 0a  |042.042.042.042.|
*
00100000

Shred:

# shred -v -n 1 loop/042
shred: loop/042: pass 1/1 (random)...

Read from block device (cached):

# dd bs=4096 skip=44800 count=256 if=/dev/loop1 | hexdump -C
00000000  30 34 32 0a 30 34 32 0a  30 34 32 0a 30 34 32 0a  |042.042.042.042.|
*
00100000

Read from block device (iflag=nocache):

# dd bs=4096 skip=44800 count=256 if=/dev/loop1 iflag=nocache | hexdump -C
00000000  30 34 32 0a 30 34 32 0a  30 34 32 0a 30 34 32 0a  |042.042.042.042.|
*
00100000
# dd bs=4096 skip=44800 count=256 if=/dev/loop1 iflag=nocache | hexdump -C | head
00000000  59 c2 d8 d4 5a 02 35 15  a1 fb f1 07 ae 53 59 99  |Y...Z.5......SY.|
00000010  5b 47 4f fc 2c e7 d3 db  10 70 c6 72 3e 6f 0b 05  |[GO.,....p.r>o..|
00000020  f5 07 c6 f7 95 64 8b a2  4e 7f 32 4f 0c b1 a3 32  |.....d..N.2O...2|
00000030  18 b5 99 7d 7d 6e 6d d6  b9 36 77 af 30 02 ba 23  |...}}nm..6w.0..#|
00000040  f5 55 a5 b7 01 51 cd 5b  64 c9 29 1f f6 48 23 6c  |.U...Q.[d.)..H#l|

dd iflag=nocache drops data from cache only after reading it from cache, so it has to be done twice to see the new data. Alternatively you can use sync; echo 3 > /proc/sys/vm/drop_caches to drop all caches or try your luck with direct I/O.

frostschutz
  • 48,978
  • 7
    "Writing data (through a filesystem) unfortunately does not update this cache." -- I'm not saying this isn't true, but it conflicts so badly with my mental model of how the file system works that I feel it needs a reference, or tests to confirm it. My model is that both the file system and dd use the block device's cache, so cache coherency would be the normal outcome. – Wayne Conrad Nov 01 '23 at 19:32
  • Indeed, the cache, or buffer we speak of is in RAM, not the storage device's cache which we should not even be aware of for an IDE block device. There may be multiple buffers involved. A mounted filesystem has buffers, but these may be augmented or duplicated. – mckenzm Nov 01 '23 at 21:39
  • 4
    @WayneConrad Your model is technically correct (both do use the block device’s cache, and there is coherency there), but it’s looking at the wrong cache. The issue here is the kernel’s page cache, not the block device’s cache. And for the page cache, the block device and the file being read from the filesystem on it are different files on different filesystems, so there is no relation whatsoever between them in the cache. – Austin Hemmelgarn Nov 02 '23 at 02:33