Programmatically identify block device of filesystem

Question

The following code was tested on RedHat 7 using df verison 8.2 and lsblk version 2.23.2. This is important as the lsblk utility has signficantly more output options than the RedHat 6 version (2.17.2)

It is rather easy to determine the filesystem that contains a file:

df -h /path/to/file | tail -n 1 | awk '{print $1}'

However it is surprisingly difficult to determine what block device contains that filesystem. Sure, you can manually determine the block device with lsblk -f, but I'm wondering how you do this in an automated way.

I have done some pretty deep digging, but I havent been able to find any way to do this. Which is strange because it seems like there is a common use case (scanning a directory on a drive, and returning the serial number of the physical disk that you scanned!)

I created a recursive bash function that does some ugly text parsing, which works, it just seems pretty hackish. Although the lsblk documentation does seem to suggest that if you use the --output switch, then you will have reliable scripts across version updates

function findBlockDevice {
  fileSystem="${1}"
  count="${2}"

  potentialBlockDeviceOutput=$(lsblk --paths --output name,type | grep "${fileSystem}" -B${count} | head -n 1)
  blockDevice=$(echo ${potentialBlockDeviceOutput} | awk '{print $1}')
  blockType=$(echo ${potentialBlockDeviceOutput} | awk '{print $2}')

  if [[ "${blockType}" != "disk" ]]; then
    count=$(( count + 1 ))
    findBlockDevice "${fileSystem}" "${count}"
  else
    echo "${blockDevice}"
  fi
}

Usage:

# Assume directory is on /dev/sda1
scanDirectory='/media/suspiciousDrive'
fileSystem=$(df -h ${scanDirectory} | tail -n 1 | awk '{print $1}')
blockDevice=$(findBlockDevice ${fileSystem} 0)

echo "${fileSystem}" # /dev/sda1
echo "${blockDevice}" # /dev/sda

# Now we can get the disk information to use in a report
lsblk --nodeps --paths --pairs --output NAME,SERIAL,MOUNTPOINT,VENDOR,\
    FSTYPE,UUID,MODEL,SIZE,TYPE,WWN,STATE ${fileSystem}

EDIT: The output of df is not sufficient, because the results returned from the lsblk utility are different when given a filesystem, versus a disk. The following command returns much different information when given the entire block device, opposed to just the filesystem

# Run this on your machine and notice the significant difference
lsblk --nodeps --paths --pairs --output NAME,SERIAL,VENDOR,MODEL /dev/sda
lsblk --nodeps --paths --pairs --output NAME,SERIAL,VENDOR,MODEL /dev/sda1

Ultimately I wanted a simple solution to solve the problem with the nature of "Scan this hard drive and also automatically return the drive information of the physical device, regardless of what directory you are scanning on the drive"

I have a solution, its just pretty complex, and was wondering if there was something easier.

Another Edit: Im surprised so many people think this is a duplicate, or are confused as to why the output of df is not sufficient. df returns the filesystem, NOT the block device. Querying information on the filesystem does NOT return any metadata about the block device such as its serial number or model. Why would i programatically want to know the serial number or hard drive model of hard disks? I hope that wouldn't be a serious follow up question from anyone.

The command in your question does list the block device containing the filesystem. What you're calling “the filesystem” is the block device containing the filesystem. Given what you're trying to do, it seems that you're trying to determine the disk containing the block device containing the filesystem, or something like that, but this is poorly defined. What would you expect for a RAID volume, for example? Or for a remote filesystem? — Gilles 'SO- stop being evil', Jun 12 '18 at 20:45
Please edit your question to clarify what you actually want. Give explanations and examples and cover as many cases as you need. As it stands, your question is meaningless because we have no way to know why the simple answer (first column of the df output) doesn't satisfy you. — Gilles 'SO- stop being evil', Jun 12 '18 at 20:46
Possible duplicate of How do I find on which physical device a folder is located? — phemmer, Jun 13 '18 at 01:17

score 1 · Accepted Answer · answered Jun 12 '18 at 22:08

Looking at the code you've provided, it seems that you want to be able to map a file on a filesystem back to a physical disk on which it resides. There appears to be no consideration of RAID, LVM, or encrypted filesystems.

The following code will print the disk device(s) that contain the specified file. For RAID and LVM it's possible the file will be present on more than one device; in this situation all relevant disk device names will be printed, one per line.

read -p 'Filename: ' file
devpart=$(mount | awk -v mount=$(stat --format '%m' "$file") '$3 == mount {print $1}')
lsblk --list | awk -v part="${devpart/#*\/}" '$6 == "disk" {disk = $1} $6 != "disk" && $1 == part {print disk}'

Excellent alternative. I should have known to use stat since that utility seems to solve every problem. — Luke Pafford, Jun 12 '18 at 22:26
@LukePafford the stat doesn't really do anything much different from your df -h, other than (IMO) being a cleaner method. — Chris Davies, Jun 12 '18 at 22:49

Programmatically identify block device of filesystem

1 Answers1