2

I am Junior System Administrator working in IT monitoring field. I have an issue with check_snmp_storage.pl script which shows certain partition (/var) disk usage some percent below than what df command shows on the same server.

I call the check_snmp_storage.pl like this:

perl check_snmp_storage.pl -2 -C public <IP_ADDRESS> -m /var -w 80 -c 90 -G

and the output is like below:

Alarm at 15
SNMP v2c login
Filter : /var
OID : 1.3.6.1.2.1.25.2.3.1.3.8, Desc : Shared memory
OID : 1.3.6.1.2.1.25.2.3.1.3.56, Desc : /dev/shm
OID : 1.3.6.1.2.1.25.2.3.1.3.31, Desc : /var
   Name : /var, Index : 31
OID : 1.3.6.1.2.1.25.2.3.1.3.6, Desc : Memory buffers
OID : 1.3.6.1.2.1.25.2.3.1.3.10, Desc : Swap space
OID : 1.3.6.1.2.1.25.2.3.1.3.40, Desc : /sys/fs/cgroup
OID : 1.3.6.1.2.1.25.2.3.1.3.7, Desc : Cached memory
OID : 1.3.6.1.2.1.25.2.3.1.3.3, Desc : Virtual memory
OID : 1.3.6.1.2.1.25.2.3.1.3.36, Desc : /run
OID : 1.3.6.1.2.1.25.2.3.1.3.32, Desc : /
OID : 1.3.6.1.2.1.25.2.3.1.3.1, Desc : Physical memory
storages selected : 1
1.3.6.1.2.1.25.2.3.1.6.31  : 320923825
1.3.6.1.2.1.25.2.3.1.4.31  : 4096
1.3.6.1.2.1.25.2.3.1.5.31  : 428831117
Descr : /var
Size :  428831117
Used : 320923825
Alloc : 4096
Perf data : /var=1224GB;
/var: 75%used(1224GB/1636GB) (<80%) : OK

In opposite, the df command called on the same server shows disk usage like this:

Filesystem      Size  Used Avail Use% Mounted on
/dev/md3        1.8T  1.4T  354G  79% /var

The difference is too big for my monitoring cluster to perform as a good source of information about our's systems health.

I've tried to find what's the real difference between both is, but I couldn't find any explanation. I assume that df "adds something" to the disk usage, but I don't know what and I have no idea how to make both those values the same (or very close, like 1% diff).

terdon
  • 242,166
Burak
  • 21
  • 1
    Your df output looks like if has the -h flag applied to make the output "human". Can you provide the output without rounding, perhaps with \df -k /var (the \ to avoid any aliases). Possible reasons for things not matching include GB vs GiB (powers of 1000 vs powers of 1024), and not accounting for "reserved" space, typically 5%. – icarus Jan 26 '21 at 15:40
  • 1
    Does this answer your question? – terdon Jan 26 '21 at 15:46

1 Answers1

0

Look about disk system reservation and snmp informations provided. This post explains the difference between the both results :

https://thwack.solarwinds.com/product-forums/f/general-it-topics/19043/linux-drive-monitors-not-accounting-for-reserved-space

As you can see, net-snmp only returns Used and Size, but not available. This leaves it up to the monitoring software to perform a calculation without all of the relevant data, most notably the values available in df's Available column:

The available space is actually pulled in two different ways - bfree and bavail

df.c :

input_units = fsu.fsu_blocksize;
      output_units = output_block_size;
      total = fsu.fsu_blocks;
      available = fsu.fsu_bavail;
      negate_available = (fsu.fsu_bavail_top_bit_set
        & (available != UINTMAX_MAX));
      available_to_root = fsu.fsu_bfree;
[..]
      used = total - available_to_root;

df returns available disk space with famous 5% system reservation space of disk, and snmp returns free space unconsidering that reserved space.

thyss
  • 1