1

In our company there are around 30 to 40 virtual linux machines. Every linux vm has maybe 3 partitions.
And every now and then, somehow, a partition gets full and brings one or more applications to a standstill.

I know, we can write cronjob batch scripts, that run every 30 minutes, and when a threshold is passed, you can write an email.

But - is there no "monitoring or alerting" infrastructure which is build into normal linux?

Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232

2 Answers2

3

There are plenty of open source (and proprietary) monitoring tools designed to solve this problem. They rely on tools within Linux, and they in turn rely on system calls within the kernel.

Some tools focus on data gathering and monitoring, while others focus on alerting, which you pick depends on your primary need.

The most well known example of an alerting and monitoring tool would be Nagios. Other tools, more focussed on data gathering and graphing, with some alerting built in would be Cacti and Munin. If you have large clusters with lots of machines, then Ganglia might be your best bet.

These tools are often called Network Monitoring Systems, and Wikipedia has an extensive list.

I recommend you don't re-invent the wheel and look for / use a tool like this.

Depending on which Linux distribution you're using, one or more of these tools many already be available in the distribution repository, with default configurations that support the environment you have.

EightBitTony
  • 21,373
1

By "built in a normal linux" I assume you are talking about kernel?

There is no such thing in kernel. Moreover, basic distributions don't have such thing out of the box. You have default tools like du that help you do this easily with bash. Here are some links:

How can I monitor disk io?

Tracking down where disk space has gone on linux?

Since you said you know yourself how to write such scripts, I will spare you the code. Most of the sysadmins will probably prefer core tools anyway; they know them, they offer great power, and using them is simple. Any new "monitoring" facility would require you to learn it.

MatthewRock
  • 6,986