The Software Watchdog
The kernel software watchdog's ability to reboot will depend on the state of the machine and interrupts. The watchdog tool itself runs several health checks and acts appropriately if the system is not in good shape. Watchdogとはシステムを監視し、異常が発生したのを観測した場合マシンを再起動するソフトウェアである。異常観測後の動作はリブートのみである。特定の機能、例えばapacheを再起動するなどということはできない。 インストール # yum install watchdog -略- Installed: watchdog.i386 0:5.6-1.el5 Complete! Linux has no lack of monitoring tools. Because Linux is used so frequently in a server context, monitoring applications have always been important. For the Linux desktop, you will find many smaller applications that focus on specific areas of monitoring. Glances is a program that keeps you up to date with the current health state of your system. Watchdog (System Monitoring) Component. The Watchdog component is a solution that ensures that your server is clean from malware, all services are up and running and there is enough free disk space on the server. Watchdog can monitor the following services: Web server providing the control panel interface.
First: build the Linux kernel with watchdog support, the full guide is located here:
After a reboot with the new kernel there should be a /dev/watchdogfile:
Next: you will need to install a watchdog daemon:
List the files that get installed by the watchdog package:
This looks interesting, /usr/lib/systemd/system/watchdog.service isa Systemd service file.
Watchdog Linux Configuration
Starting and stopping the watchdog:
The watchdog gets automatically started once you open /dev/watchdog.To stop the watchdog, you will need to:
- Write the character V into /dev/watchdog to prevent stopping thewatchdog accidentally
- Close the file /dev/watchdog unless your kernel is compiled with theCONFIG_WATCHDOG_NOWAYOUT option enabled. When this option is enabled,the watchdog cannot be stopped at all.
After the watchdog has been enabled you have to reset the watchdog timer every60 seconds, else your system gets rebooted. Resetting the timer will be done by thewatchdog daemon if none of its tests fails.
Supported tests by the watchdog daemon to check the system status:
- Is the process table full?
- Is there enough free memory?
- Are some files accessible?
- Have some files changed within a given interval?
- Is the average work load too high?
- Has a file table overflow occurred?
- Is a process still running? The process is specified by a pid file.
- Do some IP addresses answer to ping?
- Do network interfaces receive traffic?
- Is the temperature too high? (Temperature data not always available.)
- Execute a user defined command to do arbitrary tests.
- Execute one or more test/repair commands found in /etc/watchdog.d. These commands are called with the argument test or repair.
The configuration file should be self-explanatory:
Now we will enable the watchdog daemon, currently it should be disabled:
For testing purpose I've added the following to my /etc/watchdog.conf:
So when my WiFi connection gets lost my system should reboot.
Start the watchdog daemon:
OK, then I will have to use the IP address because the watchdog daemon fails to start.The ping option of watchdog only supports numeric IPv4 addresses:
In general you are safer pinging your router, packages to an remote host can getlost or delayed, Googles IP may change or your IP gets blocked if you send24/7 pinq requests to Google.
And it works:
Now disconnect the WiFi and voila, after max. 60 seconds it will reboot:
Later we can enable the watchdog on boot when everything is working correctly:
The Hardware Watchdog
The software watchdog module is, of course, no protection against a kernel faultbut hardware watchdog support is coming for the iMX233-OLinuXino.
Have a look at chapter 23 of the iMX233 Reference Manual (17,5 MB):
23.7 Watchdog Reset Function
The watchdog reset is a CPU-configurable device. It is programmed by software to generate a chip-widereset after HW_RTC_WATCHDOG milliseconds. The watchdog generates this reset if software does notrewrite this register before this time elapses.
The watchdog timer decrements the register value once for every tick of the 1-kHz clock supplied fromthe RTC analog section (see Figure 23-1). The reset generated by the watchdog timer has no effect on thevalues retained in the master registers of the real-time clock seconds counter, alarm, or persistent registers(analog persistent storage).
The watchdog timer is initially disabled and set to count 4,294,967,295 milliseconds before generating awatchdog reset.
Watchdog Linux Process
The watchdog timer does not run when the chip is in its powered-down state. Therefore, there is no master/shadow register pairing for the watchdog timer, and it must be reprogrammed after cycling power orresetting the block.
I've seen a kernel option (<*>FreescaleSTMP3XXX&i.MX23/28watchdog)on newer kernels and also some log messages:
Now I have 3 watchdog devices:
But which is the hardware watchdog?
So by default the hardware watchdog timer gets assigned to /dev/watchdogwhich makes sense. I haven't tested it yet whether the hardware watchdog timeris working on the OLinuXino but I think so.
References/Further Reading
- Hardware watchdog support in 3.9:
Linux Hardware Watchdog
- « A new SD card image for the iMX233-OLinuXino
- How to check the memory usage of your embedded Linux system »
The Watchdog component is a solution that ensures that your server is clean from malware, all services are up and running and there is enough free disk space on the server.
Watchdog Linux Disable
Watchdog can monitor the following services:
- Web server providing the control panel interface
- Web server providing WWW service to users' sites
- SMTP Server (QMail)
- IMAP/POP3 Server (Courier-IMAP)
- DNS Server (BIND)
- Tomcat
- MySQL
- PostgreSQL
- SpamAssassin
- Plesk Premium antivirus
It can start, stop, and restart the services it monitors, and it can be configured to take actions depending on the stability of a service over some time period.
It can run other utilities and notify you when disk space usage has reached the specified amount.
For the purpose of monitoring services and disk space usage, Watchdog uses the monit utility. For information on the monit utility, visit the monit developers' website at http://www.tildeslash.com/monit/.
The Watchdog can scan the server file system for rootkits, backdoors, exploits, trojan horses and other malicious software on demand or on schedule. It can notify you by email of scanning results and show reports through the control panel. It updates its security knowledge base through the Internet before each scan.
For the purpose of scanning the server for malware, Watchdog uses the Rootkit Hunter utility. For information on Rootkit Hunter, visit the Rootkit Hunter developer's Web site at http://www.rootkit.nl.
Next in this section: |