Hardware watchdog on Supermicro motherboard (CentOS7)

Long story short: I was trying to get a hardware watchdog on supermicro motherboard to work, had trouble finding a straight forward guide, so I decided to make one.

yum install OpenIPMI OpenIPMI-tools watchdog

configure watchdog

nano /etc/watchdog.conf

watchdog.config content:

#ping                   = 172.31.14.1
#ping                   = 172.26.1.255
#interface              = eth0
#file                   = /var/log/messages
#change                 = 1407

# Uncomment to enable test. Setting one of these values to '0' disables it.
# These values will hopefully never reboot your machine during normal use
# (if your machine is really hung, the loadavg will go much higher than 25)
#max-load-1             = 24
#max-load-5             = 18
#max-load-15            = 12

# Note that this is the number of pages!
# To get the real size, check how large the pagesize is on your machine.
#min-memory             = 1

# With enforcing SELinux policy please use the /usr/libexec/watchdog/scripts/
# or /etc/watchdog.d/ for your test-binary and repair-binary configuration.
#repair-binary          = /usr/sbin/repair
#repair-timeout         =
#test-binary            =
#test-timeout           =

watchdog-device = /dev/watchdog
watchdog-timeout  = 300
interval          = 10

# Defaults compiled into the binary
#temperature-device     =
#max-temperature        = 120

# Defaults compiled into the binary
#admin                  = root
#interval               = 1
#logtick                = 1
#log-dir                = /var/log/watchdog

# This greatly decreases the chance that watchdog won't be scheduled before
# your machine is really loaded
realtime                = yes
priority                = 1

# When using custom service pid check with custom service
# systemd unit file please be aware the "Requires="
# does dependent service deactivation.
# Using "Before=watchdog.service" or "Before=watchdog-ping.service"
# in the custom service unit file may be the desired operation instead.
# See man 5 systemd.unit for more details.
#
# Check if rsyslogd is still running by enabling the following line
#pidfile                = /var/run/rsyslogd.pid

enable and configure ipmi watchdog

nano /etc/sysconfig/ipmi

ipmi file contents:

## Path:        Hardware/IPMI
## Description: Enable standard hardware interfaces (KCS, BT, SMIC)
## Type:        yesno
## Default:     "yes"
## Config:      ipmi
# Enable standard hardware interfaces (KCS, BT, SMIC)
# You probably want this enabled.
IPMI_SI=yes

## Path:        Hardware/IPMI
## Description: Enable /dev/ipmi0 interface, used by ipmitool, ipmicmd,
## Type:        yesno
## Default:     "yes"
## Config:      ipmi
# Enable /dev/ipmi0 interface, used by ipmitool, ipmicmd,
# and other userspace IPMI-using applications.
# You probably want this enabled.
DEV_IPMI=yes

## Path:        Hardware/IPMI
## Description: Enable IPMI_WATCHDOG if you want the IPMI watchdog
## Type:        yesno
## Default:     "no"
## Config:      ipmi
# Enable IPMI_WATCHDOG if you want the IPMI watchdog
# to reboot the system if it hangs
IPMI_WATCHDOG=yes

## Path:        Hardware/IPMI
## Description: Watchdog options - modinfo ipmi_watchdog for details
## Type:        string
## Default:     "timeout=60"
## Config:      ipmi
# Watchdog options - modinfo ipmi_watchdog for details
# watchdog timeout value in seconds
# as there is no userspace ping application that runs during shutdown,
# be sure to give it enough time for any device drivers to
# do their cleanup (e.g. megaraid cache flushes)
# without the watchdog triggering prematurely
IPMI_WATCHDOG_OPTIONS="timeout=300"


## Type:        yesno
## Default:     "no"
## Config:      ipmi
# Enable IPMI_POWERCYCLE if you want the system to be power-cycled (power
# down, delay briefly, power on) rather than power off, on systems
# that support such.  IPMI_POWEROFF=yes is also required.
IPMI_POWERCYCLE=no

## Path:        Hardware/IPMI
## Description: Enable "legacy" interfaces for applications
## Type:        yesno
## Default:     "no"
## Config:      ipmi
# Enable "legacy" interfaces for applications
# Intel IMB driver interface
IPMI_IMB=no

enable correspondent services and reboot:

systemctl enable watchdog
systemctl enable ipmi
reboot now

Now we have to change watchdog.service to load after ipmi, otherwise it will fail to "kick the dog" since it will be loaded before watchdog module is loaded.

nano /etc/systemd//system/multi-user.target.wants/watchdog.service

contents of watchdog.service:

[Unit]
Description=watchdog daemon
# man systemd.special
# auto added After=basic.target
After=ipmi.service

[Service]
Type=forking
ExecStart=/usr/sbin/watchdog
ControlGroup=cpu:/

[Install]
WantedBy=multi-user.target

Due to a bug in present hardware watchdog driver we need to disable it from loading into kernel, otherwise ipmi_watchdog mod will not load, and watchdog service will fail resulting in no watchdog protection. (bug tracker: https://access.redhat.com/solutions/176323)

Add a file to modeprobed and blacklist iTCO drivers.

nano /etc/modprobe.d/blacklist.conf

contents of blacklist.conf:

blacklist iTCO_wdt
blacklist iTCO_vendor_support

make it executable

chmod +x /etc/modprobe.d/blacklist.conf

reboot system and check if everything is running correctly

ipmitool mc watchdog get

desired output:

[root@Testeroni1 ~]# ipmitool mc watchdog get
Watchdog Timer Use:     SMS/OS (0x44)
Watchdog Timer Is:      Started/Running
Watchdog Timer Actions: Power Cycle (0x03)
Pre-timeout interval:   1 seconds
Timer Expiration Flags: 0x00
Initial Countdown:      300 sec
Present Countdown:      295 sec

note: Present Countdown should always be above 280 sec!!!

other troubleshooting commands

systemctl show watchdog
systemctl show ipmi
lsmod
ls /dev/
...



Comments