Hardware watchdog on Supermicro motherboard (CentOS7)
Long story short: I was trying to get a hardware watchdog on supermicro motherboard to work, had trouble finding a straight forward guide, so I decided to make one.
yum install OpenIPMI OpenIPMI-tools watchdog
configure watchdog
nano /etc/watchdog.conf
watchdog.config content:
#ping = 172.31.14.1 #ping = 172.26.1.255 #interface = eth0 #file = /var/log/messages #change = 1407 # Uncomment to enable test. Setting one of these values to '0' disables it. # These values will hopefully never reboot your machine during normal use # (if your machine is really hung, the loadavg will go much higher than 25) #max-load-1 = 24 #max-load-5 = 18 #max-load-15 = 12 # Note that this is the number of pages! # To get the real size, check how large the pagesize is on your machine. #min-memory = 1 # With enforcing SELinux policy please use the /usr/libexec/watchdog/scripts/ # or /etc/watchdog.d/ for your test-binary and repair-binary configuration. #repair-binary = /usr/sbin/repair #repair-timeout = #test-binary = #test-timeout = watchdog-device = /dev/watchdog watchdog-timeout = 300 interval = 10 # Defaults compiled into the binary #temperature-device = #max-temperature = 120 # Defaults compiled into the binary #admin = root #interval = 1 #logtick = 1 #log-dir = /var/log/watchdog # This greatly decreases the chance that watchdog won't be scheduled before # your machine is really loaded realtime = yes priority = 1 # When using custom service pid check with custom service # systemd unit file please be aware the "Requires=" # does dependent service deactivation. # Using "Before=watchdog.service" or "Before=watchdog-ping.service" # in the custom service unit file may be the desired operation instead. # See man 5 systemd.unit for more details. # # Check if rsyslogd is still running by enabling the following line #pidfile = /var/run/rsyslogd.pid
enable and configure ipmi watchdog
nano /etc/sysconfig/ipmi
ipmi file contents:
## Path: Hardware/IPMI ## Description: Enable standard hardware interfaces (KCS, BT, SMIC) ## Type: yesno ## Default: "yes" ## Config: ipmi # Enable standard hardware interfaces (KCS, BT, SMIC) # You probably want this enabled. IPMI_SI=yes ## Path: Hardware/IPMI ## Description: Enable /dev/ipmi0 interface, used by ipmitool, ipmicmd, ## Type: yesno ## Default: "yes" ## Config: ipmi # Enable /dev/ipmi0 interface, used by ipmitool, ipmicmd, # and other userspace IPMI-using applications. # You probably want this enabled. DEV_IPMI=yes ## Path: Hardware/IPMI ## Description: Enable IPMI_WATCHDOG if you want the IPMI watchdog ## Type: yesno ## Default: "no" ## Config: ipmi # Enable IPMI_WATCHDOG if you want the IPMI watchdog # to reboot the system if it hangs IPMI_WATCHDOG=yes ## Path: Hardware/IPMI ## Description: Watchdog options - modinfo ipmi_watchdog for details ## Type: string ## Default: "timeout=60" ## Config: ipmi # Watchdog options - modinfo ipmi_watchdog for details # watchdog timeout value in seconds # as there is no userspace ping application that runs during shutdown, # be sure to give it enough time for any device drivers to # do their cleanup (e.g. megaraid cache flushes) # without the watchdog triggering prematurely IPMI_WATCHDOG_OPTIONS="timeout=300" ## Type: yesno ## Default: "no" ## Config: ipmi # Enable IPMI_POWERCYCLE if you want the system to be power-cycled (power # down, delay briefly, power on) rather than power off, on systems # that support such. IPMI_POWEROFF=yes is also required. IPMI_POWERCYCLE=no ## Path: Hardware/IPMI ## Description: Enable "legacy" interfaces for applications ## Type: yesno ## Default: "no" ## Config: ipmi # Enable "legacy" interfaces for applications # Intel IMB driver interface IPMI_IMB=no
enable correspondent services and reboot:
systemctl enable watchdog systemctl enable ipmi reboot now
Now we have to change watchdog.service to load after ipmi, otherwise it will fail to "kick the dog" since it will be loaded before watchdog module is loaded.
nano /etc/systemd//system/multi-user.target.wants/watchdog.service
contents of watchdog.service:
[Unit] Description=watchdog daemon # man systemd.special # auto added After=basic.target After=ipmi.service [Service] Type=forking ExecStart=/usr/sbin/watchdog ControlGroup=cpu:/ [Install] WantedBy=multi-user.target
Due to a bug in present hardware watchdog driver we need to disable it from loading into kernel, otherwise ipmi_watchdog mod will not load, and watchdog service will fail resulting in no watchdog protection. (bug tracker: https://access.redhat.com/solutions/176323)
Add a file to modeprobed and blacklist iTCO drivers.
nano /etc/modprobe.d/blacklist.conf
contents of blacklist.conf:
blacklist iTCO_wdt blacklist iTCO_vendor_support
make it executable
chmod +x /etc/modprobe.d/blacklist.conf
reboot system and check if everything is running correctly
ipmitool mc watchdog get
desired output:
[root@Testeroni1 ~]# ipmitool mc watchdog get Watchdog Timer Use: SMS/OS (0x44) Watchdog Timer Is: Started/Running Watchdog Timer Actions: Power Cycle (0x03) Pre-timeout interval: 1 seconds Timer Expiration Flags: 0x00 Initial Countdown: 300 sec Present Countdown: 295 sec
note: Present Countdown should always be above 280 sec!!!
other troubleshooting commands
systemctl show watchdog systemctl show ipmi lsmod ls /dev/ ...
Comments
Post a Comment