Table of Contents
Check_smartstatus
Script: check_smartstatus
check_smartstatus is a plugin run a smartctl check to verify the disk status of all local harddisks/ ssds.
It works on physical machines only.
Requirements
-
smartctl
The icinga user needs sudo permissions on the smartctl binary.
icingaclient ALL=(ALL) NOPASSWD: /sbin/smartctl
Standalone installation
From this repository ypu need next to this script:
-
inc_pluginfunctions
shared function for all IML checks written in bash
Syntax
______________________________________________________________________
CHECK_SMARTSTATUS
v1.6
(c) Institute for Medical Education - University of Bern
Licence: GNU GPL 3
https://os-docs.iml.unibe.ch/icinga-checks/Checks/check_smartstatus.html
______________________________________________________________________
Show status of local S.M.A.R.T. devices.
SYNTAX:
check_smartstatus [-h] [-l] [devices]
OPTIONS:
-h|--help show this help.
-l|--list list devices only.
PARAMETERS:
EXAMPLES
check_smartstatus
Scan all local disks
check_smartstatus -l
List all local disks without scanning them.
Parameters
(none)
Examples
Fort testing purposes: Show devices only without scanning them:
./check_smartstatus -l
Devices to scan:
- /dev/nvme0 -d nvme # /dev/nvme0, NVMe device
Without parameter check_smartstatus
will loop over all found devices and perform a SMART scan on each. You get a status line with a summary followed by the output sections for each disk.
This is the output of a single SSD:
OK: SMART check on 1 Disks - 0 errors - /dev/nvme0: PASSED
SMART/Health Information (NVMe Log 0x02)
----------------------------------------------------------------------
/dev/nvme0
sudo smartctl -Ha /dev/nvme0
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.9.2-1-MANJARO] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: SKHynix_HFS001TEJ9X162N
Serial Number: AJC9N469110209D22
Firmware Version: 51730A10
PCI Vendor/Subsystem ID: 0x1c5c
IEEE OUI Identifier: 0xace42e
Controller ID: 0
NVMe Version: 1.4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 1,024,209,543,168 [1.02 TB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: ace42e 0035db84db
Local Time is: Fri Jun 7 12:59:02 2024 CEST
Firmware Updates (0x16): 3 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x00df): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Verify
Log Page Attributes (0x1e): Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size: 64 Pages
Warning Comp. Temp. Threshold: 86 Celsius
Critical Comp. Temp. Threshold: 87 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 7.50W - - 0 0 0 0 5 305
1 + 3.9000W - - 1 1 1 1 30 330
2 + 1.5000W - - 2 2 2 2 100 400
3 - 0.0500W - - 3 3 3 3 500 1500
4 - 0.0050W - - 4 4 4 4 1000 9000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 43 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 6,589,009 [3.37 TB]
Data Units Written: 3,879,914 [1.98 TB]
Host Read Commands: 39,241,205
Host Write Commands: 72,717,841
Controller Busy Time: 2,112
Power Cycles: 176
Power On Hours: 642
Unsafe Shutdowns: 21
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 40 Celsius
Temperature Sensor 2: 37 Celsius
Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged
Self-test Log (NVMe Log 0x06)
Self-test status: No self-test in progress
No Self-tests Logged
/dev/nvme0 - rc=0
PASSED SMART/Health Information (NVMe Log 0x02)