Check_smartstatus

Script: check_smartstatus

check_smartstatus is a plugin run a smartctl check to verify the disk status of all local harddisks/ ssds.

It works on physical machines only.

Requirements

  • smartctl

The icinga user needs sudo permissions on the smartctl binary.

icingaclient ALL=(ALL) NOPASSWD: /sbin/smartctl

Standalone installation

From this repository you need next to this script:

  • inc_pluginfunctions shared function for all IML checks written in bash

Syntax

______________________________________________________________________

CHECK_SMARTSTATUS
v1.9

(c) Institute for Medical Education - University of Bern
Licence: GNU GPL 3

https://os-docs.iml.unibe.ch/icinga-checks/Checks/check_smartstatus.html
______________________________________________________________________

Show status of local S.M.A.R.T. devices.

SYNTAX:
    check_smartstatus [-h] [-l] [DEVICE(S)]

OPTIONS:

    -h|--help          show this help.
    -l|--list          list devices without scanning them.

    -n|--noscan        do not use 'smartctl --scan' to detect devices. Add
                       devices to scan as parameter
    -s|--short         short output
    -i|--ignore REGEX  ignore disks matching the given regex

PARAMETERS:

    DEVICE         A disk drive to scan with smartctl, eg
                   /dev/hda, /dev/sga, ...

EXAMPLES

    check_smartstatus
      Scan all disks found by 'smartctl --scan' and show full output.

    check_smartstatus -l
      List all local disks without scanning them.

    check_smartstatus -s
      Scan all disks found by 'smartctl --scan' and show short output only.

    check_smartstatus /dev/sg0 /dev/sg1
      Scan all disks found by 'smartctl --scan' plus /dev/sg0 and /dev/sg1
      and show full output.

    check_smartstatus --noscan /dev/sg0 /dev/sg1
      Scan all /dev/sg0 and /dev/sg1 and show full output.

    check_smartstatus --noscan --ignore "sg(1|10)" /dev/sg*
      Scan all /dev/sg* but ignore /dev/sg1 and /dev/sg10.
      Show full output.

Parameters

(none)

Examples

Fort testing purposes: Show devices only without scanning them:

./check_smartstatus -l
Devices to scan:
- /dev/nvme0 -d nvme # /dev/nvme0, NVMe device

Without parameter check_smartstatus will loop over all found devices and perform a SMART scan on each. You get a status line with a summary followed by the output sections for each disk.

This is the output of a single SSD:

OK: SMART check on 1 Disks - 0 errors - /dev/nvme0: PASSED

------------------------------------------------------------------------------------------ 

>>>> /dev/nvme0 - rc=0 - PASSED

Short infos:
    Model Number:                       SKHynix_HFS001TEJ9X162N
    SMART overall-health self-assessment test result: PASSED

Full output:
    $ sudo smartctl -Ha /dev/nvme0
    smartctl 7.5 2025-04-30 r5714 [x86_64-linux-6.16.8-1-MANJARO] (local build)
    Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF INFORMATION SECTION ===
    Model Number:                       SKHynix_HFS001TEJ9X162N
    Serial Number:                      AJC9N469110209D22
    Firmware Version:                   51730A10
    PCI Vendor/Subsystem ID:            0x1c5c
    IEEE OUI Identifier:                0xace42e
    Controller ID:                      0
    NVMe Version:                       1.4
    Number of Namespaces:               1
    Namespace 1 Size/Capacity:          1,024,209,543,168 [1.02 TB]
    Namespace 1 Formatted LBA Size:     512
    Namespace 1 IEEE EUI-64:            ace42e 0035db84db
    Local Time is:                      Tue Oct 21 11:15:34 2025 CEST
    Firmware Updates (0x16):            3 Slots, no Reset required
    Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
    Optional NVM Commands (0x00df):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Verify
    Log Page Attributes (0x1e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
    Maximum Data Transfer Size:         64 Pages
    Warning  Comp. Temp. Threshold:     86 Celsius
    Critical Comp. Temp. Threshold:     87 Celsius
    
    Supported Power States
    St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
     0 +     7.50W       -        -    0  0  0  0        5     305
     1 +   3.9000W       -        -    1  1  1  1       30     330
     2 +   1.5000W       -        -    2  2  2  2      100     400
     3 -   0.0500W       -        -    3  3  3  3      500    1500
     4 -   0.0050W       -        -    4  4  4  4     1000    9000
    
    Supported LBA Sizes (NSID 0x1)
    Id Fmt  Data  Metadt  Rel_Perf
     0 +     512       0         0
    
    === START OF SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    
    SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
    Critical Warning:                   0x00
    Temperature:                        48 Celsius
    Available Spare:                    100%
    Available Spare Threshold:          10%
    Percentage Used:                    0%
    Data Units Read:                    43,174,237 [22.1 TB]
    Data Units Written:                 31,718,352 [16.2 TB]
    Host Read Commands:                 625,085,692
    Host Write Commands:                743,860,963
    Controller Busy Time:               19,662
    Power Cycles:                       672
    Power On Hours:                     3,564
    Unsafe Shutdowns:                   76
    Media and Data Integrity Errors:    0
    Error Information Log Entries:      0
    Warning  Comp. Temperature Time:    0
    Critical Comp. Temperature Time:    0
    Temperature Sensor 1:               43 Celsius
    Temperature Sensor 2:               42 Celsius
    
    Error Information (NVMe Log 0x01, 16 of 256 entries)
    No Errors Logged
    
    Self-test Log (NVMe Log 0x06, NSID 0xffffffff)
    Self-test status: No self-test in progress
    No Self-tests Logged

Scan custom devices

smartctl --scan could show raids that you would like to ignore … or disks of a raid controller are not shown … sometimes you need a custom list of devices to check.

You can skip the automatic scan with --noscan. Then add the wanted devices as parameters. You can use globbing eg /dev/sg*.

To remove disks from a list use --ignore <REGEX>.

Example:

check_smartstatus --noscan --ignore "sg(1|10)" /dev/sg*