Stirbt die Platte / SSD?

altmetaller · Beitrag von **altmetaller** » 17.02.2025 17:58:08

Huhu,

ich habe mich heute seit längerer Zeit mal rootisiert und festgestellt, dass im root-Account zwei E-Mails mit Hinweisen auf Festplattenproblemen schlummerten. Die E-Mails waren ca. 7 Monate alt, ich habe mit dem System keinerlei Probleme.

Hier die S.M.A.R.T. Werte:

Code: Alles auswählen

oot@predator:~# smartctl -a /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-131-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     TOSHIBA MQ04ABF100
Serial Number:    abc123
LU WWN Device Id: 5 000039 822880484
Firmware Version: JU001J
User Capacity:    1.000.204.886.016 bytes [1,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Zoned Device:     Device managed zones
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Feb 17 17:59:04 2025 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(  120) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 172) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   050    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0027   100   100   001    Pre-fail  Always       -       1291
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       36540
  5 Reallocated_Sector_Ct   0x0033   100   100   050    Pre-fail  Always       -       2
  7 Seek_Error_Rate         0x000b   100   100   050    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   100   100   050    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0032   080   080   000    Old_age   Always       -       8011
 10 Spin_Retry_Count        0x0033   253   100   030    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       4683
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       24
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       49
193 Load_Cycle_Count        0x0032   092   092   000    Old_age   Always       -       81033
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       25 (Min/Max 10/48)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       2
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   253   000    Old_age   Always       -       0
220 Disk_Shift              0x0002   100   100   000    Old_age   Always       -       0
222 Loaded_Hours            0x0032   096   096   000    Old_age   Always       -       1627
223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
224 Load_Friction           0x0022   100   100   000    Old_age   Always       -       0
226 Load-in_Time            0x0026   100   100   000    Old_age   Always       -       274
240 Head_Flying_Hours       0x0001   100   100   001    Pre-fail  Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%       766         -
# 2  Short offline       Completed without error       00%       700         -
# 3  Short offline       Completed without error       00%       697         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

root@predator:~# 









root@predator:~# smartctl -a /dev/sdc
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-131-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Samsung SpinPoint M8 (AF)
Device Model:     ST500LM012 HN-M500MBB
Serial Number:    abc123
LU WWN Device Id: 5 0004cf 20d4b0c6d
Firmware Version: 2BA30001
User Capacity:    500.107.862.016 bytes [500 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Feb 17 17:51:59 2025 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Status not supported: Incomplete response, ATA output registers missing
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (  25)	The self-test routine was aborted by
					the host.
Total time to complete Offline 
data collection: 		( 6840) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 114) minutes.
SCT capabilities: 	       (0x003f)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       1
  2 Throughput_Performance  0x0026   252   252   000    Old_age   Always       -       0
  3 Spin_Up_Time            0x0023   094   090   025    Pre-fail  Always       -       1848
  4 Start_Stop_Count        0x0032   090   090   000    Old_age   Always       -       10237
  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       2901
 10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       32
 12 Power_Cycle_Count       0x0032   096   096   000    Old_age   Always       -       4862
191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -       20
192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0002   064   057   000    Old_age   Always       -       26 (Min/Max 14/44)
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   252   252   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0036   096   096   000    Old_age   Always       -       2371
200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       53297
223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       32
225 Load_Cycle_Count        0x0032   093   093   000    Old_age   Always       -       71556

SMART Error Log Version: 1
ATA Error Count: 7 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 7 occurred at disk power-on lifetime: 2710 hours (112 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 00 00 00 40

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  18 9f 18 9f 18 f0 18 9f      10:54:18.685  RECALIBRATE [RET-4]
  00 00 00 00 00 00 00 00      00:00:08.581  NOP [Abort queued commands]
  60 00 00 98 76 4b 40 00      00:00:08.582  READ FPDMA QUEUED
  60 00 00 98 75 4b 40 00      00:00:08.582  READ FPDMA QUEUED
  60 00 00 98 74 4b 40 00      00:00:08.582  READ FPDMA QUEUED

Error 6 occurred at disk power-on lifetime: 2710 hours (112 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 00 00 00 40

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  18 9f 18 9f 18 f0 18 9f      10:54:18.685  RECALIBRATE [RET-4]
  00 00 00 00 00 00 00 00      00:00:08.168  NOP [Abort queued commands]
  60 00 00 b0 55 77 40 00      00:00:08.168  READ FPDMA QUEUED
  60 00 00 b0 54 77 40 00      00:00:08.168  READ FPDMA QUEUED
  60 00 00 b0 53 77 40 00      00:00:08.168  READ FPDMA QUEUED

Error 5 occurred at disk power-on lifetime: 2710 hours (112 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 00 00 00 40

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  18 9f 18 9f 18 f0 18 9f      10:54:18.685  RECALIBRATE [RET-4]
  00 00 00 00 00 00 00 00      00:00:07.736  NOP [Abort queued commands]
  60 00 00 c0 ae 15 40 00      00:00:07.736  READ FPDMA QUEUED
  60 00 00 c0 ad 15 40 00      00:00:07.736  READ FPDMA QUEUED
  60 00 00 c0 ac 15 40 00      00:00:07.736  READ FPDMA QUEUED

Error 4 occurred at disk power-on lifetime: 2709 hours (112 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 00 00 00 40

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  18 9f 18 9f 18 f0 18 9f      10:54:18.685  RECALIBRATE [RET-4]
  00 00 00 00 00 00 00 00      00:00:05.506  NOP [Abort queued commands]
  60 00 00 e8 2e 19 40 00      00:00:05.507  READ FPDMA QUEUED
  60 00 00 e8 2d 19 40 00      00:00:05.507  READ FPDMA QUEUED
  60 00 00 e8 2c 19 40 00      00:00:05.507  READ FPDMA QUEUED

Error 3 occurred at disk power-on lifetime: 2709 hours (112 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 00 00 00 40

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  18 9f 18 9f 18 f0 18 9f      10:54:18.685  RECALIBRATE [RET-4]
  00 00 00 00 00 00 00 00      00:00:05.096  NOP [Abort queued commands]
  60 00 00 d8 20 b9 40 00      00:00:05.097  READ FPDMA QUEUED
  60 00 00 d8 1f b9 40 00      00:00:05.097  READ FPDMA QUEUED
  60 00 00 d8 1e b9 40 00      00:00:05.097  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Aborted by host               90%      2017         -
# 2  Extended offline    Aborted by host               90%      2016         -
# 3  Short offline       Completed without error       00%      2014         -
# 4  Vendor (0x50)       Completed without error       00%      1081         -
# 5  Short offline       Completed without error       00%      1081         -
# 6  Vendor (0x50)       Completed without error       00%       283         -
# 7  Short offline       Completed without error       00%       283         -
# 8  Vendor (0x50)       Completed without error       00%       217         -
# 9  Short offline       Completed without error       00%       217         -

SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Aborted_by_host [90% left] (0-65535)
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

root@predator:~#

Sieht doch eigentlich ganz undramatisch aus. Muss ich mir dennoch Sorgen machen?

Gruß,
Jörg

heisenberg · Beitrag von **heisenberg** » 17.02.2025 18:09:24

Hallo Jörg,

kannst Du die Ausgaben nochmal in [code] ... [/code] schreiben, statt in [quote] ... [/quote]? Dann sind die Spalten der Werte von smartctl auch sauber untereinander ausgerichtet und man das sofort übersichtlich lesen. So mache ich mir nicht die Mühe, das jetzt auseinander zu klamüsern.

Beispiel:

Code: Alles auswählen

smartctl -A /dev/sda
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-29-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   050    Old_age   Always       -       0
  5 Reallocated_Sector_Ct   0x0032   100   100   050    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   050    Old_age   Always       -       1458
 12 Power_Cycle_Count       0x0032   100   100   050    Old_age   Always       -       324
160 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0
161 Unknown_Attribute       0x0033   100   100   050    Pre-fail  Always       -       100
163 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       4
164 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       42713
165 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       300
166 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       3
167 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       67
168 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       5050
169 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       99
175 Program_Fail_Count_Chip 0x0032   100   100   050    Old_age   Always       -       0
176 Erase_Fail_Count_Chip   0x0032   100   100   050    Old_age   Always       -       0
177 Wear_Leveling_Count     0x0032   100   100   050    Old_age   Always       -       0
178 Used_Rsvd_Blk_Cnt_Chip  0x0032   100   100   050    Old_age   Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   050    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   050    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   050    Old_age   Always       -       110
194 Temperature_Celsius     0x0022   100   100   050    Old_age   Always       -       40
195 Hardware_ECC_Recovered  0x0032   100   100   050    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   100   100   050    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   050    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0032   100   100   050    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   050    Old_age   Always       -       482
232 Available_Reservd_Space 0x0032   100   100   050    Old_age   Always       -       100
241 Total_LBAs_Written      0x0030   100   100   050    Old_age   Offline      -       105269
242 Total_LBAs_Read         0x0030   100   100   050    Old_age   Offline      -       60373
245 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       78030

Danke,
h.

GregorS · Beitrag von **GregorS** » 17.02.2025 18:15:57

altmetaller hat geschrieben:
17.02.2025 17:58:08
Die E-Mails waren ca. 7 Monate alt, ich habe mit dem System keinerlei Probleme.
...
Sieht doch eigentlich ganz undramatisch aus. Muss ich mir dennoch Sorgen machen?

Die S.M.A.R.T.-Werte habe ich mir nicht angeguckt – dazu kenne ich mich damit zu wenig aus. Interessant wären aber die E-Mails. Du hast die doch nicht etwa gelöscht?
Eine Bitte: Fasse Bildschirmausgaben nicht in Quote- sondern in Code-Tags. Das ist dann besser lesbar.

heisenberg · Beitrag von **heisenberg** » 17.02.2025 18:19:18

Ich habe die Ausgabe direkt nochmal bei ChatGPT eingefügt und habe folgende Antwort bekommen:

https://chatgpt.com/share/67b36edd-842c ... 9088dc8c8b

Der Bewertung bzgl. viel zu hohem Load-Cycle-Count stimme ich zu! Das ist nicht gut! Du hast da ggf. Probleme mit einem zu aggressiv eingestellten Stromsparmodus der Platte. Ebenso stimme ich der Sache mit dem CRC-Fehlern zu und der Empfehlung das SATA-Kabel zu tauschen.

Die anderen Aussagen scheinen ebenso plausibel.

altmetaller · Beitrag von **altmetaller** » 17.02.2025 18:54:30

GregorS hat geschrieben:
17.02.2025 18:15:57
Interessant wären aber die E-Mails. Du hast die doch nicht etwa gelöscht?

Klingt ja schon wie 'n Vorwurf, nech...

Danke für den Hinweis mit den Quotes.

Code: Alles auswählen

From root@predator Sat Jul 20 01:56:31 2024
Return-Path: <root@predator>
X-Original-To: root
Delivered-To: root@predator.lan.tux-net
Received: by predator.lan.tux-net (Postfix, from userid 0)
	id 4C9D216E03AA; Sat, 20 Jul 2024 01:56:31 +0200 (CEST)
Subject: SMART error (FailedReadSmartData) detected on host: predator
To: root@predator.lan.tux-net
User-Agent: mail (GNU Mailutils 3.14)
Date: Sat, 20 Jul 2024 01:56:31 +0200
Message-Id: <20240719235631.4C9D216E03AA@predator.lan.tux-net>
From: root <root@predator>
Status: R
X-IMAPbase:           1739810963                    3
X-UID: 1

This message was generated by the smartd daemon running on:

   host name:  predator
   DNS domain: [Empty]

The following warning/error was logged by the smartd daemon:

Device: /dev/sdc [SAT], failed to read SMART Attribute Data

Device info:
ST500LM012 HN-M500MBB, S/N:S2ZYJ9AF516031, WWN:5-0004cf-20d4b0c6d, FW:2BA30001, 500 GB

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
Another message will be sent in 24 hours if the problem persists.

From root@predator Sat Jul 20 02:26:29 2024
Return-Path: <root@predator>
X-Original-To: root
Delivered-To: root@predator.lan.tux-net
Received: by predator.lan.tux-net (Postfix, from userid 0)
	id DD00D16E0827; Sat, 20 Jul 2024 02:26:29 +0200 (CEST)
Subject: SMART error (ErrorCount) detected on host: predator
To: root@predator.lan.tux-net
User-Agent: mail (GNU Mailutils 3.14)
Date: Sat, 20 Jul 2024 02:26:29 +0200
Message-Id: <20240720002629.DD00D16E0827@predator.lan.tux-net>
From: root <root@predator>
Status: R
X-UID: 2

This message was generated by the smartd daemon running on:

   host name:  predator
   DNS domain: [Empty]

The following warning/error was logged by the smartd daemon:

Device: /dev/sdc [SAT], ATA error count increased from 0 to 1

Device info:
ST500LM012 HN-M500MBB, S/N:S2ZYJ9AF516031, WWN:5-0004cf-20d4b0c6d, FW:2BA30001, 500 GB

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
Another message will be sent in 24 hours if the problem persists.

root@predator:~#

tobo · Beitrag von **tobo** » 17.02.2025 19:13:50

altmetaller hat geschrieben:
17.02.2025 17:58:08
Huhu,

ich habe mich heute seit längerer Zeit mal rootisiert und festgestellt, dass im root-Account zwei E-Mails mit Hinweisen auf Festplattenproblemen schlummerten. Die E-Mails waren ca. 7 Monate alt, ich habe mit dem System keinerlei Probleme.

Wenn du Root und einziger Benutzer an einem Rechner bist, dann macht es möglicherweise Sinn, dir die System-Mails an den Benutzer (nicht an Root) schicken zu lassen. Das funktioniert nach Ausführen der beiden folgenden Kommandos (<USER> muss an den entsprechenden Benutzer angepasst werden und newaliases setzt einen installierten MTA voraus):

Code: Alles auswählen

# printf "root: <USER>@$HOSTNAME\n" >>/etc/aliases
# newaliases

rhHeini · Beitrag von **rhHeini** » 17.02.2025 19:39:57

Wenn ich mir die Samsung anschaue fällt mir auf dass die einen hohen Wert bei UDMA_CRC_Error_Count hat, könnte auf Verkabelung hindeuten.

Beide Platten habe einen für mich deutlich zu hohen Start-Stop-Count. So als ob die 4 mal pro Stunde aus/angeschaltet werden.

cosinus · Beitrag von **cosinus** » 17.02.2025 19:49:25

rhHeini hat geschrieben:
17.02.2025 19:39:57
Beide Platten habe einen für mich deutlich zu hohen Start-Stop-Count. So als ob die 4 mal pro Stunde aus/angeschaltet werden.

Ist mir auch aufgefallen. Die 1TB-Platte hat sogar 36540 Start-Stop-Counts

Außerdem sind da zwei reallocated Sectors. Ich denke die hat ihre beste Zeit hinter sich.

MSfree · Beitrag von **MSfree** » 17.02.2025 19:58:33

cosinus hat geschrieben:
17.02.2025 19:49:25
Ist mir auch aufgefallen. Die 1TB-Platte hat sogar 36540 Start-Stop-Counts

Das sind halt 2.5" Notebookplatten, die auf aggressives Stromsparen voreingestellt sind. Abhilfe sollte hdparm schaffen können.

cosinus · Beitrag von **cosinus** » 17.02.2025 20:22:00

MSfree hat geschrieben:
17.02.2025 19:58:33
Abhilfe sollte hdparm schaffen können.

Ich würde die jetzt eher durch ne SSD ersetzen.

altmetaller · Beitrag von **altmetaller** » 17.02.2025 21:39:06

Danke für die Antworten.

ich bin inzwischen etwas entspannter: Es handelt sich nicht um die SSD bzw. die interne Platte des aktuellen Laptops, sondern um eine uralte Platte aus einem ausgeschlachteten Laptop.

Die habe ich mal irgendwann via USB als Backup-Target an das Notebook geklemmt und ich habe sie vorhin nicht gesehen, weil sie hinter dem geaufklappten Display lag. Fire and forget

Mal gucken, wie lange sie noch tut. Nur die Besten sterben jung.

altmetaller · Beitrag von **altmetaller** » 17.02.2025 21:57:08

cosinus hat geschrieben:
17.02.2025 19:49:25
Die 1TB-Platte hat sogar 36540 Start-Stop-Counts

Das wiederum ist eine von diesen 2,5" Platten, die als zweite Festplatte in einem Notebook eingebaut sind (das als Systemlaufwerk eine MSATA-SSD hat). Die nutze ich ebenfalls nur als Backup-Target: Auf dem System laufen ein paar Windows-VMs, die klappern bei jedem Herunterfahren via VEEAM ein Backup auf diese Platte. Da die gesamten VM aber "Spielwerk" sind, ist das auch nicht so dramatisch.

Beitrag von **wanne** » 18.02.2025 09:15:13

Die erste Platte ist OK. Die 2. hat offensichtlich mal vor 200 Betriebsstunden ärger gemacht. Das muss nicht heißen, dass die demnächst stirbt. Aber mehr Hinweis bekommst du wahrscheinlich nicht mehr. Ein bisschen Schade, dass du den Selbsttest nicht durchlaufen lassen hast. Die Aussage mit dem SATA-Kabel ist erfahrungsgemäß gewagt. SATA nutzt forward error correction zur Übertragung danach nochmal CRC zur Fehlererkennung. Gefühlt sind die meisten Platten gar nicht mehr ansprechbar, wenn das Kabel so schlecht ist, dass ersteres nicht mehr ausreicht und CRC-Fehler werden wo anders ausgelöst: Kaputter RAM fehlerhaft rechnende Platte. Auch hier siehst du das diverse andere Fehler auftreten. Load_Retry_Count, Multi_Zone_Error_Rate.. 7 Errors im Error log. Das ist das Bild, was sich quasi immer ergibt, wenn ich CRCs sehe.

altmetaller · Beitrag von **altmetaller** » 18.02.2025 09:42:28

wanne hat geschrieben:
18.02.2025 09:15:13
Ein bisschen Schade, dass du den Selbsttest nicht durchlaufen lassen hast.

Warum schade? Kann ich doch immer noch machen. Oder?

Beitrag von **wanne** » 18.02.2025 15:20:51

altmetaller hat geschrieben:
18.02.2025 09:42:28
Warum schade? Kann ich doch immer noch machen. Oder?

Er war fast durch und die 114min sind jetzt nicht mal eben.

debianforum.de

Stirbt die Platte / SSD?

Stirbt die Platte / SSD?

Re: Stirbt die Platte / SSD?

Re: Stirbt die Platte / SSD?

Re: Stirbt die Platte / SSD?

Re: Stirbt die Platte / SSD?

Re: Stirbt die Platte / SSD?

Re: Stirbt die Platte / SSD?

Re: Stirbt die Platte / SSD?

Re: Stirbt die Platte / SSD?

Re: Stirbt die Platte / SSD?

Re: Stirbt die Platte / SSD?

Re: Stirbt die Platte / SSD?

Re: Stirbt die Platte / SSD?

Re: Stirbt die Platte / SSD?

Re: Stirbt die Platte / SSD?