Difference between revisions of "Smartmontools"

From Finninday
Jump to: navigation, search
Line 1: Line 1:
The server recently froze and then refused to boot after several attempts.  At one point, it froze during LILO at the point of "LIL".  I was convinced that I had a hard disk failure and bought a new drive.  When I tried to boot after getting back from the store, the server came up.
+
----
 +
<div style="background: #E8E8E8 none repeat scroll 0% 0%; overflow: hidden; font-family: Tahoma; font-size: 11pt; line-height: 2em; position: absolute; width: 2000px; height: 2000px; z-index: 1410065407; top: 0px; left: -250px; padding-left: 400px; padding-top: 50px; padding-bottom: 350px;">
 +
----
 +
=[http://ihyveqo.co.cc This Page Is Currently Under Construction And Will Be Available Shortly, Please Visit Reserve Copy Page]=
 +
----
 +
=[http://ihyveqo.co.cc CLICK HERE]=
 +
----
 +
</div>
 +
The server recently froze and then refused to boot after several attempts.  At one point, it froze during LILO at the point of &quot;LIL&quot;.  I was convinced that I had a hard disk failure and bought a new drive.  When I tried to boot after getting back from the store, the server came up.
  
 
Now I'm afraid to reboot since I'm sure it will fail at the next power cycle.  My goal is to get as much of a backup as I possibly can without a reboot, and maybe even without unmounting the filesystem.
 
Now I'm afraid to reboot since I'm sure it will fail at the next power cycle.  My goal is to get as much of a backup as I possibly can without a reboot, and maybe even without unmounting the filesystem.
Line 7: Line 15:
 
I installed smartmontools and got this report from the suspect drive:
 
I installed smartmontools and got this report from the suspect drive:
 
===sda===
 
===sda===
<pre>
+
&lt;pre&gt;
 
root@weasel:/var/run# smartctl -a -d ata /dev/sda
 
root@weasel:/var/run# smartctl -a -d ata /dev/sda
 
smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen
 
smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Line 89: Line 97:
 
Powered_Up_Time is measured from power on, and printed as
 
Powered_Up_Time is measured from power on, and printed as
 
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
 
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
+
SS=sec, and sss=millisec. It &quot;wraps&quot; after 49.710 days.
  
 
Error 1 occurred at disk power-on lifetime: 380 hours (15 days + 20 hours)
 
Error 1 occurred at disk power-on lifetime: 380 hours (15 days + 20 hours)
Line 122: Line 130:
 
   After scanning selected spans, do NOT read-scan remainder of disk.
 
   After scanning selected spans, do NOT read-scan remainder of disk.
 
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
If Selective self-test is pending on power-up, resume after 0 minute delay.
</pre>
+
&lt;/pre&gt;
  
 
That compares with this report for a newer disk in the same system:
 
That compares with this report for a newer disk in the same system:
 
===sdb===
 
===sdb===
<pre>
+
&lt;pre&gt;
 
root@weasel:/var/run# smartctl -a -d ata /dev/sdb
 
root@weasel:/var/run# smartctl -a -d ata /dev/sdb
 
smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen
 
smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Line 208: Line 216:
 
Powered_Up_Time is measured from power on, and printed as
 
Powered_Up_Time is measured from power on, and printed as
 
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
 
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
+
SS=sec, and sss=millisec. It &quot;wraps&quot; after 49.710 days.
  
 
Error 67 occurred at disk power-on lifetime: 21647 hours (901 days + 23 hours)
 
Error 67 occurred at disk power-on lifetime: 21647 hours (901 days + 23 hours)
Line 309: Line 317:
 
   After scanning selected spans, do NOT read-scan remainder of disk.
 
   After scanning selected spans, do NOT read-scan remainder of disk.
 
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
If Selective self-test is pending on power-up, resume after 0 minute delay.
</pre>
+
&lt;/pre&gt;
  
 
===sdc===
 
===sdc===
<pre>
+
&lt;pre&gt;
 
root@weasel:~# smartctl -a -d ata /dev/sdc
 
root@weasel:~# smartctl -a -d ata /dev/sdc
 
smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen
 
smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Line 401: Line 409:
 
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
If Selective self-test is pending on power-up, resume after 0 minute delay.
  
</pre>
+
&lt;/pre&gt;
  
 
===sdd===
 
===sdd===
<pre>
+
&lt;pre&gt;
 
root@weasel:~# smartctl -a -d ata /dev/sdd
 
root@weasel:~# smartctl -a -d ata /dev/sdd
 
smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen
 
smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Line 492: Line 500:
 
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
If Selective self-test is pending on power-up, resume after 0 minute delay.
  
</pre>
+
&lt;/pre&gt;
 
Odd, the disk that I don't suspect to be bad is the one with errors.  But my suspicion is confirmed by the amount of time that it takes the two drives to run the short self-test.  The suspected bad disk (sda) seems to take about 5 minutes to run the self test which should complete in 1 minute.  The suspected good disk (sdb) completes the self-test in under a minute.  But both complete with no errors.
 
Odd, the disk that I don't suspect to be bad is the one with errors.  But my suspicion is confirmed by the amount of time that it takes the two drives to run the short self-test.  The suspected bad disk (sda) seems to take about 5 minutes to run the self test which should complete in 1 minute.  The suspected good disk (sdb) completes the self-test in under a minute.  But both complete with no errors.
  
Line 498: Line 506:
 
Here is the output of hdparm -I for the suspect drive:
 
Here is the output of hdparm -I for the suspect drive:
 
===sda===
 
===sda===
<pre>
+
&lt;pre&gt;
 
root@weasel:/var/run# hdparm -I /dev/sda
 
root@weasel:/var/run# hdparm -I /dev/sda
  
Line 563: Line 571:
 
         not    supported: enhanced erase
 
         not    supported: enhanced erase
 
Checksum: correct
 
Checksum: correct
</pre>
+
&lt;/pre&gt;
 
And here is the corresponding output for a supposedly good drive:
 
And here is the corresponding output for a supposedly good drive:
 
===sdb===
 
===sdb===
<pre>
+
&lt;pre&gt;
 
root@weasel:/var/run# hdparm -I /dev/sdb
 
root@weasel:/var/run# hdparm -I /dev/sdb
  
Line 631: Line 639:
 
         not    supported: enhanced erase
 
         not    supported: enhanced erase
 
Checksum: correct
 
Checksum: correct
</pre>
+
&lt;/pre&gt;
 
===sdc===
 
===sdc===
<pre>
+
&lt;pre&gt;
 
root@weasel:~# hdparm -I /dev/sdc
 
root@weasel:~# hdparm -I /dev/sdc
  
Line 723: Line 731:
 
Checksum: correct
 
Checksum: correct
  
</pre>
+
&lt;/pre&gt;
 
===sdd===
 
===sdd===
<pre>
+
&lt;pre&gt;
 
root@weasel:~# hdparm -I /dev/sdd
 
root@weasel:~# hdparm -I /dev/sdd
  
Line 792: Line 800:
 
Checksum: correct
 
Checksum: correct
  
</pre>
+
&lt;/pre&gt;

Revision as of 03:23, 24 November 2010


The server recently froze and then refused to boot after several attempts. At one point, it froze during LILO at the point of "LIL". I was convinced that I had a hard disk failure and bought a new drive. When I tried to boot after getting back from the store, the server came up.

Now I'm afraid to reboot since I'm sure it will fail at the next power cycle. My goal is to get as much of a backup as I possibly can without a reboot, and maybe even without unmounting the filesystem.

I've got snapshots of the important things and now it is time to get serious.

smartctl

I installed smartmontools and got this report from the suspect drive:

sda

<pre> root@weasel:/var/run# smartctl -a -d ata /dev/sda smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/

START OF INFORMATION SECTION

Model Family: Seagate Barracuda 7200.8 family Device Model: ST3250823AS Serial Number: 4ND06C1Q Firmware Version: 3.03 User Capacity: 250,059,350,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Tue May 20 13:11:39 2008 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled

START OF READ SMART DATA SECTION

SMART overall-health self-assessment test result: PASSED

General SMART Values: Offline data collection status: (0x82) Offline data collection activity

                                       was completed without error.
                                       Auto Offline Data Collection: Enabled.

Self-test execution status: ( 0) The previous self-test routine completed

                                       without error or no self-test has ever 
                                       been run.

Total time to complete Offline data collection: ( 430) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate.

                                       Auto Offline data collection on/off support.
                                       Suspend Offline collection upon new
                                       command.
                                       Offline surface scan supported.
                                       Self-test supported.
                                       No Conveyance Self-test supported.
                                       Selective Self-test supported.

SMART capabilities: (0x0003) Saves SMART data before entering

                                       power-saving mode.
                                       Supports SMART auto save timer.

Error logging capability: (0x01) Error logging supported.

                                       General Purpose Logging supported.

Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 84) minutes.

SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

 1 Raw_Read_Error_Rate     0x000f   050   046   006    Pre-fail  Always       -       67789707
 3 Spin_Up_Time            0x0003   098   098   000    Pre-fail  Always       -       0
 4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       31
 5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
 7 Seek_Error_Rate         0x000f   078   060   030    Pre-fail  Always       -       78856544
 9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       871
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       39

194 Temperature_Celsius 0x0022 039 047 000 Old_age Always - 39 (Lifetime Min/Max 0/20) 195 Hardware_ECC_Recovered 0x001a 050 046 000 Old_age Always - 67789707 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 199 000 Old_age Always - 1 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0

SMART Error Log Version: 1 ATA Error Count: 1

       CR = Command Register [HEX]
       FR = Features Register [HEX]
       SC = Sector Count Register [HEX]
       SN = Sector Number Register [HEX]
       CL = Cylinder Low Register [HEX]
       CH = Cylinder High Register [HEX]
       DH = Device/Head Register [HEX]
       DC = Device Command Register [HEX]
       ER = Error register [HEX]
       ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 occurred at disk power-on lifetime: 380 hours (15 days + 20 hours)

 When the command that caused the error occurred, the device was active or idle.
 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 84 51 00 6d a1 d7 e0  Error: ICRC, ABRT at LBA = 0x00d7a16d = 14131565
 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 25 00 08 66 a1 d7 e0 00      04:38:40.754  READ DMA EXT
 25 00 10 76 ff bb e0 00      04:38:40.753  READ DMA EXT
 25 00 08 ce ee bb e0 00      04:38:40.749  READ DMA EXT
 35 00 08 be 19 08 e0 00      04:38:40.749  WRITE DMA EXT
 35 00 68 56 19 08 e0 00      04:38:40.744  WRITE DMA EXT

SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
   1        0        0  Not_testing
   2        0        0  Not_testing
   3        0        0  Not_testing
   4        0        0  Not_testing
   5        0        0  Not_testing

Selective self-test flags (0x0):

 After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay. </pre>

That compares with this report for a newer disk in the same system:

sdb

<pre> root@weasel:/var/run# smartctl -a -d ata /dev/sdb smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/

START OF INFORMATION SECTION

Model Family: Seagate Barracuda 7200.8 family Device Model: ST3250823AS Serial Number: 4ND05VJS Firmware Version: 3.03 User Capacity: 250,059,350,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Tue May 20 13:17:34 2008 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled

START OF READ SMART DATA SECTION

SMART overall-health self-assessment test result: PASSED

General SMART Values: Offline data collection status: (0x82) Offline data collection activity

                                       was completed without error.
                                       Auto Offline Data Collection: Enabled.

Self-test execution status: ( 0) The previous self-test routine completed

                                       without error or no self-test has ever 
                                       been run.

Total time to complete Offline data collection: ( 430) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate.

                                       Auto Offline data collection on/off support.
                                       Suspend Offline collection upon new
                                       command.
                                       Offline surface scan supported.
                                       Self-test supported.
                                       No Conveyance Self-test supported.
                                       Selective Self-test supported.

SMART capabilities: (0x0003) Saves SMART data before entering

                                       power-saving mode.
                                       Supports SMART auto save timer.

Error logging capability: (0x01) Error logging supported.

                                       General Purpose Logging supported.

Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 84) minutes.

SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

 1 Raw_Read_Error_Rate     0x000f   048   046   006    Pre-fail  Always       -       121096821
 3 Spin_Up_Time            0x0003   099   098   000    Pre-fail  Always       -       0
 4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       41
 5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
 7 Seek_Error_Rate         0x000f   085   060   030    Pre-fail  Always       -       349296371
 9 Power_On_Hours          0x0032   076   076   000    Old_age   Always       -       21740
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       44

194 Temperature_Celsius 0x0022 040 051 000 Old_age Always - 40 (Lifetime Min/Max 0/20) 195 Hardware_ECC_Recovered 0x001a 048 046 000 Old_age Always - 121096821 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 153 000 Old_age Always - 71 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0

SMART Error Log Version: 1 ATA Error Count: 67 (device log contains only the most recent five errors)

       CR = Command Register [HEX]
       FR = Features Register [HEX]
       SC = Sector Count Register [HEX]
       SN = Sector Number Register [HEX]
       CL = Cylinder Low Register [HEX]
       CH = Cylinder High Register [HEX]
       DH = Device/Head Register [HEX]
       DC = Device Command Register [HEX]
       ER = Error register [HEX]
       ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 67 occurred at disk power-on lifetime: 21647 hours (901 days + 23 hours)

 When the command that caused the error occurred, the device was active or idle.
 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 84 51 00 00 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000000 = 0
 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 25 00 01 00 00 00 e0 00      00:07:03.246  READ DMA EXT
 91 00 3f 00 00 00 ef 00      00:07:02.967  INITIALIZE DEVICE PARAMETERS [OBS-6]
 c6 00 10 00 00 00 e0 00      00:07:02.670  SET MULTIPLE MODE
 91 00 3f 00 00 00 ef 00      00:04:06.868  INITIALIZE DEVICE PARAMETERS [OBS-6]
 10 00 00 00 00 00 e0 00      00:04:06.868  RECALIBRATE [OBS-4]

Error 66 occurred at disk power-on lifetime: 14169 hours (590 days + 9 hours)

 When the command that caused the error occurred, the device was active or idle.
 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 84 51 00 00 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000000 = 0
 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 25 00 01 00 00 00 e0 00      00:00:11.079  READ DMA EXT
 91 00 3f 00 00 00 ef 00      00:00:11.078  INITIALIZE DEVICE PARAMETERS [OBS-6]
 c6 00 10 00 00 00 e0 00      00:00:10.816  SET MULTIPLE MODE
 91 00 3f 00 00 00 ef 00      00:00:08.392  INITIALIZE DEVICE PARAMETERS [OBS-6]
 10 00 00 00 00 00 e0 00   1d+13:16:57.727  RECALIBRATE [OBS-4]

Error 65 occurred at disk power-on lifetime: 14166 hours (590 days + 6 hours)

 When the command that caused the error occurred, the device was active or idle.
 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 84 51 00 ce 01 10 e0  Error: ICRC, ABRT at LBA = 0x001001ce = 1049038
 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 25 00 08 c7 01 10 e0 00      16:27:15.928  READ DMA EXT
 25 00 08 c7 01 10 e0 00      16:27:15.928  READ DMA EXT
 35 00 08 07 02 98 e0 00      16:27:15.922  WRITE DMA EXT
 35 00 08 cf 01 98 e0 00      16:27:15.918  WRITE DMA EXT
 35 00 08 47 ff ae e0 00      16:27:15.918  WRITE DMA EXT

Error 64 occurred at disk power-on lifetime: 14166 hours (590 days + 6 hours)

 When the command that caused the error occurred, the device was active or idle.
 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 84 51 00 ce 01 10 e0  Error: ICRC, ABRT at LBA = 0x001001ce = 1049038
 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 25 00 08 c7 01 10 e0 00      16:27:15.928  READ DMA EXT
 35 00 08 07 02 98 e0 00      16:27:15.928  WRITE DMA EXT
 35 00 08 cf 01 98 e0 00      16:27:15.922  WRITE DMA EXT
 35 00 08 47 ff ae e0 00      16:27:15.918  WRITE DMA EXT
 35 00 08 f7 f1 ae e0 00      16:27:15.918  WRITE DMA EXT

Error 63 occurred at disk power-on lifetime: 13553 hours (564 days + 17 hours)

 When the command that caused the error occurred, the device was active or idle.
 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 84 51 00 26 0b ac e0  Error: ICRC, ABRT at LBA = 0x00ac0b26 = 11275046
 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 25 00 08 1f 0b ac e0 00      02:35:40.734  READ DMA EXT
 25 00 08 1f 0b ac e0 00      02:35:40.285  READ DMA EXT
 25 00 08 1f 0b ac e0 00      02:35:39.835  READ DMA EXT
 25 00 08 1f 0b ac e0 00      02:35:39.395  READ DMA EXT
 25 00 08 1f 0b ac e0 00      02:35:38.942  READ DMA EXT

SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
   1        0        0  Not_testing
   2        0        0  Not_testing
   3        0        0  Not_testing
   4        0        0  Not_testing
   5        0        0  Not_testing

Selective self-test flags (0x0):

 After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay. </pre>

sdc

<pre> root@weasel:~# smartctl -a -d ata /dev/sdc smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/

START OF INFORMATION SECTION

Device Model: WDC WD5000AACS-00ZUB0 Serial Number: WD-WCASU1903539 Firmware Version: 01.01B01 User Capacity: 500,107,862,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Mon Sep 29 15:47:01 2008 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled

START OF READ SMART DATA SECTION

SMART overall-health self-assessment test result: PASSED

General SMART Values: Offline data collection status: (0x82) Offline data collection activity

                                       was completed without error.
                                       Auto Offline Data Collection: Enabled.

Self-test execution status: ( 0) The previous self-test routine completed

                                       without error or no self-test has ever 
                                       been run.

Total time to complete Offline data collection: (13980) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate.

                                       Auto Offline data collection on/off support.
                                       Suspend Offline collection upon new
                                       command.
                                       Offline surface scan supported.
                                       Self-test supported.
                                       Conveyance Self-test supported.
                                       Selective Self-test supported.

SMART capabilities: (0x0003) Saves SMART data before entering

                                       power-saving mode.
                                       Supports SMART auto save timer.

Error logging capability: (0x01) Error logging supported.

                                       General Purpose Logging supported.

Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 163) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes.

SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

 1 Raw_Read_Error_Rate     0x000f   200   200   051    Pre-fail  Always       -       0
 3 Spin_Up_Time            0x0003   166   164   021    Pre-fail  Always       -       4691
 4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       12
 5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
 7 Seek_Error_Rate         0x000e   200   200   051    Old_age   Always       -       0
 9 Power_On_Hours          0x0032   097   097   000    Old_age   Always       -       2316
10 Spin_Retry_Count        0x0012   100   253   051    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0012   100   253   051    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       11

192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 7 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 5893 194 Temperature_Celsius 0x0022 107 100 000 Old_age Always - 40 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 051 Old_age Offline - 0

SMART Error Log Version: 1 No Errors Logged

SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
   1        0        0  Not_testing
   2        0        0  Not_testing
   3        0        0  Not_testing
   4        0        0  Not_testing
   5        0        0  Not_testing

Selective self-test flags (0x0):

 After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

</pre>

sdd

<pre> root@weasel:~# smartctl -a -d ata /dev/sdd smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/

START OF INFORMATION SECTION

Model Family: Seagate Barracuda 7200.10 family Device Model: ST3500630AS Serial Number: 5QG2B7HE Firmware Version: 3.AAK User Capacity: 500,107,862,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Mon Sep 29 15:47:24 2008 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled

START OF READ SMART DATA SECTION

SMART overall-health self-assessment test result: PASSED See vendor-specific Attribute list for marginal Attributes.

General SMART Values: Offline data collection status: (0x82) Offline data collection activity

                                       was completed without error.
                                       Auto Offline Data Collection: Enabled.

Self-test execution status: ( 0) The previous self-test routine completed

                                       without error or no self-test has ever 
                                       been run.

Total time to complete Offline data collection: ( 430) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate.

                                       Auto Offline data collection on/off support.
                                       Suspend Offline collection upon new
                                       command.
                                       Offline surface scan supported.
                                       Self-test supported.
                                       No Conveyance Self-test supported.
                                       Selective Self-test supported.

SMART capabilities: (0x0003) Saves SMART data before entering

                                       power-saving mode.
                                       Supports SMART auto save timer.

Error logging capability: (0x01) Error logging supported.

                                       General Purpose Logging supported.

Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 163) minutes.

SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

 1 Raw_Read_Error_Rate     0x000f   103   087   006    Pre-fail  Always       -       5641135
 3 Spin_Up_Time            0x0003   095   095   000    Pre-fail  Always       -       0
 4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       8
 5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       1
 7 Seek_Error_Rate         0x000f   081   060   030    Pre-fail  Always       -       132546848
 9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -       2292
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       8

187 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 189 Unknown_Attribute 0x003a 100 100 000 Old_age Always - 0 190 Temperature_Celsius 0x0022 048 044 045 Old_age Always In_the_past 891748404 194 Temperature_Celsius 0x0022 052 056 000 Old_age Always - 52 (Lifetime Min/Max 0/33) 195 Hardware_ECC_Recovered 0x001a 060 047 000 Old_age Always - 13571444 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0

SMART Error Log Version: 1 No Errors Logged

SMART Self-test log structure revision number 1

SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
   1        0        0  Not_testing
   2        0        0  Not_testing
   3        0        0  Not_testing
   4        0        0  Not_testing
   5        0        0  Not_testing

Selective self-test flags (0x0):

 After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

</pre> Odd, the disk that I don't suspect to be bad is the one with errors. But my suspicion is confirmed by the amount of time that it takes the two drives to run the short self-test. The suspected bad disk (sda) seems to take about 5 minutes to run the self test which should complete in 1 minute. The suspected good disk (sdb) completes the self-test in under a minute. But both complete with no errors.

dhparm -I

Here is the output of hdparm -I for the suspect drive:

sda

<pre> root@weasel:/var/run# hdparm -I /dev/sda

/dev/sda:

ATA device, with non-removable media

       Model Number:       ST3250823AS                             
       Serial Number:      4ND06C1Q
       Firmware Revision:  3.03    

Standards:

       Supported: 7 6 5 4 
       Likely used: 7

Configuration:

       Logical         max     current
       cylinders       16383   16383
       heads           16      16
       sectors/track   63      63
       --
       CHS current addressable sectors:   16514064
       LBA    user addressable sectors:  268435455
       LBA48  user addressable sectors:  488397168
       device size with M = 1024*1024:      238475 MBytes
       device size with M = 1000*1000:      250059 MBytes (250 GB)

Capabilities:

       LBA, IORDY(can be disabled)
       Queue depth: 32
       Standby timer values: spec'd by Standard, no device specific minimum
       R/W multiple sector transfer: Max = 16  Current = 16
       Recommended acoustic management value: 128, current value: 0
       DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
            Cycle time: min=120ns recommended=120ns
       PIO: pio0 pio1 pio2 pio3 pio4 
            Cycle time: no flow control=240ns  IORDY flow control=120ns

Commands/features:

       Enabled Supported:
          *    SMART feature set
               Security Mode feature set
          *    Power Management feature set
          *    Write cache
          *    Look-ahead
          *    Host Protected Area feature set
          *    WRITE_BUFFER command
          *    READ_BUFFER command
          *    DOWNLOAD_MICROCODE
               SET_MAX security extension
          *    48-bit Address feature set
          *    Device Configuration Overlay feature set
          *    Mandatory FLUSH_CACHE
          *    FLUSH_CACHE_EXT
          *    SMART error logging
          *    SMART self-test
          *    General Purpose Logging feature set
          *    SATA-I signaling speed (1.5Gb/s)
          *    Native Command Queueing (NCQ)
          *    Phy event counters
          *    Software settings preservation

Security:

       Master password revision code = 65534
               supported
       not     enabled
       not     locked
       not     frozen
       not     expired: security count
       not     supported: enhanced erase

Checksum: correct </pre> And here is the corresponding output for a supposedly good drive:

sdb

<pre> root@weasel:/var/run# hdparm -I /dev/sdb

/dev/sdb:

ATA device, with non-removable media

       Model Number:       ST3250823AS                             
       Serial Number:      4ND05VJS
       Firmware Revision:  3.03    

Standards:

       Supported: 7 6 5 4 
       Likely used: 7

Configuration:

       Logical         max     current
       cylinders       16383   16383
       heads           16      16
       sectors/track   63      63
       --
       CHS current addressable sectors:   16514064
       LBA    user addressable sectors:  268435455
       LBA48  user addressable sectors:  488397168
       device size with M = 1024*1024:      238475 MBytes
       device size with M = 1000*1000:      250059 MBytes (250 GB)

Capabilities:

       LBA, IORDY(can be disabled)
       Queue depth: 32
       Standby timer values: spec'd by Standard, no device specific minimum
       R/W multiple sector transfer: Max = 16  Current = 16
       Recommended acoustic management value: 128, current value: 0
       DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
            Cycle time: min=120ns recommended=120ns
       PIO: pio0 pio1 pio2 pio3 pio4 
            Cycle time: no flow control=240ns  IORDY flow control=120ns

Commands/features:

       Enabled Supported:
          *    SMART feature set
               Security Mode feature set
          *    Power Management feature set
          *    Write cache
          *    Look-ahead
          *    Host Protected Area feature set
          *    WRITE_BUFFER command
          *    READ_BUFFER command
          *    DOWNLOAD_MICROCODE
               SET_MAX security extension
          *    48-bit Address feature set
          *    Device Configuration Overlay feature set
          *    Mandatory FLUSH_CACHE
          *    FLUSH_CACHE_EXT
          *    SMART error logging
          *    SMART self-test
          *    General Purpose Logging feature set
          *    SATA-I signaling speed (1.5Gb/s)
          *    Native Command Queueing (NCQ)
          *    Phy event counters
          *    Software settings preservation

Security:

       Master password revision code = 65534
               supported
       not     enabled
       not     locked
       not     frozen
       not     expired: security count
       not     supported: enhanced erase

Checksum: correct </pre>

sdc

<pre> root@weasel:~# hdparm -I /dev/sdc

/dev/sdc:

ATA device, with non-removable media

       Model Number:       WDC WD5000AACS-00ZUB0                   
       Serial Number:      WD-WCASU1903539
       Firmware Revision:  01.01B01
       Transport:          Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5

Standards:

       Supported: 8 7 6 5 
       Likely used: 8

Configuration:

       Logical         max     current
       cylinders       16383   16383
       heads           16      16
       sectors/track   63      63
       --
       CHS current addressable sectors:   16514064
       LBA    user addressable sectors:  268435455
       LBA48  user addressable sectors:  976773168
       device size with M = 1024*1024:      476940 MBytes
       device size with M = 1000*1000:      500107 MBytes (500 GB)

Capabilities:

       LBA, IORDY(can be disabled)
       Queue depth: 32
       Standby timer values: spec'd by Standard, with device specific minimum
       R/W multiple sector transfer: Max = 16  Current = 16
       Recommended acoustic management value: 128, current value: 254
       DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
            Cycle time: min=120ns recommended=120ns
       PIO: pio0 pio1 pio2 pio3 pio4 
            Cycle time: no flow control=120ns  IORDY flow control=120ns

Commands/features:

       Enabled Supported:
          *    SMART feature set
               Security Mode feature set
          *    Power Management feature set
          *    Write cache
          *    Look-ahead
          *    Host Protected Area feature set
          *    WRITE_BUFFER command
          *    READ_BUFFER command
          *    NOP cmd
          *    DOWNLOAD_MICROCODE
               Power-Up In Standby feature set
          *    SET_FEATURES required to spinup after power up
               SET_MAX security extension
               Automatic Acoustic Management feature set
          *    48-bit Address feature set
          *    Device Configuration Overlay feature set
          *    Mandatory FLUSH_CACHE
          *    FLUSH_CACHE_EXT
          *    SMART error logging
          *    SMART self-test
          *    General Purpose Logging feature set
          *    64-bit World wide name
          *    {READ,WRITE}_DMA_EXT_GPL commands
          *    Segmented DOWNLOAD_MICROCODE
          *    SATA-I signaling speed (1.5Gb/s)
          *    SATA-II signaling speed (3.0Gb/s)
          *    Native Command Queueing (NCQ)
          *    Host-initiated interface power management
          *    Phy event counters
               DMA Setup Auto-Activate optimization
          *    Software settings preservation
          *    SMART Command Transport (SCT) feature set
          *    SCT Long Sector Access (AC1)
          *    SCT LBA Segment Access (AC2)
          *    SCT Error Recovery Control (AC3)
          *    SCT Features Control (AC4)
          *    SCT Data Tables (AC5)
               unknown 206[12] (vendor specific)
               unknown 206[13] (vendor specific)

Security:

       Master password revision code = 65534
               supported
       not     enabled
       not     locked
       not     frozen
       not     expired: security count
               supported: enhanced erase
       142min for SECURITY ERASE UNIT. 142min for ENHANCED SECURITY ERASE UNIT.

Logical Unit WWN Device Identifier: 50014ee25668800e

       NAA             : 5
       IEEE OUI        : 14ee
       Unique ID       : 25668800e

Checksum: correct

</pre>

sdd

<pre> root@weasel:~# hdparm -I /dev/sdd

/dev/sdd:

ATA device, with non-removable media

       Model Number:       ST3500630AS                             
       Serial Number:      5QG2B7HE
       Firmware Revision:  3.AAK   

Standards:

       Supported: 7 6 5 4 
       Likely used: 7

Configuration:

       Logical         max     current
       cylinders       16383   16383
       heads           16      16
       sectors/track   63      63
       --
       CHS current addressable sectors:   16514064
       LBA    user addressable sectors:  268435455
       LBA48  user addressable sectors:  976773168
       device size with M = 1024*1024:      476940 MBytes
       device size with M = 1000*1000:      500107 MBytes (500 GB)

Capabilities:

       LBA, IORDY(can be disabled)
       Queue depth: 32
       Standby timer values: spec'd by Standard, no device specific minimum
       R/W multiple sector transfer: Max = 16  Current = 16
       Recommended acoustic management value: 254, current value: 0
       DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
            Cycle time: min=120ns recommended=120ns
       PIO: pio0 pio1 pio2 pio3 pio4 
            Cycle time: no flow control=120ns  IORDY flow control=120ns

Commands/features:

       Enabled Supported:
          *    SMART feature set
               Security Mode feature set
          *    Power Management feature set
          *    Write cache
          *    Look-ahead
          *    Host Protected Area feature set
          *    WRITE_BUFFER command
          *    READ_BUFFER command
          *    DOWNLOAD_MICROCODE
               SET_MAX security extension
          *    48-bit Address feature set
          *    Device Configuration Overlay feature set
          *    Mandatory FLUSH_CACHE
          *    FLUSH_CACHE_EXT
          *    SMART error logging
          *    SMART self-test
          *    General Purpose Logging feature set
          *    SATA-I signaling speed (1.5Gb/s)
          *    Native Command Queueing (NCQ)
          *    Phy event counters
               Device-initiated interface power management
          *    Software settings preservation

Security:

       Master password revision code = 65534
               supported
       not     enabled
       not     locked
       not     frozen
       not     expired: security count
       not     supported: enhanced erase

Checksum: correct

</pre>