MSFN Forum: Seagate Barracuda 7200.11 Troubles - MSFN Forum

Jump to content


Hard Drive and Removable Media issues Rules

If you have questions about Seagate 7200.11, do read the READ_ME_FIRST, then read the FGA. If your questions remain unanswered after reading those two stickies, then post. For all other Hard Drive and Removable Media issues, you may post right away.
  • 63 Pages +
  • « First
  • 51
  • 52
  • 53
  • 54
  • 55
  • Last »
  • You cannot start a new topic
  • You cannot reply to this topic

Seagate Barracuda 7200.11 Troubles "Falling down!" (ST3500320AS-SD15 and others) Rate Topic: -----

#1041 User is offline   mikesw 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 360
  • Joined: 05-October 05

Posted 27 January 2009 - 12:04 PM

Today Western Digital is announcing their WD20WEADS drive, otherwise known as the WD Caviar Green 2.0TB. With 32MB of onboard cache and special power management algorithms that balance spindle speed and transfer rates, the WD Caviar Green 2TB not only breaks the 2 terabyte barrier but also offers offers an extremely low-power profile in its standard 3.5" SATA footprint. Early testing shows it keeps pace with similar capacity drives from Seagate and Samsung."

http://hothardware.com/News/WD-2TB-Caviar-...-Drive-Preview/

MSRP for the new ginormous Caviar is set at $299. You can catch the official press release from WD. Stay tuned for the full HH monty with WD's new big-bad Caviar, coming soon. http://wdc.com/en/company/releases/PressRe...3-F872D0E6C335}

spec sheet: http://wdc.com/en/pr...asp?DriveID=576

Warranty policy in various countries for WDC drives (2TB not listed yet) http://support.wdc.c...licy.asp#policy

buy.com has it for $272.00 http://www.pricegrabber.com/wd20eads/produ...20EADS/st=query

:thumbup

This post has been edited by mikesw: 27 January 2009 - 12:21 PM



#1042 User is offline   DrDisk 

  • Group: Members
  • Posts: 3
  • Joined: 24-January 09

Posted 27 January 2009 - 01:02 PM

Well I guess Mr. SanTools, AKA Storagesecrets was just trying to do some PR for Seagate and at the same time gave his website and company a black eye.

Looks like Seagates entire product line except the SAS drives were affected by atleast this 1 bug, but how many others suffer from other bugs.

1.5TB Studder Issue and the Log issue
1TB/500GB and other LBA 0 bug.

It's too much to keep count. Sure there are some people fanning the flames, but almost EVERY SINGLE forum has people complaining, and I'm pretty sure it isn't the SAME people at all groups. You know you have a problem with your hard drive, when the Under Water Basket Weaving forums start to have posts talking about the failures.

I hope Seagate payed Dlethe well for his PR Spin. Odd it was just 1 day before the WD announcement. Coincident?

#1043 User is offline   anonymous 

  • Group: Members
  • Posts: 4
  • Joined: 27-January 09

Posted 27 January 2009 - 02:25 PM

Regarding "320", here's an exchange with Maxtorman from the Slashdot forum:

Maxtorman's explanation (which was apparently correct):

I'll answer your questions to the best of my ability, and as honestly as I can! I'm no statistician, but the 'drive becoming inaccessable at boot-up' is pretty much a very slim chance - but when you have 10 million drives in the field, it does happen. The conditions have to be just right - you have to reboot just after the drive writes the 320th log file to the firmware space of the drive. this is a log file that's written only occasionally, usually when there are bad sectors, missed writes, etc... might happen every few days on a computer in a non-RAID home use situation.. and if that log file is written even one time after the magic #320, it rolls over the oldest file kept on the drive and there's no issue. It'll only stop responding IF the drive is powered up with log file #320 being the latest one written... a perfect storm situation. IF this is the case, then Seagate is trying to put in place a procedure where you can simply ship them the drive, they hook it up to a serial controller, and re-flashed with the fixed firmware. That's all it takes to restore the drive to operation! As for buying new drives, that's up to you. None of the CC firmware drives were affected - only the SD firmware drives. I'd wait until later in the week, maybe next week, until they have a known working and properly proven firmware update. If you were to have flashed the drives with the 'bad' firmware - it would disable any read/write functions to the drive, but the drive would still be accessible in BIOS and a very good chance that flashing it back to a previous SD firmware (or up to the yet to be released proven firmware) would make it all better. Oh, and RAID0 scares me by it's very nature... not an 'if' but 'when' the RAID 0 craps out and all data is lost - but I'm a bit jaded from too much tech support! :)

My question:

Maxtorman, is the log file written after each power-up (or POR) and before each shut down? It seems to me the #320 is being reached by many users in about 100 days... can that really be from only occasional events like bad sectors and missed writes? See this time histogram:

http://www.msfn.org/board/index.php?showto...st&p=826575

Maxtorman's response:

The log, if my information is correct, is written each time a SMART check is done. This will always happen on drive init, but can also happen at regularly scheduled events during normal usage, as the drive has to go through various maintenance functions to keep it calibrated and working properly.
_______________________

Dlethe said, "The problem is a purple squirrel (sorry about the yankee slang -- it means incredibly rare)."

Well, not if you turn your computer off every night, with or without a SMART check, at least in my opinion.

#1044 User is offline   Gradius2 

  • IT Consultant
  • PipPip
  • Group: Members
  • Posts: 240
  • Joined: 16-January 09
  • OS:Windows 7 x64
  • Country: Country Flag

Posted 27 January 2009 - 05:10 PM

View PostDerSnoezie, on Jan 27 2009, 02:05 PM, said:

View PostGradius2, on Jan 27 2009, 05:11 PM, said:

Now, let's estimate just 10% of 17,5 millions are unlucky and will have this problem so we have 1,75 millions HDDs, in other words, almost 2 millions worldwide I might said.


LOL, that's actually a pretty fat squirrel :lol: But i guess we'll never receive any feedback on the true numbers.


I would estimate at least 1/3 of them will not bother at all and will just return the drives and getting replaced by another brand at first opportunity.

Those will never post a thing or do a research about the problem.

View Postmikesw, on Jan 27 2009, 03:04 PM, said:

Today Western Digital is announcing their WD20WEADS drive, otherwise known as the WD Caviar Green 2.0TB. With 32MB of onboard cache and special power management algorithms that balance spindle speed and transfer rates, the WD Caviar Green 2TB not only breaks the 2 terabyte barrier but also offers offers an extremely low-power profile in its standard 3.5" SATA footprint. Early testing shows it keeps pace with similar capacity drives from Seagate and Samsung."


Real capacity is 1.81TB (formatted and ready for use).

ATM they are very hard to find.

#1045 User is offline   sieve-x 

  • Newbie
  • Group: Members
  • Posts: 12
  • Joined: 19-January 09

Posted 27 January 2009 - 07:17 PM

Let's look again into root cause description in a bit more clear way... :huh:

Affected drive model and firmware will trigger assert failure (ex. not detected
at BIOS) on next power-up initilization due to event log pointer getting past
the end of event log data structure (reserved area track data corruption) if
drive contains a particular data pattern (from factory test mess) and if the
Event Log counter is at entry 320, or a multiple of (320 + x*256).


View Postanonymous, on Jan 27 2009, 02:25 PM, said:

My question:
Maxtorman, is the log file written after each power-up (or POR) and before each shut down? It seems to me the #320 is being reached by many users in about 100 days... can that really be from only occasional events like bad sectors and missed writes? See this time histogram:

http://www.msfn.org/board/index.php?showto...st&p=826575

Maxtorman's response:

The log, if my information is correct, is written each time a SMART check is done. This will always happen on drive init, but can also happen at regularly scheduled events during normal usage, as the drive has to go through various maintenance functions to keep it calibrated and working properly.
_______________________


Event log counter could be written every once in a while for example if S.M.A.R.T automatic
off-line data collection (ex. every 4h) is enabled (it is by default and may include a list of
last few errors like the example below), temperature history, seek error rate and others.

smartctl -l error /dev/sda (data below is an example)

SMART Error Log Version: 1
ATA Error Count: 9 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 9 occurred at disk power-on lifetime: 6877 hours (286 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  10 51 00 ff ff ff 0f

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ea 00 00 ff ff ff af 00	  02:00:24.339  FLUSH CACHE EXIT
  35 00 10 ff ff ff ef 00	  02:00:24.137  WRITE DMA EXT
  35 00 08 ff ff ff ef 00	  02:00:24.136  WRITE DMA EXT
  ca 00 10 77 f7 fc ec 00	  02:00:24.133  WRITE DMA
  25 00 08 ff ff ff ef 00	  02:00:24.132  READ DMA EXT

Error 8 occurred at disk power-on lifetime: 4023 hours (167 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 71 03 80 01 32 e0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  a1 00 00 00 00 00 a0 02   2d+04:33:54.009  IDENTIFY PACKET DEVICE
  ec 00 00 00 00 00 a0 02   2d+04:33:54.001  IDENTIFY DEVICE
  00 00 00 00 00 00 00 ff   2d+04:33:53.532  NOP [Abort queued commands]
  a1 00 00 00 00 00 a0 02   2d+04:33:47.457  IDENTIFY PACKET DEVICE
  ec 00 00 00 00 00 a0 02   2d+04:33:47.445  IDENTIFY DEVICE

... list goes on until error 5

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


This means that theorically disabling S.M.A.R.T automatic off-line self-test, attributte auto
save (something like: smartctl -s on -o off -S off /dev/sdX) and at system BIOS (before
powering-up the drive again) or even disabling the whole S.M.A.R.T feature set could be
a workaround (crippling S.M.A.R.T would not be a permanent solution becuase it helps
to detect/log drive errors
) until the drive firmware is updated.

smartctl -l directory /dev/sda

Log Directory Supported (this one is from an affected model)

SMART Log Directory Logging Version 1 [multi-sector log support]
Log at address 0x00 has 001 sectors [Log Directory]
Log at address 0x01 has 001 sectors [Summary SMART error log]
Log at address 0x02 has 005 sectors [Comprehensive SMART error log]
Log at address 0x03 has 005 sectors [Extended Comprehensive SMART error log]
Log at address 0x06 has 001 sectors [SMART self-test log]
Log at address 0x07 has 001 sectors [Extended self-test log]
Log at address 0x09 has 001 sectors [Selective self-test log]
Log at address 0x10 has 001 sectors [Reserved log]
Log at address 0x11 has 001 sectors [Reserved log]
Log at address 0x21 has 001 sectors [Write stream error log]
Log at address 0x22 has 001 sectors [Read stream error log]
Log at address 0x80 has 016 sectors [Host vendor specific log]
Log at address 0x81 has 016 sectors [Host vendor specific log]
Log at address 0x82 has 016 sectors [Host vendor specific log]
Log at address 0x83 has 016 sectors [Host vendor specific log]
Log at address 0x84 has 016 sectors [Host vendor specific log]
Log at address 0x85 has 016 sectors [Host vendor specific log]
Log at address 0x86 has 016 sectors [Host vendor specific log]
Log at address 0x87 has 016 sectors [Host vendor specific log]
Log at address 0x88 has 016 sectors [Host vendor specific log]
Log at address 0x89 has 016 sectors [Host vendor specific log]
Log at address 0x8a has 016 sectors [Host vendor specific log]
Log at address 0x8b has 016 sectors [Host vendor specific log]
Log at address 0x8c has 016 sectors [Host vendor specific log]
Log at address 0x8d has 016 sectors [Host vendor specific log]
Log at address 0x8e has 016 sectors [Host vendor specific log]
Log at address 0x8f has 016 sectors [Host vendor specific log]
Log at address 0x90 has 016 sectors [Host vendor specific log]
Log at address 0x91 has 016 sectors [Host vendor specific log]
Log at address 0x92 has 016 sectors [Host vendor specific log]
Log at address 0x93 has 016 sectors [Host vendor specific log]
Log at address 0x94 has 016 sectors [Host vendor specific log]
Log at address 0x95 has 016 sectors [Host vendor specific log]
Log at address 0x96 has 016 sectors [Host vendor specific log]
Log at address 0x97 has 016 sectors [Host vendor specific log]
Log at address 0x98 has 016 sectors [Host vendor specific log]
Log at address 0x99 has 016 sectors [Host vendor specific log]
Log at address 0x9a has 016 sectors [Host vendor specific log]
Log at address 0x9b has 016 sectors [Host vendor specific log]
Log at address 0x9c has 016 sectors [Host vendor specific log]
Log at address 0x9d has 016 sectors [Host vendor specific log]
Log at address 0x9e has 016 sectors [Host vendor specific log]
Log at address 0x9f has 016 sectors [Host vendor specific log]
Log at address 0xa1 has 020 sectors [Device vendor specific log]
Log at address 0xa8 has 020 sectors [Device vendor specific log]
Log at address 0xa9 has 001 sectors [Device vendor specific log]
Log at address 0xe0 has 001 sectors [Reserved log]
Log at address 0xe1 has 001 sectors [Reserved log]


It may also be (theorically) possible to check if the 'specific data pattern' is present in system
area IF it can be read from SMART log pages (using standard ATA interface/specification)
so this could be used to create a simple (multi-platform) tool for verifying if a particular
drive is effectively affected by the issue and maybe even used as workaround solution IF
the wrong pattern data or event counter can be changed (ie. read/write).

This post has been edited by sieve-x: 29 January 2009 - 12:42 AM


#1046 User is online   jaclaz 

  • The Finder
  • Group: Developers
  • Posts: 11,457
  • Joined: 23-July 04
  • OS:none specified
  • Country: Country Flag

Posted 28 January 2009 - 04:49 AM

View PostGradius2, on Jan 27 2009, 05:11 PM, said:

20,000 is too optimist in my opinion. This problem just wasn't bigger, because that "320 thing" is based by "luck".


Yep :), the point I was trying to make was that if you can go in a few "logical" steps from 100÷150 reports here on MSFN to a bare minimum of 20,000, rounding everything by defect and using largely speculative "safety" factors, we can say, rightfully and without fearing to be proved wrong by actual figures (when and if they will come to the light), that the phenomenon is HUGE.

Which does not mean it's a matter of millions (though it might be :unsure:) but enough to allow me to say that the known title is incorrect:

Quote

Seagate boot-of-death analysis - nothing but overhyped FUD

as the issue does not appear that much overhyped (read not at all ;)) and it's definitely not FUD.

Using Dirk Gently's I-CHING calculator:
http://www.thateden.co.uk/dirk/
anything resulting above 4 becomes "A Suffusion of Yellow", on my personal calculator anything above 20,000 results in "lots" or "too many to count".
I don't care if they represent "only" "some percentage of the drives". :P

Besides, dlethe while advises the use of common sense:

View Postdlethe, on Jan 26 2009, 04:38 PM, said:

Use some common sense here, factor in how many 'cudas that Seagate ships in a year, and tell me how many millions of disk drives SHOULD be failing if this is a firmware bug that affects all disks running this particular firmware. Seagate is on a 5-year run rate to ship 1,000,000,000 disk drives ANNUALLY by 2014. If the drive problem was as big as you say it is, then they would have caught it in QC. The problem is a purple squirrel (sorry about the yankee slang -- it means incredibly rare).


In his article:
http://storagesecrets.org/2009/01/seagate-...-overhyped-fud/
seems to be lacking the same.

As long as we are "talking adjectives", everyone is free to have it's own stance and definitions, but when it comes to probabilities and calculating them, checking twice the math would be advised.

Compare the "cryptic" explanation of the "magic number":

Quote

So here is what happened. For whatever reason, some of Seagate’s test equipment didn’t zero out the test pattern once the test suite completed, and these disks were shipped. When disks that have this test pattern pre-loaded into the reserved area, and put into service, they are subjected to certain errors, warnings, or I/O activity [remember, I'm not going to tell you what the specific trigger is ..., but the information is available to people who need to know] that results in a counter reaching a certain value. (This is NOT a threshold, but an exact value. I.e., if the magic number was 12345, then 12346 and higher would NOT trigger the bricking logic. Only 12345 triggers it. ). Furthermore, this value is stored in a circular buffer, so it can go up and down over the life of the disk. In order for the disk to brick, the disk must be spun down at the EXACT MOMENT this field is set to this magic number. (The magic number is not an 8-bit value, either). So either on power-down, or power-up, the firmware saw the bit pattern, and the magic number in the circular buffer, and likely did what it was programmed to do … perform a type of lockdown test that is supposed to happen in the safety of the manufacturing/test lab, where it can be unlocked and appropriate action taken by test engineers.

So, let’s say you have a disk with the naughty firmware, that was tested on the wrong test systems at the wrong time. Let’s say that the magic number is a 16-bit number. Then even if you had one of the disks that are at risk, then the odds are > 65,000:1 that you will power the disk off when the counter is currently set to this magic number. If the magic number is stored in a 32-bit field, then buy lottery tickets, because you have a higher probability of winning the lottery then you do that the disk will spin down with the register set to the right value. (BTW, the magic number is not something simple like number of cumulative hours.)


With the one reported from Seagate:

Quote

The firmware issue is that the end boundary of the event log circular buffer (320) was set incorrectly. During Event Log initialization, the boundary condition that defines the end of the Event Log is off by one.
During power up, if the Event Log counter is at entry 320, or a multiple of (320 + x*256), and if a particular data pattern (dependent on the type of tester used during the drive manufacturing test process) had been present
in the reserved-area system tracks when the drive's reserved-area file system was created during manufacturing, firmware will increment the Event Log pointer past the end of the event log data structure. This error is detected and results in an "Assert Failure", which causes the drive to hang as a failsafe measure. When the drive enters failsafe further update s to the counter become impossible and the condition will remain through subsequent power cycles. The problem only arises if a power cycle initialization occurs when the Event Log is at 320 or some multiple of 256 thereafter. Once a drive is in this state, there is no path to resolve/recover existing failed drives without Seagate technical intervention.


Since I guess that this latter info was available to dlethe in his "under NDA" documentation, let's see how many x's we have in 16 bit number :ph34r: :
We have 65,536 values, possibly from 0 to 65,535.
In this range, maximum x can be found by resolving:
320+x*256=65,535
Thus:
x*256=65,535-320
x=(65,535-320)/256
x=254.7461 => 254 (plus the 0 value, i.e. "plain" 320 case) => 255 possible values for x

This would place the odds to 65,536:255 => i.e. roughly to 257:1 instead than the proposed "> 65,000:1" :w00t:

Which would mean that the initial calculation was grossly underestimated.

Again, it is possible that today is not my "lucky" day with math.....:whistle:

jaclaz

This post has been edited by jaclaz: 28 January 2009 - 04:59 AM


#1047 User is offline   icefloe01 

  • Junior
  • Pip
  • Group: Members
  • Posts: 61
  • Joined: 04-January 09

Posted 28 January 2009 - 06:06 AM

jaclaz is teh kewlerest!

#1048 User is offline   Oliver.HH 

  • Group: Members
  • Posts: 7
  • Joined: 28-January 09

Posted 28 January 2009 - 06:29 AM

Another attempt to estimate the probability of a drive failing...

Given the "root cause" document posted here by sieve-x, this is what we know:
  • A drive is affected by the bug if it contains the defective firmware and has been tested on certain test stations.
  • An affected drive will fail if turned off after exactly 320 internal events were logged initially or any multiple of 256 thereafter.

We don't have the details on how often exactly the event log is written to. Someone mentioned that it's written to when the drive initializes on power-up (though I don't remember the source). If that's true, we would have one event per power cycle plus an unknown and possibly varying number in between.

Given that, the probability of an affected drive being alive after one power cycle is 255/256. After two power cycles it's 255/256 * 255/256. After three power cycles it's (255/256)^3. And so on. While the isolated probability of the drive failing on a single power-up is just 0.4%, the numbers go up when you calculate the probability of a drive failing over time.

Let's assume, a desktop drive is power cycled once a day. The probability of an affected drive failing then is:
0.4% for 1 day
11.1% over 30 days
29.7% over 90 days
76.0% over 365 days

Obviously, I'm ignoring the fact that initially a higher number of events (320) must be logged to trigger the failure. Anyway, this would not change the numbers substiantally and the initial number might be even lower than 256 depending on the number of events logged during the manufacturing process. I'm also ignoring the number of events written while the drive is powered on, as it should not affect the overall probability.

This post has been edited by Oliver.HH: 28 January 2009 - 08:59 AM


#1049 User is online   jaclaz 

  • The Finder
  • Group: Developers
  • Posts: 11,457
  • Joined: 23-July 04
  • OS:none specified
  • Country: Country Flag

Posted 28 January 2009 - 08:33 AM

View PostOliver.HH, on Jan 28 2009, 01:29 PM, said:

Obsiously, I'm ignoring the fact that initially a higher number of events (320) must be logged to trigger the failure. Anyway, this would not change the numbers substiantally and the initial number might be even lower than 256 depending on the number of events logged during the manufacturing process. I'm also ignoring the number of events written while the drive is powered on, as it should not affect the overall probability.


Yep :), and we don't even have a clear idea on WHICH events are logged and HOW MANY such events take place in an "average powered on hour".

If, as it has been hinted/reported somewhere on the threads, a S.M.A.R.T. query raises an event that is actually logged, we will soon fall in the paradox that the more you check your hardware status the more prone it is to fail.....:w00t:

Additionally, supposing that certain commands create multiple entries (or "sets" of entries) it is debatable whether "320" has more or less probabilities to be reached.

I mean how probable it is with a "random" number of arbitrary "sets" ( say resulting in 1, 2, 3 or 4 log entries) to reach exactly 320 or to miss it, like in:

Quote

317+4
318+3
319+2


I don't think we can find an accurate answer :unsure:, but we can say that we are definitely NOT in an Infinite Improbability Drive (pardon me the pun ;)):
http://en.wikipedia.org/wiki/Sub-Etha#Infi...obability_Drive

Douglas Adams said:

two to the power of two hundred and sixty-seven thousand seven hundred and nine to one against.

.....

It sounded quite a sensible voice, but it just said, "Two to the power of one hundred thousand to one against and falling," and that was all.

Ford skidded down a beam of light and span round trying to find a source for the voice but could see nothing he could seriously believe in.

"What was that voice?" shouted Arthur.

"I don't know," yelled Ford, "I don't know. It sounded like a measurement of probability."

"Probability? What do you mean?"

"Probability. You know, like two to one, three to one, five to four against. It said two to the power of one hundred thousand to one against. That's pretty improbable you know."

.....


The voice continued.

"Please do not be alarmed," it said, "by anything you see or hear around you. You are bound to feel some initial ill effects as you have been rescued from certain death at an improbability level of two to the power of two hundred and seventy-six thousand to one against — possibly much higher. We are now cruising at a level of two to the power of twenty-five thousand to one against and falling, and we will be restoring normality just as soon as we are sure what is normal anyway. Thank you. Two to the power of twenty thousand to one against and falling."


but rather near, VERY near normality (1:1)....

:thumbup

jaclaz

This post has been edited by jaclaz: 28 January 2009 - 08:34 AM


#1050 User is offline   Oliver.HH 

  • Group: Members
  • Posts: 7
  • Joined: 28-January 09

Posted 28 January 2009 - 09:20 AM

View Postjaclaz, on Jan 28 2009, 03:33 PM, said:

Yep :), and we don't even have a clear idea on WHICH events are logged and HOW MANY such events take place in an "average powered on hour".

True, but we don't have to know. The probability of a drive failing is the same as long as at least one event is logged per power cycle.

Quote

If, as it has been hinted/reported somewhere on the threads, a S.M.A.R.T. query raises an event that is actually logged, we will soon fall in the paradox that the more you check your hardware status the more prone it is to fail.....:w00t:

No, the chance of a drive failing due to this condition is zero unless it is powered off.

All that matters is that the event counter changes at all from power-on to power-off. It does not matter whether it increases by 1, or by 50 or by any other value as long as such values are equally probable.

#1051 User is offline   Gradius2 

  • IT Consultant
  • PipPip
  • Group: Members
  • Posts: 240
  • Joined: 16-January 09
  • OS:Windows 7 x64
  • Country: Country Flag

Posted 28 January 2009 - 10:10 AM

Very nice arguments, this recalls my old days on CBBS/BBS age back in 1985. :)

Seagate boot-of-death analysis - nothing but overhyped FUD

Of course that statement above is a BIG bad joke from Seagate or whatever the source is.

To put the things simple to everyone the best proof is looking how many viewers we got on topics here related to Seagate's problems (aka 7200.11 syndrome).

Now lets just do a simple google search, by entering:

"Seagate 7200.11 failing": I got 72,100 links
"Seagate 7200.11 fail": 98,100 links
"Seagate 7200.11-failing": 371,000 links

The bad thing is I don't know how many those links might be related to the same site, so I'll take everything divided by 4:

If we take at least four drives are necessary for someone to write about this issue on the web (hence divided by 4), and at least 10 people will read that (because they have the same issue), and they all will have an average of 2 drives with problems, then we would have: 72,100/4*10*2 = 360,500 defective drives, until 371,000/4*10*2 = 1,855,000 drives (in rough math).


Now, lets looks to those who knows the issues those drives are reporting:
"Seagate 7200.11 bsy+error": 11,100 links
"Seagate 7200.11 0+lba": 4,980 links

16,080 links, unfortunately we cannot apply the same "math" as above, since this is a bit different, few people would know relatively well the problem, and will try to fix the thing themselfs, I would estimate as low as 1% of them. So in best case scenario (for Seagate) they're just by x10 factor, and worst, by x100 factor. So 16,080 * 10 = 160,800 until 16,080 * 100 = 1,608,000.

In both cases, it ultrapasses 1 million mark, coincidence?

overhyped FUD they said? :crazy: LAUGH! :w00t:

This post has been edited by Gradius2: 28 January 2009 - 10:13 AM


#1052 User is offline   SpXuxu 

  • Group: Members
  • Posts: 7
  • Joined: 19-January 09

Posted 28 January 2009 - 12:20 PM

View PostGradius2, on Jan 28 2009, 01:10 PM, said:

overhyped FUD they said? :crazy: LAUGH! :w00t:

I think i can understand what Seabreak (or Seabrick) means with that

FUD = FU...keD

(fill in the blanks)

This post has been edited by SpXuxu: 01 February 2009 - 06:28 PM


#1053 User is online   jaclaz 

  • The Finder
  • Group: Developers
  • Posts: 11,457
  • Joined: 23-July 04
  • OS:none specified
  • Country: Country Flag

Posted 28 January 2009 - 12:34 PM

View PostGradius2, on Jan 28 2009, 05:10 PM, said:

Now lets just do a simple google search, by entering:

"Seagate 7200.11 failing": I got 72,100 links
"Seagate 7200.11 fail": 98,100 links
"Seagate 7200.11-failing": 371,000 links


Sorry to say so :(, but that's not really a "valid" argument, as I (and some other people) see it ;):
http://homepages.tesco.net/J.deBoynePollar...ess-metric.html

jaclaz

#1054 User is offline   mikesw 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 360
  • Joined: 05-October 05

  Posted 28 January 2009 - 12:39 PM

The victory tool v3.4 from hddguru seems to be MSDOS based and require a floppy disk.

Is there a version that runs on windows?

This post has been edited by mikesw: 28 January 2009 - 12:39 PM


#1055 User is online   jaclaz 

  • The Finder
  • Group: Developers
  • Posts: 11,457
  • Joined: 23-July 04
  • OS:none specified
  • Country: Country Flag

Posted 28 January 2009 - 12:42 PM

View Postmikesw, on Jan 28 2009, 07:39 PM, said:

The victory tool v3.4 from hddguru seems to be MSDOS based and require a floppy disk.

Is there a version that runs on windows?


Victoria

Looky here:
http://www.benchmark...ml?/be_hdd.html

jaclaz

#1056 User is offline   Gibby 

  • Group: Members
  • Posts: 8
  • Joined: 28-January 09

Posted 28 January 2009 - 01:34 PM

View PostOliver.HH, on Jan 28 2009, 09:20 AM, said:

View Postjaclaz, on Jan 28 2009, 03:33 PM, said:

Yep :), and we don't even have a clear idea on WHICH events are logged and HOW MANY such events take place in an "average powered on hour".

True, but we don't have to know. The probability of a drive failing is the same as long as at least one event is logged per power cycle.

Quote

If, as it has been hinted/reported somewhere on the threads, a S.M.A.R.T. query raises an event that is actually logged, we will soon fall in the paradox that the more you check your hardware status the more prone it is to fail.....:w00t:

No, the chance of a drive failing due to this condition is zero unless it is powered off.

All that matters is that the event counter changes at all from power-on to power-off. It does not matter whether it increases by 1, or by 50 or by any other value as long as such values are equally probable.


But the events are hardly equally probable. It's much more likely that you're going to get a very small number each power cycle. The chances of dozens or hundreds of entries each power cycle are almost non-existant unless your drive is hosed to begin with.

And consider this: if the log incremented by EXACTLY one each power cycle (I don't know if that's even possible), what's the probability an (affected) drive will fail? It's 100%. It will fail with certainty because it WILL occur on the 320th power cycle. It will take just under a year or so for this to happen for a lot of home users assuming a power cycle per day. Just an example of course. We have to consider that a lot of drives from the list can be seen failing after around 60 - 100 days. Would this be something roughly like 60 - 100 power cycles for those drives? So maybe for the first 'batch' of bad drives, you're seeing something like a 3 - 5 log entries on average per power cycle.

My point is that the probability of an affected drive failing may be as high as something like 3, 4 or 5:1. We have probably not seen the bulk of failures yet - it's too early! And the lower the average number of log entries per power cycle, the higher the probability eventually becomes for the initial 320th entry and each 256th circulation after that. It will take longer, i.e., more power cycles, but there's a better chance of hitting the bad entry each complete cycle. Even if the average number of entries is very low, like .5 per power cycle, there is an extremely high chance of the drive failing - eventually. It's just going to take around 640 power cycles, but you are unlikely to skip ending exactly on entry 320 (or x*254 thereafter).

Figuring out the probability of failure on any single power cycle isn't really useful. The question most 7200.11 owners have is: What are the chances my drive will fail AT ALL in the next year or two?

This post has been edited by Gibby: 28 January 2009 - 01:51 PM


#1057 User is offline   pichi 

  • Member
  • PipPip
  • Group: Members
  • Posts: 170
  • Joined: 03-January 09

Posted 28 January 2009 - 03:25 PM

Seagate modify commands in the new firmware:

SD15:
Level T 'i': Rev 0001.0000, Overlay, InitDefectList, i[DefectListSelect],[SaveListOpt],[ValidKey]
Level T 'm': Rev 0001.0000, Flash, FormatPartition,
m[Partition],[FormatOpts],[DefectListOpts],[MaxWrRetryCnt],[MaxRdRetryCnt],[MaxEccTLevel],[MaxCertif
yTrkRewrites],[ValidKey]

SD1A:
Level T 'i': Rev 0011.0000, Overlay, InitDefectList, i[DefectListSelect],[SaveListOpt],[ValidKey]
Level T 'm': Rev 0012.0000, Flash, FormatPartition, m[Partition],[FormatOpts],[DefectListOpts],[MaxWrRetryCnt],[MaxRdRetryCnt],[MaxEccTLevel],[MaxCertif
yTrkRewrites],[ValidKey],[DataPattern]

Questions:
¿What is [DataPattern] in Level T 'm'?
Can be SD1A bricks repaired with the new commands table?

This post has been edited by pichi: 28 January 2009 - 03:27 PM


#1058 User is offline   sieve-x 

  • Newbie
  • Group: Members
  • Posts: 12
  • Joined: 19-January 09

Posted 28 January 2009 - 06:51 PM

View Postpichi, on Jan 28 2009, 03:25 PM, said:

Questions:
¿What is [DataPattern] in Level T 'm'?
Can be SD1A bricks repaired with the new commands table?

They should work as long you are were dealing with the same issue but SD1A
fixes that and then 'bricking' cause/solution would be something different. About
[DataPattern] I would guess the name says what it does (create/fill data pattern).

Updated my previous post #1045 to shed some light :unsure: around root cause and S.M.A.R.T.

This post has been edited by sieve-x: 29 January 2009 - 01:12 AM


#1059 User is offline   pichi 

  • Member
  • PipPip
  • Group: Members
  • Posts: 170
  • Joined: 03-January 09

Posted 29 January 2009 - 04:24 AM

I have developed programs to automatize the repairing process, to do it more easy.
Some people have probed these programs and them works.
I am colaborating with Fatlip to give a worldwide low cost solution (adapter and torx), there is people that cannot find adapters.
Soldering station aren't neccesary, electronic knowledge neither.
The work is behind and thanks to a lithuanian webpage we have the solution:
http://yura.projekta...720011_ES2.html
Due to some people that only know copy and paste, and later request donations ... I am thinking if I will give the programs or not. :angry:

#1060 User is online   jaclaz 

  • The Finder
  • Group: Developers
  • Posts: 11,457
  • Joined: 23-July 04
  • OS:none specified
  • Country: Country Flag

Posted 29 January 2009 - 04:45 AM

View Postpichi, on Jan 29 2009, 11:24 AM, said:

I have developed programs to automatize the repairing process, to do it more easy.

That would be a great thing. :)

I have a few PM's by people who don't know English very well, so I'm trying to find the time to translate existing guide (into Italian), but I am a bit reluctant as this "kind" of people tends to be also not particularly "tech savvy" and the procedure is fairly complex for the newbie, and the risk of somehow "frying" the drive by mistake is great.

Having something along the lines of what I hinted here:
http://www.msfn.org/board/index.php?showto...28807&st=48

tested and working, could make the difference. :thumbup

About the other point, of course you are free to choose your way, but:

Sir Winston Churchill said:

We make a living by what we get, but we make a life by what we give.

;)

jaclaz

Share this topic:


  • 63 Pages +
  • « First
  • 51
  • 52
  • 53
  • 54
  • 55
  • Last »
  • You cannot start a new topic
  • You cannot reply to this topic

3 User(s) are reading this topic
0 members, 3 guests, 0 anonymous users



All trademarks mentioned on this page are the property of their respective owners
Copyright © 2001 - 2013 msfn.org
Privacy Policy