Posted 06 June 2006 - 12:12 PM
Not a p***ing contest, but I have been working on the hardware of computers since 1968, and PC's in particular since 1983.
Much of what MGDX says is true; he should listen to his own post.
The electromigration problem is exactly what "ages" the chips. It happens gradually over a period of months and years, assuming the ambient temperature is maintained in the usual manner within typical computer cases.
Moreover, it is totally predictable for any temperature and this principle is what is used to test RAM chips as they are made.
To ensure reliability, sample chips are tested using accelerated methods, which is just another way of saying they HEAT UP the chips, very similar to what overclocking does, as MGDX describes. The net effect is the same, namely a chip whose absolute access time is now longer/slower. The heat can be obtained by putting it into a temp-controlled oven, over-volting/over-clocking, or a combination of techniques. No matter how you get there, the chips are now "aged" essentially simulating the same effect that would happen over a larger course of time at more moderate temperatures.
Once "aged to perfection" by these methods, the access time to reliably get data from the chips is clearly worse than when it was new, sans electromigration that hasn't happened [yet]. Reputable memory makers use this ultimate access time to be programmed into the SPD chip on the DIMM module [including hopefully a little safety factor in case the tested chip is NOT quite representative of its batch-mates, since only samples are taken, NOT every chip!]. The result should be a memory module that will not ever fail in normal use for the foreseeable future. Hopefully this is what most of you see in your computers. As such, while the absolute access time [when NEW] far exceeds what is being required of it, as the chips age, the absolute access time is longer, but never so long as to tax what is demanded of it by the SPD chip, set to require merely the ultimate longer time due to the aging process, etc.
The problem is there are sleazy manufacturers, some with cutesy names with three letters in them, that violate the entire process. They buy what amounts to reject quality or marginal quality stuff that will NOT hold up with time. Simply put, the ultimate absolute access time violates the value programmed into the SPD chip, so eventually these crap parts WILL fail due to aging. They use parts that POSSIBLY could have been fine, had the SPD chip been programmed to properly account for the predictable degradation, but then some people would notice the slower performance from day one because the timing is now much slower than you would want.
Back in the "old days" [in the '80's], every board had wait-state jumpers to accomodate almost anything you could get for the boards. [Apparently few of you remember the boards with the multiple rows of chips which had capacities like 256K x 1 and 64K x 1. Yes, these are socketed chips with a total capacity of 64K for the small ones and 256K for the large ones. Do the math and see how many sockets it takes to FULLY populate a board out to a mighty 640K [the most memory ANYONE would ever need! Or so said IBM back then!] using a bunch of 256K chips to get to 512K bytes, and the 64K chips to get to 640K, etc.] For motherboards like that, you had to conservatively set the wait-state jumpers for the RATED memory speed, but there always was a joker or two who thought they could get "something for nothing" and just hand-waved settings that would get a faster machine that worked for a few months, only to have to set it to what was recommended in the first place in order for the machine to even work at all! Again, normal migration aging at work, etc.
And of course, there were some unscrupulous memory suppliers as well [not necessarily even having names, alphabet soup or otherwise!] selling essentially reject chips that BARELY worked as marked, only to age into total unreliability or even inability to even boot up, etc. Some were so bad that setting the absolute maximum wait-states couldn't make it work properly, such was the brazenness of selling junk, etc.
We went through a period of CPU counterfitting as well [I believe recently with regard to some AMD CPU's being mis/re-marked] back in the Pentium days. Essentially, P-75 was being passed off as a P-133 by removing Intel's speed rating silk-screen and replacing it with the bogus over-rating.
Some of the chips work perfectly fine as P-133, simply because, as new, migration hadn't set in yet. As the chip ages, it becomes impossible to use at that much of an overclock, and will work perfectly fine, but only as a P-75 as it REALLY originally was rated. [I once had one of these fakes in my hand. The subtle giveaway is that REAL Intel speed marking is done LATER and as such, the silk-screen never quite lines up with the rest of the markings on the chip; go look at any of the grey ceramic P chips [before the green ones]. But the counterfitters used bogus marking machines that ALWAYS lined up, etc.]
Electromigration is normal, expected, and can be dealt with. For the most part it is so. As such, most CPU's and memory don't ever see reliability problems, because the aging is built into the design. For the rest, you get fore-shortened product life as you are cheated. In a world with automatic SPD timing, it's easier to rip people off, and it is happening to a few, but just enough that you need to be aware of it.
cjl (using knowledge, experience, and facts, not intuition)