Home About Tutorials Components OtherOS Exploits/Issues Models Resources Other

The Yellow Light Of Death (YLOD)

Link

Arguably the biggest and most fatal design flaw with the PS3. Lets break it down.

What is YLOD?

YLOD is an error code, presented in the form of a blinking yellow light, that indicates a general hardware failure.

Why does it happen?

YLOD occurs when a microcontroller called the SYSCON detects an issue during the bootup process. You can think of the SYSCON as the "all seeing eye" of the PS3 and it is responsible for many things, most notably being the main power controller of the PS3. When the console boots up the SYSCON powers on all the components on the motherboard in a predetermined order called the boot order. If the SYSCON cannot power on one of the components on the motherboard or if one of the components are producing malformed signals or one of the components does not have a key the SYSCON recognises. It terminates the boot process and stores the error code in its 32 kilobyte ledger. We see this from the outside as the YLOD

What kind of failures cause the YLOD?

Link

The type of failures that cause YLOD could realistically be nearly any component on the motherboard such as the blu-ray player, the HDD or anything else. However, there are certain components that are much more likely to fail than the others. With the data from PSXPlace The most common point of failure for the PS3 is overwhelmingly a BGA/Bump failure with the RSX, followed by a failure of the NEC/TOKIN filter capacitors and then failures of certain SMT components such as fuses and other capacitors

What do you mean by a BGA/Bump Failure?

Simple diagram of BGA Package, I made this diagram

Note, the information listed here mainly comes from RIP-felix's video on the subject and thus is only a single viewpoint. This by definition does not make it neutral. However due to the specificality of this subject other viewpoints that do not align with this one and have actual evidence seem to be few and far inbetween. At least from what I can see. It should also be noted that this is just a theory, we do not have the equipment to actually prove it with certainty

Back around 2006, Sony and Nvidia decided to use the 90nm process for their next generation console's GPU and CPU. However, this 90nm process came with issues in the enitre design and manufacturing phase. These issues all correlated around increasingly hotter chips being created as a consequence of the increase in relevant transistor density. This was one of the inevitibilities that lead to these BGA/Bump failures being so apparant.

This issue impacted all corporations that manufactured 90nm and smaller processors, that being said, some planned for this better than others. Nvidia and ATI, the manfacturers of the PS3 and Xbox 360 GPUs respectively had a lack of foresight in the design process and a true lack of planning which ultimately lead to them producing chips that were defective

Timeline of BumpGate, I made this diagram

The reason for this issue is very complex and multilayered. The most likely theory we have at the moment relates to Bumpgate. Bumpgate was a period when Nvidia produced 90nm chipsets that were defective, this period took place roughly between 2006 and 2008. The reason for these defects is as I said, very complex and multilayered.

The first reason is due to a property that affects all matter, it expands when heated and contracts when cooled. The rate that it expands and contracts is known as the Coefficent of Thermal Expansion or CTE for short. The problem with the RSX is that the underfill that is put inplace between the BGA and the Bumps has a different CTE to the interposer and the die respectively. This means they expand at different rates which puts thermomechanical strain on the solder balls/bumps that connect the interposer to the motherboard and the die to the interposer.

The second reason is that the silicon die does not heat up completely evenly, there are hotspots on the die in places with a higher transistor density and a higher electrical current passing through them (For example, the SPEs on the CELL-BE would heat up much quicker than the EIB). This means that the balls around these hotspots expand and contract much more than balls at cooler points in the chip, such as the edges. This means that these specific balls are much more likely to fail first.

The thrid reason is due to the consequences of a phenomenon known as electromigration. Electromigration is where the atoms (specifically ions) that make up a material move due to an electric current. Electromigration causes voids in the solder bumps between the die and the interposer which makes them more fragile and prone to crack. This effect becomes more pronounced as more solder bumps fail, as they fail the electrical load is distributed to less bumps then before which increases the electrical load each bump must carry, this increases the electromigration of each bump which speeds up its death.

The fourth reason is due to the choice of solder that is used for the bumps. Nvidia decided to pick a high lead solder for the reason that high lead solder can carry more current. However, the pads that connect these bumps to the die and interposer are eutectic. Eutectic solder is solder that has the same melting and crystilisation temperature. This means during manufacturing the joint crystallizes much quicker which produces a stronger and much longer lasting joint due to the crystals in it being shorter as they have less time to grow. Anyway, the point is that high lead solder and eucetic pads do not make as good a solder joint as eucetic solder and eucetic pads.

The fifth reason refers to the undefill that Nvidia chose to use. Underfill needs to be used on this chip as high lead bumps are brittle which means they fragment instead of flex and bend and thus need to have extra reinforcement. The underfill cannot be too hard or it will cause delamination of the die due to flexing of the polyaminide stress layer between the Bumps and the die and it cannot be to soft or it will obviously cause the bumps to crack. The underfill that Nvidia chose to use has too low of a glass transition temperature (Tg). At this point the underfill goes from being stiff like glass to a rubbery, pasty mess that offers near to no support. The Tg of the underfill that Nvidia decided to use was very low, specifically it was around 70-85 degrees. This combined with Sony's extreme fan curve means that this temperature can and almost certainly will be reached when playing intensive games. Once the BGA and Bumps reach the Tg temperature of the underfill they are at the full mercy of the difference in CTE and just normal thermal strain.

To sum it up. The high lead alloy that was used for the solder is brittle and thus a high Tg underfill is needed. Nvidia did not have access to many high Tg underfills at the time and they picked an underfill with a low Tg. This low Tg did not give the solder balls the reinforcement they needed under regular gaming conditions which lead to them expanding and contracting at abnormal rates due to the interposer and underfill having a different CTE. This caused them to crack which spread the electrical load onto more balls which due to electromigration caused voids to form in those balls and thus cause them to crack faster. Setting off a chain reaction that inevitably lead to the chip not having enough connectors to fuction, which lead to the SYSCON checking it and not letting it pass which leads to the YLOD

What is the solution to the BGA/Bump failure?

The best method is replacing the the failed GPU with a 65nm/40nm GPU. Despite public opinion Recently it has been noted that the 65nm GPUs are not actually prone to fail more despite using bumpgate materials. This is suspected to be due to the lower TDP (Thermal design power) of the smaller GPU

Here is my guide that explains the replacement of the RSX

I watched a video where a person replaced the NEC/Tokin capacitors and their PS3 came to life. Isnt that the fix for the YLOD?

Most likely no. The NEC/Tokin capacitors CAN cause the YLOD but they are not THE cause of the YLOD. The replacement of the NEC/Tokin's seems to work due to 2 possibilities: 1, The capacitors were the issue all along in your system or 2, Much more likely, the application of heat directly onto the motherboard caused it to flex which caused a temporary mechanical reseating of the die/interposer or the interposer/motherboard.

Here is my guide that explains the process of replacing the caps

If the CELL-BE uses the same 90nm process. Why does the RSX fail so much more often than the CELL-BE?

The IHS of the CELL-BE and the support ring, Link.

The square hole in the motherboard, I took this image

The CELL-BE has a support ring that surrounds its IHS and stops the interposer from flexing, The RSX does not have this support ring. Instead in the case of the RSX it's IHS is mounted directly ontop of the memory chips and die. Also, on the motherboard there is a square hole that is cut out of it and the CELL-BE is mounted with the middle of it in this hole. This allows more airflow to the bottom of the chip which cools it down and consequently reduces how much the interposer flexes. As you could probably guess as well the RSX does not have these things either




The Red Light Of Death (RLOD)

Link

Another fatal error, typically results due to flash corruption

What is RLOD

RLOD is an error code, presented in the form of a flashing red light

Why does it happen?

RLOD occours when the console boots up and it cannot read one of the ISO modules in the flash/harddrive. If the console cannot read the modules then it cannot proceed in the boot process

What is an ISO module, Why does it matter if it cannot be read?

The ISO modules are software modules that are used for the decryption of keys during the bootup process. When you turn the console on the SYSCON gets keys from certain components on the motherboard such as the CELL-BE, the RSX and the Flash.

A consequence of the keys is that you cannot just swap the CELL-BE or the RSX for another one, and you either have to change the SYSCON to accept the new RSX or use a mod called the ORBIS modchip which spoofs the ID of the GPU to look like the 90nm RSX.

What is the solution to the RLOD?