Safeguard data center power with regular UPS maintenance (original) (raw)

Data center power infrastructure maintenance is essential for operational continuity. In addition to regular battery health evaluation, you should also develop a maintenance plan for backup power options, such as uninterruptible power supplies.

This type of preventive maintenance plan can not only reduce the amount of surprise expenses, but also ensures that your data center will still have power in case of an emergency or unexpected outage.

Although the specifics of an uninterruptible power supply (UPS) maintenance checklist vary from one organization to the next, there are a few main areas that your team should address as part of regular operations or through a service provider.

Visual inspection

Periodic visual inspections are some of the most important steps to take to keep UPSes healthy. You can perform a partial visual inspection and look at the unit's exterior, but a fully comprehensive inspection includes an internal component examination.

Doing so, however, can expose you to dangerous electrical currents. Such an inspection should, therefore, only be performed by qualified individuals, such as an electrical engineer, facility electrician or a third party.

For a partial visual inspection, look for any buildup of dust or dirt. Dust can clog device vents and cause them to overheat. Although data centers are generally clean environments, the fans that cool a UPS naturally cause dust to accumulate on intake vents, as shown in Figure 1. It is important to remove this dust to prevent airflow obstructions that could result in overheating.

Dust on UPS intake vents.

Dust tends to accumulate on UPS intake vents.

The inspection should also include a battery assessment. Look out for any signs of corrosion, leakage or swelling, as these are indicators that the batteries need replacement.

It is also a good idea to check the alternating current input and output capacitors, as well as the direct current filter capacitors. The capacitors should be clean with no signs of cracking or swelling.

Image of batteries

Inspect the batteries for signs of leakage corrosion or swelling.

While checking the internal battery connectors of the UPS, it is important to remove the batteries to inspect for any signs of corrosion, damage or abnormal wear.

Image of battery connectors.

Check the battery connectors within the UPS for any signs of corrosion, damage or abnormal wear.

During a visual UPS maintenance inspection, don't rely solely on your eyes. You should also listen for unusual sounds. For example, a UPS that has a buzzing sound might have a transformer that is going bad. Listen for any signs of fans not running smoothly. Grinding sounds or the sound of debris trapped in a fan indicate problems that need immediate resolution.

Also, pay attention to abnormal smells. The smell of burning (or even just hot) plastic, for example, could indicate that the UPS is overloaded or that it is not being adequately cooled. Leaky batteries also tend to give off a distinct smell.

Thermal scans

Another important UPS maintenance task is a thermal scan that checks the temperature around the UPS to ensure that it is within the manufacturer's operating specifications.

One way to inspect UPS temperature is to use a noncontact, infrared thermometer to measure the surface temperature of the chassis. If the UPSes are fan-cooled, then you can also use a thermometer to check the temperature of the air that exits the units, as shown in Figure 4.

Image of noncontact infrared thermometer reading of UPS exhaust fan.

A noncontact infrared thermometer can measure the heat coming from a UPS exhaust fan.

If your organization has many UPSes to check, a thermal imaging camera might be more efficient to use. A thermal imaging camera creates a heat-based picture that makes it easy to locate thermal anomalies among your UPS systems.

A thermal imaging camera is much like a noncontact infrared thermometer, except that instead of measuring the temperature at a single point, it takes thousands of temperature measurements. These measurements are plotted graphically in a way that forms a picture. This picture enables inspectors to see any thermal variations that might exist.

The way most thermal imaging cameras work is that they measure the hottest and the coolest points within the frame. The hottest point is assigned a particular color, which is usually red, and the coolest point is assigned a different color, such as blue. All other temperatures are plotted in other colors or shades.

To see this concept in action, look at Figure 5. In this figure, the UPSes are at the bottom of the rack, beneath some servers. Because the servers are hotter than the UPSes, the thermal imaging camera has locked onto the servers. The UPS imagery in the photo lacks detail. The UPSes appear to all be the same color, which means that the camera interprets them as being the same temperature as one another.

Thermal imaging of server heat.

The thermal imaging camera picks up on the heat of the servers.

In contrast, look at Figure 6. Here, the thermal imaging camera took a closeup of the UPSes. While the UPSes are indeed of a similar temperature to one another, if you look closely at the figure, you can see the heat coming from some of the intake fans.

Thermal imaging of UPSs up close.

The thermal imaging camera looks at the UPSes up close.

An unexpected buildup of heat indicates that there is a cooling problem or that there is a problem with the UPS itself.

Examine the exhaust fans on the back of the UPS, as shown in Figure 7. This will normally be the hottest area of the UPS. Hot air is not being properly expelled from the UPS if one of the fans is significantly cooler than the others.

Back of USPs.

Be sure to look at the back of your UPSs.

Load and load bank testing

Load tests are essential for UPS maintenance. A load test verifies the UPS' ability to power your data center hardware in the event of a power failure, along with how much power the supply can support. Load tests require careful planning so they don't jeopardize production workloads.

There are several different types of load tests, some of which require specialized knowledge and should only be performed by a qualified technician. Load tests can involve more than a battery rundown. Common load test types include steady-state load tests, harmonic analysis and a transient response load test.

Though they have similar names, a load test and a load bank test are different procedures.

Like a load test, a load bank test verifies the UPS' ability to provide a predetermined amount of sustained power. These tests use specific hardware known as load banks to test UPS batteries. These banks provide different power levels to test the UPS; load bank tests do not analyze the unit's overall ability to power data center hardware.

An important consideration for load bank tests is that load banks are similar to heating elements. As such, you must perform load bank testing with fire safety in mind, and away from any alarms or sprinklers.

If you perform either a load test or a load bank test, it is essential to make sure that the UPS batteries are fully charged during testing. Otherwise, the test results are invalid.

Alarm verification and UPS calibration

Periodically verify that each UPS is properly communicating with your monitoring software. Also, review each UPS' alarm log for any indication that the UPS might be experiencing abnormal behavior.

A trained technician should periodically make sure that UPS units are properly calibrated. A UPS that is not correctly calibrated can trigger a voltage overage alarm, even if the supported load is well within the device's power rating range. It might also cause the unit to display incorrect runtime data, which makes it harder to schedule required maintenance.

As important as proper UPS maintenance might be, it is not a replacement for the hardware refresh cycle. UPS batteries, even if unused, have a limited lifespan. As such, UPS batteries should be periodically replaced according to the manufacturer's recommendations.

Brien Posey is a 15-time Microsoft MVP with two decades of IT experience. He has served as a lead network engineer for the U.S. Department of Defense and as a network administrator for some of the largest insurance companies in America.