How much cooling do you really have in your data center?

Data centers are constrained by three elements – space, power and cooling. Limitations on any one of these three will result in restricted growth and poor operation of the critical facility.  

Assessing rack and IT equipment space 

Determining if there is sufficient space for new racks or other equipment that is visible, can be measured and is quite easily assessed. You must ask, “is there ample floor space for new racks or space within existing racks to add more servers?”. 

Knowing how much power is available to the data center can be determined by the readings on the uninterruptable power supply (UPS). Distribution to the power distribution unit (PDU) and racks will provide insight into whether there is sufficient power available at the rack level for new equipment.  

Determining cooling capacity and airflow efficiency 

Cooling, on the other hand, is more difficult to assess It is dependent on two factors – the cooling capacity of each cooling unit and the efficiency of airflow. The nameplate specifications of cooling units will indicate the maximum cooling capacity of a unit. However, cooling capacity does fluctuate due to both temperature and humidity conditions, which will determine whether the cooling system can deliver its maximum capacity or much less. Operating at a low return air temperature generally results in the cooling unit providing its minimum cooling capacity. Higher return air temperatures enable the unit to work more efficiently and deliver more cooling capacity. Cooling unit capacity can vary by up to 25%, depending on the return air temperature and humidity conditions  in the room. Spreading this cooling capacity variation across 4-5 cooling units means the equivalent of an additional unit could be available if operating conditions are correctly set. However, the opposite is also true – you may not have as much cooling capacity as expected due to poor operating conditions. 

Different types of cooling systems operate differently, as well. Wall packs and rooftop units (RTU’s) typically are single-stage cooling, meaning the cooling function is either on or off with the fan operating continuously. Larger cooling units are typically variable capacity meaning they operate in a range of cooling capacity depending on the cooling demands in the room. Variable-capacity units can range from 0% cooling, fan only, to 100% cooling with fans, depending on the return air temperature (set point) to the unit. In a data center with multiple cooling units, it is not unusual to have one or two units operating at 100% cooling while the remainder are at a cooling capacity of 25% or 50%.  

A worst-case scenario is if a unit is operating only as a fan. Sometimes this is done to “make sure there is enough airflow.” The reason for this being: a problem is the fan only mode means the unit is pulling warm return air from the room and, without doing any cooling, is pushing that warm air into the supply plenum. This warm air mixes with the cool supply air and raises the temperature of the air being delivered to the IT equipment, resulting in higher exhaust air temperature. This returns to the cooling units, which engage more cooling capacity to lower the temperature. Such is a vicious cycle that needlessly uses a significant quantity of additional energy. 

The placement of the cooling unit relative to the heat load will also determine how much cooling capacity is being used. Common thinking is that a cooling unit operates at a predetermined cooling capacity regardless where it is in the data center. This belief is totally incorrect! A cooling unit close to a high heat load area may be operating at 100% cooling capacity all the time due to the high return air temperature. A cooling unit in a different part of the room which is getting less return air or air at a   lower temperature may only be operating at 25% cooling capacity. Balancing the distribution of IT heat load in the room and return air will help to provide a better balance of the cooling units operation and more evenly distribute the cooling being performed. 

Unbalanced return airflow to the cooling units comes into play in other ways.  All data centers should have N+1 cooling capacity, meaning that if one cooling unit fails, there is available additional cooling capacity to make up for the cooling unit failure with no impact on the operation of IT equipment. On paper it may appear there is sufficient cooling capacity for N+1, however if the room IT load is not balanced, failure of a cooling unit in a high density area may not be compensated for as the standby cooling unit is a distance away for the high density heat load. 

 N+1 capacity is essential to uninterrupted operation of a data center; however, this does not mean that all cooling units need to be operating all the time. In fact, significant energy savings can be achieved by networking and sequencing cooling units to ensure N+1 capacity is in standby mode and available without consuming energy. A second, and as important reason, for networking and sequencing is this ensures that regardless of which units are operating, the room will remain within the acceptable temperature ranges for the ITE. This is especially important in a room with an unbalanced heat load., N+1 cooling capacity may exist on paper, but if the cooling units are not properly placed or the return airflow is primarily to one unit, failure of that unit may result in higher than acceptable room temperatures.  

Cooling units are often a mix of cooling capacities and vintages, making the determination of cooling capacity even more difficult. For example, newer cooling units can produce more cooling capacity with higher return air temperatures. However, older cooling units may, in fact, have less cooling capacity when operating in an environment with higher return air temperatures. 

The second factor affecting the effectiveness of cooling in a data center is airflow. It is not unusual to find a data center with 3-4 times the cooling capacity required, yet the room is experiencing heat issues. This is due to the fact that good airflow management practices are not being followed, causing the cooling units to operate at less than maximum cooling efficiency. If the separation of supply and return is not good, or airflow is inhibited by obstructions in the supply plenum, amongst other factors, cooling capacity can be dramatically impacted. 

 

Best practices for airflow management 

  1. Blanking panels should be used in racks to fill voids where no equipment exists. Filling spaces between racks prevents the warm exhaust air from recirculating to the inlet side of the rack 
  2. Placement of racks and cooling units impact airflow. Cooling units should be at the end of the hot aisles to facilitate the return air pathway to the cooling units. Additionally, racks should be at least 2 metres away from the front of the cooling unit.  
  3. Perforated tiles, with the appropriate openings, must be used to ensure adequate airflow is provided to the IT equipment. But, do not install perforated tiles everywhere just to get good airflow”!

The challenge with airflow is that it is not visible and so it is impossible to see the supply and return air paths. Airflow modelling, using computational fluid dynamics (CFD), will map airflow and help to uncover issues that are not visible. CFD modelling is especially useful to assess airflow issues that cause hot spots and to help measure the impact of adding ITE to specific areas of a data center.  

Cooling is critical to the operation of a data center. Rightsizing cooling to match the ITE heat load and the conditions of the data center while maintaining an energy efficient operation is more than a paper exercise. Understanding cooling system operation and airflow are key elements to ensure the uninterrupted operation of a critical facility.  

To learn more about airflow management, read our blogAirflow for Dummies.” 

Join our mailing list

Sign up to receive email updates about new announcements, educational resources, and more.

  • This field is for validation purposes and should be left unchanged.