Game Timing and Multicore Processors - Win32 apps (original) (raw)

Skip to main content

This browser is no longer supported.

Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.

Game Timing and Multicore Processors

In this article

With power management technologies becoming more commonplace in today's computers, a commonly-used method to obtain high-resolution CPU timings, the RDTSC instruction, may no longer work as expected. This article suggests a more accurate, reliable solution to obtain high-resolution CPU timings by using the Windows APIs QueryPerformanceCounter and QueryPerformanceFrequency.

Background

Since the introduction of the x86 P5 instruction set, many game developers have made use of read time stamp counter, the RDTSC instruction, to perform high-resolution timing. The Windows multimedia timers are precise enough for sound and video processing, but with frame times of a dozen milliseconds or less, they don't have enough resolution to provide delta-time information. Many games still use a multimedia timer at start-up to establish the frequency of the CPU, and they use that frequency value to scale results from RDTSC to get accurate time. Due to the limitations of RDTSC, the Windows API exposes the more correct way to access this functionality through the routines of QueryPerformanceCounter and QueryPerformanceFrequency.

This use of RDTSC for timing suffers from these fundamental issues:

Recommendations

Games need accurate timing information, but you also need to implement timing code in a way that avoids the problems associated with using RDTSC. When you implement high-resolution timing, take the following steps:

  1. Use QueryPerformanceCounter and QueryPerformanceFrequency instead of RDTSC. These APIs may make use of RDTSC, but might instead make use of a timing devices on the motherboard or some other system services that provide high-quality high-resolution timing information. While RDTSC is much faster than QueryPerformanceCounter, since the latter is an API call, it is an API that can be called several hundred times per frame without any noticeable impact. (Nevertheless, developers should attempt to have their games call QueryPerformanceCounter as little as possible to avoid any performance penalty.)
  2. When computing deltas, the values should be clamped to ensure that any bugs in the timing values do not cause crashes or unstable time-related computations. The clamp range should be from 0 (to prevent negative delta values) to some reasonable value based on your lowest expected framerate. Clamping is likely to be useful in any debugging of your application, but be sure to keep it in mind if doing performance analysis or running the game in some unoptimized mode.
  3. Compute all timing on a single thread. Computation of timing on multiple threads — for example, with each thread associated with a specific processor — greatly reduces performance of multi-core systems.
  4. Set that single thread to remain on a single processor by using the Windows API SetThreadAffinityMask. Typically, this is the main game thread. While QueryPerformanceCounter and QueryPerformanceFrequency typically adjust for multiple processors, bugs in the BIOS or drivers may result in these routines returning different values as the thread moves from one processor to another. So, it's best to keep the thread on a single processor.
    All other threads should operate without gathering their own timer data. We do not recommend using a worker thread to compute timing, as this will become a synchronization bottleneck. Instead, worker threads should read timestamps from the main thread, and because the worker threads only read timestamps, there is no need to use critical sections.
  5. Call QueryPerformanceFrequency only once, because the frequency will not change while the system is running.

Application Compatibility

Many developers have made assumptions about the behavior of RDTSC over many years, so it is quite likely that some existing applications will exhibit problems when run on a system with multiple processors or cores due to the timing implementation. These problems will usually manifest as glitching or slow-motion movement. There is no easy remedy for applications that are not aware of power management, but there is an existing shim for forcing an application to always run on a single processor in a multiprocessor system.

To create this shim, download the Microsoft Application Compatibility Toolkit from Windows Application Compatibility.

Using the Compatibility Administrator, part of the toolkit, create a database of your application and associated fixes. Create a new compatibility mode for this database and select the compatibility fix SingleProcAffinity to force all of the threads of the application to run on a single processor/core. By using the command-line tool Fixpack.exe (also part of the toolkit), you can convert this database into an installable package for installation, testing, and distribution.

For instruction on using Compatibility Administrator, see the toolkit's documentation. For syntax for and examples of using Fixpack.exe, see its command-line help.

For customer-oriented information, see the following knowledge base articles from Microsoft Help and Support:


Feedback

Additional resources

In this article