How We Pinpointed a 244ms Latency Spike in a 500k QPS OpenResty Gateway (original) (raw)

Recently, we partnered with a leading fintech client to conduct a routine performance evaluation of their core cross-border payment clearing system. The system’s entry point is a high-performance API gateway built on OpenResty, handling billions of requests daily, with peak QPS exceeding 500,000. In the fintech sector, system stability and latency are the lifeblood of business operations. They maintain exceptionally stringent requirements for the Service Level Objectives (SLOs) governing critical transaction paths. At first glance, the system appeared to be running smoothly: P50 latency remained stable within 10ms, and all core indicators were in a healthy state.

Despite healthy average latency metrics, a deeper dive into the latency curve revealed a significant stability risk: periodic spikes pushed latency up to 300ms, exceeding the strict SLA thresholds for critical transaction paths. For systems where OpenResty functions as a critical gateway, this not only signals performance degradation but also poses a potential transaction timeout risk.

When Traditional Monitoring Fails to Pinpoint the Root Cause
From Heuristics to the Power of Dynamic Observability
Quantifiable Engineering Efficiency and Resource Optimization Results
Building Continuous Performance Observability
What is OpenResty XRay
About The Author

When Traditional Monitoring Fails to Pinpoint the Root Cause

During a routine performance health check for a client, we utilized OpenResty XRay to perform a non-intrusive deep scan of their production environment. Although the client’s existing monitoring dashboards indicated the system was running smoothly overall, OpenResty XRay’s analysis quickly uncovered two critical performance risks hidden beneath seemingly stable averages:

Unexplained Latency Spikes: Even when the vast majority of requests are responding normally, a very small number of requests still experience brief but severe delays exceeding 300 milliseconds. These signals are often dismissed as statistical noise within the client’s vast monitoring data, but OpenResty XRay precisely captures and analyzes the longest durations of these events, flagging them as high-risk.
Persistent CPU Bottleneck: Monitoring revealed consistently high CPU utilization within the gateway cluster, particularly during the log phase. To maintain stability during peak loads, the client team resorted to an over-provisioning strategy, directly leading to significant infrastructure costs.

This represents a classic dilemma many have encountered: you might have a general idea of where the problem lies, likely within the Lua code, but you lack the precise details—which specific line, which function, and under what exact conditions it’s triggered.

From Heuristics to the Power of Dynamic Observability

Clearly, the challenge is no longer merely collecting more monitoring data, but rather how to extract actionable insights from that vast amount of information. To break free from the inefficient “observe-guess-verify” cycle, we need a tool capable of safely performing deep investigations in a production environment. This is precisely where OpenResty XRay demonstrates its core value, with its non-intrusive dynamic tracing capability being paramount—requiring no code modifications and no service restarts, which is a non-negotiable requirement for critical financial systems.

We initiated OpenResty XRay’s automated analysis on one of the client’s high-load production Pods. Within minutes, the initial deep analysis reports were generated, and the answers to the puzzle began to surface.

Pinpointing Performance Hotspots

During an in-depth analysis of a customer’s high-performance gateway cluster, our primary objective was to address a persistent latency spike issue.

Data Insights: Through real-time sampling analysis of the production environment, we identified a specific string processing function as the root cause of the high latency. Our findings indicated that when processing certain input patterns, a single execution of this function could take up to 244.64 milliseconds, fully accounting for the observed latency spikes.
Evidence: Further investigation revealed that the underlying engine relied upon by this function was not inherently optimized for JIT (Just-In-Time compilation). When encountering specific edge cases in the input, its pattern-matching algorithm would revert to an inefficient backtracking mode, leading to an exponential increase in execution time.

Based on these findings, we recommended replacing this function in the critical path with a more modern, JIT-friendly alternative solution available within the framework, specifically designed for high-concurrency environments. Following implementation, the system’s latency spikes were effectively eliminated, and service availability metrics stabilized.

Deep Overhead Analysis

Even after resolving the latency issues, the system’s overall CPU utilization remained above the expected baseline, indicating further optimization opportunities.

XRay Evidence: On-CPU flame graphs clearly pointed to an unexpected CPU bottleneck: the logging (log) phase, typically considered low-overhead, consumed a disproportionate 26.5% of CPU time.
Attribution Analysis: A drill-down analysis of the function stack revealed the root cause: within the log processing logic, a regular expression used for formatting and data masking was being recompiled with every request. This costly, repetitive compilation within a loop was the direct contributor to the excessive CPU overhead.

We recommended enabling the “compilation cache” option for the relevant regular expression calls, ensuring a “compile once, run many times” approach. This adjustment significantly reduced the CPU footprint of the logging module, freeing up computational resources and thereby increasing the server’s capacity to handle core business logic.

The Forgotten PCRE JIT Setting

While the previous two optimizations addressed specific application-level issues, the “Lua-Land” report from OpenResty XRay uncovered a deeper, more systemic problem.

Systemic Finding: Our analysis revealed that even after cache optimizations had been applied, the JIT acceleration feature of the system’s core PCRE (Perl Compatible Regular Expressions) engine remained inactive across the entire production environment.
Root Cause Analysis: This issue did not stem from application code logic but rather from an upstream build process. We confirmed that the customer’s base container image, used for deployment, omitted a critical compilation parameter (--with-pcre-jit) required to enable PCRE JIT support during the compilation of their core application gateway.
Strategic Remediation: Consequently, the entire cluster had never been able to leverage this crucial performance enhancement. We recommended that the customer team revise their CI/CD process and rebuild their base image. This action fundamentally activated a core performance feature within the platform, fully unleashing OpenResty’s potential. It systematically elevated the performance baseline for all related services, showcasing our end-to-end analysis capabilities across application code, runtime environment, and infrastructure build.

Quantifiable Engineering Efficiency and Resource Optimization Results

Leveraging insights from OpenResty XRay, the customer team implemented a series of optimizations. The effects were immediate and quantifiable:

Latency Spikes Completely Eliminated: Post-optimization, the latency curve became consistently stable, dropping from over 300ms to a steady level.
30% CPU Cost Savings: After resolving the regex cache issue and globally enabling JIT, the gateway cluster’s overall CPU utilization decreased by approximately 30%, leading to significant reductions in cloud infrastructure costs.
MTTR (Mean Time To Resolution) Significantly Shortened: The diagnosis time for performance issues was dramatically reduced from “weeks of guesswork and meetings” to “minutes of accurate pinpointing.”

Building Continuous Performance Observability

The immediate value of resolving these two performance bottlenecks is clear. However, the deeper insight reinforces a core engineering principle: in high-concurrency, low-latency OpenResty environments, performance issues often lurk within the build system, runtime configurations, and the intricate details of the underlying infrastructure.

When the root cause of a problem extends beyond application code logic, traditional monitoring methods quickly become inefficient. Without dynamic, non-intrusive, deep-level tracing capabilities, even the most seasoned engineers face significantly higher costs in pinpointing these elusive performance regressions.

Building on this experience, the client engineering team is diligently planning the next phase of their engineering system optimization. They intend to shift left their continuous performance analysis capabilities, leveraging OpenResty XRay by integrating it into the CI/CD pipeline’s benchmark testing phase. This ensures that before any code or configuration that could degrade performance is merged into the main branch, automated benchmark reports can reliably detect performance anomalies stemming from environmental factors, configurations, or compilation processes. This initiative signifies a crucial shift in mindset from “passive response” to “active defense.”

We hope this in-depth analysis of two common performance blind spots in the OpenResty environment offers valuable insights and strategies for those of you on the front lines, dedicated to enhancing system stability and efficiency.

What is OpenResty XRay

OpenResty XRay is a dynamic-tracing product that automatically analyzes your running applications to troubleshoot performance problems, behavioral issues, and security vulnerabilities with actionable suggestions. Under the hood, OpenResty XRay is powered by our Y language targeting various runtimes like Stap+, eBPF+, GDB, and ODB, depending on the contexts.

If you like this tutorial, please subscribe to this blog site and/or our YouTube channel. Thank you!

Yichun Zhang (Github handle: agentzh), is the original creator of the OpenResty® open-source project and the CEO of OpenResty Inc..

Yichun is one of the earliest advocates and leaders of “open-source technology”. He worked at many internationally renowned tech companies, such as Cloudflare, Yahoo!. He is a pioneer of “edge computing”, “dynamic tracing” and “machine coding”, with over 22 years of programming and 16 years of open source experience. Yichun is well-known in the open-source space as the project leader of OpenResty®, adopted by more than 40 million global website domains.

OpenResty Inc., the enterprise software start-up founded by Yichun in 2017, has customers from some of the biggest companies in the world. Its flagship product, OpenResty XRay, is a non-invasive profiling and troubleshooting tool that significantly enhances and utilizes dynamic tracing technology. And its OpenResty Edge product is a powerful distributed traffic management and private CDN software product.

As an avid open-source contributor, Yichun has contributed more than a million lines of code to numerous open-source projects, including Linux kernel, Nginx, LuaJIT, GDB, SystemTap, LLVM, Perl, etc. He has also authored more than 60 open-source software libraries.