Don't limit observability to 3 pillars (original) (raw)
Observability is the latest buzzword to take over the IT monitoring space. But what does it take to achieve observability? And how do you know when vendors are selling you buzzwords instead of meaningful features?
Let's look at what observability means in practice and whether the three pillars of observability make sense.
What is observability?
In technical parlance, observability is the ability to determine what is happening inside a system based on data the system exposed externally. In other words, if a software system is observable, you can use external measures -- such as log files and metrics -- to figure out what is happening deep inside of it.
As a general concept in engineering, observability dates back decades. However, in recent years, software vendors who develop tools to help monitor and troubleshoot application environments have latched onto the term.
Observability vs. monitoring vs. visibility
Many software tool vendors have said that monitoring focuses on alerting you when something is wrong with your system, whereas observability reports on what is happening in the system. Therefore, observability provides a deeper level of insight than monitoring.
Vendors also often juxtapose observability with visibility, claiming that visibility focuses on singular sources of data, while observability correlates multiple types of data to gain deep insights.
With that said, it's debatable to what extent observability truly differs from monitoring and visibility. Although tools and techniques have evolved in recent years to cater to cloud-native, microservices-based environments, observability is arguably not fundamentally different from other processes associated with monitoring. It's just a bit more sophisticated.
Observability doesn't have a monopoly on logs, metrics and traces.
In practice, monitoring is much more than the simplistic idea that it only alerts you when something is wrong. Monitoring's purpose is to identify the cause of system errors and then deliver solutions. It also involves predicting these issues ahead of time so IT admins can fix them proactively. These practices did not start when observability came along; IT teams have been doing them under the guise of monitoring for a long time.
Additionally, the key data sources associated with observability -- the three pillars of observability detailed below -- have long been a part of monitoring routines. Observability doesn't have a monopoly on logs, metrics and traces.
In short, the observability versus monitoring versus visibility debate is one of semantics and marketing. To keep things simple, and despite what marketers have said, think of observability as an evolved form of monitoring, not a fundamental leap forward. The tools, processes and data sources associated with monitoring and observability are quite similar.
The three pillars of observability
The conversation about observability is often grounded in the idea that observability is based on three main pillars, or data sources, that provide insight into systems:
- Logs. Infrastructure and application activity information capturing a single point in time
- Metrics. Details from applications and cloud services performance
- Distributed traces. Tracing requests within a distributed system
These pillars are often the essential data sources on which modern monitoring or observability tools rely.
Author's note: I'm using the terms monitoring and observability interchangeably here because, as I explained above, I don't agree that there is always a fundamental distinction between the two concepts.
So, focusing on these three pillars alone is somewhat narrow-minded, for several reasons:
Not every system has logs, metrics and traces
Some cloud services might only expose metrics and not provide any log data to work with, for example. These systems are still observable, but their observability does not rest on each of the three pillars.
Don't forget monoliths
Distributed tracing only works for microservices applications, not monoliths. Unless you take the viewpoint that monolithic applications are unobservable, then it doesn't make sense to think of traces as a necessary pillar of observability.
Synthetic monitoring
Synthetic monitoring techniques, which help evaluate and predict system performance before software deploys into production, typically center on collecting data from tests, rather than on conventional logs, metrics and traces.
You could argue that observability is distinct from synthetic monitoring, in which case it would be OK to treat logs, metrics and traces as the key data sources for observability.
But you could also argue that synthetic monitoring is one way to achieve observability before putting your system into production. In the latter case, it's problematic to define logs, metrics and traces alone as the sources of observability.
Other data sources
IT teams must sometimes contextualize logs, metrics and traces with data from other systems, such as ticketing systems or CI/CD pipeline performance, to gain the most complete picture of the overall software delivery operation's health. To focus on the three pillars of observability alone risks overlooking other important sources of visibility into your systems.
IT observability is in the eye of the beholder
Don't get bogged down in semantic debates about what is and isn't observability -- or whether logs, metrics and traces alone are the essential pillars of observability. It's healthier to consider observability to have the same end-goals, and the same variety of potential data sources, as the process we call monitoring.
As noted above, monitoring teams have long strived to figure out what was happening inside their systems so they could prevent problems, and they used a mix of data sources to do it.
Observability works toward the same ends and uses a multitude of data sources. Logs, metrics and traces might be the most common, but don't limit yourself to them alone. Use whichever pillars of observability make most sense based on your organization's application architecture, hosting environment and end-user priorities.