API-Centric Observability: The key to deploying, operating, and securing cloud native services

Today’s cloud-native systems are built using containerized, distributed micro-services-based architectures. Accordingly, the Application Programming Interfaces (APIs) these systems utilize are the key to understanding the operations, status, and communications within those systems. Consequently, having deep API-Centric visibility is both critical and necessary for operations and security.

api2

APIs define the communications between applications. More specifically, APIs constitute the rules, messaging, and protocols by which applications (or network functions) interact. Consequently, API-Centric observability is especially important in multi-vendor cloud-native applications and 5G/IoT environments where the underlying applications or containerized network functions (CNFs) may have been developed by any number of third-party sources. As a result, the specific operational and implementation details of the underlying functionality may be outside the scope of the systems integrator or operator. Additionally, because cloud-native systems are dynamic, distributed, and complex; the resulting interactions between an individual API and the underlying network, processor and memory resources, as well as other CNFs, may be the direct cause of a myriad of performance and stability issues, such as queuing delays and flow related issues that need to be continuously monitored in real time.

More importantly, the interaction of APIs can have serious knock-on effects. For instance, a single service or function may call dozens of APIs and in-turn those APIs may evoke dozens more. Needless to say, the performance, security, stability, and reliability of the entire system can be greatly affected by the operation and interactions of just one API. Moreover, the aforementioned resource constraints; scheduling, and other issues may be indirectly affected by the behavior of an individual API, and the associated services or applications.

Furthermore, in a continuous integration / continuous delivery (CI/CD) world, where applications are developed, tested, and deployed in real-time, it is increasingly the responsibility of the systems integrator or operator to understand and manage these complex interdependencies, making it especially important to have continuous real-time insights into systemic API interactions.

API-Centric observability is not simple application or network monitoring. It requires deep instrumentation that is flexible and composable, able to monitor events down to the lowest level. In many cases the deep visibility is derived using eBPF or other forms of deep system instrumentation, which provide event monitoring; down to the most basic level: links, namespace, processes, flows, containers, applications, and users. The metadata resulting from API observability tools which describes those low-level events and behaviors, is then available for upstream analytic tools to exploit and derive insights.

net mon 3

Alternately, monitoring using only simple logs, metrics, and traces does not provide the necessary level of dynamic observability. Most legacy monitoring approaches produce huge amounts of data that cannot always be correlated with specific events or behaviors, and has resulted in what has been referred to as “data overload” making it difficult to differentiate meaningful information (signals) from spurious activity (noise). Whereas; properly implemented observability solutions, which are event driven, composable, and dynamic; are capable of selectively generating, processing, and aggregating a range of metadata types that can be easily collected, correlated, and abstracted into canonical datasets for faster, more efficient analysis and correlation ( i.e. produce more high-quality signals with less noise).

So, what types of observability metrics do we need in order to capture the behaviors we are interested in understanding; e.g. time-series relationships, data relationships, or both. The fundamental value of API-Centric observability is the ability to instrument and make inquiries into both system level events as well as the lowest level of observed primitives: behaviors or interactions; sampled down to node/kernel communications, namespace, dataflows, and timestamps which can be transcoded into (JSON formatted) metadata from which to build abstractions as well as models to correlate, analyze, and understand complex behaviors.

One of the most important requirements for an API-Centric observability platform is that it needs to be composable and dynamic. That-is: the specific API metadata generated by the API observability platform can be refined as required. This lends itself to applying iterative processes for determining the root cause of the behaviors (or anomalies). That is; one can start with a simple set of high-level observations, and successively correlate, analyze and refine successive observations until sufficient clarity is derived into the behaviors, or root causes, for the activities in question.

Using foundational API metadata elements, one can apply the appropriate analytics workflows to generate a broad range of reports, alarms, KPI, dashboards, and visualizations. Here are some use case examples:

API Discovery and Inventory
API Topology and Dependency Tracking
API Payload Monitoring (incl. encrypted payloads)
Network and Application Performance
Latency and Throughput
Resource Utilization
Communications, Flow and Tracing
API Status
Protocol conformance

API-Centric observability is not a replacement for conventional types of monitoring telemetry; PCAP, logs, metrics, and traces. Rather it greatly enhances and provides far deeper context for understanding the status and behavior of the underlying systems; enabling better event correlation, investigation, and root cause analysis which can result in improved efficiencies, stability, performance and security. Additionally, API-Observability is foundational, and a means to an end. More specifically, by utilizing advanced API observability tools, one has great latitude in the scope and depth of the resulting (meta)data available on which to apply advanced analytics, AI and ML tools to better understand and control the complex behaviors and interactions within cloud-native environments, both today, and in the future.

API-Centric Observability: The key to deploying, operating, and securing cloud native services

Written by Peter Dougherty