Logging, Tracing and Monitoring

image-20231222-105520.png

What is Logging, Tracing and Monitoring?

Logging, Distributed Tracing, and Monitoring are three distinct, yet complementary, practices used in managing and maintaining software systems. Understanding their differences is key for effective system analysis and troubleshooting:

  1. Logging: This is the process of recording events and data points generated by a software application or system. Logs are time-stamped records that provide historical data about what happened and when. They can include errors, warnings, or informational messages and are essential for debugging and post-mortem analysis. Logs are typically textual and can be quite verbose, covering everything from system events to user actions.

  2. Distributed Tracing: In modern, distributed systems, where a request might pass through multiple services or microservices, distributed tracing tracks and visualizes the journey of a request across these different services. It helps in identifying latency issues and pinpointing the service or component causing performance bottlenecks. Tracing assigns unique identifiers to each request, allowing developers to follow the path of a request through various services and understand the interaction and dependencies between these services.

  3. Monitoring: This involves the continuous observation of a system's operational health and performance. Monitoring tools collect various metrics like CPU usage, memory consumption, network latency, and application response times to provide real-time insights into the system's state. It’s focused on tracking performance, availability, and overall reliability, often providing dashboards and alerts for immediate visibility into system health. Monitoring is proactive, aiming to detect and alert on issues before they impact users or business operations.

In essence, logging is about recording what has happened, distributed tracing is about understanding the journey of a request through a system, and monitoring is about keeping an eye on the system’s health and performance in real-time. Each plays a crucial role in maintaining the performance, reliability, and availability of software applications, especially in complex, distributed architectures.

PIPEFORCE offers different turnkey toolings to log, trace and monitor your business solutions in order to make error researching and optimization tasks as smooth as possible.

Logging

Logging helps to track error reporting and related data in a centralized way. Applications typically write logs on the console or into log files. In PIPEFORCE managed microservices, any log output written to the standard output (= console) will be automatically collected and send to a central logs database which can be filtered and searched afterwards.

In pipelines you can use the log command in order to create and send such log entries. Here is an example how to use it:

pipeline: - log: message: "This is a log message!" severity: INFO

Afterwards, you can list the log entries using the command log.list or using the log viewer of the web portal:

Log only the information that is necessary and as precisely as possible. A rule of thumb is this:
If it is not important to the admin, do not log!

Forwarding logs to Elasticsearch

PIPEFORCE comes with an "out-of-the-box" integration into Elasticsearch. There is no need to install and maintain agents or toolings like Logstash or Filebeat for example to monitor services, managed by PIPEFORCE. Though, you can if you have the requirements to do so.

Once setup correctly, any microservice managed by PIPEFORCE is recurrently scanned for new logs. These data will then be provided to a log message queue. Finally, this queue will be consumed by a pipeline which uploads the logs to Elasticsearch. The pipeline can be customized to fit your needs.

Furthermore, any process and business messages can also be forwarded to Elastic this way in order to build powerful dashboards and to perform extensive analyses including machine learning approaches.

Prerequisites

To get started, you must meet these requirements:

  1. You have an Elasticsearch server up and running and you're able to access its API endpoint. Doing so is out of the scope of this documentation. The easiest way to start here from scratch, in case you do not want to maintain the Elastic stack yourself: Sign up for an Elastic Cloud account.

  2. You have the credentials and permission to access the Elastic endpoints.

Setup

  1. Open your PIPEFORCE Web Portal and go to Marketplace. Search for the app app-elastic-integration there and click Install.

  2. Create a new Secret of format bearer and with name elastic-token and copy and paste the bearer token of your elastic documents API endpoint into the secret field (see your Elastic documentation for details from where to get this token). Click ADD.

  3. Copy the url of your document indexing API endpoint from your Elastic installation. Go to Workbench and open global/app/elastic-integration/pipeline/shovel-logs and paste the url there. Click SAVE.

  4. Go to "Installed Apps" -> "Admin Settings" -> "Global Settings" and make sure "Shovel Logs to Queue" is enabled.

Done. From now on, any of your microservices and pipelines in PIPEFORCE will send its logs automatically to your Elasticsearch server for indexing and further processing.

Distributed Tracing

Since 9.0

PIPEFORCE has distributed tracing out-of-the-box integrated. That means, any process flow inside of PIPEFORCE across

  • any Microservice

  • any Workflow Task

  • any API Gateway Endpoint

  • any Webhook

  • any Pipeline and Command can be followed.

For this, the W3C Trace Context standard is implemented using Open Telemetry.

For Workflow Tasks, API Gateway Endpoints, Webhooks, Pipelines and Commands this is automatically done for you. So you can trace any process flow across these components and view reports for them without any further setup steps required.

For microservices, you have to make sure to handle the given trace id in the correct way depending on the programming langue and framework you’re using inside your microservice. For details how to do this, see the W3C Trace Context documentation.

In case external services also support this standard and can handle a given trace id, you can trace process flows also across external systems.

Internally, collecting the traces and visualizing them is by default done using Jaeger. Here is an example how such a Jaeger tracing report looks like in the web UI:

Note: This is by default turned-on in enterprise and corporate plans using a basic setup: By default up to 10k tracings are kept in the storage. This storage will be cleaned up on a regularly basis. In case you need to trace more information and at longer time period, or shovel the trace entries to external systems, contact our support team since this highly depends on your specific requirements.

Monitoring

Monitoring refers to the continuous process of tracking and analyzing PIPEFORCE’s performance and availability metrics. This involves regularly measuring vital aspects like CPU usage, memory utilization, network performance, and application response times. By keeping an eye on these indicators, you can ensure that the system is operating efficiently and is readily accessible to users.

Performance monitoring helps identify any bottlenecks or issues that may affect user experience.

Availability monitoring checks that all critical components of the system, such as servers and services, are operational and reachable.

In case you’re running PIPEFORCE as a subscription, basic performance and availability monitoring is done by our operational team. If you also need such insights at your side, please contact our support in order to verify your specific requirements here. On request, PIPEFORCE can be extended into multiple directions here as well.