Setup OpenTelemetry
Backstage uses OpenTelemetry to instrument its components by reporting traces and metrics.
This tutorial shows how to setup exporters in your Backstage backend package. For demonstration purposes we will use a Prometheus exporter, but you can adjust your solution to use another one that suits your needs; see for example the article on OTLP exporters. This tutorial also includes exporting traces using the JSON/HTTP exporter with Jaeger being the ideal target, but this too can be adjusted to fit your needs by seeing the supported tooling in the OTLP exporters documentation.
Install dependencies
We will use the OpenTelemetry Node SDK and the auto-instrumentations-node
packages.
Backstage packages, such as the catalog, use the OpenTelemetry API to send custom traces and metrics.
The auto-instrumentations-node
will automatically create spans for code called in libraries like Express.
yarn --cwd packages/backend add \
@opentelemetry/sdk-node \
@opentelemetry/auto-instrumentations-node \
@opentelemetry/exporter-prometheus \
@opentelemetry/exporter-trace-otlp-http
Configure
In your packages/backend/src
folder, create an instrumentation.js
file.
// Prevent from running more than once (due to worker threads)
const { isMainThread } = require('node:worker_threads');
if (isMainThread) {
const { NodeSDK } = require('@opentelemetry/sdk-node');
const {
getNodeAutoInstrumentations,
} = require('@opentelemetry/auto-instrumentations-node');
const { PrometheusExporter } = require('@opentelemetry/exporter-prometheus');
const {
OTLPTraceExporter,
} = require('@opentelemetry/exporter-trace-otlp-http');
// By default exports the metrics on localhost:9464/metrics
const prometheusExporter = new PrometheusExporter();
// We post the traces to localhost:4318/v1/traces
const otlpTraceExporter = new OTLPTraceExporter({
// Default Jaeger URL trace endpoint.
url: 'http://localhost:4318/v1/traces',
});
const sdk = new NodeSDK({
metricReader: prometheusExporter,
traceExporter: otlpTraceExporter,
instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();
}
You probably won't need all of the instrumentation inside getNodeAutoInstrumentations()
so make sure to
check the documentation and tweak it properly.
Views
The default histogram buckets for OpenTelemetry are in milliseconds, but the histograms that are created for Catalog processing emit metrics in second. You might want to adjust this to what fits your need. To do this you can use the Views feature like this:
const prometheus = new PrometheusExporter();
const sdk = new NodeSDK({
metricReader: prometheus,
views: [
new View({
instrumentName: 'catalog.test',
aggregation: new ExplicitBucketHistogramAggregation([
0.01, 0.1, 0.5, 1, 5, 10, 25, 50, 100, 500, 1000,
]),
}),
],
});
The above will make all the histogram buckets use the same config. If you would like to take a more targeted approach you can do this:
const prometheus = new PrometheusExporter();
const sdk = new NodeSDK({
metricReader: prometheus,
views: [
new View({
instrumentName: 'catalog.test',
aggregation: new ExplicitBucketHistogramAggregation([
0, 0.01, 0.05, 0.1, 0.25, 0.5, 1, 2, 5, 10, 30, 60, 120, 300, 1000,
]),
}),
],
});
Local Development Setup
It's important to setup the NodeSDK and the automatic instrumentation before
importing any library. This is why we will use the nodejs
--require
flag when we
start up the application.
For local development, you can add the required flag in your packages/backend/package.json
.
"scripts": {
"start": "backstage-cli package start --require ./src/instrumentation.js",
...
You can now start your Backstage instance as usual, using yarn start
and you'll be able to see your metrics at: http://localhost:9464/metrics
Troubleshooting
If you are having issues getting metrics or traces working there are some helpful diagnostic tools from OpenTelemetry you can use that can help.
First we need to the @opentelemetry/api
package:
yarn --cwd packages/backend add @opentelemetry/api
Then we want to add the following snippet before the sdk.start()
call:
const { diag, DiagConsoleLogger, DiagLogLevel } = require('@opentelemetry/api');
diag.setLogger(new DiagConsoleLogger(), DiagLogLevel.DEBUG);
This will then add OpenTelemetry debug logs that you can then look at to help get a better idea of why something may not be working as expected.
We don't recommend shipping this in production because of the log density.
Production Setup
In your .dockerignore
, add this line:
!packages/backend/src/instrumentation.js
This ensures that Docker build will not ignore the instrumentation file if you are following the recommended .dockerignore
setup.
In your Dockerfile
, copy instrumentation.js
file into the root of the working directory.
COPY packages/backend/src/instrumentation.js ./
And then add the --require
flag that points to the file to the CMD array.
CMD ["node", "packages/backend", "--config", "app-config.yaml"]
CMD ["node", "--require", "./instrumentation.js", "packages/backend", "--config", "app-config.yaml"]
If you need to disable/configure some OpenTelemetry feature there are lots of environment variables which you can tweak.
Available Metrics
The following metrics are available:
catalog_entities_count
: Total amount of entities in the catalogcatalog_registered_locations_count
: Total amount of registered locations in the catalogcatalog_relations_count
: Total amount of relations between entitiescatalog.processed.entities.count
: Amount of entities processedcatalog.processing.duration
: Time spent executing the full processing flowcatalog.processors.duration
: Time spent executing catalog processorscatalog.processing.queue.delay
: The amount of delay between being scheduled for processing, and the start of actually being processedcatalog.stitched.entities.count
: Amount of entities stitchedcatalog.stitching.duration
: Time spent executing the full stitching flowcatalog.stitching.queue.length
: Number of entities currently in the stitching queuecatalog.stitching.queue.delay
: The amount of delay between being scheduled for stitching, and the start of actually being stitchedscaffolder.task.count
: Count of task runsscaffolder.task.duration
: Duration of a task runscaffolder.step.count
: Count of step runsscaffolder.step.duration
: Duration of a step runsbackend_tasks.task.runs.count
: Total number of times a task has been runbackend_tasks.task.runs.duration
: Histogram of task run durations