AWS S3 Discovery
This documentation is written for the old backend which has been replaced by the new backend system, being the default since Backstage version 1.24. If have migrated to the new backend system, you may want to read its own article instead. Otherwise, consider migrating!
The AWS S3 integration has a special entity provider for discovering catalog entities located in an S3 Bucket. If you have a bucket that contains multiple catalog files, and you want to automatically discover them, you can use this provider. The provider will crawl your S3 bucket and register entities matching the configured path. This can be useful as an alternative to static locations or manually adding things to the catalog.
To use the entity provider, you'll need an AWS S3 integration
set up with accessKeyId
and secretAccessKey
, and/or
a roleArn
or none of these (e.g., profile- or instance-based credentials).
At production deployments, you likely manage these with the permissions attached to your instance.
In your configuration, you add a provider config per bucket:
# app-config.yaml
catalog:
providers:
awsS3:
yourProviderId: # identifies your dataset / provider independent of config changes
bucketName: sample-bucket
prefix: prefix/ # optional
region: us-east-2 # optional, uses the default region otherwise
schedule: # same options as in SchedulerServiceTaskScheduleDefinition
# supports cron, ISO duration, "human duration" as used in code
frequency: { minutes: 30 }
# supports ISO duration, "human duration" as used in code
timeout: { minutes: 3 }
For simple setups, you can omit the provider ID at the config
which has the same effect as using default
for it.
# app-config.yaml
catalog:
providers:
awsS3:
# uses "default" as provider ID
bucketName: sample-bucket
prefix: prefix/ # optional
region: us-east-2 # optional, uses the default region otherwise
schedule: # same options as in SchedulerServiceTaskScheduleDefinition
# supports cron, ISO duration, "human duration" as used in code
frequency: { minutes: 30 }
# supports ISO duration, "human duration" as used in code
timeout: { minutes: 3 }
As this provider is not one of the default providers, you will first need to install the AWS catalog plugin:
yarn --cwd packages/backend add @backstage/plugin-catalog-backend-module-aws
Once you've done that, you'll also need to add the segment below to packages/backend/src/plugins/catalog.ts
:
/* packages/backend/src/plugins/catalog.ts */
import { AwsS3EntityProvider } from '@backstage/plugin-catalog-backend-module-aws';
const builder = await CatalogBuilder.create(env);
/** ... other processors and/or providers ... */
builder.addEntityProvider(
AwsS3EntityProvider.fromConfig(env.config, {
logger: env.logger,
scheduler: env.scheduler,
}),
);