The GitHub integration has a special discovery processor for discovering catalog entities within a GitHub organization. The processor will crawl the GitHub organization and register entities matching the configured path. This can be useful as an alternative to static locations or manually adding things to the catalog.
To use the discovery processor, you'll need a GitHub integration
set up with a
GITHUB_TOKEN. Then you can add a location target
to the catalog configuration:
catalog: locations: - type: github-discovery target: https://github.com/myorg/service-*/blob/main/catalog-info.yaml
github-discovery type, as this is not a regular
The target is composed of three parts:
- The base organization URL,
https://github.com/myorgin this case
- The repository blob to scan, which accepts * wildcard tokens. This can simply
*to scan all repositories in the organization. This example only looks for repositories prefixed with
- The path within each repository to find the catalog YAML file. This will
/blob/master/catalog-info.yamlor a similar variation for catalog files stored in the root directory of each repository.
GitHub API Rate Limits
GitHub rate limits API requests to 5,000 per hour (or more for Enterprise accounts). The default Backstage catalog backend refreshes data every 100 seconds, which issues an API request for each discovered location.
This means if you have more than ~140 catalog entities, you may get throttled by
rate limiting. This will soon be resolved once catalog refreshes make use of
ETags; to work around this in the meantime, you can change the refresh rate of
the catalog in your
packages/backend/src/plugins/catalog.ts file, or configure
Backstage to use the github-apps plugin.
This is true for any method of adding GitHub entities to the catalog, but especially easy to hit with automatic discovery.