The GitHub integration has a special discovery processor for discovering catalog entities within a GitHub organization. The processor will crawl the GitHub organization and register entities matching the configured path. This can be useful as an alternative to static locations or manually adding things to the catalog.
To use the discovery processor, you'll need a GitHub integration
set up with a
GITHUB_TOKEN. Then you can add a location target
to the catalog configuration:
catalog: locations: # (since 0.13.5) Scan all repositories for a catalog-info.yaml in the root of the default branch - type: github-discovery target: https://github.com/myorg # Or use a custom pattern for a subset of all repositories with default repository - type: github-discovery target: https://github.com/myorg/service-*/blob/-/catalog-info.yaml # Or use a custom file format and location - type: github-discovery target: https://github.com/*/blob/-/docs/your-own-format.yaml # Or use a specific branch-name - type: github-discovery target: https://github.com/*/blob/backstage-docs/catalog-info.yaml
github-discovery type, as this is not a regular
When using a custom pattern, the target is composed of three parts:
- The base organization URL,
https://github.com/myorgin this case
- The repository blob to scan, which accepts * wildcard tokens. This can simply
*to scan all repositories in the organization. This example only looks for repositories prefixed with
- The path within each repository to find the catalog YAML file. This will
/blob/master/catalog-info.yamlor a similar variation for catalog files stored in the root directory of each repository. You could also use a dash (
-) for referring to the default branch.
GitHub API Rate Limits
GitHub rate limits API requests to 5,000 per hour (or more for Enterprise accounts). The default Backstage catalog backend refreshes data every 100 seconds, which issues an API request for each discovered location.
This means if you have more than ~140 catalog entities, you may get throttled by
rate limiting. This will soon be resolved once catalog refreshes make use of
ETags; to work around this in the meantime, you can change the refresh rate of
the catalog in your
const builder = await CatalogBuilder.create(env); // For example, to refresh every 5 minutes (300 seconds). builder.setRefreshIntervalSeconds(300);
Alternatively, or additionally, you can configure github-apps authentication which carries a much higher rate limit at GitHub.
This is true for any method of adding GitHub entities to the catalog, but especially easy to hit with automatic discovery.