Skip to main content
Version: Next

GitHub Organizational Data

info

This documentation is written for the old backend which has been replaced by the new backend system, being the default since Backstage version 1.24. If have migrated to the new backend system, you may want to read its own article instead.Otherwise, consider migrating!

The Backstage catalog can be set up to ingest organizational data - users and teams - directly from an organization in GitHub or GitHub Enterprise. The result is a hierarchy of User and Group kind entities that mirror your org setup.

Note: This adds User and Group entities to the catalog, but does not provide authentication. See the GitHub auth provider for that.

Installation without Events Support

This guide will use the Entity Provider method. If you for some reason prefer the Processor method (not recommended), it is described separately below.

The provider is not installed by default, therefore you have to add a dependency to @backstage/plugin-catalog-backend-module-github to your backend package.

From your Backstage root directory
yarn --cwd packages/backend add @backstage/plugin-catalog-backend-module-github

Note: When configuring to use a Provider instead of a Processor you do not need to add a location pointing to your GitHub server/organization

Update the catalog plugin initialization in your backend to add the provider and schedule it:

packages/backend/src/plugins/catalog.ts
import { GithubOrgEntityProvider } from '@backstage/plugin-catalog-backend-module-github';

export default async function createPlugin(
env: PluginEnvironment,
): Promise<Router> {
const builder = await CatalogBuilder.create(env);

// The org URL below needs to match a configured integrations.github entry
// specified in your app-config.
builder.addEntityProvider(
GithubOrgEntityProvider.fromConfig(env.config, {
id: 'production',
orgUrl: 'https://github.com/backstage',
logger: env.logger,
schedule: env.scheduler.createScheduledTaskRunner({
frequency: { minutes: 60 },
timeout: { minutes: 15 },
}),
}),
);

// ..
}

Alternatively, if you wish to ingest data from multiple GitHub organizations you can use the GithubMultiOrgEntityProvider instead. Note that by default, this provider will namespace groups according to the org they originate from to avoid potential name duplicates:

packages/backend/src/plugins/catalog.ts
import { GithubMultiOrgEntityProvider } from '@backstage/plugin-catalog-backend-module-github';

export default async function createPlugin(
env: PluginEnvironment,
): Promise<Router> {
const builder = await CatalogBuilder.create(env);

// The GitHub URL below needs to match a configured integrations.github entry
// specified in your app-config.
builder.addEntityProvider(
GithubMultiOrgEntityProvider.fromConfig(env.config, {
id: 'production',
githubUrl: 'https://github.com',
// Set the following to list the GitHub orgs you wish to ingest from. You can
// also omit this option to ingest all orgs accessible by your GitHub integration
orgs: ['org-a', 'org-b'],
logger: env.logger,
schedule: env.scheduler.createScheduledTaskRunner({
frequency: { minutes: 60 },
timeout: { minutes: 15 },
}),
}),
);

// ..
}

Installation with Events Support

For the legacy backend system, please read the subsection below.

The catalog module github-org comes with events support enabled for the GithubMultiOrgEntityProvider. This will make it subscribe to its relevant topics and expects these events to be published via the EventsService.

Topics:

  • github.installation
  • github.membership
  • github.organization
  • github.team

Additionally, you should install the event router by events-backend-module-github which will route received events from the generic topic github to more specific ones based on the event type (e.g., github.membership).

In order to receive Webhook events by GitHub, you have to decide how you want them to be ingested into Backstage and published to its EventsService. You can decide between the following options (extensible):

Legacy Backend System

Please follow the installation instructions at

Additionally, you need to decide how you want to receive events from external sources like

Set up your provider

packages/backend/src/plugins/catalog.ts
import { CatalogBuilder } from '@backstage/plugin-catalog-backend';
import { GithubOrgEntityProvider } from '@backstage/plugin-catalog-backend-module-github';
import { ScaffolderEntitiesProcessor } from '@backstage/plugin-scaffolder-backend';
import { Router } from 'express';
import { PluginEnvironment } from '../types';

export default async function createPlugin(
env: PluginEnvironment,
): Promise<Router> {
const builder = await CatalogBuilder.create(env);
builder.addProcessor(new ScaffolderEntitiesProcessor());
const githubOrgProvider = GithubOrgEntityProvider.fromConfig(env.config, {
id: 'production',
orgUrl: 'https://github.com/backstage',
logger: env.logger,
events: env.events,
schedule: env.scheduler.createScheduledTaskRunner({
frequency: { minutes: 60 },
timeout: { minutes: 15 },
}),
});
builder.addEntityProvider(githubOrgProvider);
const { processingEngine, router } = await builder.build();
await processingEngine.start();
return router;
}

Or, alternatively, if using the GithubMultiOrgEntityProvider:

packages/backend/src/plugins/catalog.ts
import { GithubMultiOrgEntityProvider } from '@backstage/plugin-catalog-backend-module-github';

export default async function createPlugin(
env: PluginEnvironment,
): Promise<Router> {
const builder = await CatalogBuilder.create(env);

// The GitHub URL below needs to match a configured integrations.github entry
// specified in your app-config.
builder.addEntityProvider(
GithubMultiOrgEntityProvider.fromConfig(env.config, {
id: 'production',
githubUrl: 'https://github.com',
// Set the following to list the GitHub orgs you wish to ingest from. You can
// also omit this option to ingest all orgs accessible by your GitHub integration
orgs: ['org-a', 'org-b'],
logger: env.logger,
events: env.events,
schedule: env.scheduler.createScheduledTaskRunner({
frequency: { minutes: 60 },
timeout: { minutes: 15 },
}),
}),
);

// ..
}

You can check the official docs to configure your webhook and to secure your request. The webhook will need to be configured to forward organization,team and membership events.

Configuration

As mentioned above, you also must have some configuration in your app-config that describes the targets that you want to import. This lets the entity provider know what authorization to use, and what the API endpoints are. You may or may not have such an entry already added since before:

integrations:
github:
# example for public github
- host: github.com
token: ${GITHUB_TOKEN}
# example for a private GitHub Enterprise instance
- host: ghe.example.net
apiBaseUrl: https://ghe.example.net/api/v3
token: ${GHE_TOKEN}

These examples use ${} placeholders to reference environment variables. This is often suitable for production setups, but also means that you will have to supply those variables to the backend as it starts up. If you want, for local development in particular, you can experiment first by putting the actual tokens in a mirrored config directly in your app-config.local.yaml as well.

If Backstage is configured to use GitHub Apps authentication you must grant Read-Only access for Members under Organization in order to ingest users correctly. You can modify the app's permissions under the organization settings, https://github.com/organizations/{ORG}/settings/apps/{APP_NAME}/permissions.

permissions

Please note that when you change permissions, the app owner will get an email that must be approved first before the changes are applied.

email

Custom Transformers

You can inject your own transformation logic to help map from GH API responses into backstage entities. You can do this on the user and team requests to enable you to do further processing or updates to the entities.

To enable this you pass a function into the GitHubOrgEntityProvider. You can pass a UserTransformer, TeamTransformer or both. The function is invoked for each item (user or team) that is returned from the API. You can either return an Entity (User or Group) or undefined if you do not want to import that item.

There is also a defaultUserTransformer and defaultOrganizationTeamTransformer. You could use these and simply decorate the response from the default transformation if you only need to change a few properties.

Resolving GitHub users via organization email

When you authenticate users you should resolve them to an entity within the catalog. Often the authentication you use could be a corporate SSO system that provides you with email as a key. To enable you to find and resolve GitHub users it's useful to also import the private domain verified emails into the User entity in backstage.

The integration attempts to return organizationVerifiedDomainEmails from the GitHub API and makes this available as part of the object passed to UserTransformer. The GitHub API will only return emails that use a domain that's a verified domain for your GitHub Org. It also relies on the user having configured such an email in their own account. The API will only return these values when using GitHub App authentication and with the correct app permission allowing access to emails.

You can decorate the default userTransformer to replace the org email in the returned identity.

packages/backend/src/plugins/catalog.ts
const githubOrgProvider = GithubOrgEntityProvider.fromConfig(env.config, {
id: 'production',
orgUrl: 'https://github.com/backstage',
logger: env.logger,
schedule: env.scheduler.createScheduledTaskRunner({
frequency: { minutes: 60 },
timeout: { minutes: 15 },
}),
userTransformer: async (user, ctx) => {
const entity = await defaultUserTransformer(user, ctx);
if (entity && user.organizationVerifiedDomainEmails?.length) {
entity.spec.profile!.email = user.organizationVerifiedDomainEmails[0];
}
return entity;
},
});

Once you have imported the emails you can resolve users in your sign-in resolver using the catalog entity search via email

packages/backend/src/plugins/auth.ts
ctx.signInWithCatalogUser({
filter: {
kind: ['User'],
'spec.profile.email': email as string,
},
});

Using a Processor instead of a Provider

An alternative to using the Provider for ingesting organizational entities is to use a Processor. This is the old way that's based on registering locations with the proper type and target, triggering the processor to run.

The drawback of this method is that it will leave orphaned Group/User entities whenever they are deleted on your GitHub server, and you cannot control the frequency with which they are refreshed, separately from other processors.

Processor Installation

The GithubOrgReaderProcessor is not registered by default, so you have to install and register it in the catalog plugin:

From your Backstage root directory
yarn --cwd packages/backend add @backstage/plugin-catalog-backend-module-github
packages/backend/src/plugins/catalog.ts
import { GithubOrgReaderProcessor } from '@backstage/plugin-catalog-backend-module-github';

builder.addProcessor(
GithubOrgReaderProcessor.fromConfig(env.config, { logger: env.logger }),
);

Processor Configuration

The integration section of your app-config needs to be set up in the same way as for the Entity Provider - see above.

In addition to that, you typically want to add a few static locations to your app-config, which reference your organizations to import. The following configuration enables an import of the teams and users under the org https://github.com/my-org-name on public GitHub.

catalog:
locations:
- type: github-org
target: https://github.com/my-org-name
rules:
- allow: [User, Group]