Skip to main content

URL Reader

Concept

Some of the core plugins of Backstage have to read files from an external location. Software Catalog has to read the catalog-info.yaml entity descriptor files to register and track an entity. Software Templates have to download the template skeleton files before creating a new component. TechDocs has to download the markdown source files before generating a documentation site.

Since, the requirement for reading files is so essential for Backstage plugins, the @backstage/backend-common package provides a dedicated API for reading from such URL based remote locations like GitHub, GitLab, Bitbucket, Google Cloud Storage, etc. This is commonly referred to as "URL Reader". It takes care of making authenticated requests to the remote host so that private files can be read securely. If users have GitHub App based authentication set up, URL Reader even refreshes the token, to avoid reaching the GitHub API rate limit.

As a result, plugin authors do not have to worry about any of these problems when trying to read files.

Interface

When the Backstage backend starts, a new instance of URL Reader is created. You can see this in the index file of your Backstage backend i.e.packages/backend/src/index.ts. Example

// File: packages/backend/src/index.ts

import { UrlReaders } from '@backstage/backend-common';

function makeCreateEnv(config: Config) {
// ....
const reader = UrlReaders.default({ logger: root, config });
//
}

This instance contains all the default URL Reader providers in the backend-common package including GitHub, GitLab, Bitbucket, Azure, Google GCS. As the need arises, more URL Readers are being written to support different providers.

The generic interface of a URL Reader instance looks like this.

export type UrlReader = {
/* Used to read a single file and return its content. */
readUrl(url: string, options?: ReadUrlOptions): Promise<ReadUrlResponse>;
/* Used to read a file tree and download as a directory. */
readTree(url: string, options?: ReadTreeOptions): Promise<ReadTreeResponse>;
/* Used to search a file in a tree. */
search(url: string, options?: SearchOptions): Promise<SearchResponse>;
};

Using a URL Reader inside a plugin

The reader instance is available in the backend plugin environment and passed on to all the backend plugins. You can see an example. When any of the methods on this instance is called with a URL, URL Reader extracts the host for that URL (e.g. github.com, ghe.mycompany.com, etc.). Using the @backstage/integration package, it looks inside the integrations: config of the app-config.yaml to find out how to work with the host based on the configs provided like authentication token, API base URL, etc.

Make sure your plugin-specific backend file at packages/backend/src/plugins/<PLUGIN>.ts is forwarding the reader instance passed on as the PluginEnvironment to the actual plugin's createRouter function. See how this is done in Catalog and TechDocs backend plugins.

Once the reader instance is available inside the plugin, one of its methods can directly be used with a URL. Some example usages -

  • readUrl - Catalog using the readUrl method to read the CODEOWNERS file in a repository.
  • readTree - TechDocs using the readTree method to download markdown files in order to generate the documentation site.
  • readTree - TechDocs using NotModifiedError to maintain cache and speed up and limit the number of requests.
  • search - Catalog using the search method to find files for a location URL containing a glob pattern.

Note that URL Readers which target git-based version control systems may, under the hood, leverage the ability to create tar archives based on a specific git commit-ish. A consequence of this is that files and directories configured with the export-ignore attribute via .gitattributes will not be visible to the URL reader when using the readTree or search methods.

Be aware of this limitation and ensure that end-users of your plugin are also aware via, for example, documentation.

Writing a new URL Reader

If the available URL Readers are not sufficient for your use case and you want to add a new URL Reader for any other provider, you are most welcome to contribute one!

Feel free to use the GitHub URL Reader as a source of inspiration.

1. Add an integration

The provider for your new URL Reader can also be called an "integration" in Backstage. The integrations: section of your Backstage app-config.yaml config file is supposed to be the place where a Backstage integrator defines the host URL for the integration, authentication details and other integration related configurations.

The @backstage/integration package is where most of the integration specific code lives, so that it is shareable across Backstage. Functions like "read the integrations config and process it", "construct headers for authenticated requests to the host" or "convert a plain file URL into its API URL for downloading the file" would live in this package.

2. Create the URL Reader

Create a new class which implements the UrlReader type inside @backstage/backend-common. Create and export a static factory method which reads the integration config and returns a map of host URLs the new reader should be used for. See the GitHub URL Reader for example.

3. Implement the methods

We want to make sure all URL Readers behave in the same way. Hence if possible, all the methods of the UrlReader interface should be implemented. However it is okay to start by implementing just one of them and create issues for the remaining.

readUrl

readUrl method expects a user-friendly URL, something which can be copied from the browser naturally when a person is browsing the provider in their browser.

  • ✅ Valid URL : https://github.com/backstage/backstage/blob/master/ADOPTERS.md
  • ❌ Not a valid URL : https://raw.githubusercontent.com/backstage/backstage/master/ADOPTERS.md
  • ❌ Not a valid URL : https://github.com/backstage/backstage/ADOPTERS.md

Upon receiving the URL, readUrl converts the user-friendly URL into an API URL which can be used to request the provider's API.

readUrl then makes an authenticated request to the provider API and returns the response containing the file's contents and ETag(if the provider supports it).

readTree

readTree method also expects user-friendly URLs similar to read but the URL should point to a tree (could be the root of a repository or even a sub-directory).

  • ✅ Valid URL : https://github.com/backstage/backstage
  • ✅ Valid URL : https://github.com/backstage/backstage/blob/master
  • ✅ Valid URL : https://github.com/backstage/backstage/blob/master/docs

Using the provider's API documentation, find out an API endpoint which can be used to download either a zip or a tarball. You can download the entire tree (e.g. a repository) and filter out in case the user is expecting only a sub-tree. But some APIs are smart enough to accept a path and return only a sub-tree in the downloaded archive.

search method expects a glob pattern of a URL and returns a list of files matching the query.

  • ✅ Valid URL : https://github.com/backstage/backstage/blob/master/**/catalog-info.yaml
  • ✅ Valid URL : https://github.com/backstage/backstage/blob/master/**/*.md
  • ✅ Valid URL : https://github.com/backstage/backstage/blob/master/*/package.json
  • ✅ Valid URL : https://github.com/backstage/backstage/blob/master/READM

The core logic of readTree can be used here to extract all the files inside the tree and return the files matching the pattern in the url.

4. Add to available URL Readers

There are two ways to make your new URL Reader available for use.

You can choose to make it open source, by updating the default factory method of URL Readers.

But for something internal which you don't want to make open source, you can update your packages/backend/src/index.ts file and update how the reader instance is created.

// File: packages/backend/src/index.ts
const reader = UrlReaders.default({
logger: root,
config,
// This is where your internal URL Readers would go.
factories: [myCustomReader.factory],
});

5. Caching

All of the methods above support an ETag based caching. If the method is called without an etag, the response contains an ETag of the resource (should ideally forward the ETag returned by the provider). If the method is called with an etag, it first compares the ETag and returns a NotModifiedError in case the resource has not been modified. This approach is very similar to the actual ETag and If-None-Match HTTP headers.

6. Debugging

When debugging one of the URL Readers, you can straightforward use the reader instance created when the backend starts and call one of the methods with your debugging URL.

// File: packages/backend/src/index.ts

async function main() {
// ...
const createEnv = makeCreateEnv(config);

const testReader = createEnv('test-url-reader').reader;
const response = await testReader.readUrl(
'https://github.com/backstage/backstage/blob/master/catalog-info.yaml',
);
console.log((await response.buffer()).toString());
// ...
}

This will be run every time you restart the backend. Note that after any change in the URL Reader code, you need to stop the backend and restart, since the reader instance is memoized and does not update on hot module reloading. Also, there are a lot of unit tests written for the URL Readers, which you can make use of.