URL Reader
Concept
Some of the core plugins of Backstage have to read files from an external
location. Software Catalog has to read
the catalog-info.yaml
entity descriptor files to register and track an entity.
Software Templates have to download
the template skeleton files before creating a new component.
TechDocs has to download the markdown source
files before generating a documentation site.
Since, the requirement for reading files is so essential for Backstage plugins,
the
@backstage/backend-common
package provides a dedicated API for reading from such URL based remote
locations like GitHub, GitLab, Bitbucket, Google Cloud Storage, etc. This is
commonly referred to as "URL Reader". It takes care of making authenticated
requests to the remote host so that private files can be read securely. If users
have GitHub App based authentication set up, URL Reader even
refreshes the token, to avoid reaching the GitHub API rate limit.
As a result, plugin authors do not have to worry about any of these problems when trying to read files.
Interface
When the Backstage backend starts, a new instance of URL Reader is created. You
can see this in the index file of your Backstage backend
i.e.packages/backend/src/index.ts
.
Example
// File: packages/backend/src/index.ts
import { UrlReaders } from '@backstage/backend-common';
function makeCreateEnv(config: Config) {
// ....
const reader = UrlReaders.default({ logger: root, config });
//
}
This instance contains all the default URL Reader providers in the backend-common package including GitHub, GitLab, Bitbucket, Azure, Google GCS. As the need arises, more URL Readers are being written to support different providers.
The generic interface of a URL Reader instance looks like this.
export type UrlReader = {
/* Used to read a single file and return its content. */
readUrl(url: string, options?: ReadUrlOptions): Promise<ReadUrlResponse>;
/* Used to read a file tree and download as a directory. */
readTree(url: string, options?: ReadTreeOptions): Promise<ReadTreeResponse>;
/* Used to search a file in a tree. */
search(url: string, options?: SearchOptions): Promise<SearchResponse>;
};
Using a URL Reader inside a plugin
The reader
instance is available in the backend plugin environment and passed
on to all the backend plugins. You can see an
example.
When any of the methods on this instance is called with a URL, URL Reader
extracts the host for that URL (e.g. github.com
, ghe.mycompany.com
, etc.).
Using the
@backstage/integration
package, it looks inside the
integrations:
config of the app-config.yaml
to find out how to work with the host based on
the configs provided like authentication token, API base URL, etc.
Make sure your plugin-specific backend file at
packages/backend/src/plugins/<PLUGIN>.ts
is forwarding the reader
instance
passed on as the PluginEnvironment
to the actual plugin's createRouter
function. See how this is done in
Catalog
and
TechDocs
backend plugins.
Once the reader instance is available inside the plugin, one of its methods can directly be used with a URL. Some example usages -
readUrl
- Catalog using thereadUrl
method to read the CODEOWNERS file in a repository.readTree
- TechDocs using thereadTree
method to download markdown files in order to generate the documentation site.readTree
- TechDocs usingNotModifiedError
to maintain cache and speed up and limit the number of requests.search
- Catalog using thesearch
method to find files for a location URL containing a glob pattern.
Note that URL Readers which target git-based version control systems may, under
the hood, leverage the ability to create tar archives based on a specific git
commit-ish. A consequence of this is that files and directories configured with
the export-ignore
attribute
via .gitattributes
will not be visible to the URL reader when using the
readTree
or search
methods.
Be aware of this limitation and ensure that end-users of your plugin are also aware via, for example, documentation.
Writing a new URL Reader
If the available URL Readers are not sufficient for your use case and you want to add a new URL Reader for any other provider, you are most welcome to contribute one!
Feel free to use the GitHub URL Reader as a source of inspiration.
1. Add an integration
The provider for your new URL Reader can also be called an "integration" in
Backstage. The integrations:
section of your Backstage app-config.yaml
config file is supposed to be the place where a Backstage integrator defines the
host URL for the integration, authentication details and other integration
related configurations.
The @backstage/integration
package is where most of the integration specific
code lives, so that it is shareable across Backstage. Functions like "read the
integrations config and process it", "construct headers for authenticated
requests to the host" or "convert a plain file URL into its API URL for
downloading the file" would live in this package.
2. Create the URL Reader
Create a new class which implements the
UrlReader
type
inside @backstage/backend-common
. Create and export a static factory
method
which reads the integration config and returns a map of host URLs the new reader
should be used for. See the
GitHub URL Reader
for example.
3. Implement the methods
We want to make sure all URL Readers behave in the same way. Hence if possible,
all the methods of the UrlReader
interface should be implemented. However it
is okay to start by implementing just one of them and create issues for the
remaining.
readUrl
readUrl
method expects a user-friendly URL, something which can be copied from
the browser naturally when a person is browsing the provider in their browser.
- ✅ Valid URL :
https://github.com/backstage/backstage/blob/master/ADOPTERS.md
- ❌ Not a valid URL :
https://raw.githubusercontent.com/backstage/backstage/master/ADOPTERS.md
- ❌ Not a valid URL :
https://github.com/backstage/backstage/ADOPTERS.md
Upon receiving the URL, readUrl
converts the user-friendly URL into an API URL
which can be used to request the provider's API.
readUrl
then makes an authenticated request to the provider API and returns the response containing the file's contents and ETag(if the provider supports it).
readTree
readTree
method also expects user-friendly URLs similar to read
but the URL
should point to a tree (could be the root of a repository or even a
sub-directory).
- ✅ Valid URL :
https://github.com/backstage/backstage
- ✅ Valid URL :
https://github.com/backstage/backstage/blob/master
- ✅ Valid URL :
https://github.com/backstage/backstage/blob/master/docs
Using the provider's API documentation, find out an API endpoint which can be used to download either a zip or a tarball. You can download the entire tree (e.g. a repository) and filter out in case the user is expecting only a sub-tree. But some APIs are smart enough to accept a path and return only a sub-tree in the downloaded archive.
search
search
method expects a glob pattern of a URL and returns a list of files
matching the query.
- ✅ Valid URL :
https://github.com/backstage/backstage/blob/master/**/catalog-info.yaml
- ✅ Valid URL :
https://github.com/backstage/backstage/blob/master/**/*.md
- ✅ Valid URL :
https://github.com/backstage/backstage/blob/master/*/package.json
- ✅ Valid URL :
https://github.com/backstage/backstage/blob/master/READM
The core logic of readTree
can be used here to extract all the files inside
the tree and return the files matching the pattern in the url
.
4. Add to available URL Readers
There are two ways to make your new URL Reader available for use.
You can choose to make it open source, by updating the
default
factory
method of URL Readers.
But for something internal which you don't want to make open source, you can
update your packages/backend/src/index.ts
file and update how the reader
instance is created.
// File: packages/backend/src/index.ts
const reader = UrlReaders.default({
logger: root,
config,
// This is where your internal URL Readers would go.
factories: [myCustomReader.factory],
});
5. Caching
All of the methods above support an ETag based caching. If the method is called
without an etag
, the response contains an ETag of the resource (should ideally
forward the ETag returned by the provider). If the method is called with an
etag
, it first compares the ETag and returns a NotModifiedError
in case the
resource has not been modified. This approach is very similar to the actual
ETag
and
If-None-Match
HTTP headers.
6. Debugging
When debugging one of the URL Readers, you can straightforward use the
reader
instance created
when the backend starts and call one of the methods with your debugging URL.
// File: packages/backend/src/index.ts
async function main() {
// ...
const createEnv = makeCreateEnv(config);
const testReader = createEnv('test-url-reader').reader;
const response = await testReader.readUrl(
'https://github.com/backstage/backstage/blob/master/catalog-info.yaml',
);
console.log((await response.buffer()).toString());
// ...
}
This will be run every time you restart the backend. Note that after any change
in the URL Reader code, you need to stop the backend and restart, since the
reader
instance is memoized and does not update on hot module reloading. Also,
there are a lot of unit tests written for the URL Readers, which you can make
use of.