Google Copybara: Transforming and Moving Code Between Repositories

Google Copybara: Transforming and Moving Code Between Repositories

Overview

Google Copybara is a tool designed to transform and move source code between different repositories. It is primarily used to maintain synchronization between an authoritative source of truth and one or more destination repositories, making it particularly effective for projects that maintain a confidential internal repository and a public-facing open-source repository.

Core Functionality and Workflow

Copybara allows developers to move code while applying transformations, ensuring that the destination repository contains only the necessary files and the correct formatting for its environment.

Authoritative Repositories and Contributions

Copybara requires the designation of one repository as the authoritative source. While this ensures a single source of truth, the tool supports contributions from any repository. Changes made in a non-authoritative repository (such as a public GitHub repo) can be transformed and moved back into the authoritative repository. Merge conflicts are handled similarly to standard out-of-date changes within the authoritative repository.

Stateless Operation

A key architectural feature of Copybara is that it is stateless. It stores its state within the destination repository using labels in the commit messages. This design allows multiple users or automated services to run Copybara using the same configuration and repositories while achieving consistent results.

Supported Repositories

Currently, Git is the only fully supported repository type. Support for Mercurial is available but remains experimental. The tool's extensible architecture is designed to allow the addition of bespoke origins and destinations for various use cases.

Implementation and Configuration

Copybara uses a configuration language (Starlark) to define workflows. A workflow defines the origin, the destination, which files to include or exclude, and the transformations to apply.

Example Configuration

In a typical setup, a user defines a core.workflow that specifies the origin (e.g., a GitHub Git origin) and the destination. Transformations can include replacing strings across specific paths or moving files to different directories within the destination repository.

Installation and Deployment

Copybara can be deployed in several ways:

  • Pre-built Binaries: Weekly snapshot releases are available, though they lack manual testing and version guarantees.
  • Building from Source: Requires JDK 11 and Bazel. Users can build an executable uberjar using bazel build //java/com/google/copybara:copybara_deploy.jar.
  • Docker: An experimental Docker image is available for building and running Copybara, allowing users to pass configuration via environment variables like COPYBARA_CONFIG and COPYBARA_WORKFLOW.
  • Bazel Integration: Copybara can be integrated as an external Bazel repository using http_archive and specific repository macros.

Community Insights and Alternatives

Users and contributors have shared various perspectives on the utility and trade-offs of Copybara compared to other synchronization methods.

Use Cases and Benefits

Some users find Copybara essential for monorepo management, specifically when open-sourcing sub-projects from a larger internal repository. One user noted:

I've seen many teams gain significant productivity when collaborating in a monorepo with public bits.

Others use it for "fire and forget" exports where a folder is extracted from a repository while preserving Git history, allowing the new project to have a different layout while maintaining git blame functionality.

Trade-offs and Alternatives

Critics and alternative users have highlighted several points:

  • Performance: Some users reported that Copybara can be unacceptably slow for certain sync operations, suggesting that handwritten bash scripts using git-filter-repo are faster.
  • Simplicity: For simple mirroring without transformations or exclusions, native mirroring tools (such as those provided by GitLab) are often sufficient.
  • Alternative Tools: Other tools in this space include Josh (used by the Rust project) and the now-archived fbshipit from Meta.
  • Alternative Workflows: Some developers suggest using Jujutsu (jj) for basic public/private repo maintenance or utilizing Git submodules and subtrees for sharing code between repositories.
  • Architectural Philosophy: Some argue that the need to sync code between repositories is a symptom of poor architectural choices, suggesting that shared code should be extracted into separate versioned libraries rather than being synced via tools like Copybara.

Sources