Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
174 changes: 72 additions & 102 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,143 +3,113 @@
---
# NOTICE

git-drs is not yet fully compliant with DRS. It currently works against Gen3 DRS server. Full GA4GH DRS support is expected once v1.6 of the specification has been published.
`git-drs` is not a pure GA4GH DRS client. It targets Syfon/Gen3-style DRS workflows and uses extensions where repo-scale behavior requires them.

---

[![Tests](https://github.com/calypr/git-drs/actions/workflows/test.yaml/badge.svg)](https://github.com/calypr/git-drs/actions/workflows/test.yaml)

**Git/DRS orchestration with optional Git LFS compatibility**
**Git/DRS orchestration with Git-compatible pointer workflows**

Git DRS manages Git-facing DRS workflows: local metadata, Git hooks, filter behavior, lookup/register/push/pull orchestration, and optional Git LFS compatibility. Provider-specific transfer, signed URL behavior, and direct cloud inspection live in client code outside this repo.
`git-drs` manages:

- remote Gen3/Syfon configuration
- local DRS metadata
- pointer-aware push/pull orchestration
- bucket-scoped object reference workflows

## Key Features

- **Unified Workflow**: Manage both code and large data files using standard Git commands
- **DRS Integration**: Built-in support for Gen3 DRS servers
- **Multi-Remote Support**: Work with development, staging, and production servers in one repository
- **Automatic Processing**: Files are processed automatically during commits and pushes
- **Flexible Tracking**: Track individual files, patterns, or entire directories
- unified Git/data workflow around DRS-backed pointers
- Gen3/Syfon integration
- multiple remotes in one repository
- explicit file tracking and hydration
- metadata-only reference support for existing bucket objects

## How It Works

Git DRS works alongside Git LFS when you want LFS-compatible pointers and storage, while still supporting DRS-centric workflows:
At a high level:

1. **Initialization**: Set up repository and DRS server configuration
2. **Automatic Commits**: Create DRS objects during pre-commit hooks
3. **Automatic Pushes**: Register files with DRS servers and upload to configured storage
4. **On-Demand Downloads**: Pull specific files or patterns as needed
1. initialize the repository with `git drs init`
2. configure a remote for one `organization/project`
3. track file patterns with `git drs track`
4. add/commit/push normally
5. hydrate pointer files later with `git drs pull`

## Quick Start

### Installation

```bash
# Install Git LFS first
brew install git-lfs # macOS
git lfs install --skip-smudge

# Install Git DRS
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/calypr/git-drs/refs/heads/main/install.sh)" -- $GIT_DRS_VERSION

# Install global Git filter configuration for git-drs
git drs install
```

### Basic Usage

```bash
# Initialize repository (one-time Git repo setup)
git drs init

# Add DRS remote
git drs remote add gen3 production \
--cred /path/to/credentials.json \
--url https://calypr-public.ohsu.edu \
--organization my-program \
--project my-project \
--bucket my-bucket

# Required prerequisite (usually steward/admin setup):
# create bucket credentials, then map org/project to full storage roots before users run push/pull
git drs bucket add production \
--bucket my-bucket \
--region us-east-1 \
--access-key "$AWS_ACCESS_KEY_ID" \
--secret-key "$AWS_SECRET_ACCESS_KEY" \
--s3-endpoint https://s3.amazonaws.com
git drs bucket add-organization production \
--organization my-program \
--path s3://my-bucket/my-program
git drs bucket add-project production \
--organization my-program \
--project my-project \
--path s3://my-bucket/my-program/my-project

# Track files
git lfs track "*.bam"
git drs remote add gen3 production HTAN_INT/BForePC --cred /path/to/credentials.json
git drs track "*.bam"
git add .gitattributes

# Add and commit files
git add my-file.bam
git commit -m "Add data file"
git add sample.bam
git commit -m "Add sample"
git push

# Download files
git lfs pull -I "*.bam"
git drs ls-files
git drs pull -I "*.bam"
```

## Documentation
## Current CLI Shape

For detailed setup and usage information:
The cleaned CLI intentionally removed legacy commands:

- **[Getting Started](docs/getting-started.md)** - Repository setup and basic workflows
- **[Commands Reference](docs/commands.md)** - Complete command documentation
- **[Installation Guide](docs/installation.md)** - Platform-specific installation
- **[Troubleshooting](docs/troubleshooting.md)** - Common issues and solutions
- **[E2E Modes + Local Setup](docs/e2e-modes-and-local-setup.md)** - Local vs remote mode, server config, and end-to-end runbooks
- **[Cloud/Object Integration](docs/adding-s3-files.md)** - Adding files from provider URLs or configured bucket object keys
- **[Developer Guide](docs/developer-guide.md)** - Internals and development
- removed:
- `git drs fetch`
- `git drs list`
- `git drs upload`
- `git drs download`
- `git drs pull` is hydration-only
- `git drs ls-files` is the local file inventory command
- `git drs remote add gen3` takes scope as `organization/project`

## Supported Servers
Example:

- **Gen3 Data Commons** (e.g., CALYPR)
```bash
git drs remote add gen3 production HTAN_INT/BForePC --cred /path/to/credentials.json
```

## Supported Environments
## Bucket Mapping Model

- **Local Development** environments
- **HPC Systems** (e.g., ARC)
End users should not need to know the bucket name.

## Commands Overview
Push and pull depend on server-side bucket mapping for the requested scope. That mapping is normally provisioned once by a steward/admin using the bucket commands.

| Command | Description |
| ---------------------- | ------------------------------------- |
| `git drs install` | Install global git-drs filter config |
| `git drs init` | Initialize repository |
| `git drs remote add` | Add a DRS remote server |
| `git drs remote list` | List configured remotes |
| `git drs remote set` | Set default remote |
| `git drs add-url` | Add files via provider URLs or configured bucket object keys |
| `git lfs track` | Track file patterns with LFS |
| `git lfs ls-files` | List tracked files |
| `git lfs pull` | Download tracked files |
| `git drs fetch` | Fetch metadata from DRS server |
| `git drs push` | Push objects to DRS server |
## Common Commands

Use `--help` with any command for details. See [Commands Reference](docs/commands.md) for complete documentation.
| Command | Description |
| --- | --- |
| `git drs install` | Install global `git-drs` filter config |
| `git drs init` | Initialize repository-local `git-drs` state |
| `git drs remote add gen3 [remote] <org/project>` | Add or refresh a Gen3/Syfon remote |
| `git drs remote list` | List configured remotes |
| `git drs remote set <name>` | Set the default remote |
| `git drs track <pattern>` | Track files or globs |
| `git drs untrack <pattern>` | Stop tracking files or globs |
| `git drs ls-files` | List tracked files and localization state |
| `git drs pull` | Hydrate pointer files in the current checkout |
| `git drs push` | Register/upload objects and push metadata workflow |
| `git drs add-url` | Add an existing provider object by URL or scoped key |
| `git drs add-ref` | Add a local reference to an existing DRS object |
| `git drs query` | Query a DRS object by ID |
| `git drs copy-records` | Copy Syfon records between remotes for one scope |

## Requirements
## Documentation

- Git LFS installed and configured
- Access credentials for your DRS server
- Go 1.24+ (for building from source)
- [Getting Started](docs/getting-started.md)
- [Commands Reference](docs/commands.md)
- [Troubleshooting](docs/troubleshooting.md)
- [Developer Guide](docs/developer-guide.md)
- [GA4GH DRS Scalability Gaps](docs/ga4gh-drs-scalability-gaps.md)

## Support
## Requirements

- **Issues**: [GitHub Issues](https://github.com/calypr/git-drs/issues)
- **Releases**: [GitHub Releases](https://github.com/calypr/git-drs/releases)
- **Documentation**: See `docs/` folder for detailed guides
- Git
- access credentials for the target Gen3/Syfon deployment
- Go 1.26.2+ for local builds

## License
## Support

This project is part of the CALYPR data commons ecosystem.
- [GitHub Issues](https://github.com/calypr/git-drs/issues)
- [GitHub Releases](https://github.com/calypr/git-drs/releases)
51 changes: 51 additions & 0 deletions attic/issue-add-include-pattern-to-git-drs-pull.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Add `-I "pattern"` include filter support to `git drs pull`

## Summary
Add include-pattern filtering to `git drs pull`, similar to legacy `git lfs pull -I "pattern"` workflows.

## Motivation
Current `git drs pull` behavior pulls based on repository resolution without a user-facing path pattern filter. Users migrating from `git lfs pull -I` expect selective hydration of files by glob/path.

## Proposed UX
Support:

```bash
git drs pull -I "results/*.txt"
git drs pull -I "*.bam" -I "data/**"
git drs pull --include "path/to/file"
```

Optional:
- `--exclude` parity (if desired in same change or follow-up)

## Proposed behavior
1. Parse one or more include patterns (`-I`, `--include`).
2. Resolve candidate pointers as usual.
3. Filter by repo-relative path match before download.
4. Download only matched objects; skip others with clear logging.
5. If no pattern supplied, preserve current default behavior.

## Scope
- `cmd/pull/main.go` CLI flags and pull selection pipeline
- pointer/path inventory layer (where path<->OID candidates are produced)
- docs: `docs/commands.md`, `docs/getting-started.md`, `docs/troubleshooting.md`
- tests for include filtering semantics

## Acceptance criteria
- [ ] `git drs pull -I "<pattern>"` works for a single pattern.
- [ ] Repeated `-I` flags are supported.
- [ ] Include matching is against repo-relative paths.
- [ ] Default `git drs pull` behavior unchanged when no `-I` is passed.
- [ ] Help text documents pattern syntax and examples.
- [ ] Unit/integration tests cover positive and negative matches.

## Testing matrix
- Single file exact path include.
- Wildcard include (`*.bam`, `data/**`).
- Multiple `-I` values.
- No matches (should no-op cleanly and return success unless policy says otherwise).
- Mixed matched/unmatched objects in same pull run.

## Notes
This closes a usability gap for users transitioning from `git lfs` CLI habits to `git drs` commands while keeping pull behavior explicit and predictable.

6 changes: 3 additions & 3 deletions cmd/remote/add/gen3.go
Original file line number Diff line number Diff line change
Expand Up @@ -99,13 +99,13 @@ func gen3Init(remoteName, credFile, fenceToken, project, organization, bucket st

default:
existing, err := configure.Load(remoteName)
if err == nil {
if err != nil {
return fmt.Errorf("failed to load %s config: %w", remoteName, err)
} else {
accessToken = existing.AccessToken
apiKey = existing.APIKey
keyID = existing.KeyID
apiEndpoint = existing.APIEndpoint
} else {
return fmt.Errorf("must provide either --cred or --token (or have existing profile %s)", remoteName)
}
}

Expand Down
26 changes: 13 additions & 13 deletions docs/adding-s3-files.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Adding Provider Objects with `git drs add-url`

`git drs add-url` prepares a Git LFS pointer plus local DRS metadata for an object that already exists in provider storage.
`git drs add-url` prepares a Git pointer plus local DRS metadata for an object that already exists in provider storage.

Important behavior:

Expand All @@ -26,7 +26,7 @@ The inspector also accepts other go-cloud styles (`gs://`, `azblob://`, `file://
If your remote org/project already has a bucket mapping, pass an object key relative to that configured bucket scope and set `--scheme`.

```bash
git lfs track "data/*.bin"
git drs track "data/*.bin"
git add .gitattributes

git drs add-url path/to/object.bin data/from-bucket.bin \
Expand Down Expand Up @@ -54,7 +54,7 @@ git drs add-url s3://my-bucket/path/to/object.bin data/from-bucket.bin \
If you know the authoritative SHA256, pass `--sha256`.

```bash
git lfs track "data/*.bin"
git drs track "data/*.bin"
git add .gitattributes

git drs add-url path/to/object.bin data/from-bucket.bin \
Expand All @@ -66,25 +66,25 @@ git commit -m "add known-sha object"
git drs push
```

## Unknown SHA256 (experimental sentinel mode)
## Unknown SHA256

If SHA256 is unknown, omit `--sha256`.

Behavior:

1. `add-url` performs object metadata lookup (HEAD/attributes).
2. Synthetic OID is derived from ETag (`sha256(etag)`).
3. A local sentinel object is written into `.git/lfs/objects/...`.
4. `git drs push` performs metadata-only registration.
2. A deterministic placeholder OID is derived from remote object metadata.
3. A pointer file and local DRS metadata are written.
4. `git drs push` performs metadata registration.

```bash
git lfs track "data/*.bin"
git drs track "data/*.bin"
git add .gitattributes

git drs add-url path/to/object.bin data/from-bucket.bin --scheme s3

git add data/from-bucket.bin
git commit -m "add unknown-sha object (sentinel mode)"
git commit -m "add unknown-sha object"
git drs push
```

Expand All @@ -103,7 +103,7 @@ For e2e/dev harnesses, `TEST_BUCKET_*` variables are also supported by command-l

## Prerequisites

- File path must be LFS-tracked (via `.gitattributes`).
- File path must be tracked (via `.gitattributes`).
- Remote configuration must point to the intended org/project scope.
- The bucket credential and org/project storage scope must exist on drs-server, for example via `git drs bucket add`, then `git drs bucket add-organization` or `git drs bucket add-project --path s3://bucket/prefix`.

Expand All @@ -118,13 +118,13 @@ Usually region/endpoint mismatch for S3-compatible storage.

### `no local payload available; skipping upload and keeping metadata-only registration`

Expected for add-url pointer/sentinel flows where local payload bytes are intentionally absent.
Expected for add-url pointer/metadata-only flows where local payload bytes are intentionally absent.

### `file is not tracked by LFS`
### `file is not tracked`

Track the path pattern and re-add:

```bash
git lfs track "data/*.bin"
git drs track "data/*.bin"
git add .gitattributes
```
Loading
Loading