Skip to content

Store .remote-info for remote shards to avoid extra S3 requests #435

@eguguchkin

Description

@eguguchkin

Currently, for remote shards (uploaded to S3), only an empty file <shard-name>.remote exists locally. All .info data for such shards is retrieved either from .frac_cache (if cached) or directly from S3.

Problems

  1. Cannot determine shard format without an S3 request
    We cannot tell whether a shard is a single index or split into multiple files without making a request to S3.

  2. Losing the only local source of .info when removing .frac_cache
    If we remove .frac_cache, every store initialization will require fetching .info from S3 for each remote shard. This is unacceptable in terms of speed and cost.

Solution

For each remote shard, store a local file <shard-name>.remote-infoin exactly the same format as .info for regular shards.

Legacy shard format indicator:
info.BinaryDataVer < config.BinaryDataV3

Disk space estimate

  • Average .info size: 450–500 bytes
  • One shard every 5 minutes (after compaction)
    Yearly volume:
    500 bytes * (525,600 min / 5 min) ≈ 50 MB

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions