Thank you for your interest in contributing to MaxCompression! This document provides guidelines and information for contributors.
- C99-compatible compiler (GCC ≥ 7, Clang ≥ 6, MSVC ≥ 2019)
- CMake ≥ 3.10
- Git
- Optional: OpenMP for multi-threaded compression
git clone https://github.com/SamDreamsMaker/Max-Compression.git
cd Max-Compression
cmake -S . -B build -DCMAKE_BUILD_TYPE=Debug -DMCX_BUILD_TESTS=ON
cmake --build build -j$(nproc)cd build && ctest --output-on-failureFor comprehensive roundtrip tests across all levels:
./bin/test_lzrc # LZRC v2.0 tests
./bin/test_roundtrip # Core roundtrip tests
./bin/test_bwt_levels # BWT path tests (L10-L22)- Open an issue on GitHub with a clear title and description
- Include: MCX version, OS, compiler, and steps to reproduce
- If possible, attach the input file that triggers the bug
- For compression bugs: include the compression level used
- Open a discussion or issue describing the feature
- Explain the use case and expected behavior
- Reference relevant compression literature if applicable
- Fork the repository
- Create a branch from
main:git checkout -b feature/my-feature - Write code following the style guide below
- Add tests — all changes must include roundtrip verification
- Run the full test suite — no regressions allowed
- Submit a pull request with a clear description
- C99 standard — no GNU extensions in library code (CLI may use POSIX)
- 4-space indentation, no tabs
snake_casefor functions and variablesMCX_prefix for all public symbolsmcx_prefix for all public functions- Braces on the same line for control structures
- Comments:
/* C89-style */for multi-line,//allowed for single-line in .c files
/* Good */
size_t mcx_lzrc_compress(uint8_t* dst, size_t dst_cap,
const uint8_t* src, size_t src_size,
int window_log, int bt_depth) {
if (!dst || !src || src_size == 0) return 0;
for (size_t i = 0; i < src_size; i++) {
/* Process byte */
}
return result;
}| Directory | Purpose |
|---|---|
include/maxcomp/ |
Public API only (maxcomp.h) |
lib/ |
Library implementation (internal) |
lib/entropy/ |
Entropy coders (rANS, FSE, AC, range coder) |
lib/lz/ |
LZ match finders and compressors |
lib/preprocess/ |
Transforms (BWT, MTF, RLE, delta, E8/E9) |
cli/ |
Command-line tool |
tests/ |
Test files (registered via CMake add_test) |
docs/ |
Documentation and specifications |
Follow Conventional Commits:
feat: Add ARM64 BCJ filter
fix: Correct BWT inverse for blocks > 32MB
perf: Optimize multi-rANS K-means clustering (-36% time)
docs: Update FORMAT.md with LZRC block type
test: Add LZRC roundtrip tests
refactor: Extract distance model into lz_models.h
Before submitting a PR, verify:
- Code compiles clean with
-Wall -Wextra(no new warnings) - All existing tests pass:
cd build && ctest --output-on-failure - Roundtrip verified: compressed → decompressed matches original exactly
- New tests added for any new functionality
- No compression ratio regression on Canterbury/Silesia corpora
- Commit messages follow Conventional Commits format
- Documentation updated (README, man page, CHANGELOG) if user-facing
- No hardcoded paths or platform-specific code in library (CLI may use POSIX)
- Memory: no leaks under valgrind for typical usage paths
- Pass roundtrip — compressed data must decompress to the exact original
- Not regress — no compression ratio decrease on Canterbury or Silesia corpora
- Handle edge cases — empty input, single byte, incompressible data, max-size blocks
Include before/after benchmarks:
File Before After Change
alice29.txt 3.53× 3.55× +0.6%
mozilla 2.93× 2.93× =
Include timing measurements:
File Before After
mozilla 0.3 MB/s 0.4 MB/s (+33%)
# Benchmark all default levels on a file
./build/bin/mcx bench corpora/alice29.txt
# Benchmark a specific level
./build/bin/mcx bench -l 12 corpora/alice29.txt# Run the comparison script (requires gzip, bzip2, xz installed)
./benchmarks/compare.sh corpora/alice29.txt
# Full Silesia corpus comparison
for f in corpora/silesia/*; do
echo "=== $(basename $f) ==="
./build/bin/mcx bench -l 12 "$f"
done- Build with Release mode:
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release - Use a quiet system (close browsers, stop background services)
- Timing is internal (
clock_gettime), excludes I/O overhead - For PR comparisons: benchmark the same file before and after your change
- Standard corpus files are in
corpora/(Canterbury) andcorpora/silesia/(Silesia)
Include a table like this in your PR description:
File Level Before After Change
alice29.txt L12 43,144 42,980 -0.4% ✓
dickens L12 2,497,882 2,497,882 =
kennedy.xls L12 20,551 20,551 =
Tests live in tests/. To add a new test pattern:
For roundtrip patterns, add to tests/test_comprehensive.c:
/* In the test_patterns array */
{"my_pattern_name", generate_my_pattern, MY_PATTERN_SIZE},// tests/test_my_feature.c
#include <maxcomp/maxcomp.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
static int test_roundtrip(const uint8_t* data, size_t size, int level) {
size_t comp_cap = size + size / 4 + 1024;
uint8_t* comp = malloc(comp_cap);
uint8_t* decomp = malloc(size);
size_t comp_size = mcx_compress(comp, comp_cap, data, size, level);
if (mcx_is_error(comp_size)) { free(comp); free(decomp); return 0; }
size_t decomp_size = mcx_decompress(decomp, size, comp, comp_size);
if (mcx_is_error(decomp_size)) { free(comp); free(decomp); return 0; }
int ok = (decomp_size == size) && (memcmp(data, decomp, size) == 0);
free(comp); free(decomp);
return ok;
}
int main(void) {
// Generate test data
uint8_t data[4096];
// ... fill data ...
int levels[] = {1, 3, 6, 9, 12, 20};
for (int i = 0; i < 6; i++) {
if (!test_roundtrip(data, sizeof(data), levels[i])) {
fprintf(stderr, "FAIL at level %d\n", levels[i]);
return 1;
}
}
printf("All tests passed\n");
return 0;
}Add to tests/CMakeLists.txt:
add_executable(test_my_feature test_my_feature.c)
target_link_libraries(test_my_feature PRIVATE maxcomp_static)
if(UNIX)
target_link_libraries(test_my_feature PRIVATE m)
endif()
add_test(NAME my_feature COMMAND test_my_feature)Good patterns to test (not all are covered yet):
- Sorted data (integers, strings)
- Run-heavy data (many repeated bytes)
- Alternating patterns (ABABAB...)
- Near-random with structure (random + periodic signal)
- Real-world samples (place in
corpora/if small enough)
- Define the strategy in
lib/internal.h(MCX_STRATEGY_*) - Add compression logic in
lib/core.c(block loop) - Add decompression logic in
lib/core.c(block type dispatch) - Assign a block type byte (see
docs/FORMAT.md) - Add to multi-trial at L20 if appropriate
- Add roundtrip tests in
tests/ - Update
docs/FORMAT.mdwith the new block type
- Create files in
lib/entropy/ - Add to
lib/CMakeLists.txt - Integrate as a new option in the relevant compression path
- Signal via the genome byte or block type byte
By contributing, you agree that your contributions will be licensed under the GPL-3.0 license.
Open an issue or discussion on GitHub. We're happy to help!