Add .gptinclude functionality and fix XML CDATA handling#17
Closed
imertz wants to merge 3 commits intochand1012:mainfrom
Closed
Add .gptinclude functionality and fix XML CDATA handling#17imertz wants to merge 3 commits intochand1012:mainfrom
imertz wants to merge 3 commits intochand1012:mainfrom
Conversation
- Fixed XML generation to properly handle special characters and CDATA sections - Added protection against premature CDATA termination by escaping "]]>" sequences - Improved XML formatting with consistent indentation and structure - Simplified token placeholder replacement without breaking formatting
This commit adds support for a .gptinclude file, which allows users to explicitly specify which files should be included in the repository export. The feature complements the existing .gptignore functionality: - When both .gptinclude and .gptignore exist, files are first filtered by the include patterns, then any matching ignore patterns are excluded - Added new command-line flag: -I/--include to specify a custom path to the .gptinclude file - Default behavior looks for .gptinclude in repository root - Added comprehensive tests for the new functionality - Updated README.md with documentation and examples With this change, users gain more fine-grained control over which parts of their repositories are processed by git2gpt, making it easier to focus on specific areas when working with AI language models.
This commit fixes an issue where the XML export would fail with "unexpected EOF in CDATA section" errors when file content contained the CDATA end marker sequence ']]>'. The fix implements a proper CDATA handling strategy that: - Detects all occurrences of ']]>' in file content - Splits the content around these markers - Creates properly nested CDATA sections to preserve the original content - Ensures all XML output is well-formed regardless of source content This approach maintains the efficiency of CDATA for storing large code blocks while ensuring compatibility with all possible file content. Fixes the XML validation error that would occur when processing files containing CDATA end marker sequences.
Owner
|
Duplicate of #17, closing. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This PR adds support for a
.gptincludefile, allowing users to explicitly specify which files should be included in the repository export. It also fixes an issue with XML output when file content contains CDATA end markers.New Feature:
.gptincludeSupportThis feature complements the existing
.gptignorefunctionality by letting users specify explicitly which files should be included rather than only which to exclude. When both files exist, git2gpt prioritizes.gptincludepatterns first, then excludes any files that match.gptignorepatterns.Changes:
-I/--includecommand-line flag to specify a custom path to.gptincludefile.gptincludein the repository rootBug Fix: XML CDATA Handling
Fixed an issue where the XML export would fail with "unexpected EOF in CDATA section" errors when file content contained the CDATA end marker sequence
]]>.The fix:
]]>sequenceTesting Instructions
Testing
.gptincludefunctionality:.gptincludefile in a test repository with patterns like:git2gpt -o output.txt /path/to/repoTesting with both
.gptincludeand.gptignore:.gptincludefile withsrc/**.gptignorefile withsrc/test/**git2gpt -o output.txt /path/to/reposrc/except those insrc/test/are includedTesting XML output fix:
]]>(common in some code or XML files)git2gpt -x -o output.xml /path/to/repoAutomated Tests
The PR includes a comprehensive test for the
.gptincludefunctionality:Documentation
README.md has been updated with:
.gptincludefeature.gptincludeand.gptignorePotential Impact
These changes are backward compatible and don't affect existing functionality:
.gptignorewill continue to work as before