Skip to content

refactor(file): composable Lines iterator; deprecate channel-based ReadFile helpers#742

Merged
Mzack9999 merged 2 commits intoprojectdiscovery:mainfrom
ChrisJr404:feature/read-file-split
May 8, 2026
Merged

refactor(file): composable Lines iterator; deprecate channel-based ReadFile helpers#742
Mzack9999 merged 2 commits intoprojectdiscovery:mainfrom
ChrisJr404:feature/read-file-split

Conversation

@ChrisJr404
Copy link
Copy Markdown
Contributor

@ChrisJr404 ChrisJr404 commented May 3, 2026

Closes #719.

Supersedes the original ReadFileSplit / ReadFileWithReaderSplit proposal. Per review discussion: rather than add yet another ReadFile* variant (each new option adds a new combinatorial helper), introduce a single composable iterator and reduce the existing helpers to thin deprecated wrappers over it.

What

New file file/lines.go exposes one primitive per source, returning a Go 1.23+ iterator:

func Lines(filename string, opts ...LineOption) iter.Seq2[string, error]
func LinesReader(r io.Reader, opts ...LineOption) iter.Seq2[string, error]

Composable options (applied in order scan → split → trim → skip-empty → filter):

WithBufferSize(n int)            // scanner buffer
WithSplit(separators ...rune)    // split each line on any rune (FieldsFunc semantics)
WithTrimSpace()                  // TrimSpace on each emitted value
WithSkipEmpty()                  // drop empty values (post-trim)
WithFilter(func(string) bool)    // arbitrary predicate, e.g. skip "# comment" lines

The resolver-file use case from #719 / httpx#2351 becomes:

for v, err := range fileutil.Lines(path,
    fileutil.WithSplit(','),
    fileutil.WithTrimSpace(),
    fileutil.WithSkipEmpty(),
) {
    if err != nil { return err }
    // use v
}

Why this shape

  • No combinatorial explosion. The old surface was already trending toward ReadFile, ReadFileWithBufferSize, ReadFileSplit, ReadFileSplitWithBufferSize, …WithReaderAnd…, etc. One iterator plus N orthogonal options replaces all of them and any future variants (WithFilter, WithComment, …) compose for free.
  • Errors are surfaced. The channel-based helpers silently swallowed os.Open failures inside the goroutine and scanner.Err() entirely. iter.Seq2[string, error] makes both visible: the iterator yields a final ("", err) pair and stops.
  • No goroutine leaks. iter.Seq2 cancels naturally on break; the file is closed via defer when iteration ends. The old channel pattern leaked a goroutine forever if the consumer stopped reading early.
  • Idiomatic Go 1.24. Mirrors the new stdlib strings.Lines precedent.
  • Lazy. The file isn't opened until the first range step.

Deprecation

The four existing exported helpers keep their (chan string, error) signatures (zero source-level breakage) and are marked // Deprecated: with the migration recipe. Their bodies now pump from the new iterator into the channel, so behaviour is byte-for-byte identical and the existing TestReadFile* tests pass unmodified.

Deprecated Replacement
ReadFile(path) Lines(path)
ReadFileWithBufferSize(path, n) Lines(path, WithBufferSize(n))
ReadFileWithReader(r) LinesReader(r)
ReadFileWithReaderAndBufferSize(r, n) LinesReader(r, WithBufferSize(n))

The ReadFileSplit / ReadFileWithReaderSplit / splitLineByRunes helpers from the earlier revision of this PR are removed; their functionality is now expressed as Lines(path, WithSplit(','), WithTrimSpace(), WithSkipEmpty()).

Tests

  • New file/lines_test.go (14 tests): each option in isolation, the resolver-file end-to-end scenario, missing-file error path, early-break behaviour, scanner-error propagation through LinesReader, and reader-side equivalents.
  • The original file/file_split_test.go was removed along with the helpers it covered.
  • Existing TestReadFile, TestReadFileWithBufferSize, TestReadFileWithReader, TestReadFileWithReaderAndBufferSize are unchanged and still pass — they validate that the deprecated wrappers preserve exact prior behaviour (including emitting empty lines and not trimming).
$ go vet ./file/...
$ go test ./file/... -count=1
ok  	github.com/projectdiscovery/utils/file	19.1s

Notes for downstream tools

After this lands, httpx / nuclei / subfinder etc. can migrate at their own pace:

  1. Replace for line := range fileutil.ReadFile(path)for line, err := range fileutil.Lines(path).
  2. For resolver-file parsing, drop any local "split on comma + trim" helper in favour of Lines(path, WithSplit(','), WithTrimSpace(), WithSkipEmpty()).

No flag-day required; the deprecated channel functions stay until consumers have moved.

…very#719)

When a tool reads a list of values from a file (resolvers, wordlists,
etc.) it's common for users to mix one-per-line and comma-separated
forms on the same line. The two new helpers stream non-empty values from
the file/reader, splitting each scanned line on the supplied runes and
trimming whitespace; passing no separators reduces to the existing
ReadFile/ReadFileWithReader behaviour with TrimSpace applied.

Closes projectdiscovery#719
@dogancanbakir dogancanbakir requested a review from Mzack9999 May 6, 2026 09:49
@neo-by-projectdiscovery-dev
Copy link
Copy Markdown

neo-by-projectdiscovery-dev Bot commented May 8, 2026

Neo - PR Security Review

No security issues found

Comment @pdneo help for available commands. · Open in Neo

@Mzack9999 Mzack9999 changed the title feat(file): add ReadFileSplit / ReadFileWithReaderSplit (#719) refactor(file): composable Lines iterator; deprecate channel-based ReadFile helpers May 8, 2026
@Mzack9999 Mzack9999 merged commit c0f3544 into projectdiscovery:main May 8, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support comma-separated lines

2 participants