A simple and efficient token count program written in Rust! ๐
English | ็ฎไฝไธญๆ | ็น้ซไธญๆ | ๆฅๆฌ่ช | ํ๊ตญ์ด | Deutsch
This Rust implementation of the classic wc (word count) command-line tool allows you to count lines, words, characters, and even tokens in text files or from standard input. It's fast, reliable, and supports Unicode! ๐โจ
- Count lines ๐
- Count words ๐ค
- Count characters (including multi-byte Unicode characters) ๐ก
- Count tokens using various tokenizer models ๐ข
- Process multiple files ๐
- Read from standard input ๐ฅ๏ธ
- Supports various languages (English, Korean, Japanese, and more!) ๐
There are two ways to install tc:
-
Make sure you have Rust installed on your system. If not, get it from rust-lang.org ๐ฆ
-
Clone this repository:
git clone https://github.com/guuzaa/tc.git cd tc -
Build the project:
cargo build --release -
The executable will be available at
target/release/tc
-
Go to the Releases page of the tc repository.
-
Download the latest release for your operating system and architecture.
-
Extract the downloaded archive.
-
Move the
tcexecutable to a directory in your system's PATH (e.g.,/usr/local/binon Unix-like systems). -
You can now use tc from anywhere in your terminal!
-l, --lines: Show line count ๐-w, --words: Show word count ๐ค-c, --chars: Show character count ๐ก-t, --tokens: Show token count ๐ข--model <MODEL>: Choose tokenizer model (default: gpt3)
Available models:
gpt3: r50k_baseedit: p50k_editcode: p50k_basechatgpt: cl100k_basegpt4o: o200k_base
If no options are specified, all counts (lines, words, characters, and tokens) will be shown.
-
Count lines, words, and characters in a file:
tc example.txt -
Count only words in multiple files:
tc -w file1.txt file2.txt file3.txt -
Count lines and characters from standard input:
echo "Hello, World!" | tc -lc -
Count tokens using the ChatGPT tokenizer:
tc -t --model chatgpt example.txt -
Count everything in files with different languages:
tc english.txt korean.txt japanese.txt
Contributions are welcome! Feel free to submit issues or pull requests. ๐
This project is licensed under the MIT License. See the LICENSE file for details. ๐
- The Rust community for their amazing tools and support ๐ฆโค๏ธ
- The original Unix
wccommand for inspiration ๐ฅ๏ธ - The editor Cursor ๐ค
Happy counting! ๐๐๐