Tokenizers-using-HuggingFace

A hands-on guide to exploring Hugging Face tokenizers across popular LLMs like LLaMA, PHI-3, and StarCoder2. This project demonstrates how to encode, decode, and format text, code, and chat-style messages for large language models.

📌 Features

🔄 Encode and decode text with various tokenizers
💬 Format multi-turn chat prompts using chat templates
🧠 Compare tokenization outputs across models
🧪 Visualize individual tokens and their IDs
🧰 Supports models like:
- meta-llama/Meta-Llama-3.1-8B-Instruct
- microsoft/phi-3-mini-4k-instruct
- bigcode/starcoder2-15b

📂 Folder Structure

Tokenizers-using-HuggingFace/
├── Tokenizers_using_HuggingFace.ipynb
└── README.md

🚀 Getting Started

1. Clone the repository

git clone https://github.com/your-username/Tokenizers-using-HuggingFace.git
cd Tokenizers-using-HuggingFace

2. Install dependencies

pip install transformers

Optional for some models:

pip install torch
pip install sentencepiece

🧪 Example Usage

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct", trust_remote_code=True)
text = "I love exploring tokenizers!"
tokens = tokenizer.encode(text)
decoded = tokenizer.batch_decode(tokens)

print(tokens)
print(decoded)

🧠 License

This project is open-source and available under the MIT License.

🤝 Contributing

Contributions, suggestions, and improvements are welcome! Feel free to open an issue or submit a pull request.

📬 Contact

Created by Rishi Kora (https://github.com/Rishi-Kora) – feel free to reach out with questions or ideas!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
Tokenizers_using_HuggingFace.ipynb		Tokenizers_using_HuggingFace.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tokenizers-using-HuggingFace

📌 Features

📂 Folder Structure

🚀 Getting Started

1. Clone the repository

2. Install dependencies

🧪 Example Usage

🧠 License

🤝 Contributing

📬 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Tokenizers-using-HuggingFace

📌 Features

📂 Folder Structure

🚀 Getting Started

1. Clone the repository

2. Install dependencies

🧪 Example Usage

🧠 License

🤝 Contributing

📬 Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages