How to Install tokenizers in Python

v0.22.2 Data & Science Python >=3.9
Install pip install tokenizers

What is tokenizers?

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility.

Bindings over the Rust implementation. If you are interested in the High-level design, you can go check it there.

- Train new vocabularies and tokenize using 4 pre-made tokenizers (Bert WordPiece and the 3 most common BPE versions). - Extremely fast (both training and tokenization), thanks to the Rust implementation. Takes less than 20 seconds to tokenize a GB of text on a server's CPU. - Easy to use, but also extremely versatile. - Designed for research and production. - Normalization comes with alignments tracking. It's always possible to get the part of the original sentence that corresponds to a given token. - Does all the pre-processing: Truncate, Pad, add the special tokens your model needs.

Quick Start

Minimal example to get started with tokenizers:

import tokenizers

print(tokenizers.__version__)

Installation

pip (standard)

pip install tokenizers

Virtual environment (recommended)

python -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate
pip install tokenizers

pip3

pip3 install tokenizers

conda

conda install -c conda-forge tokenizers

Poetry

poetry add tokenizers

Dependencies

Installing tokenizers will also install these packages:

Verify the Installation

After installing, confirm the package is available:

python -c "import tokenizers; print(tokenizers.__version__)"

If this prints a version number, installation succeeded. If you see a ModuleNotFoundError, see the errors section below.

Installation Errors

Common errors when installing tokenizers with pip.

ModuleNotFoundError: No module named 'tokenizers'

Cause: The package is not installed in the current Python environment.

Fix: Run pip install tokenizers. If using a virtual environment, ensure it is activated first.

ModuleNotFoundError: No module named 'tokenizers' (installed but still failing)

Cause: pip installed the package into a different Python than the one running your script.

Fix: Use python -m pip install tokenizers to install into the interpreter you are running.

ImportError: cannot import name 'X' from 'tokenizers'

Cause: The function or class does not exist in the installed version.

Fix: Check the version with pip show tokenizers and upgrade with pip install --upgrade tokenizers.

pip: command not found

Cause: pip is not in PATH or Python was not added to PATH during installation.

Fix: Try python -m pip install tokenizers. On macOS/Linux try pip3.

PermissionError: [Errno 13] Permission denied

Cause: No write access to the system Python package directory.

Fix: Use a virtual environment, or add --user: pip install --user tokenizers

SSL: CERTIFICATE_VERIFY_FAILED

Cause: pip cannot verify PyPI's SSL certificate — common behind corporate proxies.

Fix: Try: pip install --trusted-host pypi.org --trusted-host files.pythonhosted.org tokenizers

MemoryError when loading data

Cause: Dataset is too large to fit in RAM.

Fix: Read in chunks, filter columns on load, or consider Polars/Dask for out-of-core processing.

Recent Releases

VersionReleased
0.22.2 latest 2026-01-05
0.22.2rc0 2025-12-02
0.22.1 2025-09-19
0.22.1rc0 2025-09-19
0.22.0 2025-08-29

Full release history on PyPI →

Manage tokenizers

Upgrade to latest version

pip install --upgrade tokenizers

Install a specific version

pip install tokenizers==0.22.2

Uninstall

pip uninstall tokenizers

Check what is installed

pip show tokenizers

Last updated: 2026-04-11 • Data from PyPI