How to Install pyspark in Python

v4.1.1 General Purpose Python >=3.10 Apache-2.0

Apache Spark Python API

Install pip install pyspark

What is pyspark?

Apache Spark Python API

Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, pandas API on Spark for pandas workloads, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing.

You can find the latest Spark documentation, including a programming guide, on the project web page

This README file only contains basic information related to pip installed PySpark. This packaging is currently experimental and may change in future versions (although we will do our best to keep compatibility). Using PySpark requires the Spark JARs, and if you are building this from source please see the builder instructions at "Building Spark".

Quick Start

Minimal example to get started with pyspark:

import pyspark

print(pyspark.__version__)

Installation

pip (standard)

pip install pyspark

Virtual environment (recommended)

python -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate
pip install pyspark

pip3

pip3 install pyspark

conda

conda install -c conda-forge pyspark

Poetry

poetry add pyspark

Dependencies

Installing pyspark will also install these packages:

Verify the Installation

After installing, confirm the package is available:

python -c "import pyspark; print(pyspark.__version__)"

If this prints a version number, installation succeeded. If you see a ModuleNotFoundError, see the errors section below.

Installation Errors

Common errors when installing pyspark with pip.

ModuleNotFoundError: No module named 'pyspark'

Cause: The package is not installed in the current Python environment.

Fix: Run pip install pyspark. If using a virtual environment, ensure it is activated first.

ModuleNotFoundError: No module named 'pyspark' (installed but still failing)

Cause: pip installed the package into a different Python than the one running your script.

Fix: Use python -m pip install pyspark to install into the interpreter you are running.

ImportError: cannot import name 'X' from 'pyspark'

Cause: The function or class does not exist in the installed version.

Fix: Check the version with pip show pyspark and upgrade with pip install --upgrade pyspark.

pip: command not found

Cause: pip is not in PATH or Python was not added to PATH during installation.

Fix: Try python -m pip install pyspark. On macOS/Linux try pip3.

PermissionError: [Errno 13] Permission denied

Cause: No write access to the system Python package directory.

Fix: Use a virtual environment, or add --user: pip install --user pyspark

SSL: CERTIFICATE_VERIFY_FAILED

Cause: pip cannot verify PyPI's SSL certificate — common behind corporate proxies.

Fix: Try: pip install --trusted-host pypi.org --trusted-host files.pythonhosted.org pyspark

Recent Releases

VersionReleased
4.2.0.dev4 2026-04-10
4.2.0.dev3 2026-03-12
4.2.0.dev2 2026-02-08
4.0.2 2026-02-05
3.5.8 2026-01-15

Full release history on PyPI →

Manage pyspark

Upgrade to latest version

pip install --upgrade pyspark

Install a specific version

pip install pyspark==4.1.1

Uninstall

pip uninstall pyspark

Check what is installed

pip show pyspark

Last updated: 2026-04-11 • Data from PyPI