# A Complete Guide to Pydantic

Pydantic is a popular data validation and serialization library for Python. It enforces type hints at runtime, ensuring that your data conforms to the expected structure while offering high performance. 

Pydantic provides all the essential features expected in a data validation library, such as strict type enforcement, field constraints, custom validation rules, and serialization options. 

It also stands out with its ease of use and flexibility, allowing developers to define models effortlessly while ensuring data integrity.

This article will guide you through getting started with Pydantic, defining models, validating data, customizing fields, and handling advanced use cases. 


## Prerequisites

Before diving into Pydantic, ensure you have a recent version of Python (3.13 or higher) installed on your machine. Pydantic takes advantage of modern Python features like type hints and dataclasses, so having an up-to-date environment is essential.

You can check your Python version by running:


```command
python3 --version
```
```text
[output]
Python 3.13.2
```


## Setting up the project directory

In this section, you'll set up a project directory and create a virtual environment before installing dependencies. Using a virtual environment helps keep your project dependencies isolated and organized.



First, create a new directory for your project and navigate into it:

```command
mkdir pydantic-demo && cd pydantic-demo
```

Next, create a virtual environment inside your project directory:

```command
python3 -m venv venv
```

Then, activate the virtual environment:

```command
source venv/bin/activate 
```

Once the virtual environment is activated, install the latest version of Pydantic using `pip`:


```command
pip install pydantic
```


To ensure Pydantic is installed correctly, open a Python shell:

```command
python
```
Then, try importing `BaseModel` from Pydantic:

```python
>>>from pydantic import BaseModel
```

If this command runs without errors, Pydantic is successfully installed!

You can now exit the Python shell by typing:

```command
exit()
```
Now that Pydantic is installed, you can use it in your project.



## Getting started with Pydantic

Pydantic is a data validation and settings management library for Python that makes it easy to enforce data types, constraints, and serialization rules.

At its core, Pydantic leverages Python type hints to define structured data models, ensuring data integrity with minimal effort.


Let's start with a simple example. Create a new Python file, `main.py`, and define your first Pydantic model:

```python
[label main.py]
from pydantic import BaseModel


class User(BaseModel):
    name: str
    age: int


user = User(name="Alice", age=30)
print(user)
```

In this example, you define a `User` model with two fields: `name` as a string and `age` as an integer. The model inherits from `BaseModel`, which provides the validation and serialization functionality.

Execute the script with:

```command
python main.py
```

You will see output that looks like:

```text
[output]
name='Alice' age=30
```

Notice that Pydantic automatically converts and validates the data according to the specified types.

Now, let's see what happens when incorrect data is provided. Modify the script to introduce incorrect data:

```python
[label main.py]

[highlight]
user = User(name="Alice", age="thirty")
[/highlight]
print(user)
```

When you run this, Pydantic will raise a validation error:

```text
[output]
...
pydantic_core._pydantic_core.ValidationError: 1 validation error for User
age
  Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='thirty', input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/int_parsing
```

Pydantic ensures that **only valid data** is accepted, reducing potential errors in your application.

Pydantic is also smart enough to **coerce some types automatically**:

```python
[label main.py]

[highlight]
user = User(name="Alice", age="30")  # Age is a string, not an int
[/highlight]
print(user)
```

When you run the script, you'll see:

```text
[output]
name='Alice' age=30
```

Even though `"30"` was a string, Pydantic converted it to an integer automatically.


Now that you've seen how to create basic models and validate data, you can proceed to the next section to explore field validation and constraints for even more precise data control.


## Field validation and constraints  

When working with structured data, enforcing rules that ensure data integrity and consistency is important. Field validation allows you to define restrictions on input data, preventing invalid or unexpected values from entering your system.  

Pydantic provides an easy way to define validation rules and constraints using the `Field` function. This helps enforce requirements and ensures that your application only processes valid data.


Update the `User` model by adding validation rules to ensure data quality so that names shouldn't be empty or unreasonably long.  Ages must be within human limits, and email addresses should follow standard formats.

You can enforce these constraints using the `Field` function:
  
```python
[label main.py]
[highlight]
from pydantic import BaseModel, Field
[/highlight]

class User(BaseModel):
[highlight]
    name: str = Field(..., min_length=2, max_length=50)  # Name must be between 2 and 50 characters
    age: int = Field(..., gt=0, lt=120)  # Age must be greater than 0 and less than 120
    email: str = Field(..., pattern=r"^\S+@\S+\.\S+$")  # Must be a valid email format

user = User(name="Alice", age=30, email="alice@example.com")
[/higlight]

print(user)
```

Each field in the `User` model uses specific validation rules: the `name` field must be between 2 and 50 characters, the `age` field must be between 0 and 120, and the `email` field must contain an @ symbol and domain.


The `...` in each `Field()` means these fields are required when creating a new `User`.

When you run the script, you'll see that Pydantic performs these validations automatically:

```text
[output]
email: str = Field(..., pattern=r"^\S+@\S+\.\S+$")  # Must be a valid email format
```


Now,  test what happens when invalid input is provided:

```python
[label main.py
[highlight]
user = User(name="A", age=-5, email="invalid-email")
[/highlight]
```

Running this will raise multiple validation errors:

```text
[output]
3 validation errors for User
name
  String should have at least 2 characters [type=string_too_short, input_value='A', input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/string_too_short
age
  Input should be greater than 0 [type=greater_than, input_value=-5, input_type=int]
    For further information visit https://errors.pydantic.dev/2.10/v/greater_than
email
  String should match pattern '^\S+@\S+\.\S+$' [type=string_pattern_mismatch, input_value='invalid-email', input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/string_pattern_mismatch
```

Defining the constraints lets you catch errors early rather than allowing bad data to cause issues later.


Another key aspect of Pydantic is that it requires all fields to be provided by default. However, in real-world applications, some fields may be optional and not always necessary.


A default value ensures that if a field is not provided, it will automatically **take on a sensible default value** instead of causing an error.  

In the `main.py` file, add an `is_active` field that defaults to `True`:
 


```python
[label main.py]
class User(BaseModel):
    name: str = Field(..., min_length=2, max_length=50)
    ...
[highlight]
    is_active: bool = True  # Default value

# Creating a user without specifying 'is_active'
user = User(name="Bob", age=25, email="bob@example.com")
[/highlight]

print(user)
```

When you run the file, the output looks like:

```text
[output]
name='Bob' age=25 email='bob@example.com' is_active=True
```

Since `is_active` wasn’t provided, it automatically defaults to `True`.  

This ensures that every user has an active status unless explicitly set otherwise.
  

In some cases, fields should be optional, meaning they can be left out entirely without causing errors.  

To make a field optional, use `Optional` from the `typing` module and set `None` as the default value:

```python
[label main.py]
from pydantic import BaseModel, Field
[highlight]
from typing import Optional
[/highlight]

class User(BaseModel):
    ....
[highlight]
    email: Optional[str] = None  # Now optional
[/highlight]
    is_active: bool = True  # Default value

# Creating a user without an email
[highlight]
user = User(name="Charlie", age=28)
[/highlight]
print(user)
```

When you run the script, you'll see:

```text
[output]
name='Charlie' age=28 email=None is_active=True
```

Since the `email` field is optional, it defaults to `None` when not provided. This makes the model more adaptable to real-world scenarios where certain fields may not always be required.

You can design more flexible and reliable data structures by incorporating validation rules, default values, and optional fields.

Next, you'll learn how to use custom validators to implement more advanced validation logic.


## Custom validators for advanced data validation  

While Pydantic provides built-in validation for basic data types and constraints, you need more complex validation logic in some cases. This is where custom validators come in.  

A custom validator allows you to define specific rules that go beyond standard type checking. 

For example, you might want to:

- Ensure a username contains only alphanumeric characters  
- Validate a password for minimum complexity requirements  
- Check that a date falls within a specific range  

With Pydantic, you can create custom validation functions using the `@field_validator` decorator.  


Extend the `User` model by adding custom validation for usernames and passwords:


```python
[label main.py]
[highlight
from pydantic import BaseModel, Field, field_validator
import re
[/highlight]

class User(BaseModel):
    name: str = Field(..., min_length=2, max_length=50)
    age: int = Field(..., gt=0, lt=120)
[highlight]
    email: str = Field(..., pattern=r"^\S+@\S+\.\S+$")
    password: str = Field(..., min_length=8)  # Password must be at least 8 characters

    @field_validator("password")
    def password_complexity(cls, value):
        """Ensure password has at least one uppercase letter, one lowercase letter, and one number."""
        if not (
            re.search(r"[A-Z]", value)
            and re.search(r"[a-z]", value)
            and re.search(r"\d", value)
        ):
            raise ValueError(
                "Password must contain at least one uppercase letter, one lowercase letter, and one number"
            )
        return value


# Valid user
user = User(name="Alice123", age=30, email="alice@example.com", password="Secure123")
[/highlight]
print(user)
```
The `@field_validator("password")` function ensures that the password meets complexity requirements by requiring at least one uppercase letter, one lowercase letter, and one number.


When you run the file with valid input, you'll see the following output:

```text
[output]
name='Alice123' age=30 email='alice@example.com' password='Secure123'
```


Now, test what happens when an invalid password is provided:

```python
[label main.py]
[highlight]
user = User(name="Alice!", age=25, email="alice@example.com", password="weakpass")
[/highlight]
print(user)

```

Since `"weakpass"` lacks an uppercase letter and a number, Pydantic raises a validation error:

```text
[output]
1 validation error for User
password
  Value error, Password must contain at least one uppercase letter, one lowercase letter, and one number [type=value_error, input_value='weakpass', input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/value_error
```

This ensures that only correctly formatted data enters the system, reducing the risk of errors and security vulnerabilities.

Now that you've mastered custom validators, it's time to explore data serialization and transformation, converting Pydantic models into dictionaries and JSON.


## Data serialization and transformation

Beyond validation, Pedantic provides serialization and transformation features, allowing you to convert models into different formats such as dictionaries, JSON, and custom data structures. 

This is useful when storing data in databases, sending API responses, or interacting with external services.

Update the `main.py` with the following code:

```python
[label main.py]
from pydantic import BaseModel, Field

class User(BaseModel):
    name: str = Field(..., min_length=2, max_length=50)
    age: int = Field(..., gt=0, lt=120)
    email: str = Field(..., pattern=r"^\S+@\S+\.\S+$")
    is_active: bool = True

user = User(name="Alice", age=30, email="alice@example.com")

# Convert to dictionary
user_dict = user.model_dump()
print(user_dict)
```

Most of these concepts should feel familiar, except for the `user.model_dump()`. The `.model_dump()` method converts the model into a native Python dictionary while preserving all field values.


When you run this program, you'll see:

```text
{'name': 'Alice', 'age': 30, 'email': 'alice@example.com', 'is_active': True}
```


Another neat feature is the ability to serialize data into JSON format, which is essential when working with web APIs or storing data in document databases. 

To convert your model to JSON, update your code with the `model_dump_json()` method:


```python
[label main.py]
....
user = User(name="Alice", age=30, email="alice@example.com")

# Convert to JSON
[highlight]
user_json = user.model_dump_json()
print(user_json)
[/highlight]
```
The `.model_dump_json()` method automatically handles the conversion of Python types to their JSON equivalents, including proper formatting of boolean values (like `true` instead of `True`) and ensuring the output is a valid JSON string.

Running the file produces:

```json
[output]
{"name":"Alice","age":30,"email":"alice@example.com","is_active":true}
```

Pydantic also allows you to filter fields during serialization. This is particularly useful when hiding sensitive information or customizing the output for different purposes.


Update the code to incorporate field filtering:

```python
[label main.py]
...
user = User(name="Alice", age=30, email="alice@example.com")

[highlight]
# Exclude specific fields
filtered_dict = user.model_dump(exclude={"is_active"})
print(filtered_dict)

# Include only specific fields
partial_json = user.model_dump_json(include={"name", "email"})
print(partial_json)
[/highlight]
```

The `exclude` parameter removes specified fields from the output, while `include` lets you select only the fields you want to keep. 

Running this shows:

```text
{'name': 'Alice', 'age': 30, 'email': 'alice@example.com'}
{"name":"Alice","email":"alice@example.com"}
```

As you can see, the first output excludes the `is_active` field, while the second output includes only the `name` and `email` fields.



Now that you can serialize Pydantic models into dictionaries and JSON while filtering specific fields, you’re ready to explore JSON Schema generation.

## Working with JSON Schemas in Pydantic

Pydantic not only helps with data validation and serialization but also provides JSON Schema generation for your models. JSON Schema is a widely used format for defining JSON data's structure, validation rules, and constraints. 


Pydantic allows you to automatically generate JSON Schema representations of your models, ensuring that your data format is well-defined and compatible with external applications.


Pydantic models come with a built-in method called `.model_json_schema()`, which allows you to generate the corresponding JSON Schema for any model.

To see how this works, update your `main.py` file:

```python
[label main.py]
from pydantic import BaseModel, Field

class User(BaseModel):
    name: str = Field(..., min_length=2, max_length=50)
    age: int = Field(..., gt=0, lt=120)
    email: str = Field(..., pattern=r"^\S+@\S+\.\S+$")
    is_active: bool = True

user_schema = User.model_json_schema()
print(user_schema)
```

When you run this script, you'll see output similar to:

```json
[output]
{
    "title": "User",
    "type": "object",
    "properties": {
        "name": {
            "type": "string",
            "title": "Name",
            "minLength": 2,
            "maxLength": 50
        },
        "age": {
            "type": "integer",
            "title": "Age",
            "exclusiveMinimum": 0,
            "exclusiveMaximum": 120
        },
        "email": {
            "type": "string",
            "title": "Email",
            "pattern": "^\\S+@\\S+\\.\\S+$"
        },
        "is_active": {
            "type": "boolean",
            "title": "Is Active",
            "default": true
        }
    },
    "required": ["name", "age", "email"]
}
```

This JSON Schema describes the structure of the `User` model, including field types, constraints, required fields, and default values. 


This schema can be used for API documentation, client-side validation, or schema validation in databases.


Pydantic allows customization of the generated JSON Schema. You can modify field descriptions, add metadata, and adjust the schema structure to suit your needs.

To enhance the schema with descriptions, update the `User` model:

```python
[label main.py]
...
class User(BaseModel):
[highlight]
    name: str = Field(
        ..., min_length=2, max_length=50, description="The full name of the user"
    )
    age: int = Field(..., gt=0, lt=120, description="The user's age in years")
    email: str = Field(
        ..., pattern=r"^\S+@\S+\.\S+$", description="A valid email address"
    )
    is_active: bool = Field(default=True, description="Indicates if the user is active")
[/highlight]


user_schema = User.model_json_schema()
print(user_schema)
```

This modification ensures that the JSON Schema includes descriptions, making it more informative:

```json
[output]
{
    "properties": {
        "name": {
            "description": "The full name of the user",
            ...
        },
        "age": {
            "description": "The user's age in years",
            ...
        },
        "email": {
            "description": "A valid email address",
            ...
        },
        "is_active": {
            "description": "Indicates if the user is active",
            ...
        }
    },
    ...
}
```

Adding descriptions makes the schema easier to understand when shared with others or used in API documentation.


## Final thoughts

In this article, you learned how to use Pydantic for data validation and serialization in Python. With these skills, you can now build robust data models, ensure input correctness, and simplify API development. 

For more advanced features and best practices, check out the [Pydantic Documentation](https://docs.pydantic.dev/).
