# Working with YAML Files in Python

YAML is a widely used format for configuration files due to its readable syntax and support for nested data through indentation. It’s ideal for defining settings, serializing data, and sharing structured information between systems.

Python provides reliable libraries that make it easy to read, write, and modify YAML files, helping you integrate configuration data directly into your applications.

This guide covers how to work with YAML in Python, starting with basic parsing and moving into more advanced data manipulation.

[ad-logs]

## Prerequisites

Make sure you have Python 3.7 or later installed. You should know Python basics like dictionaries, lists, and file operations to get the most from this guide.

## Getting started with YAML in Python

YAML support isn't included in Python's standard library. You'll need to use the PyYAML package, which provides comprehensive functionality for YAML processing.

Working with YAML in Python is straightforward. You parse YAML files into Python objects, manipulate the data as needed, and then serialize it back to YAML when you want to save changes.

![Diagram showing how python works with YAML](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/d1f9aa24-71c1-4288-a41e-7ce43c87ab00/md1x =2014x1194)

To begin, create a project directory:

```command
mkdir yaml-python && cd yaml-python
```

Next, set up a virtual environment:

```command
python3 -m venv venv
```

Activate the virtual environment:

```command
source venv/bin/activate
```

Install the PyYAML package:

```command
pip install pyyaml
```

Now, create a file named `app.py` that you'll use throughout this tutorial.

## Reading YAML files in Python

Loading data from YAML files is typically your first step when working with configurations. The PyYAML library makes this process straightforward.

First, create a sample YAML file named `config.yaml` with this data:

```text
[label config.yaml]
# Server configuration
server:
  host: 192.168.1.100
  port: 8080
  debug: true
  workers: 4
  allowed_origins:
    - https://admin.example.com
    - https://app.example.com

# Database settings
database:
  connection_string: "postgresql://user:password@localhost:5432/mydb"
  pool_size: 5
  timeout: 30

# Logging configuration
logging:
  level: INFO
  file: "/var/log/application.log"
  rotation: daily
```

Now add this code to your `app.py` file:

```python
[label app.py]
import yaml

# Reading a YAML file
with open('config.yaml', 'r') as file:
    config = yaml.safe_load(file)

    # Print the whole configuration
    print("Complete configuration:")
    print(config)

    # Access specific sections and values
    print("\nServer configuration:")
    print(f"  Host: {config['server']['host']}")
    print(f"  Port: {config['server']['port']}")
    print(f"  Debug mode: {config['server']['debug']}")

    # Access database settings
    print("\nDatabase settings:")
    print(f"  Connection: {config['database']['connection_string']}")
    print(f"  Pool size: {config['database']['pool_size']}")
```

The `yaml.safe_load()` function parses the YAML content and converts it into Python objects. YAML's structure naturally maps to Python dictionaries, lists, and scalar values.

Run your script:

```command
python app.py
```

```text
[output]
Complete configuration:
{'server': {'host': '192.168.1.100', 'port': 8080, 'debug': True, 'workers': 4, 'allowed_origins': ['https://admin.example.com', 'https://app.example.com']}, 'database': {'connection_string': 'postgresql://user:password@localhost:5432/mydb', 'pool_size': 5, 'timeout': 30}, 'logging': {'level': 'INFO', 'file': '/var/log/application.log', 'rotation': 'daily'}}

Server configuration:
  Host: 192.168.1.100
  Port: 8080
  Debug mode: True

Database settings:
  Connection: postgresql://user:password@localhost:5432/mydb
  Pool size: 5
```

Notice how PyYAML automatically converts YAML's native data types to their Python equivalents: strings remain strings, numbers become integers or floats, and boolean values like `true` become Python's `True`.

![Diagram showing direct mapping between YAML syntax and Python data types. The left column shows YAML examples including strings, integers, booleans, lists, and nested objects, while the right column shows the equivalent Python representation. Arrows connect each YAML example to its corresponding Python code.](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/b2c44c94-be1a-42d5-73f8-fc1cd1b2b000/lg2x =1896x1320)

This automatic type conversion means you can immediately use Python's native operations on the data without additional type casting or conversion steps.

## Modifying and writing YAML files

Once you've loaded a YAML file into Python, you can modify the data just like any regular Python dictionary or list. This allows you to update settings, add new values, or even generate entirely new configurations dynamically. After making changes, you can write the updated data back to a file using `yaml.safe_dump()`.

Let’s modify the existing server configuration by changing the number of worker processes and adding a new setting under the `logging` section.

Extend your `app.py` with the following code:

```python
[label app.py]
import yaml

# Reading a YAML file
with open('config.yaml', 'r') as file:
    ...
    print(f"  Pool size: {config['database']['pool_size']}")
[highlight]
# Modify values
config['server']['workers'] = 6  # Increase worker count
config['logging']['retention_days'] = 7  # Add a new logging setting

# Write the updated config to a new YAML file
with open('updated_config.yaml', 'w') as file:
    yaml.safe_dump(config, file, sort_keys=False)
[/highlight]
```

The `yaml.safe_dump()` function converts Python objects back into a properly formatted YAML string and writes it to the specified file.

The `sort_keys=False` argument ensures that key order in the output matches the original structure, which is often important for readability in configuration files.

Run the script:

```command
python app.py
```

This will create a new file named `updated_config.yaml` with the updated content:

```yaml
server:
  host: 192.168.1.100
  port: 8080
  debug: true
  workers: 6
  allowed_origins:
  - https://admin.example.com
  - https://app.example.com
database:
  connection_string: postgresql://user:password@localhost:5432/mydb
  pool_size: 5
  timeout: 30
logging:
  level: INFO
  file: /var/log/application.log
[highlight]
  rotation: daily
  retention_days: 7
[/highlight]
```

As you can see, the `workers` count has been updated, and a new `retention_days` field has been added under the `logging` section.

## Creating YAML files from scratch

In some cases, you may need to generate a YAML file without starting from an existing one. This is common when exporting configuration settings, building templates, or generating output based on user input or API data.

Since YAML maps naturally to Python dictionaries and lists, you can define your data structure in code and write it directly to a file using `yaml.safe_dump()`.

Clear the `app.py` contents and add the following code to the file:

```python
[label app.py]
import yaml

site_config = {
    'site': {
        'name': 'MyBlog',
        'url': 'https://myblog',
        'maintenance_mode': False,
        'features': {'comments': True, 'search': True, 'newsletter': False}
    },
    'admin': {
        'username': 'admin_user',
        'email': 'admin@mail.com',
        'notifications': ['errors', 'updates']
    },
    'theme': {
        'name': 'minimal',
        'version': '2.1.0',
        'custom_css': './themes/minimal/custom.css'
    }
}

with open('site_config.yaml', 'w') as file:
    yaml.safe_dump(site_config, file, sort_keys=False)

print("YAML file created.")
```

In this code, you define a structured configuration for a website using a Python dictionary. The data includes `site` settings, `admin` preferences, and `theme` customization options.

The `yaml.safe_dump()` function then serializes this structure and writes it to a file named `site_config.yaml`, preserving formatting and key order for readability.

Run the script:

```command
python app.py
```

This will create a file named `site_config.yaml` with the following content:

```yaml
site:
  name: MyBlog
  url: https://myblog
  maintenance_mode: false
  features:
    comments: true
    search: true
    newsletter: false
admin:
  username: admin_user
  email: admin@mail.com
  notifications:
    - errors
    - updates
theme:
  name: minimal
  version: 2.1.0
  custom_css: ./themes/minimal/custom.css
```

Defining configuration data in Python and writing it to a YAML file lets you generate clean, structured output programmatically.

This approach works well in automation workflows, deployment scripts, and any situation where configuration needs to be exported or shared.

## Working with lists and nested data

YAML files often contain lists and deeply nested structures to represent items such as user roles, enabled features, plugins, or settings. PyYAML converts these YAML elements into Python lists and dictionaries, so you can work with them using standard Python operations.

Let’s extend the example by adding a list of plugins and nested theme settings in the YAML file.

Update your `site_config.yaml` with the following content:

```yaml
[label site_config.yaml]
site:
  name: MyBlog
  features:
    comments: true
    search: true
    newsletter: false
  plugins:
    - name: analytics
      enabled: true
    - name: seo
      enabled: false
theme:
  name: minimal
  settings:
    color: dark
    font: sans-serif
```

Now update your `app.py` file to read and modify this nested structure:

```python
[label app.py]
import yaml

with open('site_config.yaml', 'r') as file:
    config = yaml.safe_load(file)

# Access a list of plugins
print("Installed plugins:")
for plugin in config['site']['plugins']:
    status = "enabled" if plugin['enabled'] else "disabled"
    print(f"  - {plugin['name']} ({status})")

# Modify a nested value
config['theme']['settings']['color'] = 'light'

# Add a new plugin to the list
config['site']['plugins'].append({'name': 'backup', 'enabled': True})

# Write changes to a new file
with open('updated_site_config.yaml', 'w') as file:
    yaml.safe_dump(config, file, sort_keys=False)

print("Updated configuration saved.")
```

In this code:

- `config['site']['plugins']` accesses the list of plugin dictionaries.
- The `for` loop iterates through that list, reading each plugin’s `name` and `enabled` status.
- `config['theme']['settings']['color'] = 'light'` updates a nested dictionary value directly using key access.
- The `append()` method adds a new dictionary to the existing list of plugins.
- `yaml.safe_dump()` is used again to serialize the updated data structure and write it to a new YAML file.

This approach demonstrates how YAML's nested data structures map cleanly to Python’s native data types, enabling you to make modifications using familiar syntax. Whether you're toggling a feature flag or inserting a new configuration block, you’re just interacting with Python dictionaries and lists.

Run the script:

```command
python app.py
```

```text
[output]
Installed plugins:
  - analytics (enabled)
  - seo (disabled)
Updated configuration saved.
```

After running, open `updated_site_config.yaml` and you’ll see the modified `color` value and the new `backup` plugin added at the end of the list:

```text
[label updated_site_config.yaml]
site:
  name: MyBlog
  features:
    comments: true
    search: true
    newsletter: false
  plugins:
  - name: analytics
    enabled: true
  - name: seo
    enabled: false
[highlight]
  - name: backup
    enabled: true
[/highlight]
theme:
  name: minimal
  settings:
[highlight]
    color: light
[/highlight]
    font: sans-serif
```

Understanding how to work with lists and nested data is essential for managing real-world YAML files, which often include multiple layers of configuration.

## Validating YAML with PyKwalify

When working with configuration files, it is essential to verify the structure and data types are correct before using them in your application. PyKwalify is a tool that allows you to validate YAML files using a schema written in YAML itself.

First, install PyKwalify:

```command
pip install pykwalify
```

Next, create a schema file named `site_schema.yaml` to define the expected structure:

```yaml
[label site_schema.yaml]
type: map
mapping:
  site:
    type: map
    mapping:
      name:
        type: str
        required: true
      url:
        type: str
        required: true
      maintenance_mode:
        type: bool
        required: true
      features:
        type: map
        required: true
        mapping:
          comments: {type: bool, required: true}
          search: {type: bool, required: true}
          newsletter: {type: bool, required: true}
      plugins:
        type: seq
        required: true
        sequence:
          - type: map
            mapping:
              name: {type: str, required: true}
              enabled: {type: bool, required: true}
  admin:
    type: map
    mapping:
      username: {type: str, required: true}
      email: {type: str, required: true}
      notifications:
        type: seq
        sequence:
          - type: str
  theme:
    type: map
    mapping:
      name: {type: str, required: true}
      version: {type: str, required: true}
      custom_css: {type: str, required: true}
```

This schema defines the required keys and their expected types in your configuration. It ensures:

- The top-level keys (`site`, `admin`, `theme`) are present.
- Nested fields like `site.name` and `theme.version` are strings.
- Booleans are used where expected (e.g., `maintenance_mode`, `features.comments`).
- Lists like `plugins` and `notifications` contain properly structured items.

With this schema in place, PyKwalify can verify that your YAML configuration file has the correct structure and data types before your code uses it.

Now, validate your YAML file using the following Python code in `app.py`:

```python
[label app.py]
from pykwalify.core import Core

# Validate site_config.yaml against the schema
core = Core(source_file="site_config.yaml", schema_files=["site_schema.yaml"])

try:
    core.validate()
    print("YAML file is valid.")
except Exception as e:
    print("Validation failed:")
    print(e)
```

Here is how it works:

- `Core(source_file, schema_files)` loads both your YAML data and its corresponding schema.
- `core.validate()` checks whether the YAML file meets the schema’s rules.
- If validation fails, PyKwalify raises an exception with a clear explanation of what went wrong.

Now run the file:

```command
python app.py
```

If your YAML file is missing required fields or contains unexpected keys, you'll see output like this:

```text
[output]
validation.invalid
 --- All found errors ---
["Cannot find required key 'url'. Path: '/site'", "Cannot find required key 'maintenance_mode'. Path: '/site'", "Cannot find required key 'version'. Path: '/theme'", "Cannot find required key 'custom_css'. Path: '/theme'", "Key 'settings' was not defined. Path: '/theme'"]
Validation failed:
<SchemaError: error code 2: Schema validation failed:
 - Cannot find required key 'url'. Path: '/site'.
 - Cannot find required key 'maintenance_mode'. Path: '/site'.
 - Cannot find required key 'version'. Path: '/theme'.
 - Cannot find required key 'custom_css'. Path: '/theme'.
 - Key 'settings' was not defined. Path: '/theme'.: Path: '/'>
```

To fix these validation errors, ensure your `site_config.yaml` file includes all required fields and doesn't have any keys not defined in the schema:

```yaml
[label site_config.yaml]
site:
  name: "MyBlog"
  url: "https://myblog.example.com"
  maintenance_mode: false
  features:
    comments: true
    search: true
    newsletter: false
  plugins:
    - name: "analytics"
      enabled: true
admin:
  username: "admin_user"
  email: "admin@example.com"
  notifications:
    - "errors"

theme:
  name: "minimal"
  version: "2.1.0"
  custom_css: "./themes/minimal/custom.css"
```

After correcting your YAML file, running the validation again should produce:

```text
[output]
YAML file is valid.
```

Using PyKwalify ensures your configuration files are complete, well-structured, and consistent with your application's expectations.

This validation step is especially valuable in larger projects, automated pipelines, or systems where incorrect configuration can lead to deployment issues or runtime failures.

## Final thoughts

YAML provides a clear and structured format for configuration files, and Python makes it easy to work with that format using libraries like PyYAML and PyKwalify. With these tools, you can read, modify, create, and validate YAML data directly in your applications.

Adding validation ensures that your configuration files are accurate before use, helping prevent errors in automated processes and deployments.

For more details, refer to the official [PyYAML documentation](https://pyyaml.org/wiki/PyYAMLDocumentation) and [PyKwalify documentation](https://github.com/Grokzen/pykwalify).
