YAML is a widely used format for configuration files due to its readable syntax and support for nested data through indentation. It’s ideal for defining settings, serializing data, and sharing structured information between systems.
Python provides reliable libraries that make it easy to read, write, and modify YAML files, helping you integrate configuration data directly into your applications.
This guide covers how to work with YAML in Python, starting with basic parsing and moving into more advanced data manipulation.
Prerequisites
Make sure you have Python 3.7 or later installed. You should know Python basics like dictionaries, lists, and file operations to get the most from this guide.
Getting started with YAML in Python
YAML support isn't included in Python's standard library. You'll need to use the PyYAML package, which provides comprehensive functionality for YAML processing.
Working with YAML in Python is straightforward. You parse YAML files into Python objects, manipulate the data as needed, and then serialize it back to YAML when you want to save changes.
To begin, create a project directory:
Next, set up a virtual environment:
Activate the virtual environment:
Install the PyYAML package:
Now, create a file named app.py that you'll use throughout this tutorial.
Reading YAML files in Python
Loading data from YAML files is typically your first step when working with configurations. The PyYAML library makes this process straightforward.
First, create a sample YAML file named config.yaml with this data:
Now add this code to your app.py file:
The yaml.safe_load() function parses the YAML content and converts it into Python objects. YAML's structure naturally maps to Python dictionaries, lists, and scalar values.
Run your script:
Notice how PyYAML automatically converts YAML's native data types to their Python equivalents: strings remain strings, numbers become integers or floats, and boolean values like true become Python's True.
This automatic type conversion means you can immediately use Python's native operations on the data without additional type casting or conversion steps.
Modifying and writing YAML files
Once you've loaded a YAML file into Python, you can modify the data just like any regular Python dictionary or list. This allows you to update settings, add new values, or even generate entirely new configurations dynamically. After making changes, you can write the updated data back to a file using yaml.safe_dump().
Let’s modify the existing server configuration by changing the number of worker processes and adding a new setting under the logging section.
Extend your app.py with the following code:
The yaml.safe_dump() function converts Python objects back into a properly formatted YAML string and writes it to the specified file.
The sort_keys=False argument ensures that key order in the output matches the original structure, which is often important for readability in configuration files.
Run the script:
This will create a new file named updated_config.yaml with the updated content:
As you can see, the workers count has been updated, and a new retention_days field has been added under the logging section.
Creating YAML files from scratch
In some cases, you may need to generate a YAML file without starting from an existing one. This is common when exporting configuration settings, building templates, or generating output based on user input or API data.
Since YAML maps naturally to Python dictionaries and lists, you can define your data structure in code and write it directly to a file using yaml.safe_dump().
Clear the app.py contents and add the following code to the file:
In this code, you define a structured configuration for a website using a Python dictionary. The data includes site settings, admin preferences, and theme customization options.
The yaml.safe_dump() function then serializes this structure and writes it to a file named site_config.yaml, preserving formatting and key order for readability.
Run the script:
This will create a file named site_config.yaml with the following content:
Defining configuration data in Python and writing it to a YAML file lets you generate clean, structured output programmatically.
This approach works well in automation workflows, deployment scripts, and any situation where configuration needs to be exported or shared.
Working with lists and nested data
YAML files often contain lists and deeply nested structures to represent items such as user roles, enabled features, plugins, or settings. PyYAML converts these YAML elements into Python lists and dictionaries, so you can work with them using standard Python operations.
Let’s extend the example by adding a list of plugins and nested theme settings in the YAML file.
Update your site_config.yaml with the following content:
Now update your app.py file to read and modify this nested structure:
In this code:
config['site']['plugins']accesses the list of plugin dictionaries.- The
forloop iterates through that list, reading each plugin’snameandenabledstatus. config['theme']['settings']['color'] = 'light'updates a nested dictionary value directly using key access.- The
append()method adds a new dictionary to the existing list of plugins. yaml.safe_dump()is used again to serialize the updated data structure and write it to a new YAML file.
This approach demonstrates how YAML's nested data structures map cleanly to Python’s native data types, enabling you to make modifications using familiar syntax. Whether you're toggling a feature flag or inserting a new configuration block, you’re just interacting with Python dictionaries and lists.
Run the script:
After running, open updated_site_config.yaml and you’ll see the modified color value and the new backup plugin added at the end of the list:
Understanding how to work with lists and nested data is essential for managing real-world YAML files, which often include multiple layers of configuration.
Validating YAML with PyKwalify
When working with configuration files, it is essential to verify the structure and data types are correct before using them in your application. PyKwalify is a tool that allows you to validate YAML files using a schema written in YAML itself.
First, install PyKwalify:
Next, create a schema file named site_schema.yaml to define the expected structure:
This schema defines the required keys and their expected types in your configuration. It ensures:
- The top-level keys (
site,admin,theme) are present. - Nested fields like
site.nameandtheme.versionare strings. - Booleans are used where expected (e.g.,
maintenance_mode,features.comments). - Lists like
pluginsandnotificationscontain properly structured items.
With this schema in place, PyKwalify can verify that your YAML configuration file has the correct structure and data types before your code uses it.
Now, validate your YAML file using the following Python code in app.py:
Here is how it works:
Core(source_file, schema_files)loads both your YAML data and its corresponding schema.core.validate()checks whether the YAML file meets the schema’s rules.- If validation fails, PyKwalify raises an exception with a clear explanation of what went wrong.
Now run the file:
If your YAML file is missing required fields or contains unexpected keys, you'll see output like this:
To fix these validation errors, ensure your site_config.yaml file includes all required fields and doesn't have any keys not defined in the schema:
After correcting your YAML file, running the validation again should produce:
Using PyKwalify ensures your configuration files are complete, well-structured, and consistent with your application's expectations.
This validation step is especially valuable in larger projects, automated pipelines, or systems where incorrect configuration can lead to deployment issues or runtime failures.
Final thoughts
YAML provides a clear and structured format for configuration files, and Python makes it easy to work with that format using libraries like PyYAML and PyKwalify. With these tools, you can read, modify, create, and validate YAML data directly in your applications.
Adding validation ensures that your configuration files are accurate before use, helping prevent errors in automated processes and deployments.
For more details, refer to the official PyYAML documentation and PyKwalify documentation.