YAML is a widely used format for configuration files due to its readable syntax and support for nested data through indentation. It’s ideal for defining settings, serializing data, and sharing structured information between systems.
Python provides reliable libraries that make it easy to read, write, and modify YAML files, helping you integrate configuration data directly into your applications.
This guide covers how to work with YAML in Python, starting with basic parsing and moving into more advanced data manipulation.
Prerequisites
Make sure you have Python 3.7 or later installed. You should know Python basics like dictionaries, lists, and file operations to get the most from this guide.
Getting started with YAML in Python
YAML support isn't included in Python's standard library. You'll need to use the PyYAML package, which provides comprehensive functionality for YAML processing.
Working with YAML in Python is straightforward. You parse YAML files into Python objects, manipulate the data as needed, and then serialize it back to YAML when you want to save changes.
To begin, create a project directory:
mkdir yaml-python && cd yaml-python
Next, set up a virtual environment:
python3 -m venv venv
Activate the virtual environment:
source venv/bin/activate
Install the PyYAML package:
pip install pyyaml
Now, create a file named app.py
that you'll use throughout this tutorial.
Reading YAML files in Python
Loading data from YAML files is typically your first step when working with configurations. The PyYAML library makes this process straightforward.
First, create a sample YAML file named config.yaml
with this data:
# Server configuration
server:
host: 192.168.1.100
port: 8080
debug: true
workers: 4
allowed_origins:
- https://admin.example.com
- https://app.example.com
# Database settings
database:
connection_string: "postgresql://user:password@localhost:5432/mydb"
pool_size: 5
timeout: 30
# Logging configuration
logging:
level: INFO
file: "/var/log/application.log"
rotation: daily
Now add this code to your app.py
file:
import yaml
# Reading a YAML file
with open('config.yaml', 'r') as file:
config = yaml.safe_load(file)
# Print the whole configuration
print("Complete configuration:")
print(config)
# Access specific sections and values
print("\nServer configuration:")
print(f" Host: {config['server']['host']}")
print(f" Port: {config['server']['port']}")
print(f" Debug mode: {config['server']['debug']}")
# Access database settings
print("\nDatabase settings:")
print(f" Connection: {config['database']['connection_string']}")
print(f" Pool size: {config['database']['pool_size']}")
The yaml.safe_load()
function parses the YAML content and converts it into Python objects. YAML's structure naturally maps to Python dictionaries, lists, and scalar values.
Run your script:
python app.py
Complete configuration:
{'server': {'host': '192.168.1.100', 'port': 8080, 'debug': True, 'workers': 4, 'allowed_origins': ['https://admin.example.com', 'https://app.example.com']}, 'database': {'connection_string': 'postgresql://user:password@localhost:5432/mydb', 'pool_size': 5, 'timeout': 30}, 'logging': {'level': 'INFO', 'file': '/var/log/application.log', 'rotation': 'daily'}}
Server configuration:
Host: 192.168.1.100
Port: 8080
Debug mode: True
Database settings:
Connection: postgresql://user:password@localhost:5432/mydb
Pool size: 5
Notice how PyYAML automatically converts YAML's native data types to their Python equivalents: strings remain strings, numbers become integers or floats, and boolean values like true
become Python's True
.
This automatic type conversion means you can immediately use Python's native operations on the data without additional type casting or conversion steps.
Modifying and writing YAML files
Once you've loaded a YAML file into Python, you can modify the data just like any regular Python dictionary or list. This allows you to update settings, add new values, or even generate entirely new configurations dynamically. After making changes, you can write the updated data back to a file using yaml.safe_dump()
.
Let’s modify the existing server configuration by changing the number of worker processes and adding a new setting under the logging
section.
Extend your app.py
with the following code:
import yaml
# Reading a YAML file
with open('config.yaml', 'r') as file:
...
print(f" Pool size: {config['database']['pool_size']}")
# Modify values
config['server']['workers'] = 6 # Increase worker count
config['logging']['retention_days'] = 7 # Add a new logging setting
# Write the updated config to a new YAML file
with open('updated_config.yaml', 'w') as file:
yaml.safe_dump(config, file, sort_keys=False)
The yaml.safe_dump()
function converts Python objects back into a properly formatted YAML string and writes it to the specified file.
The sort_keys=False
argument ensures that key order in the output matches the original structure, which is often important for readability in configuration files.
Run the script:
python app.py
This will create a new file named updated_config.yaml
with the updated content:
server:
host: 192.168.1.100
port: 8080
debug: true
workers: 6
allowed_origins:
- https://admin.example.com
- https://app.example.com
database:
connection_string: postgresql://user:password@localhost:5432/mydb
pool_size: 5
timeout: 30
logging:
level: INFO
file: /var/log/application.log
rotation: daily
retention_days: 7
As you can see, the workers
count has been updated, and a new retention_days
field has been added under the logging
section.
Creating YAML files from scratch
In some cases, you may need to generate a YAML file without starting from an existing one. This is common when exporting configuration settings, building templates, or generating output based on user input or API data.
Since YAML maps naturally to Python dictionaries and lists, you can define your data structure in code and write it directly to a file using yaml.safe_dump()
.
Clear the app.py
contents and add the following code to the file:
import yaml
site_config = {
'site': {
'name': 'MyBlog',
'url': 'https://myblog',
'maintenance_mode': False,
'features': {'comments': True, 'search': True, 'newsletter': False}
},
'admin': {
'username': 'admin_user',
'email': 'admin@mail.com',
'notifications': ['errors', 'updates']
},
'theme': {
'name': 'minimal',
'version': '2.1.0',
'custom_css': './themes/minimal/custom.css'
}
}
with open('site_config.yaml', 'w') as file:
yaml.safe_dump(site_config, file, sort_keys=False)
print("YAML file created.")
In this code, you define a structured configuration for a website using a Python dictionary. The data includes site
settings, admin
preferences, and theme
customization options.
The yaml.safe_dump()
function then serializes this structure and writes it to a file named site_config.yaml
, preserving formatting and key order for readability.
Run the script:
python app.py
This will create a file named site_config.yaml
with the following content:
site:
name: MyBlog
url: https://myblog
maintenance_mode: false
features:
comments: true
search: true
newsletter: false
admin:
username: admin_user
email: admin@mail.com
notifications:
- errors
- updates
theme:
name: minimal
version: 2.1.0
custom_css: ./themes/minimal/custom.css
Defining configuration data in Python and writing it to a YAML file lets you generate clean, structured output programmatically.
This approach works well in automation workflows, deployment scripts, and any situation where configuration needs to be exported or shared.
Working with lists and nested data
YAML files often contain lists and deeply nested structures to represent items such as user roles, enabled features, plugins, or settings. PyYAML converts these YAML elements into Python lists and dictionaries, so you can work with them using standard Python operations.
Let’s extend the example by adding a list of plugins and nested theme settings in the YAML file.
Update your site_config.yaml
with the following content:
site:
name: MyBlog
features:
comments: true
search: true
newsletter: false
plugins:
- name: analytics
enabled: true
- name: seo
enabled: false
theme:
name: minimal
settings:
color: dark
font: sans-serif
Now update your app.py
file to read and modify this nested structure:
import yaml
with open('site_config.yaml', 'r') as file:
config = yaml.safe_load(file)
# Access a list of plugins
print("Installed plugins:")
for plugin in config['site']['plugins']:
status = "enabled" if plugin['enabled'] else "disabled"
print(f" - {plugin['name']} ({status})")
# Modify a nested value
config['theme']['settings']['color'] = 'light'
# Add a new plugin to the list
config['site']['plugins'].append({'name': 'backup', 'enabled': True})
# Write changes to a new file
with open('updated_site_config.yaml', 'w') as file:
yaml.safe_dump(config, file, sort_keys=False)
print("Updated configuration saved.")
In this code:
config['site']['plugins']
accesses the list of plugin dictionaries.- The
for
loop iterates through that list, reading each plugin’sname
andenabled
status. config['theme']['settings']['color'] = 'light'
updates a nested dictionary value directly using key access.- The
append()
method adds a new dictionary to the existing list of plugins. yaml.safe_dump()
is used again to serialize the updated data structure and write it to a new YAML file.
This approach demonstrates how YAML's nested data structures map cleanly to Python’s native data types, enabling you to make modifications using familiar syntax. Whether you're toggling a feature flag or inserting a new configuration block, you’re just interacting with Python dictionaries and lists.
Run the script:
python app.py
Installed plugins:
- analytics (enabled)
- seo (disabled)
Updated configuration saved.
After running, open updated_site_config.yaml
and you’ll see the modified color
value and the new backup
plugin added at the end of the list:
site:
name: MyBlog
features:
comments: true
search: true
newsletter: false
plugins:
- name: analytics
enabled: true
- name: seo
enabled: false
- name: backup
enabled: true
theme:
name: minimal
settings:
color: light
font: sans-serif
Understanding how to work with lists and nested data is essential for managing real-world YAML files, which often include multiple layers of configuration.
Validating YAML with PyKwalify
When working with configuration files, it is essential to verify the structure and data types are correct before using them in your application. PyKwalify is a tool that allows you to validate YAML files using a schema written in YAML itself.
First, install PyKwalify:
pip install pykwalify
Next, create a schema file named site_schema.yaml
to define the expected structure:
type: map
mapping:
site:
type: map
mapping:
name:
type: str
required: true
url:
type: str
required: true
maintenance_mode:
type: bool
required: true
features:
type: map
required: true
mapping:
comments: {type: bool, required: true}
search: {type: bool, required: true}
newsletter: {type: bool, required: true}
plugins:
type: seq
required: true
sequence:
- type: map
mapping:
name: {type: str, required: true}
enabled: {type: bool, required: true}
admin:
type: map
mapping:
username: {type: str, required: true}
email: {type: str, required: true}
notifications:
type: seq
sequence:
- type: str
theme:
type: map
mapping:
name: {type: str, required: true}
version: {type: str, required: true}
custom_css: {type: str, required: true}
This schema defines the required keys and their expected types in your configuration. It ensures:
- The top-level keys (
site
,admin
,theme
) are present. - Nested fields like
site.name
andtheme.version
are strings. - Booleans are used where expected (e.g.,
maintenance_mode
,features.comments
). - Lists like
plugins
andnotifications
contain properly structured items.
With this schema in place, PyKwalify can verify that your YAML configuration file has the correct structure and data types before your code uses it.
Now, validate your YAML file using the following Python code in app.py
:
from pykwalify.core import Core
# Validate site_config.yaml against the schema
core = Core(source_file="site_config.yaml", schema_files=["site_schema.yaml"])
try:
core.validate()
print("YAML file is valid.")
except Exception as e:
print("Validation failed:")
print(e)
Here is how it works:
Core(source_file, schema_files)
loads both your YAML data and its corresponding schema.core.validate()
checks whether the YAML file meets the schema’s rules.- If validation fails, PyKwalify raises an exception with a clear explanation of what went wrong.
Now run the file:
python app.py
If your YAML file is missing required fields or contains unexpected keys, you'll see output like this:
validation.invalid
--- All found errors ---
["Cannot find required key 'url'. Path: '/site'", "Cannot find required key 'maintenance_mode'. Path: '/site'", "Cannot find required key 'version'. Path: '/theme'", "Cannot find required key 'custom_css'. Path: '/theme'", "Key 'settings' was not defined. Path: '/theme'"]
Validation failed:
<SchemaError: error code 2: Schema validation failed:
- Cannot find required key 'url'. Path: '/site'.
- Cannot find required key 'maintenance_mode'. Path: '/site'.
- Cannot find required key 'version'. Path: '/theme'.
- Cannot find required key 'custom_css'. Path: '/theme'.
- Key 'settings' was not defined. Path: '/theme'.: Path: '/'>
To fix these validation errors, ensure your site_config.yaml
file includes all required fields and doesn't have any keys not defined in the schema:
site:
name: "MyBlog"
url: "https://myblog.example.com"
maintenance_mode: false
features:
comments: true
search: true
newsletter: false
plugins:
- name: "analytics"
enabled: true
admin:
username: "admin_user"
email: "admin@example.com"
notifications:
- "errors"
theme:
name: "minimal"
version: "2.1.0"
custom_css: "./themes/minimal/custom.css"
After correcting your YAML file, running the validation again should produce:
YAML file is valid.
Using PyKwalify ensures your configuration files are complete, well-structured, and consistent with your application's expectations.
This validation step is especially valuable in larger projects, automated pipelines, or systems where incorrect configuration can lead to deployment issues or runtime failures.
Final thoughts
YAML provides a clear and structured format for configuration files, and Python makes it easy to work with that format using libraries like PyYAML and PyKwalify. With these tools, you can read, modify, create, and validate YAML data directly in your applications.
Adding validation ensures that your configuration files are accurate before use, helping prevent errors in automated processes and deployments.
For more details, refer to the official PyYAML documentation and PyKwalify documentation.
Make your mark
Join the writer's program
Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.
Write for us
Build on top of Better Stack
Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.
community@betterstack.comor submit a pull request and help us build better products for everyone.
See the full list of amazing projects on github