Ansible has become one of the most popular tools for automation, configuration management, and infrastructure as code.
Its agentless architecture and YAML-based playbooks make it relatively easy to learn, but that doesn't mean you won't encounter errors.
In fact, as you progress from simple playbooks to more complex automation scenarios, troubleshooting becomes an increasingly important skill.
This article covers the most common errors you'll encounter when working with Ansible, why they occur, and how to fix them.
Understanding these common pitfalls will help you build more robust automation and save time when issues arise.
Syntax and YAML errors
YAML forms the foundation of Ansible playbooks. Its human-readable format makes playbooks easy to write, but its strict syntax rules can also lead to frustrating errors.
YAML indentation errors
Perhaps the most common error in Ansible is related to YAML indentation. YAML uses indentation to establish the structure and hierarchy of data. Unlike some languages where indentation is merely a matter of style, in YAML, indentation is syntactically significant.
Common error message:
. . .
Syntax Error while loading YAML.
mapping values are not allowed in this context
The error appears to be in '/home/ayo/dev/betterstack/demo/ansible-errors/playbook.yml': line 5, column 10, but may
be elsewhere in the file depending on the exact syntax problem.
. . .
Why it happens:
This error typically occurs when you mix tabs and spaces or use inconsistent indentation levels. YAML is very particular about this distinction.
How to fix it:
- Use spaces instead of tabs for indentation.
- Maintain consistent indentation (2 spaces is the common standard).
- Use a YAML validator or linter to check your files.
Here's an example of incorrect indentation:
- name: Install web server
hosts: webservers
tasks:
- name: Install Apache
apt:
name: apache2
state: present
- name: Start Apache service
service:
name: apache2
state: started
And here's the corrected version:
- name: Install web server
hosts: webservers
tasks:
- name: Install Apache
apt:
name: apache2
state: present
- name: Start Apache service
service:
name: apache2
state: started
To validate your YAML files, you can use the yamllint tool:
yamllint playbook.yml
Missing or invalid quotes
Another common syntax error involves string quoting, especially when your strings contain special characters.
Common error message:
ERROR! Syntax Error while loading YAML.
found unacceptable character #: mapping values are not allowed in this context
The error appears to be in '/path/to/playbook.yml': line 12, column 15, but may
be elsewhere in the file depending on the exact syntax problem.
Why it happens:
YAML requires strings containing special characters like colons, hash symbols, or starting with characters like asterisks to be quoted. Additionally, strings containing variables might need proper quoting to avoid interpolation issues.
How to fix it:
- Quote strings containing special characters
- Use single quotes to prevent variable interpolation
- Use double quotes when you need variable interpolation
Here's an example that would cause errors:
- name: Configure application
hosts: app_servers
vars:
app_config:
url: http://example.com:8080 # Error: contains colon
comment: This server handles #1 priority tasks # Error: contains hash
command: ls -la # Error: contains space and hyphen
tasks:
- name: Create configuration file
template:
src: config.j2
dest: /etc/app/config.yaml
Corrected version:
- name: Configure application
hosts: app_servers
vars:
app_config:
url: "http://example.com:8080"
comment: "This server handles #1 priority tasks"
command: "ls -la"
tasks:
- name: Create configuration file
template:
src: config.j2
dest: /etc/app/config.yaml
Jinja2 template syntax errors
Ansible uses Jinja2 templating extensively, which provides powerful capabilities but can also introduce errors, especially when mixing it with YAML syntax.
Common error message:
ERROR! template error while templating string: unexpected '{'. String: {{ item }}{{ ansible_facts['hostname'] }}
Why it happens:
These errors often occur due to missing spaces in Jinja2 expressions, incorrect filter syntax, or confusing Jinja2 with YAML syntax.
How to fix it:
- Ensure proper spacing in Jinja2 expressions (
{{ variable }}
not{{variable}}
). - Use proper syntax for filters (
{{ variable | filter }}
). - Properly quote templated strings in YAML.
Incorrect example:
- name: Configure hosts
hosts: all
tasks:
- name: Create file with hostname
file:
path: /tmp/{{item}}{{ansible_facts['hostname']}}.txt
state: touch
loop:
- server_
- host_
Corrected version:
- name: Configure hosts
hosts: all
tasks:
- name: Create file with hostname
file:
path: "/tmp/{{ item }}{{ ansible_facts['hostname'] }}.txt"
state: touch
loop:
- server_
- host_
For complex Jinja2 expressions, you can use the debug module to test your syntax:
- name: Debug Jinja2 expressions
hosts: localhost
vars:
my_string: "Hello World"
my_list: [1, 2, 3, 4, 5]
tasks:
- name: Test Jinja2 expression
debug:
msg: "{{ my_string | upper }} {{ my_list | sum }}"
Connection errors
Since Ansible operates by connecting to remote hosts, connection problems are a common source of errors. Understanding these issues is crucial for effective troubleshooting.
SSH is Ansible's primary method for connecting to managed nodes, and SSH-related issues are among the most common errors.
Common error message:
UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 192.168.1.100 port 22: Connection timed out", "unreachable": true}
Why it happens: SSH connection failures can result from network connectivity issues, firewall configurations, incorrect credentials, SSH service not running, or incorrect SSH configuration.
How to fix it:
- Verify network connectivity with a
ping
test. - Check that SSH service is running on the target.
- Verify firewall rules allow SSH connections.
- Ensure SSH credentials are correct.
- Configure SSH options in
ansible.cfg
.
To test basic connectivity:
ping <ip_address>
You can configure SSH options in your ansible.cfg
file:
[defaults]
inventory = ./inventory
remote_user = deploy
[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o ConnectTimeout=10
pipelining = True
For more verbose SSH debugging, increase Ansible's verbosity:
ansible-playbook playbook.yml -vvv
Privilege escalation errors
Ansible often needs to run commands with elevated privileges, which can lead to permission-related errors.
Common error message:
FAILED! => {"msg": "Missing sudo password"}
Or:
FAILED! => {"changed": false, "module_stderr": "sudo: a password is required\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}
Why it happens: These errors occur when Ansible attempts to execute tasks
that require elevated privileges without providing the necessary sudo
password
or having passwordless sudo
configured.
How to fix it:
- Use the
--ask-become-pass
option to prompt for thesudo
password. - Configure passwordless
sudo
on the target hosts. - Specify the become method in your playbook.
- Store the
sudo
password securely using Ansible Vault.
Example playbook with become (sudo
) configured:
- name: Configure system
hosts: webservers
become: true
become_method: sudo
become_user: root
tasks:
- name: Install required packages
apt:
name:
- nginx
- curl
- python3
state: present
To run the playbook with sudo
password prompt:
ansible-playbook playbook.yml --ask-become-pass
Host key verification failures
SSH relies on host key verification for security, which can sometimes cause connection issues with Ansible.
Common error message:
UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Host key verification failed.", "unreachable": true}
Why it happens: This error occurs when the SSH host key of the target system isn't in your known_hosts file, or when the host key has changed (which could indicate a potential security issue).
How to fix it:
- Manually connect to the host via SSH to add it to
known_hosts
. - Disable host key checking in
ansible.cfg
(less secure but convenient for testing). - Use
ssh-keyscan
to add the host key to known_hosts programmatically.
To disable host key checking for testing:
[defaults]
host_key_checking = False
For a more secure approach, add the host key programmatically:
ssh-keyscan 192.168.1.100 >> ~/.ssh/known_hosts
Inventory and variable errors
Properly managing inventory and variables is crucial for Ansible. Errors in these areas can be particularly confusing because they may not manifest until specific tasks are executed.
Undefined variables
Using a variable that hasn't been defined is a common error, especially in complex playbooks with multiple variable sources.
Common error message:
FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'app_version' is undefined"}
Why it happens: This error occurs when you reference a variable that hasn't been defined in any of Ansible's variable sources (playbook vars, inventory, group_vars, etc.) or when you misspell a variable name.
How to fix it:
- Check variable definitions across all relevant files.
- Use the default filter to provide fallback values.
- Use debug tasks to inspect variable content.
- Use
ansible-inventory --list
to see all inventory variables.
Example using the default filter:
- name: Deploy application
hosts: app_servers
tasks:
- name: Create app directory
file:
path: "/opt/app/{{ app_version | default('latest') }}"
state: directory
Adding debug
tasks to inspect variables:
- name: Debug variables
hosts: app_servers
tasks:
- name: Display all variables
debug:
var: hostvars[inventory_hostname]
- name: Display specific variable
debug:
var: app_version
ignore_errors: yes
Inventory parsing issues
Problems with inventory file syntax can prevent Ansible from properly recognizing and connecting to hosts.
Common error message:
ERROR! Attempted to read "/path/to/inventory" as YAML: Syntax Error while loading YAML.
Or:
[WARNING]: Could not match supplied host pattern, ignoring: webservers
Why it happens: These errors occur due to syntax errors in inventory files, incorrect group definitions, or typos in host patterns.
How to fix it:
- Validate inventory syntax.
- Check group names and hierarchy.
- Use
ansible-inventory --graph
to visualize your inventory structure.
Example of a problematic inventory file:
[webservers]
web1.example.com
web2.example.com
[dbservers]
db1.example.com
db2.example.com
[production:children]
webservers
database # This should be dbservers
Corrected version:
[webservers]
web1.example.com
web2.example.com
[dbservers]
db1.example.com
db2.example.com
[production:children]
webservers
dbservers
To validate your inventory structure:
ansible-inventory --graph
You should see output like:
@all:
|--@dbservers:
| |--db1.example.com
| |--db2.example.com
|--@production:
| |--@dbservers:
| | |--db1.example.com
| | |--db2.example.com
| |--@webservers:
| | |--web1.example.com
| | |--web2.example.com
|--@ungrouped:
|--@webservers:
| |--web1.example.com
| |--web2.example.com
Variable precedence confusion
Ansible has a complex variable precedence system that can lead to unexpected values being used.
Common error message: There's rarely a specific error message for precedence issues. Instead, you'll notice variables having unexpected values.
Why it happens: Ansible has a specific order in which it processes variables from different sources. When the same variable is defined in multiple places, the value from the highest-precedence source wins.
How to fix it:
- Review Ansible's variable precedence documentation.
- Use debug tasks to find where variables are coming from.
- Use
ansible-inventory --list
to see all inventory variables. - Consider where you define variables based on their scope and purpose.
Example debug
task to help trace variable sources:
- name: Trace variable precedence
hosts: app_servers
vars:
app_port: 8080
tasks:
- name: Show app_port from different sources
debug:
msg: |
Playbook vars: {{ app_port }}
Group vars: {{ hostvars[inventory_hostname].app_port | default('undefined') }}
Host vars: {{ hostvars[inventory_hostname].app_port | default('undefined') }}
Extra vars: {{ app_port }}
Run this with:
ansible-playbook trace_vars.yml --extra-vars "app_port=9000"
Module-specific errors
Ansible modules are the workhorses that perform actual tasks on managed nodes. Each module has its own set of potential errors.
Command/shell module failures
The command and shell modules are among the most commonly used, but they can fail for various reasons.
Common error message:
FAILED! => {"changed": true, "cmd": "service apache2 status", "delta": "0:00:00.005625", "end": "2023-04-11 14:30:12.125799", "msg": "non-zero return code", "rc": 3, "start": "2023-04-11 14:30:12.120174", "stderr": "", "stderr_lines": [], "stdout": "apache2 is not running", "stdout_lines": ["apache2 is not running"]}
Why it happens: Command module failures typically occur when the executed command exits with a non-zero status. This could be due to the command not existing, insufficient permissions, or the command itself failing.
How to fix it:
- Add
ignore_errors: true
for commands that may legitimately fail - Use
become: true
for commands requiring elevated privileges - Use the
failed_when
directive to customize failure conditions - Consider using specialized modules instead of raw commands
Example with improved error handling:
- name: Check and restart services
hosts: webservers
become: true
tasks:
- name: Check if Apache is running
command: systemctl status apache2
register: apache_status
ignore_errors: true
changed_when: false # Status check doesn't change anything
- name: Restart Apache if not running
service:
name: apache2
state: restarted
when: apache_status.rc != 0
Using failed_when
for custom failure conditions:
- name: Check disk space
hosts: all
tasks:
- name: Get disk usage
command: df -h /
register: df_output
changed_when: false
- name: Parse disk usage
set_fact:
disk_usage_pct: "{{ df_output.stdout_lines[1].split()[4] | replace('%', '') }}"
- name: Check if disk space is critical
fail:
msg: "Disk usage is critical: {{ disk_usage_pct }}%"
when: disk_usage_pct | int > 90
Package management errors
Package installation issues are common, especially when dealing with different distributions or repository configurations.
Common error message:
FAILED! => {"changed": false, "msg": "No package matching 'apache2' available."}
Or:
FAILED! => {"changed": false, "msg": "Failed to update apt cache: E: Could not get lock /var/lib/apt/lists/lock"}
Why it happens: These errors occur when packages are not available in the configured repositories, repositories are not accessible, or there are locking issues with the package manager.
How to fix it:
- Verify package name and availability for the target distribution
- Ensure repositories are properly configured
- Update package cache before installation
- Handle lock files appropriately
Example with improved package handling:
- name: Install packages robustly
hosts: webservers
become: true
tasks:
- name: Update apt cache
apt:
update_cache: yes
cache_valid_time: 3600 # Use cached results if updated within the last hour
when: ansible_os_family == "Debian"
- name: Install Apache (Debian/Ubuntu)
apt:
name: apache2
state: present
when: ansible_os_family == "Debian"
- name: Install Apache (RedHat/CentOS)
yum:
name: httpd
state: present
when: ansible_os_family == "RedHat"
For lock file issues, you can add retry logic:
- name: Install packages with retry
hosts: all
become: true
tasks:
- name: Install required packages
apt:
name: nginx
state: present
update_cache: yes
register: apt_result
retries: 5
delay: 10
until: apt_result is success
File operation errors
File operations can fail due to permissions, path issues, or disk space constraints.
Common error message:
FAILED! => {"changed": false, "msg": "Error creating file /etc/app/config.ini: [Errno 13] Permission denied: '/etc/app/config.ini'"}
Or:
FAILED! => {"changed": false, "msg": "Error creating directory /var/www/html/uploads: [Errno 2] No such file or directory: '/var/www/html/uploads'"}
Why it happens: File operation errors typically occur due to insufficient permissions, non-existent parent directories, or full filesystems.
How to fix it:
- Use
become: true
to get elevated privileges. - Ensure parent directories exist (use
state: directory
with thefile
module). - Check file ownership and permissions.
- Verify available disk space.
Example with robust file operations:
- name: Create application directories
hosts: webservers
become: true
tasks:
- name: Check available disk space
command: df -h /var
register: df_output
changed_when: false
- name: Ensure parent directory exists
file:
path: /var/www/myapp
state: directory
mode: '0755'
owner: www-data
group: www-data
- name: Create nested directories with parent option
file:
path: /var/www/myapp/uploads/images
state: directory
mode: '0775'
owner: www-data
group: www-data
recurse: yes # Creates parent directories if needed
For copying files with proper permissions:
- name: Copy configuration files
hosts: app_servers
become: true
tasks:
- name: Copy app configuration
copy:
src: files/app.conf
dest: /etc/app/app.conf
owner: app_user
group: app_group
mode: '0640'
backup: yes # Create backup of existing file
Performance and scalability issues
As your Ansible deployments grow, you may encounter performance issues that aren't strictly errors but can significantly impact usability.
Playbook execution timeouts
Long-running tasks can timeout, especially when Ansible's default timeouts are insufficient.
Common error message:
FAILED! => {"msg": "The async task did not complete within the requested time (300s)."}
Why it happens: Ansible has various timeout settings that can cause tasks to fail when they take too long to complete, such as large file transfers, database migrations, or package installations.
How to fix it:
- Use async/poll for long-running tasks.
- Adjust timeout settings in
ansible.cfg
. - Break large tasks into smaller components.
Example using async for long-running tasks:
- name: Run long operations
hosts: webservers
become: true
tasks:
- name: Update all packages
apt:
upgrade: dist
update_cache: yes
async: 3600 # Allow this task to run for up to 1 hour
poll: 30 # Check status every 30 seconds
Configure timeouts in ansible.cfg
:
[defaults]
timeout = 60 # Default SSH timeout in seconds
[ssh_connection]
ssh_args = -o ConnectTimeout=60 -o ServerAliveInterval=30
Memory issues with large inventories
When working with large inventories, Ansible can consume significant memory, potentially causing performance problems or failures.
Common error message: Memory issues typically manifest as the ansible process being killed by the OS or extremely slow performance.
Why it happens: Ansible loads the entire inventory into memory and collects facts from all hosts by default. With large inventories, this can consume substantial memory resources.
How to fix it:
- Use fact caching to reduce repeated fact gathering
- Limit fact gathering when possible
- Use the
--limit
option to target specific hosts - Break large playbooks into smaller components
Configure fact caching in ansible.cfg
:
[defaults]
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_fact_cache
fact_caching_timeout = 86400 # 24 hours in seconds
Limit fact gathering in playbooks:
- name: Minimal facts playbook
hosts: all
gather_facts: no # Don't gather facts at all
tasks:
- name: Gather only the facts we need
setup:
gather_subset:
- '!all'
- '!min'
- 'network'
- 'hardware'
- name: Display only network facts
debug:
var: ansible_default_ipv4
Parallelism problems
Ansible's parallel execution can sometimes lead to resource contention and unpredictable behavior.
Common error message: There's usually no specific error message, but you might see inconsistent results or timeouts when running against many hosts simultaneously.
Why it happens: By default, Ansible runs tasks in parallel across hosts. This can cause issues when tasks compete for resources or when order matters between different host groups.
How to fix it:
- Adjust the
forks
parameter to control parallelism - Use the
serial
directive to limit simultaneous execution - Apply throttling to resource-intensive tasks
Configure lower parallelism in ansible.cfg
:
[defaults]
forks = 10 # Default is 5
Use serial execution for critical tasks:
- name: Update database servers in sequence
hosts: db_servers
serial: 1 # Run on one host at a time
become: true
tasks:
- name: Stop database service
service:
name: postgresql
state: stopped
- name: Update database packages
apt:
name: postgresql
state: latest
- name: Start database service
service:
name: postgresql
state: started
Using throttle for specific tasks:
- name: Run resource-intensive operations
hosts: all
become: true
tasks:
- name: Rebuild search index
command: rebuild_search_index
throttle: 3 # Only run on 3 hosts at a time
Final thoughts
Understanding common Ansible errors and their solutions is crucial for effective automation. By recognizing patterns in YAML syntax issues, connection failures, variable handling, and module-specific errors, you can quickly diagnose and resolve problems.
Thanks for reading!
Make your mark
Join the writer's program
Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.
Write for us
Build on top of Better Stack
Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.
community@betterstack.comor submit a pull request and help us build better products for everyone.
See the full list of amazing projects on github