Ansible has become a cornerstone technology for infrastructure automation due to its agentless architecture and declarative approach.
However, when playbooks fail or behave unexpectedly, troubleshooting can be challenging without the right techniques. Effective debugging is essential not just for resolving immediate issues, but for maintaining reliable, long-term automation solutions.
Common challenges in Ansible workflows include:
- Unpredictable task failures.
- Inconsistent behavior across environments.
- Variable interpolation issues.
- Connection and authentication problems.
- Complex dependencies between roles and tasks.
This guide aims to provide you with a systematic approach to debugging these issues, from basic techniques to advanced strategies and tools.
Understanding the Ansible execution flow
Before diving into specific debugging techniques, it's important to understand how Ansible processes playbooks, as this knowledge forms the foundation for effective troubleshooting.
Anatomy of an Ansible playbook
An Ansible playbook consists of one or more plays, each targeting specific hosts. Each play contains tasks that execute modules with specific parameters. When a playbook fails, knowing this hierarchy helps you pinpoint where the issue might be occurring.
Consider this simple example of a playbook structure:
- name: First play
hosts: webservers
vars:
http_port: 80
tasks:
- name: Ensure nginx is installed
ansible.builtin.package:
name: nginx
state: present
- name: Configure nginx site
ansible.builtin.template:
src: nginx.conf.j2
dest: /etc/nginx/sites-available/default
notify: Restart nginx
handlers:
- name: Restart nginx
ansible.builtin.service:
name: nginx
state: restarted
When debugging, you need to consider which hosts are being targeted, what variables are available at each level of the playbook, how tasks depend on one another, and when handlers are triggered and executed. Each of these elements can be a potential source of issues.
How Ansible processes tasks and handles errors
Ansible processes tasks sequentially for each host in the inventory. By default,
if a task fails on a particular host, that host is removed from the remainder of
the play. This behavior can be modified with settings like ignore_errors: yes
or any_errors_fatal: true
.
The typical task execution proceeds through several phases. First, Ansible
gathers facts about the target system to provide context for the task. Next, it
checks if conditions for the task are met using the when
clause if present.
Then, it executes the module with the provided parameters and determines if
changes were made, assigning either a changed
or ok
status. The output is
registered for potential use in later tasks, and failures are handled according
to configured error policies.
Understanding this flow is crucial because different phases can fail for different reasons. For instance, a task might fail during the condition check due to an undefined variable, or during module execution due to permission issues on the remote system.
The importance of idempotency in debugging contexts
Idempotency—the property where repeated executions produce the same result—is central to Ansible's design. During debugging, understanding idempotency helps identify why certain tasks run differently on subsequent executions.
A well-written task should make changes only when necessary, report "changed" only when actual changes occur, and produce consistent results given the same inputs. Non-idempotent tasks can create frustrating debugging scenarios where problems appear intermittently or only during specific runs.
For example, a task that uses the command
module to run a script without
proper creates
or removes
parameters will run every time, regardless of
whether it needs to. This can mask issues and make debugging more difficult. In
contrast, using the appropriate module (like file
or template
) ensures that
Ansible only takes action when the current state doesn't match the desired
state.
Essential debugging techniques
Ansible offers several built-in features that are invaluable for troubleshooting. Mastering these techniques will significantly reduce your debugging time.
Using increased verbosity
Ansible's verbosity options provide increasingly detailed information about execution. You can use one to four v's depending on how much information you need:
[command]
ansible-playbook playbook.yml -v
Each level provides additional information about what's happening during
execution. With a single -v
, you'll see the results of each task, such as what
changes were made and return values.
With -vv
, you'll also see task configuration details, including how variables
were interpolated. The -vvv
level adds connection information, showing how
Ansible connects to remote hosts. Finally, -vvvv
includes low-level SSH
connection debugging, which is invaluable for connection problems.
Here's what the output might look like at the first verbosity level:
TASK [Ensure nginx is installed] ***********************************************
changed: [web01] => {"changed": true, "name": "nginx", "state": "present"}
And with increased verbosity (-vvv
), you'd see much more detail:
TASK [Ensure nginx is installed] ***********************************************
<web01> ESTABLISH SSH CONNECTION FOR USER: ansible
<web01> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no ansible@192.168.1.10 '/bin/sh -c '"'"'echo ~ansible && sleep 0'"'"''
<web01> (255, b'', b'Permission denied (publickey).\r\n')
fatal: [web01]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host..."}
This information can quickly reveal issues like SSH connection problems or module execution errors. The verbosity flags are likely the first tool you'll reach for when something goes wrong, as they provide immediate insight without modifying your playbooks.
Leveraging the --step
option
The --step
option allows you to execute a playbook task by task, confirming
each step before proceeding. This is particularly useful when you want to
observe how each task affects the system:
ansible-playbook playbook.yml --step
This produces prompts between each task:
TASK [Ensure nginx is installed] ***********************************************
Perform task? (y/n/c): y
changed: [web01]
TASK [Configure nginx site] ****************************************************
Perform task? (y/n/c):
You have three response options: y
(yes) to execute the current task, n
(no)
to skip the current task, or c
(continue) to proceed without further prompts.
This technique is particularly useful for identifying which specific task causes a failure, testing changes to a playbook without running the entire workflow, or simply learning how a complex playbook functions.
The --step
option gives you fine-grained control over execution, allowing you
to pause before potentially problematic tasks or skip tasks that you know aren't
relevant to the issue you're investigating. It's like having a debugger's "step
through" function for your infrastructure code.
Implementing the debug
module effectively
The debug
module is one of the most useful tools for inspecting variables and
expressions during playbook execution. It allows you to print values to the
console without making any changes to the target systems:
- name: Debug variable values
hosts: webservers
vars:
webapp_port: 8080
environment: '{{ lookup(''env'', ''DEPLOY_ENV'') | default(''development'', true) }}'
tasks:
- name: Display all variables
ansible.builtin.debug:
var: 'hostvars[inventory_hostname]'
verbosity: 2
- name: Check specific variable
ansible.builtin.debug:
msg: >-
Web application will run on port {{ webapp_port }} in {{ environment
}} environment
- name: Complex expression evaluation
ansible.builtin.debug:
msg: 'Config file should be at {{ ''/etc/'' + environment + ''/app.conf'' }}'
The debug module serves several key purposes in troubleshooting: it can display the value of variables or the result of Jinja2 operations, helping you verify that they contain what you expect.
The verbosity
parameter is particularly useful, as it allows you to leave
debug tasks in your playbooks that only execute when running with the
corresponding verbosity level. This means you can build debugging into your
playbooks without cluttering standard output during normal runs.
Working with register
and when
for conditional debugging
Combining register
with conditional execution provides a powerful debugging
technique. The register
directive captures the output of a task, allowing you
to inspect it and make decisions based on the results:
- name: Conditionally debug based on task results
hosts: webservers
tasks:
- name: Check if config file exists
ansible.builtin.stat:
path: /etc/nginx/sites-available/default
register: config_file
- name: Show config file details
ansible.builtin.debug:
msg: "Config file exists: {{ config_file.stat.exists }}, Size: {{ config_file.stat.size }}"
when: config_file.stat.exists
- name: Show error if file missing
ansible.builtin.debug:
msg: "WARNING: Config file does not exist!"
when: not config_file.stat.exists
This approach lets you capture and inspect the results of operations, make debugging conditional on specific situations, create detailed audit trails of complex operations, and build self-diagnosing playbooks that report their own issues.
The content of register
variables often contains detailed information beyond
what's displayed in the standard output. For instance, a registered result from
the command
module will include the return code, standard output, and standard
error, giving you complete visibility into what happened during execution.
Advanced debugging strategies
As your Ansible infrastructure grows in complexity, you'll need more sophisticated debugging approaches to handle intricate issues.
Using ansible.cfg configuration for debugging
The Ansible configuration file provides several options to enhance debugging capabilities. By customizing these settings, you can gain more insight into playbook execution:
[defaults]
# Increase timeout for slow-responding hosts
timeout = 60
# Enable task profiling to identify slow tasks
callback_whitelist = profile_tasks
# Improve error display
stdout_callback = yaml
display_skipped_hosts = True
display_args_to_stdout = True
# Preserve host in play after failure
any_errors_fatal = False
max_fail_percentage = 25
[ssh_connection]
# Keep SSH connections for debugging
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
pipelining = True
These settings provide several debugging advantages. The yaml
callback formats
output to be more readable, making complex return values easier to understand.
The profile_tasks
callback shows execution time for each task, helping
identify performance bottlenecks. Displaying passed arguments makes it easier to
verify what values are being used, while connection settings can be tuned to
provide more reliable access to remote systems for debugging.
Configuration changes can be made at different levels: system-wide in
/etc/ansible/ansible.cfg
, per-user in ~/.ansible.cfg
, or per-project in a
local ansible.cfg
file. This flexibility allows you to have different settings
for different environments, such as more verbose output for development but more
concise logs in production.
Implementing custom callback plugins
For advanced debugging, custom callback plugins can capture and display information in formats tailored to your needs. Callbacks intercept specific events during playbook execution and can process them in custom ways:
from ansible.plugins.callback import CallbackBase
class CallbackModule(CallbackBase):
CALLBACK_VERSION = 2.0
CALLBACK_TYPE = 'notification'
CALLBACK_NAME = 'debug_logger'
def __init__(self):
super(CallbackModule, self).__init__()
self.task_ok_counter = 0
self.task_failed_counter = 0
def v2_runner_on_ok(self, result):
self.task_ok_counter += 1
self._display.display(
f"SUCCESS: Task '{result._task.name}' on {result._host.name} [{self.task_ok_counter}]"
)
def v2_runner_on_failed(self, result, ignore_errors=False):
self.task_failed_counter += 1
self._display.display(
f"FAILURE: Task '{result._task.name}' on {result._host.name} [{self.task_failed_counter}]"
)
self._display.display(f"Error: {result._result.get('msg', 'No error message')}")
This simple callback plugin counts successful and failed tasks, displaying a custom message for each. More sophisticated plugins could send notifications to external systems, log detailed information to files, or format output in custom ways for easier analysis.
To use a custom callback, place it in a callback_plugins
directory in your
Ansible project and enable it in your configuration. Callbacks can respond to
various events like playbook start and end, task execution, or host unreachable
notifications, giving you complete visibility into the Ansible execution
lifecycle.
Creating targeted test playbooks
For complex problems, creating focused test playbooks can isolate issues without the complexity of full production playbooks:
- name: Isolated test of problematic task
hosts: problem_host
gather_facts: no
tasks:
- name: Show environment
ansible.builtin.debug:
msg: "Testing on {{ inventory_hostname }} with ansible_connection={{ ansible_connection | default('undefined') }}"
- name: Run isolated version of problem task
ansible.builtin.template:
src: problem_template.j2
dest: /tmp/test_output.conf
register: test_result
- name: Show detailed results
ansible.builtin.debug:
var: test_result
verbosity: 1
This approach lets you focus on a single host or task, remove dependencies that might mask the real problem, and gather comprehensive information about a specific operation. By simplifying the context, you can more easily identify what's causing an issue without the noise of an entire complex playbook.
Test playbooks are especially useful when you encounter intermittent issues that are difficult to reproduce. By creating a minimal reproduction case, you can run it repeatedly until the issue occurs, then examine the conditions in detail.
Utilizing check mode effectively
Ansible's check mode (--check
) predicts changes without making them, which is
valuable for debugging:
[command]
ansible-playbook playbook.yml --check --diff
The --diff
flag enhances check mode by showing exactly what changes would be
made to files. This combination allows you to see exactly what changes Ansible
would make without actually applying them, which is invaluable for verifying
that a playbook will do what you expect.
You can also control check mode behavior within individual tasks:
- name: Demo check mode control
hosts: webservers
tasks:
- name: Task that always runs even in check mode
ansible.builtin.command: hostname
check_mode: false
register: hostname_result
- name: Task that reports changes even in check mode
ansible.builtin.debug:
msg: "This would create a new config"
check_mode: false
changed_when: true
- name: Task that never reports changes in check mode
ansible.builtin.template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
check_mode: true
diff: true
This technique helps identify which tasks would make changes before running a full playbook. It's particularly useful for verifying changes in sensitive environments where downtime must be minimized.
Check mode doesn't work perfectly with all modules, particularly those that execute commands or scripts. Some modules might need to make actual connections or queries to determine what would change. Understanding these limitations is important when using check mode for debugging.
Developing debugging strategies for complex roles
Roles introduce another layer of complexity for debugging. Effective approaches include enabling conditional debugging outputs within roles:
- name: Play with role debugging
hosts: webservers
vars:
nginx_role_debug: true
roles:
- role: nginx
Within the role itself, you can add debug tasks that activate conditionally:
- name: Show role variables
ansible.builtin.debug:
msg: |
Port: {{ nginx_port | default('80') }}
Document root: {{ nginx_docroot | default('/var/www/html') }}
Worker processes: {{ nginx_workers | default('auto') }}
when: nginx_role_debug | default(false)
- name: Include installation tasks
ansible.builtin.include_tasks: install.yml
This approach lets you toggle debugging output for roles without modifying the
role files themselves. The nginx_role_debug
variable acts as a switch that can
be set at the playbook level, making it easy to enable verbose output when
needed and disable it during normal operation.
Another powerful technique is using tags to selectively run parts of complex roles:
[command]
ansible-playbook playbook.yml --tags nginx-config
By tagging different sections of your roles, you can focus on specific functionality during debugging. This is particularly valuable in complex roles with many tasks, as it lets you isolate the specific area where issues are occurring.
Integration with logging systems
For production environments, integrating Ansible with centralized logging provides valuable debugging information even after playbook execution has completed:
- name: Playbook with centralized logging
hosts: all
vars:
log_server: "logs.example.com"
pre_tasks:
- name: Record playbook start
community.general.logstash:
server: "{{ log_server }}"
port: 5000
message:
event: "playbook_start"
playbook: "{{ ansible_play_name }}"
hosts: "{{ ansible_play_hosts | join(',') }}"
delegate_to: localhost
run_once: true
tasks:
- name: Sample task
ansible.builtin.debug:
msg: "Running task"
post_tasks:
- name: Record playbook completion
community.general.logstash:
server: "{{ log_server }}"
port: 5000
message:
event: "playbook_complete"
playbook: "{{ ansible_play_name }}"
status: "{{ ansible_failed_task | default('success') }}"
delegate_to: localhost
run_once: true
This approach creates audit trails for automation that can be searched and analyzed later. It centralizes debugging information from multiple playbook runs, enabling correlation with other system events. It also provides historical data for troubleshooting, which is invaluable when issues occur infrequently or are related to specific environmental conditions.
Centralized logging becomes increasingly important as your Ansible usage scales. When multiple teams run playbooks across numerous systems, having a central repository of execution information helps identify patterns and common issues that might not be apparent from individual runs.
Final thoughts
Effective Ansible debugging is a blend of art and science, combining technical tools with systematic problem-solving approaches.
Throughout this guide, we've explored strategies ranging from basic verbosity increases to sophisticated custom plugins and comprehensive testing frameworks.
The key to successful debugging lies in understanding Ansible's execution model, leveraging the right tools for each situation, and developing a systematic approach to isolating and resolving issues.
Thanks for reading!
Make your mark
Join the writer's program
Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.
Write for us
Build on top of Better Stack
Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.
community@betterstack.comor submit a pull request and help us build better products for everyone.
See the full list of amazing projects on github