# Debugging Ansible Workflows: A Comprehensive Guide

[Ansible](https://betterstack.com/community/guides/linux/ansible-getting-started/) has become a cornerstone technology for
infrastructure automation due to its agentless architecture and declarative
approach.

However, when playbooks fail or behave unexpectedly, troubleshooting can be
challenging without the right techniques. Effective debugging is essential not
just for resolving immediate issues, but for maintaining reliable, long-term
automation solutions.

Common challenges in Ansible workflows include:

- Unpredictable task failures.
- Inconsistent behavior across environments.
- Variable interpolation issues.
- Connection and authentication problems.
- Complex dependencies between roles and tasks.

This guide aims to provide you with a systematic approach to debugging these
issues, from basic techniques to advanced strategies and tools.

<iframe width="100%" height="315" src="https://www.youtube.com/embed/iq94jL3t8gs" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>

## Understanding the Ansible execution flow

Before diving into specific debugging techniques, it's important to understand
how Ansible processes playbooks, as this knowledge forms the foundation for
effective troubleshooting.

### Anatomy of an Ansible playbook

An Ansible playbook consists of one or more plays, each targeting specific
hosts. Each play contains tasks that execute modules with specific parameters.
When a playbook fails, knowing this hierarchy helps you pinpoint where the issue
might be occurring.

Consider this simple example of a playbook structure:

```yaml
- name: First play
  hosts: webservers
  vars:
    http_port: 80
  tasks:
    - name: Ensure nginx is installed
      ansible.builtin.package:
        name: nginx
        state: present
    - name: Configure nginx site
      ansible.builtin.template:
        src: nginx.conf.j2
        dest: /etc/nginx/sites-available/default
      notify: Restart nginx
  handlers:
    - name: Restart nginx
      ansible.builtin.service:
        name: nginx
        state: restarted
```

When debugging, you need to consider which hosts are being targeted, what
variables are available at each level of the playbook, how tasks depend on one
another, and when handlers are triggered and executed. Each of these elements
can be a potential source of issues.

### How Ansible processes tasks and handles errors

Ansible processes tasks sequentially for each host in the inventory. By default,
if a task fails on a particular host, that host is removed from the remainder of
the play. This behavior can be modified with settings like `ignore_errors: yes`
or `any_errors_fatal: true`.

The typical task execution proceeds through several phases. First, Ansible
gathers facts about the target system to provide context for the task. Next, it
checks if conditions for the task are met using the `when` clause if present.

Then, it executes the module with the provided parameters and determines if
changes were made, assigning either a `changed` or `ok` status. The output is
registered for potential use in later tasks, and failures are handled according
to configured error policies.

Understanding this flow is crucial because different phases can fail for
different reasons. For instance, a task might fail during the condition check
due to an undefined variable, or during module execution due to permission
issues on the remote system.

### The importance of idempotency in debugging contexts

Idempotency—the property where repeated executions produce the same result—is
central to Ansible's design. During debugging, understanding idempotency helps
identify why certain tasks run differently on subsequent executions.

A well-written task should make changes only when necessary, report "changed"
only when actual changes occur, and produce consistent results given the same
inputs. Non-idempotent tasks can create frustrating debugging scenarios where
problems appear intermittently or only during specific runs.

For example, a task that uses the `command` module to run a script without
proper `creates` or `removes` parameters will run every time, regardless of
whether it needs to. This can mask issues and make debugging more difficult. In
contrast, using the appropriate module (like `file` or `template`) ensures that
Ansible only takes action when the current state doesn't match the desired
state.

## Essential debugging techniques

Ansible offers several built-in features that are invaluable for
troubleshooting. Mastering these techniques will significantly reduce your
debugging time.

### Using increased verbosity

Ansible's verbosity options provide increasingly detailed information about
execution. You can use one to four v's depending on how much information you
need:

```text
[command]
ansible-playbook playbook.yml -v
```

Each level provides additional information about what's happening during
execution. With a single `-v`, you'll see the results of each task, such as what
changes were made and return values.

With `-vv`, you'll also see task configuration details, including how variables
were interpolated. The `-vvv` level adds connection information, showing how
Ansible connects to remote hosts. Finally, `-vvvv` includes low-level SSH
connection debugging, which is invaluable for connection problems.

Here's what the output might look like at the first verbosity level:

```text
[output]
TASK [Ensure nginx is installed] ***********************************************
changed: [web01] => {"changed": true, "name": "nginx", "state": "present"}
```

And with increased verbosity (`-vvv`), you'd see much more detail:

```text
[output]
TASK [Ensure nginx is installed] ***********************************************
<web01> ESTABLISH SSH CONNECTION FOR USER: ansible
<web01> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no ansible@192.168.1.10 '/bin/sh -c '"'"'echo ~ansible && sleep 0'"'"''
<web01> (255, b'', b'Permission denied (publickey).\r\n')
fatal: [web01]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host..."}
```

This information can quickly reveal issues like SSH connection problems or
[module execution errors](https://betterstack.com/community/guides/linux/ansible-errors/). The verbosity flags are likely the
first tool you'll reach for when something goes wrong, as they provide immediate
insight without modifying your playbooks.

### Leveraging the `--step` option

The `--step` option allows you to execute a playbook task by task, confirming
each step before proceeding. This is particularly useful when you want to
observe how each task affects the system:

```command
ansible-playbook playbook.yml --step
```

This produces prompts between each task:

```text
[output]
TASK [Ensure nginx is installed] ***********************************************
Perform task? (y/n/c): y
changed: [web01]

TASK [Configure nginx site] ****************************************************
Perform task? (y/n/c):
```

You have three response options: `y` (yes) to execute the current task, `n` (no)
to skip the current task, or `c` (continue) to proceed without further prompts.

This technique is particularly useful for identifying which specific task causes
a failure, testing changes to a playbook without running the entire workflow, or
simply learning how a complex playbook functions.

The `--step` option gives you fine-grained control over execution, allowing you
to pause before potentially problematic tasks or skip tasks that you know aren't
relevant to the issue you're investigating. It's like having a debugger's "step
through" function for your infrastructure code.

### Implementing the `debug` module effectively

The `debug` module is one of the most useful tools for inspecting variables and
expressions during playbook execution. It allows you to print values to the
console without making any changes to the target systems:

```yaml
- name: Debug variable values
  hosts: webservers
  vars:
    webapp_port: 8080
    environment: '{{ lookup(''env'', ''DEPLOY_ENV'') | default(''development'', true) }}'
  tasks:
    - name: Display all variables
      ansible.builtin.debug:
        var: 'hostvars[inventory_hostname]'
      verbosity: 2
    - name: Check specific variable
      ansible.builtin.debug:
        msg: >-
          Web application will run on port {{ webapp_port }} in {{ environment
          }} environment
    - name: Complex expression evaluation
      ansible.builtin.debug:
        msg: 'Config file should be at {{ ''/etc/'' + environment + ''/app.conf'' }}'
```

The debug module serves several key purposes in troubleshooting: it can display
the value of variables or the result of Jinja2 operations, helping you verify
that they contain what you expect.

The `verbosity` parameter is particularly useful, as it allows you to leave
debug tasks in your playbooks that only execute when running with the
corresponding verbosity level. This means you can build debugging into your
playbooks without cluttering standard output during normal runs.

### Working with `register` and `when` for conditional debugging

Combining `register` with conditional execution provides a powerful debugging
technique. The `register` directive captures the output of a task, allowing you
to inspect it and make decisions based on the results:

```text
[label register_debug.yml]
- name: Conditionally debug based on task results
 hosts: webservers
 tasks:
   - name: Check if config file exists
     ansible.builtin.stat:
       path: /etc/nginx/sites-available/default
     register: config_file

   - name: Show config file details
     ansible.builtin.debug:
       msg: "Config file exists: {{ config_file.stat.exists }}, Size: {{ config_file.stat.size }}"
     when: config_file.stat.exists

   - name: Show error if file missing
     ansible.builtin.debug:
       msg: "WARNING: Config file does not exist!"
     when: not config_file.stat.exists
```

This approach lets you capture and inspect the results of operations, make
debugging conditional on specific situations, create detailed audit trails of
complex operations, and build self-diagnosing playbooks that report their own
issues.

The content of `register` variables often contains detailed information beyond
what's displayed in the standard output. For instance, a registered result from
the `command` module will include the return code, standard output, and standard
error, giving you complete visibility into what happened during execution.

## Advanced debugging strategies

As your Ansible infrastructure grows in complexity, you'll need more
sophisticated debugging approaches to handle intricate issues.

### Using ansible.cfg configuration for debugging

The Ansible configuration file provides several options to enhance debugging
capabilities. By customizing these settings, you can gain more insight into
playbook execution:

```text
[label ansible.cfg]
[defaults]
# Increase timeout for slow-responding hosts
timeout = 60

# Enable task profiling to identify slow tasks
callback_whitelist = profile_tasks

# Improve error display
stdout_callback = yaml
display_skipped_hosts = True
display_args_to_stdout = True

# Preserve host in play after failure
any_errors_fatal = False
max_fail_percentage = 25

[ssh_connection]
# Keep SSH connections for debugging
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
pipelining = True
```

These settings provide several debugging advantages. The `yaml` callback formats
output to be more readable, making complex return values easier to understand.
The `profile_tasks` callback shows execution time for each task, helping
identify performance bottlenecks. Displaying passed arguments makes it easier to
verify what values are being used, while connection settings can be tuned to
provide more reliable access to remote systems for debugging.

Configuration changes can be made at different levels: system-wide in
`/etc/ansible/ansible.cfg`, per-user in `~/.ansible.cfg`, or per-project in a
local `ansible.cfg` file. This flexibility allows you to have different settings
for different environments, such as more verbose output for development but more
concise logs in production.

### Implementing custom callback plugins

For advanced debugging, custom callback plugins can capture and display
information in formats tailored to your needs. Callbacks intercept specific
events during playbook execution and can process them in custom ways:

```text
[label custom_debug_callback.py]
from ansible.plugins.callback import CallbackBase

class CallbackModule(CallbackBase):
   CALLBACK_VERSION = 2.0
   CALLBACK_TYPE = 'notification'
   CALLBACK_NAME = 'debug_logger'

   def __init__(self):
       super(CallbackModule, self).__init__()
       self.task_ok_counter = 0
       self.task_failed_counter = 0

   def v2_runner_on_ok(self, result):
       self.task_ok_counter += 1
       self._display.display(
           f"SUCCESS: Task '{result._task.name}' on {result._host.name} [{self.task_ok_counter}]"
       )

   def v2_runner_on_failed(self, result, ignore_errors=False):
       self.task_failed_counter += 1
       self._display.display(
           f"FAILURE: Task '{result._task.name}' on {result._host.name} [{self.task_failed_counter}]"
       )
       self._display.display(f"Error: {result._result.get('msg', 'No error message')}")
```

This simple callback plugin counts successful and failed tasks, displaying a
custom message for each. More sophisticated plugins could send notifications to
external systems, log detailed information to files, or format output in custom
ways for easier analysis.

To use a custom callback, place it in a `callback_plugins` directory in your
Ansible project and enable it in your configuration. Callbacks can respond to
various events like playbook start and end, task execution, or host unreachable
notifications, giving you complete visibility into the Ansible execution
lifecycle.

### Creating targeted test playbooks

For complex problems, creating focused test playbooks can isolate issues without
the complexity of full production playbooks:

```text
[label test_specific_task.yml]
- name: Isolated test of problematic task
 hosts: problem_host
 gather_facts: no
 tasks:
   - name: Show environment
     ansible.builtin.debug:
       msg: "Testing on {{ inventory_hostname }} with ansible_connection={{ ansible_connection | default('undefined') }}"

   - name: Run isolated version of problem task
     ansible.builtin.template:
       src: problem_template.j2
       dest: /tmp/test_output.conf
     register: test_result

   - name: Show detailed results
     ansible.builtin.debug:
       var: test_result
       verbosity: 1
```

This approach lets you focus on a single host or task, remove dependencies that
might mask the real problem, and gather comprehensive information about a
specific operation. By simplifying the context, you can more easily identify
what's causing an issue without the noise of an entire complex playbook.

Test playbooks are especially useful when you encounter intermittent issues that
are difficult to reproduce. By creating a minimal reproduction case, you can run
it repeatedly until the issue occurs, then examine the conditions in detail.

### Utilizing check mode effectively

Ansible's check mode (`--check`) predicts changes without making them, which is
valuable for debugging:

```text
[command]
ansible-playbook playbook.yml --check --diff
```

The `--diff` flag enhances check mode by showing exactly what changes would be
made to files. This combination allows you to see exactly what changes Ansible
would make without actually applying them, which is invaluable for verifying
that a playbook will do what you expect.

You can also control check mode behavior within individual tasks:

```text
[label check_mode_control.yml]
- name: Demo check mode control
 hosts: webservers
 tasks:
   - name: Task that always runs even in check mode
     ansible.builtin.command: hostname
     check_mode: false
     register: hostname_result

   - name: Task that reports changes even in check mode
     ansible.builtin.debug:
       msg: "This would create a new config"
     check_mode: false
     changed_when: true

   - name: Task that never reports changes in check mode
     ansible.builtin.template:
       src: nginx.conf.j2
       dest: /etc/nginx/nginx.conf
     check_mode: true
     diff: true
```

This technique helps identify which tasks would make changes before running a
full playbook. It's particularly useful for verifying changes in sensitive
environments where downtime must be minimized.

Check mode doesn't work perfectly with all modules, particularly those that
execute commands or scripts. Some modules might need to make actual connections
or queries to determine what would change. Understanding these limitations is
important when using check mode for debugging.

### Developing debugging strategies for complex roles

Roles introduce another layer of complexity for debugging. Effective approaches
include enabling conditional debugging outputs within roles:

```text
[label role_debug.yml]
- name: Play with role debugging
 hosts: webservers
 vars:
   nginx_role_debug: true
 roles:
   - role: nginx
```

Within the role itself, you can add debug tasks that activate conditionally:

```text
[label roles/nginx/tasks/main.yml]
- name: Show role variables
 ansible.builtin.debug:
   msg: |
     Port: {{ nginx_port | default('80') }}
     Document root: {{ nginx_docroot | default('/var/www/html') }}
     Worker processes: {{ nginx_workers | default('auto') }}
 when: nginx_role_debug | default(false)

- name: Include installation tasks
 ansible.builtin.include_tasks: install.yml
```

This approach lets you toggle debugging output for roles without modifying the
role files themselves. The `nginx_role_debug` variable acts as a switch that can
be set at the playbook level, making it easy to enable verbose output when
needed and disable it during normal operation.

Another powerful technique is using tags to selectively run parts of complex
roles:

```text
[command]
ansible-playbook playbook.yml --tags nginx-config
```

By tagging different sections of your roles, you can focus on specific
functionality during debugging. This is particularly valuable in complex roles
with many tasks, as it lets you isolate the specific area where issues are
occurring.

## Integration with logging systems

For production environments, integrating Ansible with centralized logging
provides valuable debugging information even after playbook execution has
completed:

```text
[label logging_playbook.yml]
- name: Playbook with centralized logging
 hosts: all
 vars:
   log_server: "logs.example.com"

 pre_tasks:
   - name: Record playbook start
     community.general.logstash:
       server: "{{ log_server }}"
       port: 5000
       message:
         event: "playbook_start"
         playbook: "{{ ansible_play_name }}"
         hosts: "{{ ansible_play_hosts | join(',') }}"
     delegate_to: localhost
     run_once: true

 tasks:
   - name: Sample task
     ansible.builtin.debug:
       msg: "Running task"

 post_tasks:
   - name: Record playbook completion
     community.general.logstash:
       server: "{{ log_server }}"
       port: 5000
       message:
         event: "playbook_complete"
         playbook: "{{ ansible_play_name }}"
         status: "{{ ansible_failed_task | default('success') }}"
     delegate_to: localhost
     run_once: true
```

This approach creates audit trails for automation that can be searched and
analyzed later. It centralizes debugging information from multiple playbook
runs, enabling correlation with other system events. It also provides historical
data for troubleshooting, which is invaluable when issues occur infrequently or
are related to specific environmental conditions.

[Centralized logging](https://betterstack.com/community/guides/logging/log-aggregation/) becomes increasingly important as your Ansible usage scales.
When multiple teams run playbooks across numerous systems, having a central
repository of execution information helps identify patterns and common issues
that might not be apparent from individual runs.

[ad-logs]

## Final thoughts

Effective Ansible debugging is a blend of art and science, combining technical
tools with systematic problem-solving approaches.

Throughout this guide, we've explored strategies ranging from basic verbosity
increases to sophisticated custom plugins and comprehensive testing frameworks.

The key to successful debugging lies in understanding Ansible's execution model,
leveraging the right tools for each situation, and developing a systematic
approach to isolating and resolving issues.

Thanks for reading!