# Common Ansible Errors and How to Fix Them

[Ansible](https://betterstack.com/community/guides/linux/ansible-getting-started/) has become one of the most popular tools for automation, configuration
management, and infrastructure as code.

Its agentless architecture and YAML-based playbooks make it relatively easy to
learn, but that doesn't mean you won't encounter errors.

In fact, as you progress from simple playbooks to more complex automation
scenarios, troubleshooting becomes an increasingly important skill.

This article covers the most common errors you'll encounter when working with
Ansible, why they occur, and how to fix them.

Understanding these common pitfalls will help you build more robust automation
and save time when issues arise.

[ad-logs]

## Syntax and YAML errors

YAML forms the foundation of Ansible playbooks. Its human-readable format makes
playbooks easy to write, but its strict syntax rules can also lead to
frustrating errors.

### YAML indentation errors

Perhaps the most common error in Ansible is related to YAML indentation. YAML
uses indentation to establish the structure and hierarchy of data. Unlike some
languages where indentation is merely a matter of style, in YAML, indentation is
syntactically significant.

**Common error message**:

```text
. . .
Syntax Error while loading YAML.
  mapping values are not allowed in this context

The error appears to be in '/home/ayo/dev/betterstack/demo/ansible-errors/playbook.yml': line 5, column 10, but may
be elsewhere in the file depending on the exact syntax problem.
. . .
```

![Ansible syntax error](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/ce61f199-f243-467f-cc1e-0d703841c900/lg2x =2374x1162)

**Why it happens**:

This error typically occurs when you mix tabs and spaces or use inconsistent
indentation levels. YAML is very particular about this distinction.

**How to fix it**:

1. Use spaces instead of tabs for indentation.
2. Maintain consistent indentation (2 spaces is the common standard).
3. Use a [YAML validator](https://www.yamllint.com/) or linter to check your
   files.

Here's an example of incorrect indentation:

```text
- name: Install web server
  hosts: webservers
  tasks:
  - name: Install Apache
      apt:
        name: apache2
        state: present
    - name: Start Apache service
      service:
        name: apache2
        state: started
```

And here's the corrected version:

```text
- name: Install web server
  hosts: webservers
  tasks:
    - name: Install Apache
      apt:
        name: apache2
        state: present
    - name: Start Apache service
      service:
        name: apache2
        state: started
```

To validate your YAML files, you can use the
[yamllint tool](https://github.com/adrienverge/yamllint):

```command
yamllint playbook.yml
```

![Yamllint errors](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/730ab99e-6b0b-4174-2843-1a8af42eb700/md1x =1996x806)

### Missing or invalid quotes

Another common syntax error involves string quoting, especially when your
strings contain special characters.

**Common error message**:

```text
[output]
ERROR! Syntax Error while loading YAML.
  found unacceptable character #: mapping values are not allowed in this context

The error appears to be in '/path/to/playbook.yml': line 12, column 15, but may
be elsewhere in the file depending on the exact syntax problem.
```

**Why it happens**:

YAML requires strings containing special characters like colons, hash symbols,
or starting with characters like asterisks to be quoted. Additionally, strings
containing variables might need proper quoting to avoid interpolation issues.

**How to fix it**:

1. Quote strings containing special characters
2. Use single quotes to prevent variable interpolation
3. Use double quotes when you need variable interpolation

Here's an example that would cause errors:

```yaml
- name: Configure application
  hosts: app_servers
  vars:
    app_config:
      url: http://example.com:8080  # Error: contains colon
      comment: This server handles #1 priority tasks  # Error: contains hash
      command: ls -la  # Error: contains space and hyphen
  tasks:
    - name: Create configuration file
      template:
        src: config.j2
        dest: /etc/app/config.yaml
```

Corrected version:

```yaml
- name: Configure application
  hosts: app_servers
  vars:
    app_config:
      url: "http://example.com:8080"
      comment: "This server handles #1 priority tasks"
      command: "ls -la"
  tasks:
    - name: Create configuration file
      template:
        src: config.j2
        dest: /etc/app/config.yaml
```

### Jinja2 template syntax errors

Ansible uses Jinja2 templating extensively, which provides powerful capabilities
but can also introduce errors, especially when mixing it with YAML syntax.

**Common error message**:

```text
[output]
ERROR! template error while templating string: unexpected '{'. String: {{ item }}{{ ansible_facts['hostname'] }}
```

**Why it happens**:

These errors often occur due to missing spaces in Jinja2 expressions, incorrect
filter syntax, or confusing Jinja2 with YAML syntax.

**How to fix it**:

1. Ensure proper spacing in Jinja2 expressions (`{{ variable }}` not
   `{{variable}}`).
2. Use proper syntax for filters (`{{ variable | filter }}`).
3. Properly quote templated strings in YAML.

Incorrect example:

```text
- name: Configure hosts
  hosts: all
  tasks:
    - name: Create file with hostname
      file:
        path: /tmp/{{item}}{{ansible_facts['hostname']}}.txt
        state: touch
      loop:
        - server_
        - host_
```

Corrected version:

```text
- name: Configure hosts
  hosts: all
  tasks:
    - name: Create file with hostname
      file:
        path: "/tmp/{{ item }}{{ ansible_facts['hostname'] }}.txt"
        state: touch
      loop:
        - server_
        - host_
```

For complex Jinja2 expressions, you can use the debug module to test your
syntax:

```text
- name: Debug Jinja2 expressions
  hosts: localhost
  vars:
    my_string: "Hello World"
    my_list: [1, 2, 3, 4, 5]
  tasks:
    - name: Test Jinja2 expression
      debug:
        msg: "{{ my_string | upper }} {{ my_list | sum }}"
```

## Connection errors

Since Ansible operates by connecting to remote hosts, connection problems are a
common source of errors. Understanding these issues is crucial for effective
troubleshooting.

SSH is Ansible's primary method for connecting to managed nodes, and SSH-related
issues are among the most common errors.

**Common error message**:

```text
[output]
UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 192.168.1.100 port 22: Connection timed out", "unreachable": true}
```

![Ansible unreacheable error](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/bd190125-48e0-4912-0e69-4c7d88d28c00/md2x =2374x1162)


**Why it happens**: SSH connection failures can result from network connectivity
issues, firewall configurations, incorrect credentials, SSH service not running,
or incorrect SSH configuration.

**How to fix it**:

1. Verify network connectivity with a `ping` test.
2. Check that SSH service is running on the target.
3. Verify firewall rules allow SSH connections.
4. Ensure SSH credentials are correct.
5. Configure SSH options in `ansible.cfg`.

To test basic connectivity:

```command
ping <ip_address>
```

You can configure SSH options in your `ansible.cfg` file:

```text
[label ansible.cfg]
[defaults]
inventory = ./inventory
remote_user = deploy

[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o ConnectTimeout=10
pipelining = True
```

For more verbose SSH debugging, increase Ansible's verbosity:

```command
ansible-playbook playbook.yml -vvv
```

### Privilege escalation errors

Ansible often needs to run commands with elevated privileges, which can lead to
permission-related errors.

**Common error message**:

```text
[output]
FAILED! => {"msg": "Missing sudo password"}
```

Or:

```text
[output]
FAILED! => {"changed": false, "module_stderr": "sudo: a password is required\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}
```

![Privilege escalation errors](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/6b0927c0-3a68-469d-5823-32631c4bc600/lg2x =2374x1162)

**Why it happens**: These errors occur when Ansible attempts to execute tasks
that require elevated privileges without providing the necessary `sudo` password
or having passwordless `sudo` configured.

**How to fix it**:

1. Use the `--ask-become-pass` option to prompt for the `sudo` password.
2. Configure passwordless `sudo` on the target hosts.
3. Specify the become method in your playbook.
4. Store the `sudo` password securely using [Ansible Vault](https://betterstack.com/community/guides/linux/ansible-vault/).

Example playbook with become (`sudo`) configured:

```yaml
- name: Configure system
  hosts: webservers
  become: true
  become_method: sudo
  become_user: root
  tasks:
    - name: Install required packages
      apt:
        name:
          - nginx
          - curl
          - python3
        state: present
```

To run the playbook with `sudo` password prompt:

```command
ansible-playbook playbook.yml --ask-become-pass
```

### Host key verification failures

SSH relies on host key verification for security, which can sometimes cause
connection issues with Ansible.

**Common error message**:

```text
[output]
UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Host key verification failed.", "unreachable": true}
```

**Why it happens**: This error occurs when the SSH host key of the target system
isn't in your known_hosts file, or when the host key has changed (which could
indicate a potential security issue).

**How to fix it**:

1. Manually connect to the host via SSH to add it to `known_hosts`.
2. Disable host key checking in `ansible.cfg` (less secure but convenient for
   testing).
3. Use `ssh-keyscan` to add the host key to known_hosts programmatically.

To disable host key checking for testing:

```text
[label ansible.cfg]
[defaults]
host_key_checking = False
```

For a more secure approach, add the host key programmatically:

```command
ssh-keyscan 192.168.1.100 >> ~/.ssh/known_hosts
```

## Inventory and variable errors

Properly managing inventory and variables is crucial for Ansible. Errors in
these areas can be particularly confusing because they may not manifest until
specific tasks are executed.

### Undefined variables

Using a variable that hasn't been defined is a common error, especially in
complex playbooks with multiple variable sources.

**Common error message**:

```text
[output]
FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'app_version' is undefined"}
```

**Why it happens**: This error occurs when you reference a variable that hasn't
been defined in any of Ansible's variable sources (playbook vars, inventory,
group_vars, etc.) or when you misspell a variable name.

**How to fix it**:

1. Check variable definitions across all relevant files.
2. Use the default filter to provide fallback values.
3. Use debug tasks to inspect variable content.
4. Use `ansible-inventory --list` to see all inventory variables.

Example using the default filter:

```text
- name: Deploy application
  hosts: app_servers
  tasks:
    - name: Create app directory
      file:
        path: "/opt/app/{{ app_version | default('latest') }}"
        state: directory
```

Adding `debug` tasks to inspect variables:

```yaml
- name: Debug variables
  hosts: app_servers
  tasks:
    - name: Display all variables
      debug:
        var: hostvars[inventory_hostname]

    - name: Display specific variable
      debug:
        var: app_version
      ignore_errors: yes
```

### Inventory parsing issues

Problems with inventory file syntax can prevent Ansible from properly
recognizing and connecting to hosts.

**Common error message**:

```text
[output]
ERROR! Attempted to read "/path/to/inventory" as YAML: Syntax Error while loading YAML.
```

Or:

```text
[output]
[WARNING]: Could not match supplied host pattern, ignoring: webservers
```

**Why it happens**: These errors occur due to syntax errors in inventory files,
incorrect group definitions, or typos in host patterns.

**How to fix it**:

1. Validate inventory syntax.
2. Check group names and hierarchy.
3. Use `ansible-inventory --graph` to visualize your inventory structure.

Example of a problematic inventory file:

```text
[webservers]
web1.example.com
web2.example.com

[dbservers]
db1.example.com
db2.example.com

[production:children]
webservers
database  # This should be dbservers
```

Corrected version:

```text
[webservers]
web1.example.com
web2.example.com

[dbservers]
db1.example.com
db2.example.com

[production:children]
webservers
dbservers
```

To validate your inventory structure:

```command
ansible-inventory --graph
```

You should see output like:

```text
[output]
@all:
  |--@dbservers:
  |  |--db1.example.com
  |  |--db2.example.com
  |--@production:
  |  |--@dbservers:
  |  |  |--db1.example.com
  |  |  |--db2.example.com
  |  |--@webservers:
  |  |  |--web1.example.com
  |  |  |--web2.example.com
  |--@ungrouped:
  |--@webservers:
  |  |--web1.example.com
  |  |--web2.example.com
```

### Variable precedence confusion

Ansible has a complex variable precedence system that can lead to unexpected
values being used.

**Common error message**: There's rarely a specific error message for precedence
issues. Instead, you'll notice variables having unexpected values.

**Why it happens**: Ansible has a specific order in which it processes variables
from different sources. When the same variable is defined in multiple places,
the value from the highest-precedence source wins.

**How to fix it**:

1. Review Ansible's variable precedence documentation.
2. Use debug tasks to find where variables are coming from.
3. Use `ansible-inventory --list` to see all inventory variables.
4. Consider where you define variables based on their scope and purpose.

Example `debug` task to help trace variable sources:

```yaml
- name: Trace variable precedence
  hosts: app_servers
  vars:
    app_port: 8080
  tasks:
    - name: Show app_port from different sources
      debug:
        msg: |
          Playbook vars: {{ app_port }}
          Group vars: {{ hostvars[inventory_hostname].app_port | default('undefined') }}
          Host vars: {{ hostvars[inventory_hostname].app_port | default('undefined') }}
          Extra vars: {{ app_port }}
```

Run this with:

```command
ansible-playbook trace_vars.yml --extra-vars "app_port=9000"
```

## Module-specific errors

Ansible modules are the workhorses that perform actual tasks on managed nodes.
Each module has its own set of potential errors.

### Command/shell module failures

The command and shell modules are among the most commonly used, but they can
fail for various reasons.

**Common error message**:

```text
[output]
FAILED! => {"changed": true, "cmd": "service apache2 status", "delta": "0:00:00.005625", "end": "2023-04-11 14:30:12.125799", "msg": "non-zero return code", "rc": 3, "start": "2023-04-11 14:30:12.120174", "stderr": "", "stderr_lines": [], "stdout": "apache2 is not running", "stdout_lines": ["apache2 is not running"]}
```

**Why it happens**: Command module failures typically occur when the executed
command exits with a non-zero status. This could be due to the command not
existing, insufficient permissions, or the command itself failing.

**How to fix it**:

1. Add `ignore_errors: true` for commands that may legitimately fail
2. Use `become: true` for commands requiring elevated privileges
3. Use the `failed_when` directive to customize failure conditions
4. Consider using specialized modules instead of raw commands

Example with improved error handling:

```yaml
- name: Check and restart services
  hosts: webservers
  become: true
  tasks:
    - name: Check if Apache is running
      command: systemctl status apache2
      register: apache_status
      ignore_errors: true
      changed_when: false  # Status check doesn't change anything

    - name: Restart Apache if not running
      service:
        name: apache2
        state: restarted
      when: apache_status.rc != 0
```

Using `failed_when` for custom failure conditions:

```yaml
- name: Check disk space
  hosts: all
  tasks:
    - name: Get disk usage
      command: df -h /
      register: df_output
      changed_when: false

    - name: Parse disk usage
      set_fact:
        disk_usage_pct: "{{ df_output.stdout_lines[1].split()[4] | replace('%', '') }}"

    - name: Check if disk space is critical
      fail:
        msg: "Disk usage is critical: {{ disk_usage_pct }}%"
      when: disk_usage_pct | int > 90
```

### Package management errors

Package installation issues are common, especially when dealing with different
distributions or repository configurations.

**Common error message**:

```text
[output]
FAILED! => {"changed": false, "msg": "No package matching 'apache2' available."}
```

Or:

```text
[output]
FAILED! => {"changed": false, "msg": "Failed to update apt cache: E: Could not get lock /var/lib/apt/lists/lock"}
```

**Why it happens**: These errors occur when packages are not available in the
configured repositories, repositories are not accessible, or there are locking
issues with the package manager.

**How to fix it**:

1. Verify package name and availability for the target distribution
2. Ensure repositories are properly configured
3. Update package cache before installation
4. Handle lock files appropriately

Example with improved package handling:

```yaml
- name: Install packages robustly
  hosts: webservers
  become: true
  tasks:
    - name: Update apt cache
      apt:
        update_cache: yes
        cache_valid_time: 3600  # Use cached results if updated within the last hour
      when: ansible_os_family == "Debian"

    - name: Install Apache (Debian/Ubuntu)
      apt:
        name: apache2
        state: present
      when: ansible_os_family == "Debian"

    - name: Install Apache (RedHat/CentOS)
      yum:
        name: httpd
        state: present
      when: ansible_os_family == "RedHat"
```

For lock file issues, you can add retry logic:

```yaml
- name: Install packages with retry
  hosts: all
  become: true
  tasks:
    - name: Install required packages
      apt:
        name: nginx
        state: present
        update_cache: yes
      register: apt_result
      retries: 5
      delay: 10
      until: apt_result is success
```

### File operation errors

File operations can fail due to permissions, path issues, or disk space
constraints.

**Common error message**:

```text
[output]
FAILED! => {"changed": false, "msg": "Error creating file /etc/app/config.ini: [Errno 13] Permission denied: '/etc/app/config.ini'"}
```

Or:

```text
[output]
FAILED! => {"changed": false, "msg": "Error creating directory /var/www/html/uploads: [Errno 2] No such file or directory: '/var/www/html/uploads'"}
```

**Why it happens**: File operation errors typically occur due to insufficient
permissions, non-existent parent directories, or full filesystems.

**How to fix it**:

1. Use `become: true` to get elevated privileges.
2. Ensure parent directories exist (use `state: directory` with the `file`
   module).
3. Check file ownership and permissions.
4. Verify available disk space.

Example with robust file operations:

```yaml
- name: Create application directories
  hosts: webservers
  become: true
  tasks:
    - name: Check available disk space
      command: df -h /var
      register: df_output
      changed_when: false

    - name: Ensure parent directory exists
      file:
        path: /var/www/myapp
        state: directory
        mode: '0755'
        owner: www-data
        group: www-data

    - name: Create nested directories with parent option
      file:
        path: /var/www/myapp/uploads/images
        state: directory
        mode: '0775'
        owner: www-data
        group: www-data
        recurse: yes  # Creates parent directories if needed
```

For copying files with proper permissions:

```yaml
- name: Copy configuration files
  hosts: app_servers
  become: true
  tasks:
    - name: Copy app configuration
      copy:
        src: files/app.conf
        dest: /etc/app/app.conf
        owner: app_user
        group: app_group
        mode: '0640'
        backup: yes  # Create backup of existing file
```

## Performance and scalability issues

As your Ansible deployments grow, you may encounter performance issues that
aren't strictly errors but can significantly impact usability.

### Playbook execution timeouts

Long-running tasks can timeout, especially when Ansible's default timeouts are
insufficient.

**Common error message**:

```text
[output]
FAILED! => {"msg": "The async task did not complete within the requested time (300s)."}
```

**Why it happens**: Ansible has various timeout settings that can cause tasks to
fail when they take too long to complete, such as large file transfers, database
migrations, or package installations.

**How to fix it**:

1. Use async/poll for long-running tasks.
2. Adjust timeout settings in `ansible.cfg`.
3. Break large tasks into smaller components.

Example using async for long-running tasks:

```yaml
- name: Run long operations
  hosts: webservers
  become: true
  tasks:
    - name: Update all packages
      apt:
        upgrade: dist
        update_cache: yes
      async: 3600  # Allow this task to run for up to 1 hour
      poll: 30     # Check status every 30 seconds
```

Configure timeouts in `ansible.cfg`:

```text
[label ansible.cfg]
[defaults]
timeout = 60  # Default SSH timeout in seconds

[ssh_connection]
ssh_args = -o ConnectTimeout=60 -o ServerAliveInterval=30
```

### Memory issues with large inventories

When working with large inventories, Ansible can consume significant memory,
potentially causing performance problems or failures.

**Common error message**: Memory issues typically manifest as the ansible
process being killed by the OS or extremely slow performance.

**Why it happens**: Ansible loads the entire inventory into memory and collects
facts from all hosts by default. With large inventories, this can consume
substantial memory resources.

**How to fix it**:

1. Use fact caching to reduce repeated fact gathering
2. Limit fact gathering when possible
3. Use the `--limit` option to target specific hosts
4. Break large playbooks into smaller components

Configure fact caching in `ansible.cfg`:

```text
[label ansible.cfg]
[defaults]
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_fact_cache
fact_caching_timeout = 86400  # 24 hours in seconds
```

Limit fact gathering in playbooks:

```yaml
- name: Minimal facts playbook
  hosts: all
  gather_facts: no  # Don't gather facts at all

  tasks:
    - name: Gather only the facts we need
      setup:
        gather_subset:
          - '!all'
          - '!min'
          - 'network'
          - 'hardware'

    - name: Display only network facts
      debug:
        var: ansible_default_ipv4
```

### Parallelism problems

Ansible's parallel execution can sometimes lead to resource contention and
unpredictable behavior.

**Common error message**: There's usually no specific error message, but you
might see inconsistent results or timeouts when running against many hosts
simultaneously.

**Why it happens**: By default, Ansible runs tasks in parallel across hosts.
This can cause issues when tasks compete for resources or when order matters
between different host groups.

**How to fix it**:

1. Adjust the `forks` parameter to control parallelism
2. Use the `serial` directive to limit simultaneous execution
3. Apply throttling to resource-intensive tasks

Configure lower parallelism in `ansible.cfg`:

```text
[label ansible.cfg]
[defaults]
forks = 10  # Default is 5
```

Use serial execution for critical tasks:

```yaml
- name: Update database servers in sequence
  hosts: db_servers
  serial: 1  # Run on one host at a time
  become: true
  tasks:
    - name: Stop database service
      service:
        name: postgresql
        state: stopped

    - name: Update database packages
      apt:
        name: postgresql
        state: latest

    - name: Start database service
      service:
        name: postgresql
        state: started
```

Using throttle for specific tasks:

```yaml
- name: Run resource-intensive operations
  hosts: all
  become: true
  tasks:
    - name: Rebuild search index
      command: rebuild_search_index
      throttle: 3  # Only run on 3 hosts at a time
```

## Final thoughts

Understanding common Ansible errors and their solutions is crucial for effective
automation. By recognizing patterns in YAML syntax issues, connection failures,
variable handling, and module-specific errors, you can quickly diagnose and
resolve problems.

Thanks for reading!