# Scrapling: Adaptive Web Scraping with Self-Healing Selectors in Python

[Scrapling](https://github.com/D4Vinci/Scrapling) is an **all-in-one Python web scraping framework built around an adaptive parser that can relocate extracted elements** after a website redesign. When a selector is run with `auto_save=True`, Scrapling records a structural profile of the matched element. When the same selector fails later because class names or structure changed, running it with `adaptive=True` causes Scrapling to find the closest matching element using the stored profile.

<iframe width="100%" height="315" src="https://www.youtube.com/embed/q-uj7wk0LRI" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>


## The problem with CSS-selector-based scrapers

Traditional scrapers extract data using CSS selectors or XPath expressions:

```python
price_element = soup.find("div", class_="price")
price = price_element.text
```

This fails silently or raises an error if the site renames the class from `price` to `product-cost`, moves the element to a different parent, or restructures the page. One upstream change breaks the entire downstream pipeline.

![Graphic illustrating how a website change breaks a data pipeline](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/cb730ee3-5ac2-43e8-3359-e9c2e3353900/public =1280x720)

## Adaptive parsing: how it works

When an element is found with `auto_save=True`, Scrapling records a profile containing the element's tag, all attributes, parent information, child information, neighboring text, DOM position relative to siblings, and structural shape in the DOM tree.

When the selector is run again with `adaptive=True` on a changed page, Scrapling compares this stored profile against all elements on the new page and selects the one with the highest similarity score.

![Animation showing the various clues Scrapling records for an element](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/8420cd74-0356-49ef-695b-366ea1037600/lg2x =1280x720)

## Practical example

The following demonstrates the adaptive parser against two versions of a product page HTML.

### Setup

```command
python -m venv scrapling-env && source scrapling-env/bin/activate
```

```command
pip install scrapling
```

### The two HTML versions

```python
[label main.py]
from scrapling.parser import Selector

URL = "https://example-shop.test/products"

# Version 1
SHOP_V1 = """
<html><body>
    <div class="product-card">
        <h2 class="product-title">Mechanical Keyboard</h2>
        <span class="product-price">$129</span>
    </div>
</body></html>
"""

# Version 2 - redesigned class names and structure
SHOP_V2 = """
<html><body>
    <section class="catalog-item">
        <h2 class="item-heading">Mechanical Keyboard</h2>
        <span class="pricing-value">$129</span>
    </section>
</body></html>
"""
```

![Python code showing the two HTML versions SHOP_V1 and SHOP_V2](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/fa263e37-e595-41f5-2e0b-f8fe16fbca00/orig =1280x720)

Both versions contain the same data but use completely different class names and container elements.

### Initial scrape with auto_save

```python
[label main.py]
# Run against V1 with auto_save=True to record element profiles
page = Selector(content=SHOP_V1, url=URL)

name = page.css(".product-title", identifier="name", auto_save=True).first
price = page.css(".product-price", identifier="price", auto_save=True).first

print({"name": name.text, "price": price.text})
# Output: {'name': 'Mechanical Keyboard', 'price': '$129'}
```

![Terminal output showing the successful initial scrape](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/f1dfad1b-9fe3-498a-09c5-e0e29fdf6400/lg2x =1280x720)

`identifier` gives each extraction a stable name for internal storage. `auto_save=True` tells Scrapling to record the structural profile of the matched element.

### Standard selectors break on V2

```python
[label main.py]
# Same selectors against V2 without adaptive mode
dead_page = Selector(content=SHOP_V2, url=URL)

dead_name = dead_page.css(".product-title").first
dead_price = dead_page.css(".product-price").first

print(f".product-title -> {dead_name}")   # None
print(f".product-price -> {dead_price}")  # None
```

The selectors return `None`. Calling `.text` on `None` would raise `AttributeError` in production.

### Adaptive mode recovers the data

```python
[label main.py]
# Same original selectors against V2 with adaptive=True
page = Selector(content=SHOP_V2, url=URL, adaptive=True)

name = page.css(".product-title", identifier="name", adaptive=True).first
price = page.css(".product-price", identifier="price", adaptive=True).first

print({"name": name.text if name else None, "price": price.text if price else None})
# Output: {'name': 'Mechanical Keyboard', 'price': '$129'}
```

![Terminal output showing that adaptive mode successfully extracted the data](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/d65980cc-e16e-47ee-687f-17ecf3f54000/md1x =1280x720)

`adaptive=True` on the `Selector` and on each `.css()` call enables the matching logic. When `.product-title` finds no direct match in V2, Scrapling consults the stored profile: the element was an `h2`, contained the text "Mechanical Keyboard", was a sibling of a price-containing `span`, and was the first child of a card-like container. It finds the `h2.item-heading` as the closest match and returns it.

## Fetchers

Scrapling provides three fetchers that replace the need for separate `requests` and `Playwright` dependencies.

![Comparison table of Scrapling's three main fetchers from the official documentation](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/0fe05161-1d7c-4e8b-76f4-40eb2c0f2900/orig =1280x720)

**`Fetcher`** makes plain HTTP requests. Suitable for static pages with no JavaScript rendering and minimal bot protection.

**`DynamicFetcher`** uses a Chromium browser via Playwright. Loads the page, executes JavaScript, waits for content to render, then returns the final HTML. Required for single-page applications and dynamically loaded content.

**`StealthyFetcher`** extends `DynamicFetcher` with anti-detection techniques: browser fingerprint management, automatic Cloudflare Turnstile solving, and other measures that make requests resemble human browser traffic. Use this when scraping produces CAPTCHAs or blocks.

All three share a consistent API. Switching from `Fetcher` to `StealthyFetcher` is a one-line change in the code.

## Spider framework

For multi-page crawls, Scrapling provides a spider framework with asynchronous requests, session management, proxy rotation, crawl checkpointing (pause and resume), and configurable output formats.

![Data flow diagram showing the components of the Spider architecture](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/9fb7a251-0b7e-493d-be69-618cc57c2500/orig =1280x720)

This is the appropriate tool when a script needs to follow links, paginate through results, or manage a large-scale extraction job.

## When to use Scrapling

Scrapling is most valuable for long-running data pipelines where selector maintenance is a recurring cost, for price monitoring or other time-sensitive extractions where uptime matters, and for feeding data to AI pipelines or RAG jobs where consistent data quality is required.

![Diagram showing ideal use cases for Scrapling including data pipelines and AI agents](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/2f19dad2-8b3f-434a-cb07-ea4a3735b500/lg2x =1280x720)

For one-off scripts against simple, stable pages, `requests` and `BeautifulSoup` remain a lighter choice. Scrapling's overhead in setup and storage pays off when the alternative is regular manual selector rewrites after site redesigns.

## Final thoughts

The adaptive parser is Scrapling's core differentiator. The **structural profile approach is more resilient than text matching or line-number anchors because it combines multiple independent signals** (tag, content, parent, position) rather than relying on a single identifier. A site can change class names, restructure containers, and rename attributes, and as long as enough of the structural context is preserved, Scrapling can still locate the element.

The integrated fetcher hierarchy (plain HTTP → browser → stealth browser) with a consistent API means the same parsing code works regardless of which fetcher is needed, and upgrading the fetcher when a site adds bot protection requires no changes to the extraction logic.

Source code and documentation are at [github.com/D4Vinci/Scrapling](https://github.com/D4Vinci/Scrapling).