Back to Scaling Python applications guides

Scrapling: Adaptive Web Scraping with Self-Healing Selectors in Python

Stanley Ulili
Updated on June 8, 2026

Scrapling is an all-in-one Python web scraping framework built around an adaptive parser that can relocate extracted elements after a website redesign. When a selector is run with auto_save=True, Scrapling records a structural profile of the matched element. When the same selector fails later because class names or structure changed, running it with adaptive=True causes Scrapling to find the closest matching element using the stored profile.

The problem with CSS-selector-based scrapers

Traditional scrapers extract data using CSS selectors or XPath expressions:

 
price_element = soup.find("div", class_="price")
price = price_element.text

This fails silently or raises an error if the site renames the class from price to product-cost, moves the element to a different parent, or restructures the page. One upstream change breaks the entire downstream pipeline.

Graphic illustrating how a website change breaks a data pipeline

Adaptive parsing: how it works

When an element is found with auto_save=True, Scrapling records a profile containing the element's tag, all attributes, parent information, child information, neighboring text, DOM position relative to siblings, and structural shape in the DOM tree.

When the selector is run again with adaptive=True on a changed page, Scrapling compares this stored profile against all elements on the new page and selects the one with the highest similarity score.

Animation showing the various clues Scrapling records for an element

Practical example

The following demonstrates the adaptive parser against two versions of a product page HTML.

Setup

 
python -m venv scrapling-env && source scrapling-env/bin/activate
 
pip install scrapling

The two HTML versions

main.py
from scrapling.parser import Selector

URL = "https://example-shop.test/products"

# Version 1
SHOP_V1 = """
<html><body>
    <div class="product-card">
        <h2 class="product-title">Mechanical Keyboard</h2>
        <span class="product-price">$129</span>
    </div>
</body></html>
"""

# Version 2 - redesigned class names and structure
SHOP_V2 = """
<html><body>
    <section class="catalog-item">
        <h2 class="item-heading">Mechanical Keyboard</h2>
        <span class="pricing-value">$129</span>
    </section>
</body></html>
"""

Python code showing the two HTML versions SHOP_V1 and SHOP_V2

Both versions contain the same data but use completely different class names and container elements.

Initial scrape with auto_save

main.py
# Run against V1 with auto_save=True to record element profiles
page = Selector(content=SHOP_V1, url=URL)

name = page.css(".product-title", identifier="name", auto_save=True).first
price = page.css(".product-price", identifier="price", auto_save=True).first

print({"name": name.text, "price": price.text})
# Output: {'name': 'Mechanical Keyboard', 'price': '$129'}

Terminal output showing the successful initial scrape

identifier gives each extraction a stable name for internal storage. auto_save=True tells Scrapling to record the structural profile of the matched element.

Standard selectors break on V2

main.py
# Same selectors against V2 without adaptive mode
dead_page = Selector(content=SHOP_V2, url=URL)

dead_name = dead_page.css(".product-title").first
dead_price = dead_page.css(".product-price").first

print(f".product-title -> {dead_name}")   # None
print(f".product-price -> {dead_price}")  # None

The selectors return None. Calling .text on None would raise AttributeError in production.

Adaptive mode recovers the data

main.py
# Same original selectors against V2 with adaptive=True
page = Selector(content=SHOP_V2, url=URL, adaptive=True)

name = page.css(".product-title", identifier="name", adaptive=True).first
price = page.css(".product-price", identifier="price", adaptive=True).first

print({"name": name.text if name else None, "price": price.text if price else None})
# Output: {'name': 'Mechanical Keyboard', 'price': '$129'}

Terminal output showing that adaptive mode successfully extracted the data

adaptive=True on the Selector and on each .css() call enables the matching logic. When .product-title finds no direct match in V2, Scrapling consults the stored profile: the element was an h2, contained the text "Mechanical Keyboard", was a sibling of a price-containing span, and was the first child of a card-like container. It finds the h2.item-heading as the closest match and returns it.

Fetchers

Scrapling provides three fetchers that replace the need for separate requests and Playwright dependencies.

Comparison table of Scrapling's three main fetchers from the official documentation

Fetcher makes plain HTTP requests. Suitable for static pages with no JavaScript rendering and minimal bot protection.

DynamicFetcher uses a Chromium browser via Playwright. Loads the page, executes JavaScript, waits for content to render, then returns the final HTML. Required for single-page applications and dynamically loaded content.

StealthyFetcher extends DynamicFetcher with anti-detection techniques: browser fingerprint management, automatic Cloudflare Turnstile solving, and other measures that make requests resemble human browser traffic. Use this when scraping produces CAPTCHAs or blocks.

All three share a consistent API. Switching from Fetcher to StealthyFetcher is a one-line change in the code.

Spider framework

For multi-page crawls, Scrapling provides a spider framework with asynchronous requests, session management, proxy rotation, crawl checkpointing (pause and resume), and configurable output formats.

Data flow diagram showing the components of the Spider architecture

This is the appropriate tool when a script needs to follow links, paginate through results, or manage a large-scale extraction job.

When to use Scrapling

Scrapling is most valuable for long-running data pipelines where selector maintenance is a recurring cost, for price monitoring or other time-sensitive extractions where uptime matters, and for feeding data to AI pipelines or RAG jobs where consistent data quality is required.

Diagram showing ideal use cases for Scrapling including data pipelines and AI agents

For one-off scripts against simple, stable pages, requests and BeautifulSoup remain a lighter choice. Scrapling's overhead in setup and storage pays off when the alternative is regular manual selector rewrites after site redesigns.

Final thoughts

The adaptive parser is Scrapling's core differentiator. The structural profile approach is more resilient than text matching or line-number anchors because it combines multiple independent signals (tag, content, parent, position) rather than relying on a single identifier. A site can change class names, restructure containers, and rename attributes, and as long as enough of the structural context is preserved, Scrapling can still locate the element.

The integrated fetcher hierarchy (plain HTTP → browser → stealth browser) with a consistent API means the same parsing code works regardless of which fetcher is needed, and upgrading the fetcher when a site adds bot protection requires no changes to the extraction logic.

Source code and documentation are at github.com/D4Vinci/Scrapling.

Got an article suggestion? Let us know
Next article
Get Started with Job Scheduling in Python
Learn how to create and monitor Python scheduled tasks in a production environment
Licensed under CC-BY-NC-SA

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.