Scrapling: Adaptive Web Scraping with Self-Healing Selectors in Python
Scrapling is an all-in-one Python web scraping framework built around an adaptive parser that can relocate extracted elements after a website redesign. When a selector is run with auto_save=True, Scrapling records a structural profile of the matched element. When the same selector fails later because class names or structure changed, running it with adaptive=True causes Scrapling to find the closest matching element using the stored profile.
The problem with CSS-selector-based scrapers
Traditional scrapers extract data using CSS selectors or XPath expressions:
This fails silently or raises an error if the site renames the class from price to product-cost, moves the element to a different parent, or restructures the page. One upstream change breaks the entire downstream pipeline.
Adaptive parsing: how it works
When an element is found with auto_save=True, Scrapling records a profile containing the element's tag, all attributes, parent information, child information, neighboring text, DOM position relative to siblings, and structural shape in the DOM tree.
When the selector is run again with adaptive=True on a changed page, Scrapling compares this stored profile against all elements on the new page and selects the one with the highest similarity score.
Practical example
The following demonstrates the adaptive parser against two versions of a product page HTML.
Setup
The two HTML versions
Both versions contain the same data but use completely different class names and container elements.
Initial scrape with auto_save
identifier gives each extraction a stable name for internal storage. auto_save=True tells Scrapling to record the structural profile of the matched element.
Standard selectors break on V2
The selectors return None. Calling .text on None would raise AttributeError in production.
Adaptive mode recovers the data
adaptive=True on the Selector and on each .css() call enables the matching logic. When .product-title finds no direct match in V2, Scrapling consults the stored profile: the element was an h2, contained the text "Mechanical Keyboard", was a sibling of a price-containing span, and was the first child of a card-like container. It finds the h2.item-heading as the closest match and returns it.
Fetchers
Scrapling provides three fetchers that replace the need for separate requests and Playwright dependencies.
Fetcher makes plain HTTP requests. Suitable for static pages with no JavaScript rendering and minimal bot protection.
DynamicFetcher uses a Chromium browser via Playwright. Loads the page, executes JavaScript, waits for content to render, then returns the final HTML. Required for single-page applications and dynamically loaded content.
StealthyFetcher extends DynamicFetcher with anti-detection techniques: browser fingerprint management, automatic Cloudflare Turnstile solving, and other measures that make requests resemble human browser traffic. Use this when scraping produces CAPTCHAs or blocks.
All three share a consistent API. Switching from Fetcher to StealthyFetcher is a one-line change in the code.
Spider framework
For multi-page crawls, Scrapling provides a spider framework with asynchronous requests, session management, proxy rotation, crawl checkpointing (pause and resume), and configurable output formats.
This is the appropriate tool when a script needs to follow links, paginate through results, or manage a large-scale extraction job.
When to use Scrapling
Scrapling is most valuable for long-running data pipelines where selector maintenance is a recurring cost, for price monitoring or other time-sensitive extractions where uptime matters, and for feeding data to AI pipelines or RAG jobs where consistent data quality is required.
For one-off scripts against simple, stable pages, requests and BeautifulSoup remain a lighter choice. Scrapling's overhead in setup and storage pays off when the alternative is regular manual selector rewrites after site redesigns.
Final thoughts
The adaptive parser is Scrapling's core differentiator. The structural profile approach is more resilient than text matching or line-number anchors because it combines multiple independent signals (tag, content, parent, position) rather than relying on a single identifier. A site can change class names, restructure containers, and rename attributes, and as long as enough of the structural context is preserved, Scrapling can still locate the element.
The integrated fetcher hierarchy (plain HTTP → browser → stealth browser) with a consistent API means the same parsing code works regardless of which fetcher is needed, and upgrading the fetcher when a site adds bot protection requires no changes to the extraction logic.
Source code and documentation are at github.com/D4Vinci/Scrapling.