Back to AI guides

BettaFish Explained: How This Multi-Agent AI Analyzes Public Opinion

Stanley Ulili
Updated on December 7, 2025

In the rapidly evolving landscape of artificial intelligence, a new class of powerful tools is emerging that can sift through the noise of the digital world to find meaningful signals. One of the most impressive and ambitious examples of this is BettaFish, a multi-agent public opinion analysis system designed to automatically collect, process, and synthesize insights from an ocean of online data. Created by a single, talented Chinese student and released for free on GitHub, BettaFish offers capabilities that rival commercial platforms charging over a thousand dollars per month.

This powerful tool can analyze videos, images, news articles, and even private databases by employing sophisticated web scraping and multi-agent AI collaboration. It promises to help users break out of information silos, understand the true state of public opinion, predict future trends, and assist in critical decision-making. However, the project's GitHub page is filled with extensive disclaimers, raising important questions about its use. Is it legal? What are the ethical implications of wielding such a powerful analysis engine?

In this article, you will gain a comprehensive understanding of BettaFish from the inside out. The analysis dissects its intricate architecture, explores the roles of its intelligent agents, and examines the legal and ethical questions surrounding this groundbreaking tool.

What is BettaFish? The philosophy and functionality

Before diving into the technical nuts and bolts, it's essential to understand the core concept and purpose behind the BettaFish project. The name itself is a clever piece of symbolism that reveals the developer's vision.

The "small but powerful" concept

The name "BettaFish" is a direct play on the Chinese term "Weiyu" (微语), which is a homophone for a term meaning "small fish." The betta fish, also known as the Siamese fighting fish, is renowned for being small yet incredibly aggressive, vibrant, and resilient. This symbolism perfectly encapsulates the project's ethos: it is a seemingly small, open-source project created by one individual, yet it is designed to be incredibly powerful and unafraid of tackling the immense challenge of analyzing global public opinion.

This "small but powerful" philosophy is evident throughout its design. It leverages open-source technologies and a modular architecture to create a system that can process data from over 30 mainstream social media platforms and distill insights from millions of public comments, a task typically reserved for large corporations with significant resources.

Core functionality at a glance

BettaFish is not just a web scraper but a complete end-to-end analysis pipeline. Its primary goal is to provide users with a clear, synthesized understanding of complex topics by observing the digital discourse surrounding them. Here are its core functions:

Automated Data Collection: It automatically scrapes a wide array of sources, including major Chinese social media platforms like Weibo, Douyin (TikTok), and Zhihu, as well as news sites and technical forums.

Multimodal Analysis: It goes beyond simple text analysis. The system is equipped to process and understand insights from images and videos, recognizing that modern communication is increasingly visual.

Multi-Agent System: Instead of a single monolithic AI, BettaFish employs a team of specialized AI agents that work in parallel, each focusing on a specific task like database mining, media analysis, or web searching.

AI-Powered Debate and Synthesis: In a truly innovative step, the agents communicate their findings in a moderated "Agent Forum," where they debate differing perspectives and work collaboratively to form a comprehensive, nuanced conclusion.

Trend Prediction and Reporting: The final output is not just a collection of data points but a structured report that identifies key trends, analyzes sentiment, and even attempts to predict future developments.

The user experience is designed to be as simple as a chat interface. A user submits a query, and the intelligent agents begin their automated analysis, culminating in a detailed report that would otherwise require a team of human analysts weeks to compile.

Deconstructing the BettaFish architecture

The true power of BettaFish lies in its sophisticated and well-thought-out architecture. It functions as a multi-stage pipeline where data is collected, processed, analyzed, debated, and finally synthesized into a final report.

The complete architecture diagram of the BettaFish system, showing the flow from User Query to the Final Report

The user query and the Flask orchestrator

The entire process begins with a single point of entry: the User Query. This is where a user inputs the topic they want to analyze, for example, "What is the current sentiment toward the latest iPhone release?"

This query is not sent directly to an AI model. Instead, it is first received by a Flask Main Application. Flask is a lightweight and flexible Python web framework. In this context, it acts as the central nervous system or "orchestrator" of the entire operation. Its job is to receive the user's request and intelligently delegate the necessary tasks to the various specialized agents. This orchestration layer is crucial for managing the complex workflow and ensuring that each part of the system receives the instructions it needs to function correctly.

The three-agent system: A parallel processing powerhouse

Once the Flask orchestrator receives the query, it simultaneously dispatches the task to three distinct, specialized AI agents. This parallel processing approach is a cornerstone of BettaFish's efficiency, as it allows for different types of data collection and analysis to happen all at once.

A focused view of the three core agents: Insight Agent, Media Agent, and Query Agent, detailing their individual process flows

The Insight Agent: Mining for structured data

The Insight Agent is responsible for proprietary database mining. Its primary role is to extract information from structured, pre-existing databases. Upon receiving the analysis topic, the Insight Agent generates complex SQL (Structured Query Language) queries. It then executes these queries against connected databases, which could be MySQL or PostgreSQL instances. These databases are continuously populated by the system's backend web crawler with data from social media.

Public web content can be chaotic and unstructured. The Insight Agent allows the system to tap into a clean, organized, and pre-processed source of data. This is particularly useful for longitudinal analysis (tracking a topic over time) and for accessing specific data points that have already been categorized, such as posts with a high "hotness score" or a specific sentiment.

The Media Agent: Analyzing the visual web

The Media Agent performs multimodal content analysis. Its expertise lies in understanding the vast amount of non-textual information on the web, primarily images and videos. This agent uses browser automation tools like Playwright to navigate websites, interact with media elements, and analyze their content. It can process visual information to understand memes, interpret the sentiment of a video, or identify key visual themes related to a topic.

In the age of TikTok, Instagram, and YouTube, public opinion is heavily influenced by visual media. A purely text-based analysis would miss the rich context, sentiment, and narratives embedded in images, memes, and videos. The Media Agent ensures this critical dimension of the digital discourse is not overlooked.

The Query Agent: Scouring the public web

The Query Agent is tasked with broad global search. It acts like a super-powered traditional search engine, designed to gather information from news articles, blogs, forums, and other public web content. It conducts extensive web searches related to the user's query. Crucially, it doesn't just collect links but performs diverse search and verification to cross-reference information and build a more reliable picture of the public narrative.

This agent provides the broad context for the analysis. While the Insight Agent looks at curated social media data and the Media Agent looks at visuals, the Query Agent gathers the reports from news organizations, official statements, and in-depth articles that frame the conversation.

The backend data pipeline: The foundation of analysis

A significant part of BettaFish's work happens even before a user types in a query. The system runs a continuous backend process to crawl and organize data, creating the rich database that the Insight Agent relies on.

MindSpider Web Crawler: This is a custom AI web crawler designed specifically for public opinion analysis. It actively scrapes over 13 global social media platforms, with a heavy focus on major Chinese platforms.

Deep Sentiment Crawling: The crawler doesn't just grab text. It performs deep analysis on the collected data, identifying hot topics and assessing public opinion in near real-time.

Database Population: All this collected data is then structured and fed into a MySQL or PostgreSQL database. This turns the chaotic mess of the live web into an organized, queryable resource.

Hotness Score Calculation: To prioritize influential content, the system calculates a weighted "hotness score" for each piece of content. This algorithm assigns different weights to various forms of engagement.

Here is an example of the weighting logic from the project's code:

hotness_calculator.py
def calculate_hotness_score(engagement):
    """Calculate weighted hotness score"""
    W_LIKE = 1.0
    W_COMMENT = 5.0
    W_SHARE = 10.0
    W_VIEW = 0.1
    W_FAVORITE = 8.0
    # ... calculation logic ...

This simple but effective method ensures that a post with many shares and comments (high-intent engagement) is prioritized over a post with many low-intent views.

The Agent Forum: Where AI agents debate and collaborate

This is arguably the most innovative and fascinating component of the BettaFish architecture. After the three agents have completed their initial data gathering and analysis, they don't simply pass their raw findings along. Instead, they convene in the Agent Forum.

An example of the Forum Engine's output, showing a "Forum Moderator's Summary" with a timeline analysis

Here, the agents engage in a form of structured debate:

Share Key Findings: Each agent presents the most critical information it has discovered. The Query Agent might present a news timeline, while the Insight Agent presents sentiment data from social media.

Unfold a Debate: The system encourages the agents to identify and discuss discrepancies. For instance, if the news media (found by the Query Agent) reports a positive official stance on a topic, but social media sentiment (found by the Insight Agent) is overwhelmingly negative, the agents will highlight and attempt to explain this conflict.

Moderation by an LLM Host: The entire debate is moderated by a fourth AI, the LLM Host. This moderator's job is to guide the discussion, ask clarifying questions, and ensure the agents stay on topic and work towards a productive, synthesized conclusion. It acts as the impartial chairman of a board meeting of AIs.

This process of debate and reconciliation is what elevates BettaFish from a simple data aggregator to a true analysis platform. It mimics the process of human analysts debating a topic to arrive at a nuanced conclusion, and it results in a much richer and more reliable final output.

The Final Report Agent

The last step in the pipeline is the Report Agent. This agent takes the synthesized conclusions from the Agent Forum, along with all the supporting data from the other agents, and generates a comprehensive final report. This process involves several sub-steps:

  • Timeline Analysis
  • Chunked Analysis and Reflection
  • Information Retrieval (IR) Scoring and Validation
  • Binding and Rendering a Final, readable report, often presented in a dashboard format

A dashboard-style report generated by the Insight Engine, showing sentiment analysis and market correlation data

The result is a professional-grade analysis, complete with charts, key performance indicators, and narrative summaries, all generated automatically from a single user query.

Setting up and running BettaFish

Understanding how BettaFish can be deployed provides valuable insight into its practical operation and requirements.

Prerequisites

Before deployment, several components are required:

  • A Linux Server: The system runs effectively on servers like a Hetzner CAX31 with at least 16GB of RAM and adequate CPU cores
  • Docker and Docker Compose: The project is containerized, which simplifies deployment
  • Git: For cloning the project repository from GitHub
  • LLM API Keys: BettaFish requires access to Large Language Models to power its agents. API keys from providers are necessary. A service like OpenRouter is recommended as it provides a unified interface to access many different models (like GPT, Claude, Gemini) with a single API key

Installation process

The installation demonstrates the system's modular architecture. First, the repository is cloned from GitHub:

 
git clone https://github.com/6666-dev/weiyu.git
 
cd weiyu

The project uses a .env file to manage secret keys and configuration. An example file is provided:

 
cp .env.example .env

The .env file contains the configuration for each agent's LLM access:

.env
# For Insight Agent
INSIGHT_ENGINE_API_KEY=<your_openrouter_api_key>
INSIGHT_ENGINE_BASE_URL=https://openrouter.ai/api/v1
INSIGHT_ENGINE_MODEL_NAME=google/gemini-pro

# For Media Agent
MEDIA_ENGINE_API_KEY=<your_openrouter_api_key>
MEDIA_ENGINE_BASE_URL=https://openrouter.ai/api/v1
MEDIA_ENGINE_MODEL_NAME=google/gemini-pro

You can customize which model each agent uses, allowing for a more powerful model for the Report Agent and a faster, cheaper model for initial data processing.

The same .env file includes database settings. The default Docker Compose setup creates a PostgreSQL database, and the default settings work out of the box.

Once configuration is complete, all services can be launched with a single command:

 
docker compose up -d

This command pulls all necessary container images and starts the Flask application, the database, and all backend services.

Running an analysis

With the services running, the BettaFish web interface becomes accessible through a browser at the server's IP address. The interface provides a simple input box where users can enter their analysis query, such as "What do the Chinese media REALLY think of Donald Trump."

The BettaFish web interface showing the various engine tabs and the live log output as the system processes a query

After submitting a query, the UI displays different tabs for each engine (Insight Engine, Media Engine, Query Engine, Forum Engine) with live logs showing their work. You can observe the agents planning their tasks, executing searches, analyzing data, and communicating in the forum.

BettaFish is undeniably powerful, but with great power comes great responsibility, and significant legal questions. The developer has included numerous disclaimers for very good reasons.

Why so many disclaimers?

The primary reason for the extensive disclaimers is the use of web scraping. The legality of web scraping exists in a gray area. While accessing publicly available data is not inherently illegal, it can violate the Terms of Service of the websites being scraped. Companies often take legal action against automated scraping of their platforms.

Furthermore, the project operates in the complex domain of international data privacy laws. Regulations like Europe's GDPR and China's Personal Information Protection Law (PIPL) impose strict rules on the collection and processing of personal data. Even data that users post publicly can be considered personal information. By stating that the project is for "learning, academic research, and educational purposes only," and that users "shall bear all legal consequences," the developer is shifting the legal liability for how the tool is used onto the end-user.

The potential for misuse

The ethical concerns are just as significant as the legal ones. A tool that can analyze public opinion with this level of detail and automation could be used for malicious purposes, including:

Amplifying Misinformation: By identifying divisive topics and effective narrative strategies, a bad actor could use BettaFish to design and spread highly effective propaganda or misinformation campaigns.

Advanced Phishing and Social Engineering: The detailed insights into group psychology and sentiment could be used to craft highly convincing and targeted phishing attacks.

Market Manipulation: In the financial sector, real-time sentiment analysis could be used to unethically influence market behavior.

Surveillance: The tool could be adapted for mass surveillance of public or private conversations to monitor dissent or track individuals.

Final thoughts

BettaFish is more than just a clever open-source project. It's a powerful statement about the democratization of advanced AI technology. It demonstrates that a single dedicated developer can create a tool with capabilities that were, until recently, the exclusive domain of intelligence agencies and multinational corporations. The multi-agent architecture, and particularly the innovative AI-moderated "Agent Forum," provides a fascinating glimpse into a future where teams of AI agents collaborate and debate to solve complex problems.

However, BettaFish also serves as a critical case study in the ethical challenges that accompany such powerful tools. Its reliance on web scraping and its potential for misuse highlight the urgent need for a ethical framework to guide the development and deployment of AI.

While its current focus is on Chinese social media, the underlying architecture is universally applicable. With some modification, it could be pointed at any data source in any language, opening up a world of possibilities for market research, academic study, and social science. BettaFish is a small but powerful fish in a rapidly growing ocean of AI, and its journey is one that is well worth watching.

Got an article suggestion? Let us know
Next article
Supermemory: Adding Long-Term Memory to AI Apps
Learn how to add persistent memory to AI applications using Supermemory's RAG architecture. Build a chatbot that remembers context across sessions.
Licensed under CC-BY-NC-SA

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.