Vibe Testing with Vibetest-use: A Comprehensive Guide
Tired of Postman? Want a decent postman alternative that doesn't suck?
Apidog is a powerful all-in-one API development platform that's revolutionizing how developers design, test, and document their APIs.
Unlike traditional tools like Postman, Apidog seamlessly integrates API design, automated testing, mock servers, and documentation into a single cohesive workflow. With its intuitive interface, collaborative features, and comprehensive toolset, Apidog eliminates the need to juggle multiple applications during your API development process.
Whether you're a solo developer or part of a large team, Apidog streamlines your workflow, increases productivity, and ensures consistent API quality across your projects.
Introduction to Vibetest-use
Welcome to the future of web testing! If you're tired of manually clicking through your website, hunting for elusive UI bugs, broken links, or accessibility glitches, then Vibetest-use
is about to become your new best friend. This powerful tool leverages the capabilities of AI-driven browser automation to conduct comprehensive "vibe tests" on your web applications, ensuring they not only work correctly but also provide a seamless user experience.
What is Vibetest-use?
At its core, Vibetest-use
is an MCP (Multi-Capability Program) server that unleashes a team of intelligent Browser-Use agents upon your website. These agents, powered by large language models (LLMs), autonomously explore your site, interact with its elements, and report back on any issues they discover. It's like having a dedicated QA team on call, ready to test your site 24/7.
What problems does it solve?
Vibetest-use
is designed to catch a wide array of common web development headaches:
- UI Bugs: Buttons that don't click, forms that don't submit, visual glitches.
- Broken Links: Dead ends and 404 errors that frustrate users.
- Accessibility Issues: Problems that prevent users with disabilities from using your site effectively (e.g., missing alt text, poor contrast โ guided by the LLM's analysis).
- Other Technical Problems: JavaScript errors, slow loading times, and unexpected behavior.
Whether you're testing a live production website or a work-in-progress on your localhost
development server, Vibetest-use
can handle it.
Key Features:
- Scout Agent System: A unique preliminary "scout" agent first maps out your webpage's interactive elements. This information is then used to generate targeted testing tasks for the subsequent QA agents.
- Parallel Testing: Multiple agents can run concurrently, significantly speeding up the testing process.
- LLM-Powered Analysis: Google's Gemini models are used not only for agent navigation and interaction but also for intelligently summarizing bug reports and classifying their severity.
- Headless & Non-Headless Modes: Run tests silently in the background (headless) or watch the agents work their magic in real browser windows (non-headless).
- Easy Integration: Works seamlessly with development environments like Cursor and Claude Code that support MCP.
Who is this tutorial for?
This guide is for web developers, QA engineers, and anyone involved in building and maintaining websites who wants to automate and enhance their testing process. Whether you're new to AI-driven testing or looking for advanced techniques, you'll find valuable insights here.
The philosophy is simple: "Vibecode and vibetest until your website works." Let's dive in!
Core Concepts
Before we get our hands dirty, let's clarify a few key concepts that underpin Vibetest-use
:
MCP (Multi-Capability Program) Server:
MCP is a framework that allows AI models (like those in Cursor or Claude Code) to interact with external tools and services. Vibetest-use
runs as an MCP server, exposing its testing capabilities (start
a test, get results
) to the AI assistant you're working with. This enables you to initiate complex testing workflows using natural language prompts.
Browser-Use Agents (and the browser-use
library):
Vibetest-use
is "Powered by Browser Use." The browser-use
library provides the foundation for creating AI agents that can intelligently operate web browsers. These agents can understand tasks, perceive web content (including using vision capabilities), and perform actions like clicking, typing, and navigating, much like a human user would. Vibetest-use
customizes and orchestrates these agents for the specific purpose of QA testing.
Vibe Coding & Vibe Testing (Context):
While not strictly defined by this tool, "vibe coding" generally refers to a more intuitive, goal-oriented approach to development, often assisted by AI. "Vibe testing," in the context of Vibetest-use
, extends this by using AI to assess not just the functional correctness but the overall "vibe" or quality of user experience on a website. Does it feel right? Is it smooth? Are there jarring issues?
The Role of LLMs (Gemini Models):
Large Language Models are the brains behind the operation. Vibetest-use
utilizes Google's Gemini models (specifically gemini-2.0-flash
for agent actions and gemini-1.5-flash
for scouting, task generation and results analysis) for several critical functions:
- Understanding Instructions: Interpreting your natural language prompts to start tests.
- Scouting & Task Generation: The scout agent's observations are processed by an LLM to create a structured list of specific testing tasks for the QA agents.
- Agent Decision Making: Each QA agent uses an LLM to decide how to execute its assigned task on the webpage.
- Bug Report Analysis & Summarization: After tests are complete, an LLM analyzes the raw findings from all agents, identifies genuine issues, describes them in detail, classifies their severity (high, medium, low), and provides an overall status.
Getting Started
Let's get Vibetest-use
set up and ready to roll.
Prerequisites:
- Python 3.11+: Ensure you have a compatible Python version installed.
- Google API Key: You'll need a Google API key that has access to the Gemini models (specifically
gemini-2.0-flash
andgemini-1.5-flash
). You can obtain one from the Google AI Studio or Google Cloud Console. This is crucial, as the LLM interactions are central toVibetest-use
. - Cursor or Claude Code with MCP Support:
Vibetest-use
is designed to be used within these AI-assisted development environments. uv
(Optional but Recommended): TheREADME.md
suggests usinguv
for Python packaging and virtual environment management. It's a fast and modern alternative topip
andvenv
. If you don't have it, you can install it or use your preferred Python environment tools.
Installation:
- Clone the Repository (if you haven't already):
git clone https://github.com/your-username/vibetest-use.git # Or the correct URL cd vibetest-use
- Set up Python Environment (using
uv
):uv venv # Create a virtual environment source .venv/bin/activate # Activate it (macOS/Linux) # For Windows: .venv\\Scripts\\activate
- Install Dependencies (using
uv
):
Theuv pip install -e .
-e .
installs the package in "editable" mode, which is good for development.
Configuration: Setting the GOOGLE_API_KEY
Vibetest-use
needs your Google API key to function. You'll provide this when you add it as an MCP server.
For Claude Code (CLI): When adding the MCP server, use the
-e
flag to set the environment variable. Make sure to replace/full/path/to/vibetest-use/
with the actual absolute path to your cloned repository.claude mcp add vibetest /full/path/to/vibetest-use/.venv/bin/vibetest-mcp -e GOOGLE_API_KEY="YOUR_API_KEY_HERE"
You can then verify by typing
/mcp
in Claude and checking ifvibetest
shows as connected.For Cursor (MCP Settings UI):
- Open Cursor Settings (Cmd+, or Ctrl+,).
- Click on "MCP" in the left sidebar.
- Click "Add Server" or the "+" button.
- Manually edit the configuration JSON to include
vibetest
. Replace/full/path/to/vibetest-use/
with the actual absolute path.
{ "mcpServers": { "vibetest": { "command": "/full/path/to/vibetest-use/.venv/bin/vibetest-mcp", "env": { "GOOGLE_API_KEY": "YOUR_API_KEY_HERE" } } } }
Save the configuration. Cursor should then connect to your
vibetest-mcp
server.
With these steps completed, Vibetest-use
is installed and configured!
Running Your First Vibe Test
Now for the exciting part โ launching your first automated vibe test. You'll do this by invoking the start
tool via your MCP-enabled AI assistant.
The start
command:
The primary way to initiate a test is by using a prompt that your AI assistant can translate into an MCP call to the vibetest
server's start
tool.
Full Syntax (as an MCP call):
/mcp vibetest start url="<your_url>" num_agents=<N> headless=<true_or_false>
Parameters Explained:
url
(string, required): The full URL of the website you want to test. This can be a live site (e.g.,"https://example.com"
) or a local development server (e.g.,"http://localhost:3000"
).num_agents
(integer, optional, default: 3): The number of concurrent QA agents to deploy. More agents generally means more thorough testing as they can cover different aspects or execute different generated tasks simultaneously. The system can run up to 10 agents in parallel due to a semaphore in the code.headless
(boolean, optional, default:false
):false
: Agents will run in visible browser windows. This is great for watching what they're doing, debugging, or just for the sheer coolness factor. Theagents.py
code even includes logic to tile these windows on your screen and apply a 0.25x zoom to fit more in.true
: Agents will run in headless mode (no visible UI). This is ideal for automated CI/CD pipelines or when you don't need to see the browsers.
Natural Language Examples (how you'd prompt your AI):
> Vibetest my website with 5 agents: browser-use.com
- This would translate to:
/mcp vibetest start url="browser-use.com" num_agents=5 headless=false
(assuming default headless)
- This would translate to:
> Run vibetest on localhost:3000
- This would translate to:
/mcp vibetest start url="localhost:3000" num_agents=3 headless=false
(default agents and headless)
- This would translate to:
> Run a headless vibetest on localhost:8080 with 10 agents
- This would translate to:
/mcp vibetest start url="localhost:8080" num_agents=10 headless=true
- This would translate to:
What happens behind the scenes? (The Magic Unveiled)
When you issue that command, a sophisticated process kicks off:
MCP Invocation: Your AI assistant (Cursor/Claude) recognizes your intent and calls the
start
tool on yourvibetest
MCP server.run_pool
Initiated: Themcp_server.py
receives the call and triggers therun_pool
asynchronous function inagents.py
. This function orchestrates the entire testing process. A uniquetest_id
(a UUID) is generated for this run.Phase 1: The Scout Mission (Intelligence Gathering)
- Before the main QA agents are deployed, a special scout agent is dispatched. This is a clever optimization!
- The
scout_page(base_url)
function is called. - Scout's Task: This agent is given a very specific instruction: "Visit {base_url} and identify ALL interactive elements on the page. Do NOT click anything, just observe and catalog what's available. List buttons, links, forms, input fields, menus, dropdowns, and any other clickable elements you can see. Provide a comprehensive inventory."
- The scout agent (using
browser-use
with vision andgemini-1.5-flash
) navigates to your URL in headless mode and "looks" at the page to identify all potential points of interaction. - Task Generation via LLM: The scout's raw findings (a textual list of elements) are then fed back into
gemini-1.5-flash
with another prompt. This prompt asks the LLM to: "Create a list of specific testing tasks, each focusing on different elements... Aim for 6-8 distinct tasks... Format as JSON array: ["Test the [specific element description]...", ...]" - The output is a list of targeted QA task strings, like:
"Test the main navigation links in the header - click on 'Products'"
"Test the search bar functionality - type 'test query' and submit"
- Fallback: If the scouting or LLM partitioning fails, a predefined list of general tasks is used (e.g., "Test navigation elements," "Test main content links").
Phase 2: The Agent Testing Swarm
- With a list of specific tasks in hand,
run_pool
now prepares to launch the main QA agents. - It uses
asyncio.gather
to run multiplerun_single_agent
coroutines concurrently. A semaphore limits true parallelism tomin(num_agents, 10)
to avoid overwhelming your system or the target server. - For each agent (
run_single_agent
):- A task is assigned from the list generated in the scout phase (tasks are distributed cyclically if
num_agents
> number of scouted tasks). - Browser Setup:
- A
BrowserProfile
is configured. This includes arguments like--disable-gpu
,--no-sandbox
. Crucially,headless
is set based on your input. - Non-Headless Specifics: If
headless=false
:- Screen dimensions are fetched.
- Window size (300x400) and viewport (280x350) are set.
- An attempt is made to tile the agent browser windows neatly on your screen.
- A 0.25x zoom is applied to the webpage content within each browser window (
document.body.style.zoom = '0.25'
). This allows agents to "see" more of the page at once, especially on smaller viewports, and helps fit multiple agent windows.
- A
BrowserSession
is created, which is the agent's interface to the web browser.
- A
- Agent Initialization: A
browser_use.Agent
is instantiated with:task
: The specific QA task string.llm
: Thegemini-2.0-flash
model for in-task decision-making.browser_session
: The configured browser.use_vision=True
: Enabling the agent to see and interpret the page visually.
- Execution:
await agent.run()
is called. The agent now attempts to complete its task (e.g., "Click the 'Submit' button on the contact form and verify success message"). It will navigate, click, type, and observe outcomes based on its LLM's understanding. - Result Collection: The agent's final result (a textual description of what it found or if it succeeded/failed) is captured.
- Cleanup: The browser session for that agent is closed. Any errors during the agent's run are caught and recorded.
- A task is assigned from the list generated in the scout phase (tasks are distributed cyclically if
- With a list of specific tasks in hand,
Completion and Storage:
- Once all agents have completed (or timed out/erred),
run_pool
records the end time and calculates the total duration. - There's a cleanup step to kill any lingering Chromium processes (specifically on macOS using
pkill -f chromium
), which is good practice. - All results, including individual agent reports, timings, and the initial parameters, are stored in the in-memory
_test_results
dictionary, keyed by thetest_id
.
- Once all agents have completed (or timed out/erred),
Return
test_id
: Thestart
tool finally returns the uniquetest_id
to you via the MCP interface. You'll use this ID to fetch the summarized results.
Phew! That's a lot happening under the hood, all automated for you.
Understanding the Results
Once the start
command has returned a test_id
, your vibe test is complete (or has at least finished its execution phase). Now it's time to see what the agents found.
The results
command:
You'll use the results
tool, providing the test_id
you received.
Syntax (as an MCP call):
/mcp vibetest results test_id="<your_test_id>"
Example:
> /mcp vibetest results test_id="a1b2c3d4-e5f6-7890-1234-567890abcdef"
(Replace with your actual ID)
Interpreting the Output (The summarize_bug_reports
function):
When you call the results
tool, the summarize_bug_reports(test_id)
function in agents.py
is invoked. This function is responsible for taking all the raw data collected by the agents and turning it into a human-readable, actionable summary. Here's what to look for in the JSON response:
Basic Test Info:
test_id
: The ID of this test run.total_agents
: The number of agents you requested.successful_agents
: How many agents completed their tasks without critical errors.failed_agents
: How many agents encountered errors during their operation.errors
: An array of error details if any agents failed.duration_seconds
: Total time the test took in seconds.duration_formatted
: A human-friendly duration (e.g., "1m 30s" or "45s").summary_generated
: Timestamp of when this summary was created.
LLM-Generated Analysis (The Core Intelligence - if bugs were reported and API key is valid): This is where
Vibetest-use
truly shines. If agents reported potential issues, their findings are passed togemini-1.5-flash
with a very specific prompt. The LLM is instructed to:- Act as an objective QA analyst.
- Identify only actual functional issues, broken features, or technical problems.
- Ignore subjective opinions, missing features (that might be intentional), or design preferences.
- Focus on: Broken functionality, technical errors (404s, JS errors), accessibility violations, performance problems.
- Provide specific and detailed descriptions: exact element, action taken, result/error observed, and context.
- Format output as JSON with severity levels.
You'll see these fields:
overall_status
: A quick assessment:"high-severity"
(Emoji: ๐ด): Critical issues found that need immediate attention."medium-severity"
(Emoji: ๐ ): Moderate issues found that should be addressed."low-severity"
(Emoji: ๐ก): Minor issues found that could be improved."passing"
(Emoji: โ ): No technical issues detected during testing.
status_emoji
: The corresponding emoji for a quick visual cue.status_description
: A brief text summary of the overall status.total_issues
: The total number of distinct issues identified by the LLM.severity_breakdown
(JSON object):high_severity
: An array of objects, each representing a high-severity issue:[{"category": "e.g., Navigation", "description": "Upon clicking the 'Contact Us' button in the header navigation, the page redirected to a 404 error."}]
medium_severity
: Similar array for medium-severity issues.low_severity
: Similar array for low-severity issues. The LLM is explicitly prompted not to use vague descriptions.
llm_analysis
:raw_response
: The full, raw text response from the Gemini model (useful for debugging or deeper inspection).model_used
: Confirms"gemini-1.5-flash"
was used for this analysis.
Fallback Analysis (If LLM analysis couldn't run or no bugs were initially flagged by agents): If no potential bugs were reported by the agents, or if the
GOOGLE_API_KEY
was invalid/missing for the summarization step, or if the LLM analysis itself failed:- The
overall_status
will be"low-severity"
(๐ก) if there were any agent reports at all, or"passing"
(โ ) if agents reported nothing. status_description
will indicate something like"Found X potential issues requiring manual review"
or"No technical issues detected"
.total_issues
will reflect the count of raw agent reports.severity_breakdown
will likely put all potential issues intolow_severity
with a generic description.llm_analysis_error
: May contain an error message if the LLM part failed.
- The
The goal of this structured output is to give you a clear, prioritized list of potential problems with your website, backed by detailed descriptions that help you quickly locate and address them.
Advanced Usage and Tips
Choosing num_agents
Wisely:
- More agents (
num_agents
) can lead to more comprehensive testing because the initialscout_page
function aims to generate 6-8 distinct tasks. If you use, say, 8 agents, each agent can potentially focus on a different identified area or element. - However, each agent consumes resources (CPU, memory, network bandwidth, and LLM API calls). The system is internally limited to running a maximum of 10 browser instances concurrently via a semaphore.
- Start with 3-5 agents for moderately complex pages. For very large or complex sites, you might increase this, keeping an eye on performance and API costs.
Headless vs. Non-Headless Revisited:
- Non-Headless (
headless=false
): This is invaluable for initial setup, debugging your tests, or when you want to visually confirm what the agents are doing. The automatic window tiling and 0.25x zoom inagents.py
are thoughtful features that try to make multiple agent browsers observable. Remember, these visual browsers will pop up on the machine where thevibetest-mcp
server is running. - Headless (
headless=true
): The preferred mode for automated runs, CI/CD integration, or when running on a server without a graphical display. Tests generally run faster in headless mode.
Testing localhost
:
- Perfectly supported! Just ensure your local development server (e.g.,
npm start
,hugo server
,python -m http.server
) is running and accessible at the URL you provide (e.g.,http://localhost:3000
). - Firewalls or network configurations could potentially block access if the MCP server is running in a different environment (e.g., a Docker container) than your dev server. Usually, if you can access
localhost:3000
in your own browser,Vibetest-use
should be able to as well.
Interpreting LLM Analysis โ Trust but Verify:
- The LLM-driven analysis (using
gemini-1.5-flash
) is powerful and guided by a detailed prompt to be objective and specific. - However, LLMs can still make mistakes or misinterpret complex situations. Always treat the reported issues as strong leads, not infallible judgments.
- Review the
description
for each issue. Does it make sense in the context of your site? Can you reproduce it manually? The goal is to save you time, not replace human oversight entirely.
Understanding Agent Tasks (The Scout Phase):
- The
scout_page
function's goal is to intelligently divide the testing labor. It first identifies interactive elements and then asks an LLM to create 6-8 distinct testing tasks. - For very complex UIs with hundreds of interactive elements, these 6-8 tasks will necessarily be somewhat high-level or focused on key areas. The agents will then explore within the scope of their assigned task.
- If you have a critical, complex workflow that needs very specific end-to-end testing,
Vibetest-use
provides general component-level testing; for hyper-specific scripted E2E tests, you might still use traditional tools in conjunction.
Troubleshooting Common Issues:
GOOGLE_API_KEY
Errors:- "ValueError: GOOGLE_API_KEY environment variable is required." This means the key wasn't set correctly when adding the MCP server or is missing from the environment where
vibetest-mcp
runs. Double-check your MCP server configuration in Claude Code or Cursor. - API errors from Google (e.g., 401, 403, 429): Ensure your key is valid, has the Gemini API enabled, and you haven't exceeded quotas.
- "ValueError: GOOGLE_API_KEY environment variable is required." This means the key wasn't set correctly when adding the MCP server or is missing from the environment where
- Browser Launch Issues: The
browser-use
library is generally robust in managing browser instances. If you see persistent errors related to browser startup, ensure Chrome/Chromium is correctly installed and accessible in thePATH
or consider issues with sandboxing if running in highly restricted environments (though--no-sandbox
is used). Thepkill -f chromium
on macOS after a run is a defensive measure against zombie processes. Test ID not found
fromresults
command:- Did the
start
command complete and return atest_id
? - Was there a critical error during
run_pool
that prevented results from being stored? Check the console output of yourvibetest-mcp
server if possible. - Remember, results are stored in-memory (
_test_results
inagents.py
). If the MCP server restarts, previous test results will be lost.
- Did the
- Long Test Durations: Complex pages, many agents, or slow target websites can lead to longer test times. Monitor agent actions if running non-headless to see if they are getting stuck.
How Vibetest-use Works Internally (Deeper Dive)
For those curious about the mechanics, let's briefly revisit the key Python files:
vibetest/mcp_server.py
:
FastMCP
: This is the foundation for creating the MCP server.mcp = FastMCP("vibetest")
initializes your server under the name "vibetest".@mcp.tool()
decorator: This exposes Python functions as callable tools for the AI assistant.async def start(...)
: The entry point for initiating tests. It primarily callsrun_pool
fromagents.py
and returns thetest_id
. Includes basic error handling.def results(...)
: The entry point for fetching test summaries. It callssummarize_bug_reports
fromagents.py
and also adds the test duration to the summary.
run()
function: The main function to start theFastMCP
server, typically called whenvibetest-mcp
is executed.- Logging & Output Suppression: Note the initial lines that disable logging and redirect
stderr
. This is often done in MCP tools to ensure that only clean JSON-RPC communication happens over stdout, preventing interference from debug prints or library logs.
vibetest/agents.py
(The Engine Room):
This is where the core logic of scouting, agent execution, and result summarization resides.
_test_results = {}
: A simple in-memory Python dictionary to store results. No external database is used by default.scout_page(base_url)
:- Uses a
BrowserSession
(headless) and anAgent
withgemini-1.5-flash
to perform the initial reconnaissance ofbase_url
. - The task given to this scout is to list interactive elements.
- The scout's output is then fed again to
gemini-1.5-flash
with a different prompt to parse that list and generate 6-8 distinct, actionable QA tasks in JSON format. This two-LLM-call approach (one for scouting, one for task partitioning) is a sophisticated way to derive structured tasks from an unstructured page exploration. - Provides robust fallback tasks if any step in this chain fails.
- Uses a
run_pool(base_url, num_agents, headless)
:- The main orchestrator.
- Calls
scout_page
to get tasks. - Sets up
asyncio.Semaphore(min(num_agents, 10))
to limit true browser concurrency. - Uses
asyncio.gather
to runnum_agents
instances ofrun_single_agent
concurrently. - Handles overall timing and storage of results into
_test_results
. - Includes the
pkill -f chromium
for post-test cleanup on macOS.
async def run_single_agent(i)
(withinrun_pool
):- Configures
BrowserProfile
(headless status, arguments like--no-sandbox
, window size/positioning/zoom for non-headless). - Creates and manages a
BrowserSession
. - The 0.25x zoom for non-headless mode is applied via JavaScript evaluation directly on the page:
document.body.style.zoom = '0.25';
. This happens on "load" and "domcontentloaded" events. - Instantiates the
browser_use.Agent
with its specific task (from the scout phase),gemini-2.0-flash
, the browser session, anduse_vision=True
. await agent.run()
is the call that makes the agent perform its web interactions.- Captures the
history.final_result()
or any exceptions. - Ensures
browser_session.close()
is called.
- Configures
summarize_bug_reports(test_id)
:- Retrieves raw agent results from
_test_results
. - Constructs a detailed prompt for
gemini-1.5-flash
(a different, potentially more powerful model than the agent'sgemini-2.0-flash
for this analytical task). - The prompt strongly guides the LLM on what constitutes an issue, the level of detail required, and the JSON output format (high/medium/low severity).
- Parses the LLM's JSON response (including a regex fallback to find the JSON block if the LLM adds extra text).
- Calculates overall status, emoji, and counts based on the severity analysis.
- Provides a fallback summary if LLM analysis fails or if no initial bug reports were made by agents.
- Retrieves raw agent results from
GOOGLE_API_KEY
: Checked at the beginning; operations requiring LLMs will fail or be skipped if not present.get_screen_dimensions()
: Usesscreeninfo
if available, otherwise defaults to 1920x1080, for non-headless window positioning.
The use of asyncio
throughout agents.py
is critical for efficiently managing multiple browser agents and LLM calls without blocking.
Conclusion
Vibetest-use
offers a significant leap forward in automating website quality assurance. By combining the intelligent browsing capabilities of multiple AI agents with a smart scouting system and LLM-powered analysis, it moves beyond simple scripted tests to provide a more holistic "vibe check" of your web applications.
It helps you catch functional bugs, broken experiences, and technical glitches with greater efficiency, allowing you to focus on building great features. The detailed, categorized, and severity-rated reports give you actionable starting points for debugging.
So, integrate Vibetest-use
into your workflow, run it on your live sites and local developments, and embrace the power of AI-driven testing. As the motto goes: "Vibecode and vibetest until your website works."
Happy testing!