a Sagan program ยท currently in private beta

Website QA Agent

Your QA report shows which pages are missing meta tags or tracking scripts, which forms fail to redirect correctly, and which mobile layouts break at 375px, with screenshots and confidence scores for every issue.
before

After a site build is visually complete, the team opens a 50-question Google Form and manually checks each item: Does this page have a meta description? Does the contact form redirect to the thank-you page? Does the mobile header fit at 375px without horizontal scroll? The work takes 2-3 hours per site, catches issues inconsistently, and sites still launch with missing scripts or broken forms.

after

Paste the client URL into the dashboard. The agent crawls the site, runs deterministic checks on every page, submits test forms, captures screenshots at four viewports plus Safari, and assesses visual quality against your best-practices library. In under ten minutes, you have a report structured exactly like your current form, with pass/fail/warning status, one-line reasons, and screenshots of every failure. Your team reviews the report instead of filling out the form.

marketing agencies / web development / local services / dental / roofing / propane / law / tree care / home services / quality assurance / testing automation / post-launch verification /  marketing agencies / web development / local services / dental / roofing / propane / law / tree care / home services / quality assurance / testing automation / post-launch verification / 
the problem

Every site launch finds the same preventable mistakes.

Web development teams manually check 50+ QA items after each site goes live: meta tags, form submissions, mobile layouts across viewports, Safari rendering.

01
Deterministic checks get skipped

Meta descriptions, Google Analytics, CallRail tracking, GTM, easy to verify but time-consuming to check on every page.

02
Functional tests require manual interaction

Forms must be submitted, CTAs clicked, redirects verified. Each test takes minutes across a 15-20 page site.

03
Visual QA is subjective and viewport-heavy

Mobile layouts, whitespace, alignment, and Safari rendering need human eyes at 375px, 1366px, 1440px, and 1920px.

the math, if you want to look

Automated QA that matches your checklist.

proof 01
Deterministic checks in seconds

Meta titles, descriptions, Google Analytics, CallRail, GTM, canonical URLs, Open Graph tags, and console errors, verified across every page in one run.

proof 02
Functional tests with Playwright

Forms submit with test data, CTAs redirect to the right pages, and tracking numbers render correctly. Failures include the specific page and element.

proof 03
Subjective visual assessment with confidence scores

Screenshots at four viewports plus Safari rendering are reviewed by a multimodal model trained on your Beyond Brand and Quality library. Issues are flagged with severity and natural-language reasoning.

proof 04
Report mirrors your current form

Same sections, same question order, same workflow. Your team reads results the way they read QA today.

The agent crawls your client site, runs deterministic HTML checks for tags and scripts, submits forms to verify redirects, and uses multimodal AI to assess mobile layouts and visual quality against your internal best-practices library. Results land in a report structured exactly like your current Google Form, with pass/fail/warning status, one-line reasons, and screenshots of every failure.

how it works

Three test engines, one report.

Paste a client URL into the dashboard. The agent crawls the site across a viewport matrix, runs three categories of tests in parallel, and returns a structured report with pass/fail results, screenshots, and confidence-scored observations.

step 01
Crawl and discover pages

The agent discovers all pages on the site, typically 15-20 pages for a Marion client site.

step 02
Run deterministic HTML checks

DOM inspection verifies meta tags, analytics scripts, tracking codes, canonical URLs, and Open Graph tags. Results aggregate across pages so you see site-wide gaps at a glance.

step 03
Execute functional tests with Playwright

Forms submit with labeled test data, CTAs are clicked, redirects are verified, and tracking numbers are confirmed to render on every page.

step 04
Capture screenshots across viewports

Mobile (375px), small laptop (1366px), medium laptop (1440px), large laptop (1920px), and Safari (webkit) for home and primary CTA pages.

step 05
Run multimodal visual assessment

A SoTA-tier model reviews each screenshot against your Beyond Brand and Quality library, identifying mobile layout issues, whitespace problems, alignment gaps, and Safari-specific rendering differences. Each observation includes a confidence score.

step 06
Generate your QA report

Results are structured exactly like your current Google Form, with sections, question order, pass/fail/warning status, one-line reasons, and links to failure screenshots.

ai agent · estimator console inputs transform outputs public preview
inputs
Client website URL +

Live or staging site URL. The agent crawls all discoverable pages.

Beyond Brand and Quality library +

Marion's internal best-practices corpus (Google Doc, Notion page, or similar). The multimodal model references this when assessing visual quality.

Test library configuration +

Deterministic checks (meta tags, scripts, canonical URLs), functional checks (form submissions, CTA redirects, tracking numbers), and subjective checks (mobile layout, whitespace, alignment, Safari rendering). Seeded from your current Google Form.

Viewport matrix +

Mobile (375px), small laptop (1366px), medium laptop (1440px), large laptop (1920px), and webkit (Safari) for home and primary CTA pages.

transformation
Site crawl and page discovery +

Playwright discovers all pages on the site. Typical Marion sites yield 15-20 pages.

Deterministic HTML inspection +

DOM queries check for meta titles, descriptions, Google Analytics, CallRail, GTM, canonical URLs, Open Graph tags, and console errors. Results aggregate by page and site-wide.

Functional test execution +

Playwright submits forms with labeled test data, clicks CTAs and nav items, verifies redirects, and confirms tracking numbers render. Failures capture the specific page and element.

Screenshot capture across viewports +

Chromium and webkit browsers capture the same pages at four viewport widths plus Safari rendering for critical pages.

Multimodal visual assessment +

A SoTA-tier model reviews each screenshot against your Beyond Brand and Quality library, identifying mobile layout issues, whitespace problems, alignment gaps, and Safari-specific rendering. Each observation is confidence-scored.

Report assembly and structuring +

Results are organized into sections matching your current Google Form, with pass/fail/warning status, one-line reasons, severity levels, and links to failure screenshots.

outputs
QA report matching your Google Form +

Same sections, same question order, same workflow. Aggregate summary shows passed/failed/warning counts. Each item includes status, reason, and failure screenshots.

Per-page deterministic results +

Meta tags, analytics scripts, tracking codes, canonical URLs, and Open Graph tags verified on every page. Site-wide aggregation shows which pages are missing which checks.

Functional test results with redirects +

Form submissions, CTA clicks, and redirect destinations verified. Failures include the specific page URL and element that failed.

Visual assessment observations with confidence scores +

Natural-language feedback on mobile layouts, whitespace, alignment, and Safari rendering. Each observation includes severity (none/minor/major) and a confidence score (0-5).

Screenshots of failures +

Viewport-specific screenshots linked to each failed or warning item so your team can jump straight to the fix.

draft ready for estimator review _
tech used
Playwright browser automationOpenRouter multimodal AIRailway hostingSQLite databaseGoogle Workspace SSO
tool alternatives
Cypress or Puppeteer instead of Playwright for browser automationClaude or Gemini API instead of OpenRouter for multimodal visual assessment
honest qualification

Is this for you?

built for you if
  • + Web development agencies - Teams that build 5-10 client sites per month and need consistent post-launch QA without hiring a dedicated QA person.
  • + In-house web teams at marketing agencies - Marketing agencies that build and maintain client websites alongside ad and content services. Marion's use case.
  • + Teams using Playwright and modern stacks - Shops already invested in Playwright, Railway, or similar infrastructure. Ryan's existing codebase is the starting point.
  • + Organizations with documented best practices - Teams that have a written or documented internal quality standard (Marion's Beyond Brand and Quality library). The agent references this standard when assessing visual quality.
not for you if
  • - Teams without a documented QA checklist or best-practices library - The agent mirrors your current QA process. If you don't have a checklist or internal quality standard, you'll need to build one first.
  • - Sites requiring authentication to access - The agent works with live or staging sites that can be accessed directly. Sites behind login or authentication require additional setup (test accounts, session management). v1 targets sites that can be accessed without login.
  • - Organizations that want to generate website copy - This agent runs QA checks only. Website body-copy generation is out of scope per Marion's explicit preference.
  • - Teams needing real-time monitoring or scheduled re-runs - v1 is on-demand: paste a URL, run the tests, get a report. Scheduled post-launch monitoring or continuous re-runs are future phases.
pricing

Scoped build plus usage-based runs.

to build

The agent is a custom build tailored to Marion's test library, best-practices corpus, and CMS stacks. Pricing covers the initial build, dashboard, admin UI, and Playwright infrastructure. Per-run costs depend on multimodal AI calls and screenshot storage.

then
  • Initial build includes dashboard, admin UI for test management, and integration with your Beyond Brand and Quality library.
  • Per-run costs scale with site size (page count) and viewport matrix. Typical Marion site (15-20 pages, four viewports) runs at minimal cost.
  • Multimodal AI calls for subjective visual assessment are the primary per-run variable. Deterministic and functional tests use lightweight DOM inspection and Playwright interaction.
  • Screenshot storage on Railway volume is included. Retention policy (keep last N runs per site) can be configured.
  • Form-submission email verification (live mode) requires a dedicated test inbox or webhook endpoint. Setup and maintenance are customer-provided.
FAQ
How long does a full QA run take?

The agent crawls your site, runs deterministic HTML checks, submits test forms, captures screenshots across four viewports plus Safari, and assesses visual quality. A typical Marion site (15-20 pages) completes in under ten minutes. Your team reviews the report instead of filling out the manual form.

What happens if a form submission fails or redirects to the wrong page?

The agent captures the specific page URL, the form element that failed, and a screenshot showing what happened. Your team gets a one-line reason (e.g., 'contact form redirects to /thank-you-old instead of /thank-you') and can jump straight to the fix.

Does the agent check for missing analytics scripts and tracking codes?

Yes. The deterministic test suite verifies Google Analytics, Google Tag Manager, CallRail tracking scripts, and canonical URLs across every page. Results aggregate site-wide so you see which pages are missing which checks at a glance.

Can I add custom QA tests without writing code?

Yes. The admin UI lets you type a new test in plain English and pick the type: deterministic (DOM checks), functional (form/link interaction), or subjective (visual assessment). For deterministic tests, the system generates the DOM query for your review before saving.

What does the report look like?

The report mirrors your current Google Form exactly: same sections, same question order, same workflow. Each item shows pass/fail/warning status, a one-line reason, and links to failure screenshots so your team can jump straight to the fix.

Does the agent assess mobile layout quality?

Yes. The agent captures screenshots at four viewports (mobile at 375px, small laptop, medium laptop, large laptop) plus Safari rendering for critical pages. A multimodal AI model reviews each screenshot against your internal best-practices library and flags mobile layout issues, whitespace problems, alignment gaps, and Safari-specific rendering with confidence scores.

What if my site requires login to access?

The agent works with live or staging sites that can be accessed directly without authentication. Sites behind login require additional setup (test accounts, session management). Version 1 targets sites that can be crawled without login. Contact us to discuss authentication requirements for your specific site.

Can the agent verify that form submissions actually arrive in my inbox?

In the prototype, form submission verification is stubbed. After approval, you provide a dedicated test inbox address or webhook endpoint, and the agent confirms that test submissions land correctly. This is optional for the initial build but recommended for live mode.

related builds
deckclose-21664542bidscout-3dfbd88amachine-hunter-prd-v1-0-3172c3d9rent-roll-parser-agent-prd-v1-0-a6644eb0
next step

Stop launching sites with missing tags and broken forms.

The agent runs your QA checklist in under ten minutes, catching deterministic mistakes, functional failures, and visual issues before your team reviews the report. Paste a URL, get a report structured exactly like your current Google Form.