
The dev team is shipping fast. Their sprint velocity has genuinely jumped.
But the QA queue is three days long. The pipeline takes 45 minutes to run. Every fast PR is sitting in a holding pattern, waiting for a human tester who has a backlog of 23 tickets. The testers are working harder than ever, and they’re still the constraint.
The bottleneck didn’t disappear. It just moved to a different part of the org.
This is the pattern that shows up in every organisation that installs AI coding tools and leaves QA unchanged. The developer transformation is real, but it runs ahead of the testing and pipeline model it depends on. The sprint doesn’t speed up; the queue just builds up somewhere new.
This is Post 3 in the Vibe Thinking series. The posts covered so far are:
- Post 0 - Vibe Thinking - The Full Org Transformation
Why developer-only vibe coding doesn’t change the sprint, and what full transformation actually requires. - Post 1 - Vibe Thinking - The Developer Who Codes at the Speed of Thought
The developer layer, the discipline required to make fast output safe. - Post 2 - Vibe Thinking - The PM Who Writes Requirements That an AI Can Actually Use
The PM layer, and how ambiguous requirements become faster garbage in a vibe coding workflow.
This post is about what happens when the code reaches testing.
The Queue Migration
Vibe coding doesn’t eliminate bottlenecks. It relocates them.
Before AI coding tools, the constraint was typically writing code - developers were the limiting factor. The sprint was sized around how long it took to build. When vibe coding works well, that constraint moves. Code output per developer can multiply significantly. PRs arrive faster. More tickets hit “Dev Done” per week than ever before.
But the pipeline doesn’t know that. The testing environment still runs the same regression suite it always did. The QA team still has the same headcount. The CI/CD pipeline still takes the same time to run. The release cadence still assumes the same weekly output that justified it.
The result is predictable: faster code flowing into a pipe built for slower code, and the pipe becomes the constraint. It gets misread as a QA problem or a DevOps problem. The actual issue is transformation completeness: the org changed one layer and left everything downstream unchanged.
The good news: this is the most solvable bottleneck in the sequence. Most of what QA teams do manually today can be automated or shifted earlier in the pipeline - using tooling that didn’t exist five years ago, without reducing quality. Often the quality improves.
The constraint moved. The queue didn’t disappear. It just got a different label.
Why Manual Regression Breaks Under Vibe Coding Volume
Manual regression testing has always been a compromise. Thorough in theory, chronically under-resourced in practice. Most teams run partial regression at best - covering the critical paths and hoping the edge cases don’t surface in production.
That compromise was sustainable when code moved slowly. When a sprint produced twenty changed components, a QA team of three could cover it - barely, but reliably. When vibe coding doubles or triples the output, the same team now faces forty or sixty changed components per sprint. The math doesn’t work.
More code arriving faster with the same human review bandwidth makes the situation worse, regardless of how fast the code ships.
There’s a second problem specific to AI-generated code: the patterns are harder to spot manually. AI tends to produce output that is syntactically clean and structurally plausible, which means it reads well in a quick review. The issues tend to be semantic: incorrect assumptions about state, edge cases handled incorrectly, security-relevant behaviours that look fine at a glance. Manual testing that worked well for hand-typed code misses more of what AI generates, for exactly the same effort.
The answer is a different testing model.
Why “QA Sign-Off” Needs to Be Redesigned
The definition of done hasn’t changed in most organisations since they adopted sprints: dev complete, QA sign-off, deploy.
That model was designed around a world where testing is a phase - something that happens after the code is written, before the release. It made sense when code came out slowly enough that QA had time to run the suite, log findings, hand back to dev, and cycle through again.
In a vibe coding org, that sequence breaks in two ways.
First: The cycle time assumption is wrong. If a developer can build a feature in a morning, a two-day QA cycle for that feature is a 4:1 delay ratio. The feature sits finished, waiting. The developer moves to the next thing. By the time QA comes back with findings, the developer has moved context three tasks forward. Context-switching back is expensive.
Second: The ownership assumption is wrong. “QA owns quality” is the default in a sequential model. In a vibe coding world (where AI-generated code is producing output the developer hasn’t fully reasoned through), quality has to be everyone’s responsibility from the first line, not a function’s job at the end of the sequence. Pushing quality to the end of the pipeline is how OWASP-class issues reach production without anyone catching them.
QA sign-off isn’t going away, but what it means is changing. The real question is at what point in the flow QA gets embedded, and what “QA” actually means when testing can be automated and AI-generated at every stage.
What is shift left, and why it means something different with AI
Most QA engineers know the term, and most organisations say they practise it. Few have actually closed the gap between the principle and the pipeline.
Shift left means moving quality activities earlier in the development lifecycle, to the left of a timeline where code moves from requirements through to release. The earlier a defect is found, the cheaper it is to fix. The principle is sound; the challenge has always been execution.
There are three distinct models, and they are not equivalent. To make the differences concrete, the same feature runs through all three: “Add a CSV export button to the filtered report view - users can export up to 10,000 rows of their report data.” What changes is how much of the calendar gets consumed by iteration loops, and how much slips through without anyone catching it.
Model 1: Traditional QA (Test Last)
Requirements → Development → Lead code review ↺ → QA ↺ → Release.

The cost of context-switching back to fix code you wrote weeks ago is real, and this is the model most organisations are still running, regardless of what their job postings say.
reportId is taken from the URL without validating it belongs to the authenticated user (any logged-in user can pull another user's report data by guessing the ID). Happy path works, no tests written. PR opened end of Day 3.reportId is not reviewed because it was never documented. Requests changes.reportId in the URL exposes other user's report data. 4 bugs, including an OWASP A01 data exposure issue that has been live in the codebase since Day 3.- Lead reviews without a spec - relies on experience alone, misses auth scoping because it was never written down
- QA writes every test case manually, from scratch, for every feature
- Every defect triggers a full three-role context switch to resolve
- OWASP A01 data exposure bug existed for 7 days before a human happened to test for it
- No automated test, no security scan, no pipeline gate - only QA thoroughness under pressure
- At AI-assisted development output volumes, manual QA as the only safety net does not hold
Model 2: Traditional shift left (TDD, BDD, CI, pre-AI)
Requirements + AC → QA drafts test cases from AC + Dev builds with unit tests in parallel → CI on every commit → Lead code review ↺ → QA executes pre-drafted cases ↺ → Release.

Better than Model 1 - test cases are ready when the PR lands, and CI catches regressions early. But defects QA finds still trigger the same context-switch loop. Two dependencies also remain: developers writing unit tests manually under sprint pressure, and the AC being complete enough for QA to test against.
In practice, coverage is the first thing to get cut - and gaps in the AC become gaps in the test suite.
reportId not included; assumed obvious, never written.// TODO and skipped. Auth scoping has no test, because it wasn't in the AC. PR opened with partial coverage.// TODO on the 10k warning test. Comment: "Please add this before ship." Auth scoping gap doesn't surface; no criterion to check against. Does not block the PR.- QA drafts test cases from the AC on Day 2 - execution is faster once the PR lands, not starting from scratch
- Lead manually verifies whether unit tests cover the AC cases - a check the pipeline should enforce, not a reviewer
- QA exploratory remains the safety net for what the AC didn't specify
- Both roles doing more useful work than Model 1, but still compensating for gaps the process hasn't closed
- Auth scoping vulnerability shipped - not in the AC, no automated scan looked for it
- "Tests for what was documented" is not the same as "coverage of what matters"
- The gap between what was specified and what was assumed is precisely where security issues live
Model 3: Shift left with AI (the model this post is about)
AI drafts brief + AC (PM reviews) → AI generates test cases + Gherkin specs (QA reviews, commits specs to repo) + AI generates code + unit tests (Dev reviews) in parallel → AI code review tool flags issues + suggests fixes (Dev applies/modifies) ↺ → Quality gates green → Lead verifies gate summary + Architectural sign-off ↺ → QA wires E2E automation from Gherkin specs + merged implementation → Automated E2E + QA exploratory ↺ → release.

When AI enters the pipeline, the testing model can’t stay the same. Output volume changes the math: AI generates a full feature in the time it takes a developer to write three test cases. Teams that add AI coding without updating the test model hit a coverage cliff - output accelerates but coverage stays manual. This model resolves that by having AI generate code and unit tests together, both automated, neither optional.
The risk profile shifts too. AI-generated code introduces predictable, pattern-specific vulnerabilities that functional testing alone doesn’t catch. Shift left with AI requires security scanning embedded in the pipeline, not just functional coverage.
And the constraint that made shift left hard in a manual world (writing tests is time-consuming and gets deprioritised) disappears when test generation is automated. Shift left with AI becomes a default state enforced by the pipeline, not a discipline imposed on developers.
reportId URL param not validated against the session user's owned reports (OWASP A01) - tool presents a suggested fix with apply/modify options. (2) Export response should use chunked streaming for datasets over 5k rows - tool presents a suggested refactor. Dev selects apply on the auth fix, modifies the streaming threshold to 8k rows. Commits. Pipeline re-runs - all quality gates pass. PR opened end of Day 1.- PM - business completeness judgment, not AC formatting
- QA - reviews AI-generated test cases, Gherkin specs, and automation scripts for coverage and correctness, not authoring them
- Dev - reviews AI-generated code and applies or modifies AI-suggested fixes - correctness and intent, not syntax
- Lead - reviews only after quality gates pass - architecture and technical strategy, not bug-hunting
- Auth scoping caught by the AI code review tool on the first commit - before the PR was opened
- Unit test coverage enforced by the pipeline, not developer discipline under pressure
- E2E automation suite catches regressions on every future deploy
- The 3-second performance baseline is a living gate in the pipeline, not a judgment call made under pressure
Across all three models, the developer effort is the same. What differs is how much of that effort gets multiplied into calendar days by manual process, and how much of what matters gets missed by the people reviewing and testing manually. The sections below cover the tools that make Model 3’s pipeline possible.
What to Let Go Of
Let Go Of
- Manual regression as the primary safety net
- Testing that only starts once the code is already written
- QA as a pipeline phase at the end
- CI/CD pipelines sized for a weekly release cadence
- Test writing as a separate, manually scheduled activity
- Security reviews as a quarterly audit phase
- Quality ownership sitting with QA alone
- Lead review as the first layer of bug detection
Replace With
- AI-generated test coverage, automated on every commit
- Shift-left quality thinking into the brief stage
- QA embedded in every increment, not just at the gate
- Pipelines built for continuous delivery
- Unit tests auto-generated and auto-run on every push
- Security scanning embedded in every mini-feature pipeline
- Quality as every function’s responsibility from day one
- Lead review reserved for architecture, only after quality gates pass
| Let Go Of | Replace With |
|---|---|
|
|
The Security Dimension
The security implications of vibe coding at scale aren’t getting enough attention, and QA is the function best positioned to own the response.
When nearly half of AI-generated code arrives with known security vulnerabilities, the question is where in the pipeline those vulnerabilities are being caught.
In most organisations today, the honest answer sits somewhere between code review (occasionally) and production (often). Neither is the right place.
Take a common pattern. An AI coding agent, given the prompt “Add an endpoint that returns user order history”, will generate a working endpoint. It will also, in a significant proportion of cases, do one or more of the following: fail to scope the returned data to the authenticated user (returning other user’s orders if the user ID is passed directly), skip input sanitisation on the order ID parameter, or expose fields that contain PII beyond what the use case requires. The endpoint works and passes a happy-path test. It ships.
This is the default output pattern of a model working from a prompt without security constraints baked into the brief.
When accountability is spread across three parties with no explicit owner, the vulnerability travels the full distance to production.
The fragmented ownership problem makes this worse. When the prompt author, the AI, and the reviewer are different people (or when the reviewer is doing a light pass), nobody truly owns the security posture of what shipped. Diffuse accountability is how OWASP vulnerabilities stay in production for months.
Shadow IT extends the problem further. As vibe coding lowers the technical barrier to building, non-engineering functions (operations teams, marketing, analytics) will start shipping their own tools and automations, often outside any security governance model. QA and AppSec teams need to extend their remit to account for this, before a self-built internal tool becomes the attack surface for a production system it touches.
What QA’s new security remit includes:
- Security test coverage as a standard pipeline component, not a separate audit phase
- Automated scanning for OWASP Top 10 patterns on every PR - not post-release
- A governance policy for AI-built tools created outside the core engineering team
- Explicit ownership assigned for every AI-generated component that handles sensitive data
TESTR - Automated Unit Test Coverage at Every Commit

- Unit tests are written manually - or deferred entirely
- Coverage is partial; edge cases and security paths are skipped
- New code builds on top of an untested foundation each sprint
- No visibility into which functions have coverage and which don’t - gaps are invisible until they surface
- A regression surfaces in production three sprints later - tracing back to an edge case in the export function no one wrote a test for
- On the first commit, TESTR reads every function signature and code path
- Tests are generated automatically - 10k row threshold, Free tier restriction, empty dataset state, auth scoping
- Tests run immediately; PR is clean if they pass, blocked if they fail
- Developer sees failures before the PR is opened, not after QA picks it up
- Coverage stays current on every commit, automatically - no scheduling needed
PASSR - Automated Engineering Review Across Every Commit

- The Lead reviews the PR - code works, happy-path tests pass, nothing looks obviously wrong
- No automated check runs across performance, availability, security, or scalability dimensions
- QA runs functional tests; the right data comes back. It ships.
- Two weeks later: a security researcher finds the endpoint returns data scoped to a URL parameter, not the authenticated session - classic IDOR, OWASP A01
- It passed code review. It passed QA. Nobody was looking across all the right dimensions.
- PR lands - PASSR runs automatically across all 8 review dimensions: Performance, Availability, Security, Scalability, Correctness, Architecture, Code Quality, Testing
- Complete resolution package delivered: issue description, impact rating, and ready-to-apply fix inline
- Developer applies the fix in the PR - QA never sees the unpatched version; critical issues block merge automatically
- PASSR portal logs the finding, fix, and outcome - full lifecycle visibility across every repo and every PR
The Pipeline Has to Catch Up Too
Test coverage and security scanning address what’s in the code. The pipeline itself is a separate constraint.
A CI/CD configuration designed for weekly releases creates artificial latency that compounds across every fast PR. If the pipeline runs for 45 minutes and a team is merging six PRs per day, that’s four and a half hours of pipeline time per day - which means PRs are waiting in queue, not running in parallel, and the feedback loop between code and validated deployment is measured in hours rather than minutes.
Pipeline evolution in a vibe coding org works along a few structural dimensions.
Parallelisation. Tests that used to run sequentially can run in parallel. A suite that takes 45 minutes sequentially can often run in under 10 when the test jobs are properly split. Most teams have never parallelised their pipelines because the weekly release cadence didn’t make the latency painful enough to fix. Vibe coding makes it painful enough.
Incremental testing. Running the full suite on every commit is expensive. Running only the tests relevant to the changed code paths - with a full suite scheduled less frequently - dramatically reduces per-commit pipeline time without reducing coverage.
Environment parity. Pipelines that fail in staging but pass in production (or vice versa) are a sign that environments have drifted. Containerised environments with infrastructure-as-code eliminate the most common source of “works on my machine” pipeline failures - which are even more common when the code was generated by AI rather than hand-typed by a developer who knows the environment.
The pipeline is infrastructure, and it needs to be treated as a first-class engineering concern rather than an operational afterthought. In a vibe coding org, a slow pipeline causes as much friction as a slow developer.
In a vibe coding org, parallelisation, incremental testing, and environment parity are the baseline pipeline configuration, not the optional extras.
Working with Flytebit
At FLYTEBIT TECHNOLOGIES, Vibe Coding Transformation is a structured engagement.
QA processes and pipeline health are part of every transformation feasibility study we run. In most engagements, the QA layer is where we find the biggest gap between what teams believe is happening and what the pipeline metrics actually show. Teams describe their QA process as “solid,” and then the pipeline data shows a three-day average from PR open to QA sign-off, a 60% manual regression rate, and security scanning that runs quarterly rather than on every commit.
We map the current testing model, identify where test coverage falls below the risk level of the code, and establish what an automated-first pipeline looks like for that team’s specific codebase and deployment model. TESTR and PASSR are part of that picture, as is the DevOps configuration that feeds them.
Not sure where your organisation stands today? The Vibe Coding Transformation Readiness Quick Check takes five minutes and gives you a per-function view of where your pipeline is most exposed.
If your team is shipping faster with vibe coding and the QA queue is growing to match, that’s the conversation to start.
What’s in this series
Related Reading
The series intro:
👉 Vibe Thinking - The Full Org Transformation
Why developer-only vibe coding doesn’t change the sprint - and what full transformation actually requires across every function.
The PM layer:
👉 Vibe Thinking - The PM Who Writes Requirements That an AI Can Actually Use
Precision requirements reduce the surface area for AI to improvise - and that includes the security surface area that QA has to catch.
On AI rollout failure patterns:
👉 The 5 Biggest AI Implementation Mistakes - and How to Avoid Them
The patterns that cause AI rollouts to underdeliver - including the QA and pipeline assumptions that get left unchanged.
Key takeaways
- ✅ Vibe coding relocates the bottleneck, it doesn't remove it: If QA and DevOps are unchanged, the pipeline becomes the new constraint almost immediately after the developer transformation takes hold.
- ✅ Manual regression doesn't scale with AI-generated output volume: The testing model has to change at the same time as the development model - not after the queue builds up.
- ✅ Shift-left quality means acceptance signals defined in the brief: Testing at the end of the sequence is the most expensive time to do it. Quality has to enter the workflow at the requirements stage, not the QA stage.
- ✅ 45% of AI-generated code fails security tests (Veracode Spring 2026): This is a pipeline governance problem, not just a developer problem. QA owns the catch point - and most teams don't have one in the right place.
- ✅ Fragmented ownership is how vulnerabilities stay in production: When the prompt author, AI, and reviewer are separate people without explicit accountability, nobody truly owns the security posture of what shipped.
- ✅ Shadow IT risk grows with vibe coding adoption: Non-engineering teams will build their own tools. AppSec governance needs to extend to everything that touches production systems - not just what engineering ships.
- ✅ TESTR keeps unit test coverage current as output volume increases: Auto-generated, auto-run on every commit - coverage scales with the team instead of lagging behind it, without anyone having to schedule it.
- ✅ PASSR catches Performance, Availability, Security, and Scalability issues on every PR: Before human review, with description, impact, and a ready fix. The PASSR portal makes quality trends visible across all repos over time.
- ✅ Pipeline configuration is a first-class engineering concern: Parallelisation, incremental testing, and environment parity aren't optional refinements. In a vibe coding org, a slow pipeline is as much a bottleneck as a slow developer.
Ready to Transform Your Business with AI?
Let's discuss how Agentic AI and intelligent automation can help you achieve your goals.
