All work samples
Case Study2022–2025

The AI Workflow That Cut Review Production from 90 Minutes to 6

How rethinking the research layer — not just adding AI tools — freed up 4,200 hours of labor annually and enabled 6× software brand growth without proportional headcount increases.

Role

Director of Content, Technology

Company

Black & White Zebra

Scale

3,000 reviews / year

Timeline

2022–2025

The Number That Matters

Every B2B software review used to take up to 90 minutes to produce. After redesigning the workflow around an AI research and drafting agent, that same review takes 6 minutes — without trading accuracy for speed.

At 3,000 reviews per year, that's 4,200 hours of labor recovered annually. At a fully loaded company cost of ~$50/hour for a content manager (based on an $80K average salary, inclusive of benefits and overhead), that's approximately $210,000 in annual labor savings — from a single workflow change.

But the more telling number came later: software brand applications went from fewer than 100 per month in early 2024 to 600–800 per month by early 2026. A 6–8× increase in demand — absorbed without a proportional increase in headcount — because the production system had been rebuilt to scale.

The Results · Black & White Zebra · 2022–2025Fig. 01
Per review
906min
Production time per B2B software review, before and after redesign.
Recovered annually
4,200hrs
Across 3,000 reviews / year — equivalent to ~2 FTEs returned to the team.
Annual labor savings
$210K
At ~$50/hr fully loaded ($80K base, benefits + overhead).

What a B2B Software Review Actually Is

Before the workflow change makes sense, it helps to understand what this content type actually demands — because “software review” undersells it.

Each review had to accomplish three things simultaneously: build genuine trust with a software buyer who is actively evaluating options, accurately represent the platform's features, pricing, and positioning, and hold its own within a list of 10 or more alternatives covering the same use case. A buyer reading a “Best Project Management Software for Small Teams” list needs every tool reviewed against that specific context — not as a generic product, but as a solution to their particular problem.

That meant every review required:

The research phase alone — the effort to find, reconcile, and synthesize information from genuinely inconsistent sources — took anywhere from 25 to 60 minutes per review depending on how well the software brand had documented their own product. The drafting and verification added another 20–30 minutes. At the volumes we were running, and with revenue directly tied to how many live listings were on the page, this was a production problem with a direct business cost.

The Before: Where the Time Was Going

When I mapped the workflow, three stages were consuming the most time:

Research

The core bottleneck. Editors were doing open-ended information hunts — checking the software brand website, cross-referencing the help center, searching review sites, pulling YouTube screenshots, and manually consolidating everything into a working document before they could write a single word. Time varied unpredictably based on how much the brand had made publicly available.

Verification and QA

Even after a draft existed, editors were re-researching to confirm pricing, validate integrations, and check that the USP they'd chosen wasn't already in use elsewhere on the live list. These checks required human judgment but were eating time in proportion to the volume, not the complexity.

Screenshot sourcing and formatting

Every review needed a compliant product screenshot — correct dimensions, not from a competitor's site, not a marketing render, captioned accurately. Trivial at 10 reviews. At 3,000, it's a recurring logistical drain.

The human cost wasn't just time. The variability in research sources meant quality was inconsistent across the team — not because editors weren't skilled, but because there was no standardized input feeding their work. The team was hitting a throughput ceiling. Not from lack of effort, but from a workflow designed for a different volume.

Fig. 02 · Workflow comparison
Where the 90 minutes went — and where it didn't go after.
AI agent
Editor
Shared
Before
Manual research-first workflow
01
Identify software brand & category context
3–5 min
02
Research: website, help center, YouTube, Reddit, review sites
25–60 min
03
Consolidate research into working doc
10–15 min
04
Draft full review from scratch
20–30 min
05
Verify pricing, integrations, USP
manual lookups
06
Source, resize, caption screenshot
5–8 min
07
Publish
1 min
Per review
~90 min
Redesign
After
AI-augmented research & draft
01
AI agent: website extraction + G2 / Capterra / Forbes queries + Reddit sentiment
automated
02
AI agent: structured first draft (intro, Why I Picked, features, integrations, USP ×5, pros/cons)
automated
03
Editor: validate accuracy & apply USP judgment
~5 min
04
Editor: polish copy & editorial voice
included
05
Editor: QA against live page
~1 min
06
Screenshot pipeline (standardized)
batched
07
Publish
< 1 min
Per review
~6 min

The Diagnosis: What Actually Needed to Change

The fix wasn't obvious at first, because the obvious solution — hire more editors — would have replicated the problem at scale. The bottleneck wasn't people. It was the research input layer.

Every review followed a predictable structure. The sections were fixed. The sources were consistent. The information being gathered was the same type of information for every tool, every time. The only variable was how easy or difficult a particular software brand had made that information to find. That variability was causing unpredictable time inputs — and the manual effort to navigate it was the reason the clock read 90 minutes.

What the workflow actually needed was a system that could absorb the research phase entirely: extract all available information from the brand's website, pull user sentiment from G2, Capterra, Forbes, and Reddit, and synthesize it into a structured baseline document an editor could validate and write from — rather than starting from nothing.

My role wasn't building the system. The tech team at Black & White Zebra built a custom AI agent — pulling from multiple LLMs depending on the nature of the research task — that became the engine of the new workflow. My role was defining exactly what that system needed to produce, what quality standard its output had to meet to be useful, and where humans needed to stay in the loop — and building the editorial infrastructure around it that made the output reliable at scale.

The Redesign: Where AI Stopped and Humans Took Over

The principle behind the redesign was straightforward: AI handles research collection and initial content generation. Humans handle information validation and final editorial judgment.

In practice, that meant defining the system's outputs in enough detail that editors didn't have to re-invent the wheel on every review — and that the AI agent's output met a consistent enough standard that the polish pass could genuinely happen in minutes.

What the AI agent produces

  • Full research report: website, help center, G2, Capterra, Forbes, Reddit
  • Structured first draft: intro, “Why I Picked,” features, integrations, pros/cons
  • Five USP options ranked by fit
  • “Tool Fit” assessment flagging listicle relevance

What editors own

  • Verifying AI output accuracy against the research baseline
  • USP selection — strongest option, not already taken
  • Editorial voice and copy polish
  • Final QA against the live page and publishing

The quality standard — what “good enough to publish” actually looked like — was built from real human examples. Editors who had been doing this work the hard way contributed the style guide and the library of good and bad examples that calibrated the AI output and made the validation pass fast rather than comprehensive.

The result was a shift in what human time was spent on: not gathering information, but applying judgment to information that had already been gathered.

Fig. 04 · Where AI stops, where humans take over
AI handles collection & generation. Humans handle validation & judgment.
AI owns
Information at scale
Humans own
Judgment & polish
01
Website + review-site extraction
HANDOFF
01
Accuracy validation against source
02
Sentiment synthesis from Reddit & forums
HANDOFF
02
USP judgment & differentiation
03
Structured first-draft generation
HANDOFF
03
Editorial voice & polish
04
Five ranked USP options
HANDOFF
04
Final QA + publish
The boundary isn't fixed. It moved as the system improved and editors built confidence in the output.Calibrated against editor style guide

The Results

Production time: 90 minutes → 6 minutes per review. That's an 84-minute reduction per review. Across 3,000 annual reviews, the math produces 4,200 hours recovered per year — the equivalent of roughly two full-time employees' annual working hours returned to the business.

At a fully loaded company cost of approximately $50/hour (based on an $80K average content manager salary inclusive of benefits, payroll taxes, and overhead), that's $210,000 in recovered labor annually.

Vendor volume: fewer than 100 applications/month → 600–800/month. Between early 2024 and early 2026, software brand demand for listings increased 6–8×. The team absorbed that growth without proportional headcount expansion because the production system had been rebuilt to scale. More live listings meant more reader clicks — and revenue at Black & White Zebra grew 50% year-over-year for three consecutive years, reaching $20M+ by 2025.

Fig. 03 · Demand absorbed without proportional headcount
Software brand applications per month
Growth multiple
6–8×
0200400600800JAN 24APR 24JUL 24OCT 24JAN 25APR 25JUL 25OCT 25Early 2024~80 / moEarly 2026600–800 / mo
The slope is the story. The production system was rebuilt to scale before the demand curve started bending.
Source: internal application ledger

What Made It Work — and What Most Teams Get Wrong

The thing that made this work wasn't the AI tool. It was the research architecture underneath it.

The system succeeded because of a specific design decision: building a research component that did total extraction from the software brand website, structured web queries from G2, Capterra, and Forbes, and a user sentiment pass from Reddit — then synthesized all of that into a consistent baseline document that became the single source of truth for both the AI draft and the editor's validation. That moved the editor's job from “find information in 10 different places” to “confirm this information is accurate.” That's why 6 minutes is achievable. The wild goose chase was eliminated, not accelerated.

What most teams get wrong is treating AI as a step-swap rather than a workflow redesign. The standard approach: buy AI licenses for the team, expect immediate productivity gains. The reality that follows: uneven adoption, inconsistent output quality, and the realization that AI inserted into a broken workflow produces broken output faster. The actual sequence that works is: train the team on AI capabilities, audit the steps in every process workflow, categorize which steps are automatable and which require human judgment, then rebuild the workflow and standardize how it's used. That's a content operations project, not a software purchase.

The honest caveat: AI still gets things wrong. The 6-minute time isn't zero-error — it's faster-with-review. The AI agent misidentifies features, occasionally cites incorrect pricing, and sometimes flags tools as a poor fit when they aren't. The editorial review step exists because of this, not despite it. The goal was never to remove human judgment from the process. It was to ensure humans were applying their judgment to the right things.

Fig. 05 · Pull quote
Most teams treat AI as a step-swap. The system that actually works requires auditing your workflow first, categorizing which steps are automatable and which require human judgment, then rebuilding — not just inserting AI into the process you already had.
Rocco Brudno
Director of Content, Black & White Zebra
What teams get wrong

Takeaways

01

Define the output standard before you build the system.

The style guide and good/bad example library — created by the editors who were doing the work manually — came first. Without that, the AI has no calibration point and editors have no fast way to determine whether the output is usable.

02

Measure what the time is actually worth.

Most teams know their workflows are slow; few have calculated the labor cost. Running the math — time per task × annual volume × fully loaded hourly rate — makes the ROI case for investment concrete and defensible, and it clarifies which workflow problems are worth solving first.

03

The boundary between AI and human isn't fixed.

We moved it over time as the system improved and editors built confidence in the output. Starting with a conservative human-review layer and pulling it back as trust developed was slower than going all-in from the start — but it was how we preserved quality through the transition.

Rocco Brudno is a content and brand marketing leader with 16+ years in B2B SaaS. He led content strategy and operations at Black & White Zebra from 2022 to 2025, scaling the team from 3 generalists to 18 specialists and growing review revenue 50% year-over-year for three consecutive years.