Program Analysis · Jan 2024 – Dec 2024

Building a real software review operation from nothing

Black & White Zebra3 Publications3 Analysts Hired & Trained12 Months

Forbes, Capterra, G2, PCMag — none of them are hiring practitioners to use software, evaluate it without a vendor briefing, and follow up with real users to validate what they found. We decided to build that instead.

120+

Software analyses completed across 3 publications

Major industry market reports, 10 tools each

1,500+

Outreach contacts across vendors, users & SMEs

120+

Expert interviews recorded — 50%+ with industry SMEs

1k+

Original product assets (screenshots & video)

2–3×

On-page time increase on content infused with original screenshots

01 — Context

The market gap we were solving for

Black & White Zebra is a digital media company with 15 publications covering specific professional verticals. Three of those — The Digital Project Manager, People Managing People, and CPO Club — compete directly for B2B SaaS tool review traffic. Project management software, HR and payroll platforms, product management tools. High commercial intent, high competition.

The problem: most software review content is produced by writers who've never logged into the product. They're synthesizing vendor websites, aggregating G2 scores, and calling it research. The results are interchangeable — the same seven criteria, the same four pros, the same vague caveat about “it depends on your needs.”

The thesis was simple: if we could build an operation where analysts used the software, talked to the people buying and implementing it, and captured that perspective at scale — we'd have something no one else in our competitive set had. Original data. Real authority. Content that could support our advertising relationships because we could say, with specificity, who each tool was right for.

I designed and ran this program end-to-end over the course of 2024 — from the hiring brief to the review methodology to the weekly management cadence to the editorial handoff workflow.

02 — Methodology

How we reviewed software

Each review was built in three distinct phases. The combination of first-hand experience, structured user conversations, and community-level sentiment data was what separated this from a typical editorial operation.

Phase 01

Hands-on Product Trial

Each analyst either set aside budget for a paid plan or worked directly with vendors to secure account access — disclosing upfront that gifted access wouldn't influence the review, which every vendor agreed to. They then built a populated demo environment using AI-generated dummy data (names, tasks, projects, employee records — realistic for the vertical) and ran a structured evaluation against 7 criteria: onboarding, core functionality, standout features, usability, customer support, value, and customer reviews. Two business days per product, max.

Phase 02

ICP Conversations

Analysts reached 3 distinct buyer profiles per product via cold LinkedIn and email outreach. These weren't vendor-supplied references — analysts sourced people who had left public reviews on G2, posted in relevant communities, or were actively talking about the software on LinkedIn, then reached out directly.

Phase 03

Sentiment Analysis

Analysts scanned Reddit forums, Slack communities, and user groups to surface recurring themes around pricing, feature gaps, implementation friction, and competitive comparisons. Smaller tools: ~20 data points. Market leaders like Monday.com, Bamboo HR, ClickUp: 100–200 responses.

Output

Market Report

All three phases fed into a structured market report: a single research asset per product containing scores, original screenshots, call recordings, and quantitative sentiment data. This report became the source of truth for content writers and editors upgrading existing review pages.

03 — Research Design

Three buyer profiles, per product

The goal wasn't just to collect opinions — it was to understand how the same software performs at different organizational sizes and use cases. Each analyst targeted three conversation types per review.

Startup / SMB

Owner-operators or department leads at smaller organizations — typically evaluating for the first time, weighing cost and implementation lift heavily, often moving off a manual or legacy solution.

Mid-Market

Five to ten years in their field, often managing a transition from one platform to another. Most useful for surfacing why people switch, what breaks down in practice, and what workarounds users develop on their own.

Enterprise

Rapidly scaling or established enterprise accounts where the questions shift — not "does this work" but "does this scale, integrate, and survive procurement?" Higher stakes, more nuanced on implementation complexity, globalization, and total cost of ownership.

04 — Team & Management

How I built and ran the analyst team

I hired three analysts, each aligned to one publication and vertical: one for project management (DPM), one for HR and people management (PMP), and one for product management and UX (CPO Club). None of them came in with a ready-made review methodology — that was part of the design. Building the muscles was the point.

The first two weeks were structured onboarding: identifying best-in-class exemplars to study, defining evaluation criteria, designing their dummy testing scenario, building their outreach contact list, writing interview scripts, and building out their analyst bio and profile. I graded each deliverable and gave written feedback before they started live software reviews.

Management cadence: Monday standups to validate progress and priorities, Wednesday reviews to discuss work quality, interview outcomes, and any scope adjustments. I managed this team exclusively — no middle layer. That proximity mattered because the methodology was still being figured out as we went, and I needed direct feedback loops to course-correct quickly.

Tool selection was a shared decision. I worked with the editorial director to identify which software categories and specific products would generate the most value — mixing market leaders (the high-traffic targets) with emerging tools worth getting ahead of. From there, each analyst selected 10 triable products per batch, with preference for free trials over vendor-gated demos.

05 — Program Review

What worked. What didn't.

I kept internal retrospectives on this program throughout the year. Here's the honest read.

✓ What Worked

The two-day review cycle

One day wasn't enough to set up a real test scenario, evaluate against criteria, and identify anything non-obvious. Two days gave us enough depth to write with authority without becoming an inefficient crawl.

Vendor & SME interviews at scale

Jan through April was a learning curve on outreach mechanics. By June the team was consistently booking 5–10 calls per week. Year-end: 1,500+ outreach contacts, 120+ recorded interviews, more than half with industry SMEs. The calls with real users were consistently the richest research in any market report.

AI as investigation scaffolding

Early on we tested whether AI could do the research. It couldn't — it returned vendor marketing copy. The pivot was to use AI differently: as a starting point investigator that surfaces where analysts should focus, identifies relevant sources, and analyzes quality — with humans taking it from there. That worked well.

Analyst-built quote bank tool

One analyst built a custom ChatGPT workflow that followed specific instructions to scrape an interview transcript and pull key quotes into a structured quote bank. All three analysts adopted it. It's now the team's default process for converting interview recordings into usable blog content and surfacing the best moments from any conversation.

Screenshot infusion → time on page

Pages upgraded with original screenshots from the program saw 2–3× increases in on-page time on The Digital Project Manager. The data was directionally consistent enough that it justified the research overhead.

✗ What Didn't

Video production

We spent nearly two months before accepting the creative team didn't have a clear enough brief to execute video reviews reliably. Multiple reshoots, inconsistent output, and one analyst who delivered the only fully finished video reviews. We cut scope to screen walkthroughs and moved on — two months I'd approach differently.

Editorial handoff for content infusion

Originally analysts were supposed to infuse their own research into existing content pages. That was too slow and misaligned with what they were hired to do. We pivoted to handing market reports to editors — but adoption was inconsistent. Infusion needed a clearer owner and a defined process before the first review shipped, not after.

Misreading where AI belongs in a research workflow

The early instinct was to automate as much as possible. That failed in consistent and instructive ways. Personalized connection outreach written by AI didn't convert — the messages were detectable and response rates confirmed it. Scraping a product's website to generate a review produced a polished summary of the vendor's own marketing copy. Nothing original. AI had no ability to form an opinion on whether a product was a good or bad fit for a specific user — that judgment requires domain experience, real context, and time inside the software. We also tried using AI to summarize qualitative data in a vacuum: it stripped out everything that made the data worth having. We eventually built a structured quote bank, scored entries for relevance and quality, and used that as the basis for the model — which worked, but it's a different solution to a different problem than what we started with.

Where AI earned its place

Finding investigation starting points — surfacing contacts, community discussions, and reference material for an analyst to evaluate, not use wholesale. Translating messy interview recordings and transcripts into structured first drafts that analysts could shape into op-eds or summaries. Building dummy data quickly: realistic names, task structures, employee records, and project scenarios ready to load into a platform before a walkthrough. And when information on a product was genuinely scarce, identifying new angles and sources to investigate — not filling gaps with invented data, but pointing analysts toward where real data might exist. The consistent pattern: AI worked as a force multiplier for human judgment. It didn't work as a substitute for it.

06 — Output

What the program shipped

120+

Individual software analyses

Across project management, HR & people ops, and product management

Industry-level market reports

Each covering 10 products with original research, screenshots, and interview data

120+

Expert interviews recorded

Vendors, power users & SMEs — 50%+ SME conversations

1k+

Original product assets

Screenshots and screen-recorded walkthroughs, catalogued by product

The market reports fed multiple content formats downstream: individual in-depth reviews, comparison articles (X vs. Y), roundup lists, and five state-of-the-industry reports that synthesized findings across all reviewed tools in each vertical. Those larger reports became the highest-effort, highest-differentiation content we published that year.

Note on metrics: Traffic and SEO ranking data from content infusion isn't included here; the 2–3× on-page time figure is from The Digital Project Manager during the first content infusion cycle.

07 — Tools & Stack

What we used to run it

No custom infrastructure. The operation ran on a practical stack that an editorial team could maintain.

Apollo Lead data & outreach enrichment

Airtable Research tracking & market report dependencies

LinkedIn Sales Nav Contact sourcing & outreach

Claude Data analysis & content writing

ChatGPT Research discovery, interview prep & quote bank

Google Drive Asset library & report storage

UpWork Consultant sourcing for specialist interviews

08 — Reflection

What I took from it

This was a program I built without a blueprint. There was no internal model to reference, no industry equivalent doing it quite this way. The first half of the year was heavy on iteration — figuring out what the two-day review cycle should produce, how much structure the analysts needed versus how much they could define themselves, where the editorial team fit in the workflow.

The thing that held up was the core thesis. Software review content that comes from people who've used the product — who've talked to three different kinds of buyers, who've read the forums — is categorically different from content that doesn't. Advertisers felt it. The on-page data confirmed it. The analysts who got in front of real users consistently came back with material that couldn't have been generated any other way.

What I'd do differently: front-load the editorial infusion workflow design before the first review ships, not six weeks after. And build the outreach playbook on day one — the analysts figured out cold outreach mechanics, but it took four months to hit a reliable weekly cadence. That's time you could recover with the right training structure from the start.