How We Run Growth Ops at BiClaw: AI Agents Doing the Work
An inside look at how BiClaw uses AI agents for content, SEO, and growth operations — what works, what breaks, and what it actually costs.
BiClaw

How We Run Growth Ops at BiClaw: AI Agents Doing the Work
TL;DR
- BiClaw runs its entire content and growth operation with a small team + AI agents. No dedicated content team.
- The agent pipeline covers: keyword research → outline → draft → QA → publish → revalidate → verify.
- Daily cost target: under $20/day across all agents. Main challenge: keeping context lean.
- Key lesson: agents fail on ambiguity. Clear SOPs + good guardrails = reliable output.
Image: How We Run Growth Ops at BiClaw: AI Agents Doing the Work
Why We Built This
BiClaw is a pre-revenue startup. We can't hire 5 content writers, a social media manager, and a growth analyst. But we still need to compete for organic traffic against companies with real teams.
So we built the operation we could afford: a small team of AI agents with clear roles, tight guardrails, and daily human oversight.
This post is a transparent look at how it works, what we've learned, and what still breaks.
The Stack
We run on OpenClaw — the same platform BiClaw is built on. Each agent has:
- A dedicated workspace with its own SKILL.md (instructions + constraints)
- A model assigned to the task type (cheaper for utility, stronger for writing)
- A daily token budget with hard stops
- Logging to usage.jsonl and quality scoring to scores.jsonl
Four agents currently active:
| Agent | Role | Model |
|---|---|---|
| Growth | Blog content, outreach drafts | DeepSeek V3.2 |
| Ops | Infrastructure health, monitoring | GPT-4o-mini |
| Optimizer | Landing page experiments, conversion | GPT-4o-mini |
| Main | Orchestration, daily review, operator comms | Claude Sonnet 4.6 |
The Content Pipeline (End to End)
Every blog post goes through this flow:
1. Keyword brief → Growth agent
2. GET /api/content/related (internal link candidates)
3. Outline (gpt-5-mini: fast, cheap)
4. Full draft (~1,800 words, DeepSeek V3.2)
5. QA: validate links, word count, MDX safety
6. Publish via publish-with-verify.sh
7. POST /api/revalidate (blog + slug + sitemap)
8. Live verify: web_fetch checks H1 + TL;DR present
9. Log: usage.jsonl + quality score
Steps 1–9 happen without human involvement. Human review comes after: we spot-check 2–3 posts per batch for tone, accuracy, and competitive positioning.
Quality Controls We Actually Use
Minimum bar (server-side enforced):
- 900 words minimum
- 3 internal links
- 2 external links (HTTP 200 verified)
- MDX pre-compilation (broken content stays draft)
Soft quality checklist (agent self-review before publish):
- TL;DR with 4–6 bullets
- At least 1 table
- At least 1 concrete example with numbers
- H1 ≠ page title (question or outcome-first)
- Meta description 140–155 chars
Human review triggers:
- Quality score below 3.5 (auto-flagged)
- Post touches pricing or competitor comparisons (sensitive)
- External link returns non-200 (quarantined)
What the Daily Rhythm Looks Like
07:30 VN — Morning brief delivered to Telegram: experiment results, GA4 top pages, cost vs cap, overnight publishes.
Morning (manual) — Tuan reviews briefs, calls out priority topics or corrections.
During the day — Growth agent runs content batches (max 5 posts/run). Ops agent monitors infra. Main orchestrates.
18:00 — Daily review cron: reads all agent logs, compiles consolidated report, saves to reviews/daily-YYYY-MM-DD.md.
Monday 18:00 — Weekly synthesis: model performance, cost trends, quality scores. Adjusts model routing if needed.
Cost Control (The Hard Part)
We hit $68/day in early March from a runaway context window issue. Growth agent was sending ~40k tokens per request. That's expensive at GPT-4 pricing.
Fixes that worked:
- Compaction threshold lowered (sessions compact earlier)
- Context trim: tool results pruned after 15 min
- Model swap: gpt-5 → DeepSeek V3.2 for content (same quality, 80% cheaper)
- Hard stop at $20/day with 80% warning alert
Current daily spend: tracking toward $10–15/day.
What Still Breaks
Infra errors confuse agents. Before we added the "don't debug infra" rule, the growth agent would spend 30 min trying to fix a CDN cache issue that needed 1 line from a developer. Now: report the error + URL, move on.
Ambiguous tasks produce mediocre output. "Write about AI agents" returns generic content. "Write a 1,800-word guide on how AI agents sort emails, targeting 'email management software' (2.9k vol, KD 34), with a mini-case from an e-commerce store" returns something publishable.
Quality score gaming. An agent that scores its own output will score itself highly. We cross-check: if the quality score is 4.5 but the post has no table and a weak TL;DR, the score gets manually corrected and the prompt updated.
External link rot. Links that were valid at publish time break later. We don't have automated re-verification yet — it's on the dev roadmap.
Mini-Case: Week 1 Content Batch
Target: 5 posts, Monday publish.
- Keywords: best-ai-agents-2026 (KD 15), agentic-ai-news-2026 (74k vol), ai-executive-assistant-guide, ai-automation-agency-guide, ai-email-management-software-2026
- Time from brief to all 5 published: ~4 hours (agent run time + human review)
- Human time spent: ~45 min (reviewing drafts, approving publishes)
- Cost: $3.20 for the batch
All 5 passed server-side QA on first attempt. Two needed minor fixes (wrong internal link slug). None needed full rewrites.
What We'd Do Differently
- Build the QA layer first. We spent a week fixing MDX errors that a preflight checker would have caught in seconds.
- Start with 3 posts/batch, not 10. More posts = more errors = more context = more cost. Smaller batches are more reliable.
- Version control content from day 1. We lost some early drafts. Content versioning is now in the DB — every update snapshots the previous version.
- Instrument before you scale. You can't cut costs if you don't know where they're going.
The Honest Assessment
Is this production-grade? Not yet. Failure rate on automated publishes is around 5–10%. Human spot-checks catch most issues before they matter.
But for a pre-revenue startup competing for organic traffic, it works. We're publishing 10–15 posts/week with a combined 3–4 hours of human involvement. The quality floor is enforced by tooling, not discipline.
The goal isn't to remove humans from the loop. It's to put humans where they add the most value: strategy, tone calibration, competitive positioning — not copy-pasting content into a CMS.
Related reading
- What is an agentic AI architecture? A practical guide
- Best business process automation tools in 2026
- From SOP to autopilot: using AI agents for business workflows
We build BiClaw on OpenClaw. If you're curious about the underlying platform, see OpenClaw's documentation.