← cd ..

Building an AI-Powered Content Pipeline with Claude

·3 min·中文
aiautomationclaudepythonnotion

The Problem

As someone building a bilingual personal brand (English + Traditional Chinese), I was spending hours every week on a repetitive loop: scan tech news, find interesting topics, draft posts for Threads/LinkedIn/Instagram, then translate for the other language. The core insight was that most of this process is pattern recognition and synthesis — exactly what LLMs are good at.

Architecture Overview

The pipeline has two main stages:

1. Ideation Pipeline

# Simplified flow
sources = fetch_all_sources()      # 100+ sources in parallel
items = deduplicate(sources)        # URL hash + Notion dedup
scored = claude_score(items)        # 5-dimension scoring
ideas = generate_angles(scored)     # Bilingual angle generation
write_to_notion(ideas)              # Structured Notion database

The scoring system evaluates each item on 5 dimensions:

  • Relevance (0.25 weight) — How well it fits my content pillars
  • Virality (0.20) — Social discussion potential
  • Timeliness (0.15) — How recent and time-sensitive
  • Uniqueness (0.20) — How novel the angle is
  • Brand fit (0.20) — Can I write from a unique perspective?

2. Creation Pipeline

ideas = fetch_picked_ideas()        # Status = "Picked" in Notion
drafts = generate_drafts(ideas)     # Platform-specific (Threads/LinkedIn/IG)
validated = quality_check(drafts)   # Auto-regenerate if score < 6
final = transcreate(validated)      # EN → ZH-TW trans-creation
write_to_pipeline(final)            # Ready for review

The key distinction: this is trans-creation, not translation. The Chinese version adapts cultural references, adjusts tone for Taiwan's tech community, and restructures sentences for natural readability.

Key Design Decisions

Claude CLI as Primary Backend

I use Claude's CLI (via Max subscription) as the primary LLM backend, with the Anthropic API as a fallback. This gives me essentially unlimited usage for scoring and generation without per-token costs.

class ClaudeClient:
    def __init__(self):
        self.use_cli = os.getenv("CLAUDE_USE_CLI", "true") == "true"

    async def generate(self, prompt: str) -> str:
        if self.use_cli:
            return await self._cli_generate(prompt)
        return await self._api_generate(prompt)

Tiered Source Architecture

Not all sources are created equal. The system uses a 4-tier hierarchy:

TierExampleFilterBonus
1Direct signalsNone+2.0
2OpenAI/Anthropic blogsNone+1.5
3Google AI, LangChainNone+0.5
4HackerNews, RedditKeyword+0.0

Tier 4 sources go through keyword filtering against an AI_KEYWORDS set to avoid noise.

What I Learned

  1. Scoring beats filtering. Early versions used simple keyword matching. Adding multi-dimensional scoring with Claude dramatically improved content quality.

  2. Trans-creation > Translation. Direct translation produces awkward content. Having Claude understand both cultures and adapt the message produces natural-sounding posts.

  3. Source health matters. Silent source failures are the worst bugs. I added health tracking that warns after 3 consecutive empty fetches.

  4. Cost control is essential. Capping scoring at 300 items per run and using CLI-first keeps costs predictable.

What's Next

The pipeline currently stops at draft generation — I still manually review and post. Next steps include building a personal website (you're reading it!) to house longer-form versions of this content, and eventually automating the publish step.