Hi there 👋

Welcome to Gerald Chen's tech blog — notes and deep dives on frontend development, JavaScript, AI agents, and the modern web.

Feel free to look around, or visit the About page to learn more.

Social Links:

Featured

In Agent Land, Errors Are Scheduled Work: From Error Boundary to Verifier✓

30 Jul, 2026

The model confidently declares "done" while the work isn't — an error that doesn't error, and no layer of try/catch will catch it. Mechanical failures get retry with exponential backoff plus a hard max_steps brake; semantic failures get an independent verifier — and verification must run in an isolated context. Part 6 of the fe2agent series, with tiny-agent v0.6 (withRetry + MAX_STEPS + verifier).
Context Is RAM, Memory Is Disk: Give Your Agent Lazy Loading✓

27 Jul, 2026

You think the agent remembered your rule; really that sentence lives in yesterday's session's messages array, and today it wakes up brand new. Context is RAM, memory is disk — "write it down" is the only persistence, and retrieve-and-backfill is agent-land's lazy loading. Part 5 of the fe2agent series, with tiny-agent v0.5 (a markdown memory file + naive keyword retrieval + two new tools, save_memory / search_memory).
Prompt Is the New CSS: Declarative, Cascading, No Devtools✓

24 Jul, 2026

Change one line of the system prompt and the agent behaves like a different animal — prompts are like CSS: declarative, cascading, overridable, but with no devtools to show which rule is in effect. Part 4 of the fe2agent series — prompt layers, instruction dilution, and structured prompting, with tiny-agent v0.4 (same tools, two system prompts, run the behavior diff).
From OpenClaw to Claude Code: A Half-Year Config Evolution Ledger✓

22 Jul, 2026

Two TODOs and one regret from three months ago are all settled, plus four things I never imagined: memory as the rule hub, Skills in the pipeline, bilingual i18n as a second publishing line, and AdSense reshaping the writing discipline. Seven evolutions, one section each — a ledger.
Tool Call: Let the AI Dispatch, You Stay the Reducer✓

21 Jul, 2026

In frontend you dispatch and you write the reducer. In an agent the AI dispatches; the reducer half is still yours. Part 3 of the fe2agent series — the tool schema trio, separation of execution rights, validating the AI like an untrusted client, with tiny-agent v0.3.
The Model Has No Memory: Context Is State, and Every Turn Is a Full Re-render✓

18 Jul, 2026

You think the agent remembers you; really it re-sends the full messages array every turn. LLM calls are stateless — "conversation" is a client illusion.
An Agent Is Just a While Loop? From UI = f(state) to action = LLM(context)✓

15 Jul, 2026

An agent is a while loop whose function isn't pure. Series opener from UI=f(state) to action=LLM(context), with a 32-line runnable tiny-agent.
3 Principles Fiber Teaches Us About AI Agent State Design✓

14 Jul, 2026

How Fiber's three mechanisms (double buffering, time slicing, lanes priority) map to AI Agent state design — three principles I stole and dogfooded.
Kimi K2.7 Code Lands in GitHub Copilot: After Benchmarks and Buzz, a Chinese Model Finally Wins the Channel✓

11 Jul, 2026

GitHub Copilot added Kimi K2.7 Code to its model picker on 2026-07-01 — the first open-weight and first Chinese model there. Hand-verified data, plus the "open weights + Azure hosting + off-by-default" trust structure.
WorkBuddy Hit #1 in China in 90 Days: What messaging-first AI Agents Teach the Claude Code / Codex CLI World✓

Updated: 7 Jul, 2026

Tencent WorkBuddy launched 2026-03-09 and hit 8.85M monthly visits (+831% MoM) with a DAU 3-4x the
GLM 5.2 Beats Claude on a Cyber Benchmark: Is China's 'Scenario-Specific Superiority' the New Inflection Point?✓

Updated: 7 Jul, 2026

Zhipu's GLM 5.2 hit 39% F1 on Semgrep's IDOR vulnerability detection benchmark, beating every Claude Code Opus version at roughly 1/6 the per-finding cost — while still trailing about 2 points on aggregate benchmarks. Kimi and Qwen reproduce the same pattern. Is "scenario-specific superiority" the 2026 inflection point for Chinese models?
AI Model Comparison Mid-2026 Sequel: 52 Days After blog166, Here's What I Got Wrong in May✓

6 Jul, 2026

A line-by-line self-audit 52 days after blog166: of 5 predictions, 2 retracted (the Sonnet/Opus advice is stale, the open-source gap was underestimated), 1 still pending (Anthropic pricing), 2 held up. Plus six months of real model-routing observations from this blog and the checkpoints for the next audit.
Ponytail Hit 73k Stars in Three Weeks: A Side Project Cured AI Agents' Over-Coding Disease✓

4 Jul, 2026

DietrichGebert's ponytail launched on 2026-06-12 and climbed to 73k stars / 3,848 forks in three weeks, covering 16 agent ecosystems. Its core is a 7-step decision ladder that forces AI to obey YAGNI. I ran an A/B test against a 196-line TableOfContents component from this blog; after installing ponytail the review recommendation was to cut it to 25-30 lines, a -84% change. Inside — firsthand test data, three structural reasons for the viral growth, a step-by-step breakdown of the ladder, and three counter-boundaries.
WeRead's Official Agent Skill: The First Chinese Content Platform to Ship One — But Is the Indie Dev Monetization Window Actually Open?✓

3 Jul, 2026

微信读书 (WeRead) shipped its official Agent Skill on 2026-05-17 — the first "official player" among Chinese content platforms. Community forks exploded in the first week; six weeks later there is still not a single commercialized product. This post unpacks the three structural blockers behind the "official access + community boom + zero commercialization" pattern, and what indie devs can actually cash in on right now.
ZCode on HN's Front Page: When Silicon Valley Actually Read a Chinese Dev Tool's Docs✓

2 Jul, 2026

Zhipu's official GLM-5.2 harness "ZCode" hit the Hacker News front page (155/192 when I first opened it, 264/236 now). This isn't "yet another AI IDE" — it's the first time a Chinese dev tool has been seriously evaluated by mainstream Silicon Valley engineers. This post breaks down the timeline, what ZCode actually is (a desktop app, not a CLI), the three real objections on HN, and a 30-line TS snippet for wiring up GLM Coding API yourself.
The 90-Day Spaghetti Point: Why Maintenance Is Vibe Coding's Real Exam✓

27 Jun, 2026

When Karpathy coined Vibe Coding in Feb 2025, he only talked about how good it felt to write. GitClear's 211M-line study and the 2026 wave of rescue jobs both map that feeling onto a 4x maintenance bill in year two. This post unpacks the three mechanisms that detonate around day 90 — the Spaghetti Point — and how to keep AI speed without paying that bill.
The CLI's Second Spring in the AI Era: Three Structural Reasons Behind the Claude Code / Codex / Charm / Ink Surge✓

25 Jun, 2026

Starting in 2025, Claude Code, Codex CLI, Charm Bubble Tea, and Ink all took off at once. After 30 years of GUI dominance, the CLI suddenly became the default form factor for AI tooling — Claude Code itself even ships as a Bun binary. This isn't nostalgia. Three structural constraints — LLM I/O, controllable long-running tasks, and cross-tool interop — are pushing engineers back to the terminal. Includes a 30-line Claude Agent SDK sample.
After the Loop Is Running: A Playbook for Verification, Comprehension, and Cognitive-Surrender Debt✓

23 Jun, 2026

blog191 walked through the five components of Loop Engineering and named three debts—verification, comprehension, cognitive surrender—without offering a treatment plan. This post fills that gap: a Wharton 1,372-person experiment, Anthropic's 52-engineer study with a 17% comprehension gap, a real $4,200 overnight bill, Claude Code's PreToolUse rejection hook, and three checkpoint patterns—the full playbook in one place.
Stop Re-Introducing Your Project to AI Every Session: A Project Passport with AGENTS.md, CLAUDE.md, and memory✓

22 Jun, 2026

AGENTS.md is the de facto standard in 2026, CLAUDE.md is still Claude Code's richer format, and project memory holds a different class of information altogether. Stitch them into a "Project Passport" so every new session begins with the AI clearing customs in seconds instead of asking you to paste context again.
A 2026 Monorepo Setup From Zero to Production: pnpm catalogs, Turborepo 2.x, changesets✓

15 Jun, 2026

The previous post argued monorepos are an organizational problem. This one walks through the actual setup, end to end. pnpm catalogs for version alignment, Turborepo 2.x task graphs, changesets + OIDC publishing, remote cache, CODEOWNERS — every command and config you need, current as of 2026.
Monorepo Is an Org Decision, Not a Tech One: Lessons from Babel, Lerna, and Mercari✓

15 Jun, 2026

Monorepo isn't a yes/no technical choice. Babel went in and then quietly walked half of it back, Lerna spent a stretch with no maintainer, and Mercari's cross-region CI burned real money for a year. All three cases point at the same thing: what decides whether a monorepo works isn't Turborepo or Nx, it's your team and how it's organized.
Loop Engineering: From Writing Prompts to Designing Loops That Run Agents for You✓

12 Jun, 2026

The "Loop Engineering" term Addy Osmani popularized in June isn't a replacement for prompt engineering — it's about swapping you out as "the person who hits enter" and turning you into "the person who designs the loop". Walks through the five components plus a state, and the three debts Osmani is really worth remembering for (verification debt / comprehension debt / cognitive surrender) — and along the way, why I disagree with him placing loop above the harness.
Two Days with Claude Fable 5: The 5 Things Every API Integrator Actually Has to Change✓

11 Jun, 2026

Anthropic shipped Fable 5 on 6/9, swapping the Opus/Sonnet/Haiku naming for Fable/Mythos. But the things that actually force every Claude API integrator to touch code aren't the names — they're the new stop_reason refusal, the forced adaptive thinking, and the "you must wire up a fallback model" architecture. After two days driving it inside Claude Code, here's the integration-side detail you need.
Prompt, Context, Harness, Agentic: The Four Nested Layers of LLM Apps — and Knowing Which One You're Stuck In✓

8 Jun, 2026

Prompt engineer, context engineer, harness engineer, agentic engineer — these aren't four competing job titles. They're nested layers of concern, from a single instruction to an entire autonomous system. Understand how the four layers relate, and you'll know exactly which one you're optimizing every time you get stuck.
Electron for React Developers: 9 Things, Ranked by How Hard They Hit✓

4 Jun, 2026

This isn't a tutorial on installing electron-builder. It's a primer for web developers with a few years of React experience, organized by actual impact from biggest to smallest — from the fundamental process model, to the native-module pit everyone falls into, to tooling and debugging. Reading it should save you about a week of trial and error.
WebContentsView in Electron 30: How to Build Multi-View Apps After BrowserView's Deprecation✓

4 Jun, 2026

BrowserView is officially deprecated in Electron 30; the new API is the WebContentsView + BaseWindow combo. This post breaks down the new model: key differences from iframe/webview/BrowserView, migration code diffs, how to lay out multi-view apps, and a few easy-to-hit pitfalls.
Version Management for Electron Apps in Practice: Auto Updates, Version Checks, and a Better User Experience✓

2 Jun, 2026

A deep dive into the full version management lifecycle for Electron apps — from versioning conventions to auto-update mechanics, user notifications, and rollback strategies. Built on real-world practices with electron-updater and semantic versioning to deliver a user-friendly update experience.
How Electron Desktop Wallets Should Store Private Keys: safeStorage Isn't Enough — Learn from MetaMask and Phantom✓

1 Jun, 2026

A few days ago I wrote a teardown of safeStorage — the conclusion was that it's fine for ordinary API keys. But what if you need to store a wallet private key worth tens of thousands of dollars? safeStorage falls short. This post looks at how real desktop wallets like MetaMask and Phantom do it, and why the trio of "encryption + master password + short-lived unlock" is unavoidable.
Letting Claude Code Touch My Real-Money Trading Code: The Lines I Refused to Cross✓

1 Jun, 2026

I've spent 10 months using Claude Code on a real-money futures trading project. This is an honest retrospective: the AI never touched the money directly (I'm not that bold), but it did write code on the critical order/stop-loss/close-position paths. Here are the boundaries I held, where AI genuinely helped, and the moments I had to take over.
Flutter Desktop vs Electron: What Migration Patterns in 2026 Tell Us About Choosing a Desktop Framework✓

28 May, 2026

No boring comparison tables here. This post reverse-engineers the decision logic from what real products did in 2026 — why VS Code / Slack / Claude Desktop are still betting on Electron, why the Ubuntu 26.04 desktop went all-in on Flutter, and why Teams and Zed walked away.
Cracking Open the Electron safeStorage Black Box: AES-128-CBC, a Hardcoded IV, and the Things Nobody Tells You✓

25 May, 2026

safeStorage is Electron's recommended API for storing secrets, but its implementation details are rarely discussed. This post cracks open the source: roughly 100 lines of C++ wrapping Chromium's OSCrypt, AES-128-CBC, an IV hardcoded to 16 spaces, and PBKDF2 with a single iteration. Paired with real cases — VS Code credentials read directly by extensions, VoidStealer grabbing the master key with a hardware breakpoint — it ends with a threat-model-based storage decision table.
7 "Anti-AI-Tone" Principles I Distilled After Writing 80+ Blog Posts with Claude Code✓

16 May, 2026

Roughly 70% of posts blog080-166 on this blog were written with Claude Code's help, yet readers almost never notice. Here are the 7 "anti-AI-tone" principles I distilled — the goal isn't to make AI sound less like AI, it's to make AI sound like you. Includes the automated check script from my blog-preflight Skill.
AI Model Comparison, Mid-2026 Edition: Two Months After blog080, the Model Layer Has Turned Over✓

15 May, 2026

blog080 was written in early March 2026. Two-plus months later, GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro have all shipped, and open-source flagships GLM-5.1/Qwen 3 Coder have closed the gap to within 5-15 points of closed models. This is the May update: what changed, and how to adjust your March picks.
AI Tooling Supply Chain Security Checklist: 8 Defense Principles Distilled from the Vercel and Nx Console Incidents✓

15 May, 2026

Neither the Vercel breach nor the Nx Console incident was a protocol vulnerability—both were credential governance failures. This post distills these two AI tooling supply chain attacks into 8 defense principles plus a 1-hour audit checklist, covering OAuth least privilege, secret tiering, managed device isolation, and IDE extension credential isolation—a security playbook indie developers and small teams can act on immediately.
Claude Code Multi-Agent Orchestration Plugins Compared 2026: Choosing Between Ruflo, Maestro, Claude Octopus, and Codex Peer Review✓

11 May, 2026

A head-to-head comparison of multi-agent orchestration plugins: Ruflo calls itself the "leading Claude orchestration platform" but underdelivers in execution, Maestro stays lightweight, Claude Octopus runs reviews across 8 models in parallel, and Codex Peer Review gates merges behind three sequential reviewers. From architecture to measured token costs — a decision framework for indie developers.
Claude Code Workflow Plugins Compared (2026): Superpowers, Shipyard, Ralph Loop, Maestro, or Karpathy CLAUDE.md?✓

11 May, 2026

The Claude Code ecosystem has splintered into 100+ plugins as of May. This post zooms in on the "workflow methodology" category—Superpowers, Shipyard, Ralph Loop, Maestro, and Karpathy CLAUDE.md. Design philosophy, context overhead, fit, and combination strategies, plus a decision tree for indie developers.
Astro 5 to 6, Fully Documented: Real Migration Data from a 48-Page Blog — the Official "2x Faster" Claim Doesn't Hold for Small Blogs✓

3 May, 2026

I upgraded my own blog (48 pages, Astro 5.16.6) to Astro 6.3.1 and recorded what actually changed, whether builds got faster, and what broke. Verdict: near-zero migration cost for a small blog, but the official "2x faster" claim doesn't hold at 48 pages — measured build times were essentially flat.
AI Agent Persistent Memory Architectures Compared: File-Based vs Vector Retrieval, Benchmarked with a blog-preflight Subagent✓

2 May, 2026

I hooked the same Subagent up to both Claude Code's built-in file-based memory and mem0's vector retrieval, then compared token cost, recall quality, and cross-session learning. The result: concrete thresholds for which approach fits which data scale, plus a look at procedural memory—the weakest but most promising direction.
Claude Code's Five-Layer Architecture Explained: How MCP, Skills, Agent, Subagents, and Agent Teams Work Together✓

2 May, 2026

Anthropic officially describes Claude Code as a five-layer architecture: MCP for connectivity, Skills for task knowledge, Agent as the main worker, Subagents for parallel isolation, and Agent Teams for coordination. This post breaks down each layer's role and collaboration patterns, with a real-world example from my blog's blog-preflight Skill showing three layers working together.
Frontman Deep Dive: What an AI Agent Can Do When It Sees Your Code from the Browser, Paired with Frontend Skills✓

1 May, 2026

Cursor and Claude Code both start from source code, but a frontend engineer's real work happens in the browser—the actual color on hover, the real DOM after SSR, the re-render triggered by the third useState. Frontman works in the opposite direction: from the browser back to the code. This post breaks down its architecture and combines it with Anthropic's frontend-design Skill and others into a complete frontend AI workflow.
Claude Code Skills in Practice: Building a Reusable Cross-Project Skill from Scratch✓

1 May, 2026

Pull your repetitive playbooks, checklists, and multi-step workflows out of CLAUDE.md and turn them into Skills. Using a "pre-publish blog check" as the running example, this post covers the SKILL.md structure, every frontmatter field, context fork isolation, the boundaries with Slash Commands and Sub Agents, plus debugging and sharing.
GPT-5.5 vs Claude Opus 4.6 vs Gemini 2.5 Pro: Coding Capability Comparison 2026✓

29 Apr, 2026

A 2026 head-to-head coding comparison of the leading large language models: benchmark numbers, pricing, and real-world coding performance for GPT-5.5, Claude Opus 4.6, and Gemini 2.5 Pro — to help you pick the right model for everyday development.
Choosing a React Chart Library: Recharts vs. ECharts vs. Nivo vs. Lightweight Charts✓

27 Apr, 2026

An in-depth comparison of the leading React chart libraries in 2026—Recharts, Apache ECharts, Nivo, and TradingView Lightweight Charts—across three core scenarios (candlestick charts, bar charts, treemaps), with recommendations based on performance, bundle size, and ease of use.
The 2026 AI Coding Tools Scorecard: An Honest Review of Claude Code, Cursor, Copilot, Windsurf, and Gemini CLI✓

26 Apr, 2026

An in-depth comparison of the leading AI coding tools in 2026: Claude Code, Cursor, GitHub Copilot, Windsurf, Trae, Cline, Gemini CLI, and Aider — covering real-world data, current pricing, and use cases to help developers pick the right tool.
AI Agent Success Rates Jumped from 12% to 66%: How Frontend Developers Should Prepare for the Era of 'Usable' Agents✓

25 Apr, 2026

Stanford's 2026 AI Index report shows AI agent success rates on real computer tasks jumped from 12% to 66% in a single year—just 6 percentage points shy of the human baseline. Here's what that means for frontend developers and how to adjust your workflow to take advantage of this inflection point.
AI Toolchain Supply Chain Security: A Full Post-Mortem of the Vercel Breach✓

24 Apr, 2026

A post-mortem of the April 2026 Vercel breach and its full attack chain: Roblox cheat script → Lumma Stealer → over-permissive OAuth → SSO lateral movement → leaked environment variables. We break down the security blind spot behind each link and the defenses developers should put in place.
One CLAUDE.md File, 44K Stars in a Week: Karpathy's Four Principles for AI Coding✓

21 Apr, 2026

A breakdown of how the forrestchang/andrej-karpathy-skills repo gained 44K stars in a single week: Karpathy's four principles for AI coding (think before coding, simplicity first, surgical changes, goal-driven execution), and how to use them directly in Claude Code.
Two Claude Code Environment Variables You've Probably Never Used: EFFORT_LEVEL and ADDITIONAL_DIRECTORIES_CLAUDE_MD✓

20 Apr, 2026

A deep dive into two underrated Claude Code environment variables: CLAUDE_CODE_EFFORT_LEVEL controls the reasoning effort tier, and CLAUDE_CODE_ADDITIONAL_DIRECTORIES_CLAUDE_MD enables sharing rules across projects — with complete configuration examples and use cases.
Inside the Axios Poisoning: How a North Korean APT Infected Millions of Developer Environments in 3 Hours✓

17 Apr, 2026

In March 2026, the axios npm package was hijacked by a North Korean state-level APT, planting a RAT into millions of developer environments within 3 hours. This post breaks down two separate but related incidents: the full supply-chain poisoning attack chain, and the technical mechanics and real-world exploitability debate around CVE-2026-40175 (CVSS 10.0).
The AI Agent Security Landscape: From the ClawHavoc Poisoning to Cisco DefenseClaw and Microsoft's Governance Toolkit✓

17 Apr, 2026

A ClawHavoc-style supply chain attack poisons 1,184 agent skills and hits 300,000 users; within two weeks, Cisco and Microsoft ship agent security tooling. This post breaks down the threat model, compares the two defense architectures, and walks through real integration code.
Getting Started with Claude Managed Agents: Let Anthropic Run Your Agent Loop✓

15 Apr, 2026

Claude Managed Agents, which entered public beta in April 2026, moves the agent loop, tool execution, and sandboxed runtime entirely into Anthropic's cloud—three API calls are all it takes to get an autonomous agent running. This post walks through the core concepts, demonstrates the full workflow with real code, and compares it against building your own.
Hermes Agent in Practice: Embedding an AI Assistant into Your Development Workflow✓

14 Apr, 2026

Not another feature rundown of Hermes — this is what it's actually like after wiring it into a real development workflow: code review, requirement breakdown, doc generation, scheduled monitoring. Which scenarios genuinely help, and which ones will bite you.
A Deep Dive into Claude Code Hooks: Making the AI Coding Tool Truly Fit Your Workflow✓

13 Apr, 2026

Claude Code Hooks might be the most underrated AI coding feature out there. This post starts with how the three hook types fire, then walks through 10+ real configurations from my blog agent, tooling site, and daily work to show how Hooks can make Claude Code truly part of your workflow.
Hermes Agent Review: OpenClaw's Successor, a Multi-Platform AI Assistant with a Built-In Learning Loop✓

10 Apr, 2026

Hermes Agent is an open-source AI assistant framework from Nous Research, featuring a self-learning loop, cross-platform messaging integration, and cron scheduling, with one-command migration from OpenClaw configs. This post covers its core features, where it fits, and its limitations.
After OpenClaw Shut Down: Rebuilding a Multi-Agent Automation Setup with the Claude Code CLI✓

9 Apr, 2026

When the third-party AI agent framework OpenClaw shut down, I rebuilt the entire multi-agent automation experience with the Claude Code CLI: Telegram bots, scheduled tasks, session persistence — plus every pitfall I hit along the way.
What Qwen3.6-Plus Tells Us: Chinese LLMs Can Now Compete in Specific Domains✓

7 Apr, 2026

Alibaba's Qwen3.6-Plus topped OpenRouter's global daily leaderboard just one day after launch, and Chinese AI token usage has now outpaced the US for five straight weeks. What does this actually mean? This post breaks it down across technology, data, and ecosystem.
Flash-MoE: Running a 397B-Parameter Model on a MacBook at 4.4 token/s✓

31 Mar, 2026

A developer built Flash-MoE in 24 hours: it runs the 397B-parameter Qwen3.5 model on a 48GB MacBook Pro at 4.4 token/s, using only about 6GB of RAM and no cloud GPUs. We break down how it works: SSD streaming, Metal shader optimization, and MoE sparse activation.
Apifox Supply Chain Attack Post-Mortem: Your SSH Keys May Already Be Compromised✓

28 Mar, 2026

In March 2026, the Apifox desktop client was hit by a supply chain attack: a JS file on the official CDN was replaced with a malicious version that stole users' SSH keys, Git credentials, and other sensitive data. A technical breakdown of the attack chain, blast radius, and how to check if you were affected.
GitHub Squad: Drop an AI Dev Team Straight into Your Repo✓

26 Mar, 2026

Squad, an open-source project, lets you spin up an AI dev team inside your repo with two commands—an architect, frontend dev, backend dev, and tester, each with their own job, collaborating on top of Copilot. A look at its architecture, what it's like to use, and the multi-agent collaboration patterns behind it.
Computer-Use: When AI Agents No Longer Need APIs✓

24 Mar, 2026

AI agents are learning to operate computers the way humans do—reading the screen, clicking the mouse, typing on the keyboard. From Anthropic's Claude Computer Use to Microsoft's CUA to OpenAI's Operator, Computer-Use is redefining what "software integration" means.
Google Stitch's Big Update: UI Design in Natural Language — Should Figma Be Worried?✓

20 Mar, 2026

Google Stitch just got a major update, evolving from a simple prompt-to-mockup tool into an AI-native design canvas. Infinite canvas, voice interaction, the DESIGN.md design system format, MCP integration — five big features shipped at once. Free, powered by Gemini, and aimed squarely at Figma Make.
Karpathy's AutoResearch: Letting an AI Agent Run 700 ML Experiments on Its Own✓

19 Mar, 2026

A deep dive into Karpathy's open-source AutoResearch project: how a 630-line Python script lets an AI agent run ML experiments autonomously on a single GPU, completing 700 experiments in two days and finding 20 effective optimizations. From architecture to practical applications, here's why every developer should pay attention to the "Karpathy Loop" pattern.
Reading the MCP 2026 Roadmap: From Local Tools to Production-Grade Agent Infrastructure✓

16 Mar, 2026

MCP (Model Context Protocol) has published its 2026 roadmap with four priority areas: transport evolution, agent communication, governance maturity, and enterprise readiness. A technical breakdown of the concrete problems and proposed solutions in each area.
MiroFish: A Swarm Intelligence Prediction Engine a College Senior Built in 10 Days with Vibe Coding✓

15 Mar, 2026

A deep dive into the architecture and design of MiroFish, a swarm intelligence prediction engine. Built by a college senior in 10 days with Vibe Coding, it topped GitHub Trending and landed a 30M RMB investment from Shanda. Here's what it got right, technically.
When AI Agents Learn to Pay: A Deep Dive into the x402 Protocol and Agent Payment Infrastructure✓

12 Mar, 2026

How Coinbase's x402 protocol revived the HTTP 402 status code to let AI agents pay for API calls with stablecoins. From protocol design to hands-on code, from the Stripe/Mastercard competitive landscape to the reality of $28K daily volume — a full breakdown of the agent payments space.
Browser-Native AI in Practice: The Complete Guide to Chrome Built-in AI APIs✓

11 Mar, 2026

A deep dive into Chrome's built-in Gemini Nano model and the browser-native AI APIs (Prompt, Translation, Summarization) — from technical architecture to production practice, showing how to build privacy-first local AI apps, with complete code examples and performance analysis.
OpenClaw Automation in Practice: Building a 24/7 AI Assistant with Cron + Heartbeat✓

9 Mar, 2026

A deep dive into OpenClaw's Cron Job and Heartbeat mechanisms—from choosing between them to engineering practice—with a real production case study covering error handling, state management, and cost optimization.
The 2026 AI Model Landscape: Hands-On Comparison of 12 Leading Models from China and Abroad✓

6 Mar, 2026

Hands-on benchmarks of 12 AI models (GPT-4o, Claude 3.5, Gemini 2.0, Qwen 2.5, GLM-4, Kimi, and more) across 6 real-world scenarios including code generation, Chinese writing, and reasoning. Includes performance scores, monthly cost comparisons, and a decision tree to help you pick the right model.
Building AI Agents with Long-Term Memory: From Design Patterns to Production✓

2 Mar, 2026

A deep dive into building episodic, semantic, and procedural long-term memory for AI Agents, with a complete technical architecture, code implementation, and production optimization strategies.
AI Agent-Driven Development: The Paradigm Shift from Tools to Workflows✓

1 Mar, 2026

How AI Agents are evolving from "assistive tools" into "collaborative partners" and reshaping the modern developer workflow. Based on real project experience, a deep dive into three core capabilities — context awareness, proactive execution, and tool orchestration — with complete code examples.
The AI Agent Skills Standardization War: Architecture, Security, and Ecosystem Evolution✓

26 Feb, 2026

A deep dive into the technical architecture, security models, and governance mechanisms of MCP, Agent Skills, and Skills.sh — unpacking the design-philosophy clash behind real-world security incidents, with a practical selection guide for enterprises and developers.
AI Agent Multi-Task Collaboration in Practice: From Monolith to Distributed Workflows✓

25 Feb, 2026

How to design and build a multi-task collaboration system for AI Agents, covering task decomposition, state management, and error recovery. A hands-on look at agent collaboration architecture through a real blog-publishing workflow.
Why Do AI Agents Ignore Your "Hard Rules"? A Deep Postmortem of Two Real Incidents✓

23 Feb, 2026

A deep postmortem of two real production incidents, analyzing why AI agents systematically ignore explicit rules and how to design constraint mechanisms that actually hold. Covers the technical root causes, common patterns, and actionable solutions.
AI Agent Memory Systems in Practice: OpenClaw Memory Best Practices✓

23 Feb, 2026

A deep dive into OpenClaw's memory architecture, from file layout to retrieval tuning, with actionable best practices for managing AI Agent memory
From Command Line to Conversational Programming: Building a Personal Dev Assistant with AI Agents✓

22 Feb, 2026

A deep dive into building an AI dev assistant from scratch with memory, tool calling, and task planning — a hands-on look at AI-native development patterns
The Complete Guide to AI Agent Skills: Give Your AI Assistant Superpowers✓

21 Feb, 2026

What are Skills? How do you install and use them in Claude Code, Codex, OpenClaw, and other AI tools? Explore skills.sh and awesome-openclaw-skills, tap into 3,000+ community Skills, and turn your AI Agent from a generalist assistant into a domain expert.
A Lightweight Electron Alternative: A Deep Dive into electrobun✓

21 Feb, 2026

12MB vs 150MB, 14KB incremental updates, full-stack TypeScript. How does electrobun redefine desktop app development with Bun + Zig? A complete breakdown of its architecture, performance, and hands-on usage.
AI Agent Frontend Workflow (Part 3): Cost Optimization and Team Collaboration Best Practices✓

15 Feb, 2026

How do you keep AI Agent token costs under control? How do you deal with hallucinations? How do you roll it out across a team? This post shares battle-tested optimization strategies and collaboration practices, backed by real cost data.
AI Agent Frontend Workflows (Part 4): What's Next, and the Open Source Tool Landscape✓

15 Feb, 2026

The series finale. From Copilot to autonomous agents, from closed to open source — this post maps where AI agents are heading, compares the major tools, and explores how the developer's role is changing. Complete learning roadmap included.
Hard-Won Lessons from Multi-Agent Collaboration: How One config.patch Nearly Took Down the Whole System✓

15 Feb, 2026

Real-world lessons from a week of running a multi-agent system: a config management incident, TypeScript import errors, a wrong publish date, cost optimization in practice, and best practices for team collaboration.
AI Agent Frontend Workflow (Part 2): Intelligent Code Review and Automated Testing✓

14 Feb, 2026

Use AI Agents for intelligent code review and automated test case generation. From Git Hook integration to E2E testing, this post shares a complete hands-on setup with real project metrics.
Safely Exposing a Local AI Assistant to the Internet: SSH Reverse Tunnels in Practice✓

13 Feb, 2026

Expose a locally running OpenClaw Gateway to the public internet with an SSH reverse tunnel plus an Nginx reverse proxy, accessible via your own domain. All data stays local, with multiple layers of security — at zero cost.
AI Agent Frontend Workflow (Part 1): Understanding Agents and Automated Component Generation in Practice✓

13 Feb, 2026

An AI Agent is not just a chatbot — it's an intelligent assistant that can invoke tools and manage context on its own. This post breaks down how Agents actually work, then walks through a hands-on React component generator to show how AI can reshape the frontend development workflow.
OpenClaw Multi-Agent Setup in Practice: Pitfalls and Best Practices✓

13 Feb, 2026

Setting up multiple OpenClaw agents and multiple Telegram accounts comes with plenty of pitfalls. Based on hands-on experience, this post covers every common problem and its fix so you can skip the pain.
AI Agent Dev Tools Compared 2026: Claude Code vs OpenClaw vs Cursor — Which One Should You Pick✓

12 Feb, 2026

A hands-on comparison of Claude Code, OpenClaw, and Cursor — the three big AI coding tools. From runtime model, memory systems, and model support to skill mechanisms, this guide covers how to choose an AI agent dev tool in 2026, plus a deep dive into their config systems and a cross-tool migration guide.

All Posts

Hi there 👋

Featured

In Agent Land, Errors Are Scheduled Work: From Error Boundary to Verifier✓

Context Is RAM, Memory Is Disk: Give Your Agent Lazy Loading✓

Prompt Is the New CSS: Declarative, Cascading, No Devtools✓

From OpenClaw to Claude Code: A Half-Year Config Evolution Ledger✓

Tool Call: Let the AI Dispatch, You Stay the Reducer✓

The Model Has No Memory: Context Is State, and Every Turn Is a Full Re-render✓

An Agent Is Just a While Loop? From UI = f(state) to action = LLM(context)✓

3 Principles Fiber Teaches Us About AI Agent State Design✓

Kimi K2.7 Code Lands in GitHub Copilot: After Benchmarks and Buzz, a Chinese Model Finally Wins the Channel✓

WorkBuddy Hit #1 in China in 90 Days: What messaging-first AI Agents Teach the Claude Code / Codex CLI World✓

GLM 5.2 Beats Claude on a Cyber Benchmark: Is China's 'Scenario-Specific Superiority' the New Inflection Point?✓

AI Model Comparison Mid-2026 Sequel: 52 Days After blog166, Here's What I Got Wrong in May✓

Ponytail Hit 73k Stars in Three Weeks: A Side Project Cured AI Agents' Over-Coding Disease✓

WeRead's Official Agent Skill: The First Chinese Content Platform to Ship One — But Is the Indie Dev Monetization Window Actually Open?✓

ZCode on HN's Front Page: When Silicon Valley Actually Read a Chinese Dev Tool's Docs✓

The 90-Day Spaghetti Point: Why Maintenance Is Vibe Coding's Real Exam✓

The CLI's Second Spring in the AI Era: Three Structural Reasons Behind the Claude Code / Codex / Charm / Ink Surge✓

After the Loop Is Running: A Playbook for Verification, Comprehension, and Cognitive-Surrender Debt✓

Stop Re-Introducing Your Project to AI Every Session: A Project Passport with AGENTS.md, CLAUDE.md, and memory✓

A 2026 Monorepo Setup From Zero to Production: pnpm catalogs, Turborepo 2.x, changesets✓

Monorepo Is an Org Decision, Not a Tech One: Lessons from Babel, Lerna, and Mercari✓

Loop Engineering: From Writing Prompts to Designing Loops That Run Agents for You✓

Two Days with Claude Fable 5: The 5 Things Every API Integrator Actually Has to Change✓

Prompt, Context, Harness, Agentic: The Four Nested Layers of LLM Apps — and Knowing Which One You're Stuck In✓

Electron for React Developers: 9 Things, Ranked by How Hard They Hit✓

WebContentsView in Electron 30: How to Build Multi-View Apps After BrowserView's Deprecation✓

Version Management for Electron Apps in Practice: Auto Updates, Version Checks, and a Better User Experience✓

How Electron Desktop Wallets Should Store Private Keys: safeStorage Isn't Enough — Learn from MetaMask and Phantom✓

Letting Claude Code Touch My Real-Money Trading Code: The Lines I Refused to Cross✓

Flutter Desktop vs Electron: What Migration Patterns in 2026 Tell Us About Choosing a Desktop Framework✓

Cracking Open the Electron safeStorage Black Box: AES-128-CBC, a Hardcoded IV, and the Things Nobody Tells You✓

7 "Anti-AI-Tone" Principles I Distilled After Writing 80+ Blog Posts with Claude Code✓

AI Model Comparison, Mid-2026 Edition: Two Months After blog080, the Model Layer Has Turned Over✓

AI Tooling Supply Chain Security Checklist: 8 Defense Principles Distilled from the Vercel and Nx Console Incidents✓

Claude Code Multi-Agent Orchestration Plugins Compared 2026: Choosing Between Ruflo, Maestro, Claude Octopus, and Codex Peer Review✓

Claude Code Workflow Plugins Compared (2026): Superpowers, Shipyard, Ralph Loop, Maestro, or Karpathy CLAUDE.md?✓

Astro 5 to 6, Fully Documented: Real Migration Data from a 48-Page Blog — the Official "2x Faster" Claim Doesn't Hold for Small Blogs✓

AI Agent Persistent Memory Architectures Compared: File-Based vs Vector Retrieval, Benchmarked with a blog-preflight Subagent✓

Claude Code's Five-Layer Architecture Explained: How MCP, Skills, Agent, Subagents, and Agent Teams Work Together✓

Frontman Deep Dive: What an AI Agent Can Do When It Sees Your Code from the Browser, Paired with Frontend Skills✓

Claude Code Skills in Practice: Building a Reusable Cross-Project Skill from Scratch✓

GPT-5.5 vs Claude Opus 4.6 vs Gemini 2.5 Pro: Coding Capability Comparison 2026✓

Choosing a React Chart Library: Recharts vs. ECharts vs. Nivo vs. Lightweight Charts✓

The 2026 AI Coding Tools Scorecard: An Honest Review of Claude Code, Cursor, Copilot, Windsurf, and Gemini CLI✓

AI Agent Success Rates Jumped from 12% to 66%: How Frontend Developers Should Prepare for the Era of 'Usable' Agents✓

AI Toolchain Supply Chain Security: A Full Post-Mortem of the Vercel Breach✓

One CLAUDE.md File, 44K Stars in a Week: Karpathy's Four Principles for AI Coding✓

Two Claude Code Environment Variables You've Probably Never Used: EFFORT_LEVEL and ADDITIONAL_DIRECTORIES_CLAUDE_MD✓

Inside the Axios Poisoning: How a North Korean APT Infected Millions of Developer Environments in 3 Hours✓

The AI Agent Security Landscape: From the ClawHavoc Poisoning to Cisco DefenseClaw and Microsoft's Governance Toolkit✓

Getting Started with Claude Managed Agents: Let Anthropic Run Your Agent Loop✓

Hermes Agent in Practice: Embedding an AI Assistant into Your Development Workflow✓

A Deep Dive into Claude Code Hooks: Making the AI Coding Tool Truly Fit Your Workflow✓

Hermes Agent Review: OpenClaw's Successor, a Multi-Platform AI Assistant with a Built-In Learning Loop✓

After OpenClaw Shut Down: Rebuilding a Multi-Agent Automation Setup with the Claude Code CLI✓

What Qwen3.6-Plus Tells Us: Chinese LLMs Can Now Compete in Specific Domains✓

Flash-MoE: Running a 397B-Parameter Model on a MacBook at 4.4 token/s✓

Apifox Supply Chain Attack Post-Mortem: Your SSH Keys May Already Be Compromised✓

GitHub Squad: Drop an AI Dev Team Straight into Your Repo✓

Computer-Use: When AI Agents No Longer Need APIs✓

Google Stitch's Big Update: UI Design in Natural Language — Should Figma Be Worried?✓

Karpathy's AutoResearch: Letting an AI Agent Run 700 ML Experiments on Its Own✓

Reading the MCP 2026 Roadmap: From Local Tools to Production-Grade Agent Infrastructure✓

MiroFish: A Swarm Intelligence Prediction Engine a College Senior Built in 10 Days with Vibe Coding✓

When AI Agents Learn to Pay: A Deep Dive into the x402 Protocol and Agent Payment Infrastructure✓

Browser-Native AI in Practice: The Complete Guide to Chrome Built-in AI APIs✓

OpenClaw Automation in Practice: Building a 24/7 AI Assistant with Cron + Heartbeat✓

The 2026 AI Model Landscape: Hands-On Comparison of 12 Leading Models from China and Abroad✓

Building AI Agents with Long-Term Memory: From Design Patterns to Production✓

AI Agent-Driven Development: The Paradigm Shift from Tools to Workflows✓

The AI Agent Skills Standardization War: Architecture, Security, and Ecosystem Evolution✓

AI Agent Multi-Task Collaboration in Practice: From Monolith to Distributed Workflows✓

Why Do AI Agents Ignore Your "Hard Rules"? A Deep Postmortem of Two Real Incidents✓

AI Agent Memory Systems in Practice: OpenClaw Memory Best Practices✓

From Command Line to Conversational Programming: Building a Personal Dev Assistant with AI Agents✓

The Complete Guide to AI Agent Skills: Give Your AI Assistant Superpowers✓

A Lightweight Electron Alternative: A Deep Dive into electrobun✓

AI Agent Frontend Workflow (Part 3): Cost Optimization and Team Collaboration Best Practices✓