From Tools to System: How to Build AI Skills for Your Workflow

In January I showed you the AI tools I use for design and development. In February I explained why they lose the thread after 30 minutes — tokens, context windows, compaction — and how to fix it with orchestration and persistent memory.

But there was one problem I didn't address: AI starts every session from scratch.

It doesn't matter if you use Claude Code, Cursor, Copilot, or any other agent. It doesn't matter if you work with React, Python, Rails, Go, or WordPress. Every time you open a new conversation, your agent doesn't know how you structure your code, what patterns you prefer, or what conventions your team has been using for months. It has to rediscover all of it. And you have to re-explain it.

What if your AI could work like a team — with specialists who know your design principles, master your stack, understand your writing voice, and enforce your quality standards? Not as different people, but as specialized knowledge that activates automatically when you need it.

That's exactly what I built. They're called skills. And what I'm going to explain isn't what skills I have — it's how to build your own, regardless of your stack or language.

The problem: every session starts from zero

Imagine your team showing up to the office every morning with no memory of the day before. No architecture decisions, no project standards, no lessons learned. That's what happens every time you start a new conversation with your AI agent.

The result is predictable:

You repeat instructions — "Use strict TypeScript", "Follow the project's naming convention", "Don't use that dependency, we have our own abstraction"
Inconsistent quality — one day it generates flawless code, the next day it ignores patterns you'd already established
Lost context — decisions you made in previous sessions don't exist for the agent
Generic feedback — you ask for a code review and get "looks good" instead of analysis against your concrete standards

This problem isn't specific to any one agent. It's structural. Language models don't have memory between sessions. And while some tools save history, that's not the same as organized, activatable knowledge.

The obvious solution — copy-pasting a block of instructions at the start of every session — works but doesn't scale. When you have 5 principles it's manageable. When you have 50, spread across design, development, content, and processes, it becomes unsustainable. You need something that kicks in on its own, at the right moment, without you doing anything.

What is a skill (and why it's not a prompt)

A skill is a Markdown file — SKILL.md — that contains specialized instructions, principles, processes, and examples. It lives in your project or your global configuration, and your agent loads it automatically when it detects that your task matches the skill's triggers.

The difference from a prompt is fundamental:

A prompt is an instruction you write every time: "Review this code following SOLID principles, with unit tests and documentation."
A skill activates on its own. When you say "review this module", the agent detects it's a code review task and loads the corresponding skill. You don't have to remember what to ask for.

The basic structure is simple:

---
name: code-review
description: Activates when the user asks to review code,
  do a PR review, or audit quality. Triggers: "review",
  "audit", "code quality", "check this code".
---

## Principles
- Readability over cleverness
- Each function does one thing
- Tests cover happy paths and edge cases

## Process
1. Read the full code before commenting
2. Identify architecture issues (the most expensive ones)
3. Review naming, types, and contracts
4. Verify error handling
5. Suggest tests if none exist

## Examples
- Good: 15-line function with descriptive name and explicit types
- Bad: 200-line function called `handleStuff` with `any` in the parameters

The description is the key. The agent reads it and decides whether to activate the skill based on your task. A good description with precise triggers means the right knowledge gets applied at the right moment, without you intervening.

What's interesting about this system is that any developer can build it. It doesn't matter if you work with React, Django, Laravel, Spring Boot, or Bash scripts. If you have principles you apply repeatedly, you can turn them into a skill.

Types of skills you can build

After months of building and refining skills, I've identified five categories that cover most needs. I'm not sharing them so you copy mine — but so you think about which categories you need.

How you work: debugging, testing, code reviews. Methodology turned into instructions.

DebuggingReproduce → isolate → hypothesize → validate
Code ReviewArchitecture, types, edge cases
TestingTDD with happy paths and edge cases
RefactoringDetect code smells, apply patterns

Coverage85%

Example categories — your skills will reflect your own stack and workflows

Process skills

These define how you work. Debugging, TDD, code review, feature planning, refactoring. They don't depend on any specific framework — they're universal.

A debugging skill, for example, can codify your mental process: "First reproduce the bug consistently. Then isolate the variable. Then form a hypothesis and design an experiment to validate it." Sounds basic, but the difference between asking the agent for this every time and having it happen automatically is enormous.

Another universal example: a feature planning skill. Before writing any code, the agent breaks the feature into tasks with dependencies, identifies risks, defines acceptance criteria, and proposes an implementation order. Not because it's smart — but because you gave it the exact instructions you'd follow yourself.

Domain skills

These are specific to your stack and your project. Your framework, your design system, your architecture patterns, your team's conventions.

I have skills for my particular stack. You'd have yours: Django patterns if you work with Python, Spring conventions if you use Java, your team's style guide if you work at a company. The point isn't which framework — it's that your standards stop being tacit knowledge and become executable instructions.

Content skills

Good writing isn't some mysterious talent. It's process. A copywriting skill can codify principles like "clarity over creativity" and "benefits over features." An editing skill can apply sequential passes — first clarity, then voice, then relevance, then evidence.

The most valuable part: every time you write with the skill active and correct the output, you're refining the process. Skills improve with use because you update them whenever they encounter a case they don't cover.

Growth skills

Technical SEO, AI search optimization, conversion psychology, form optimization. These are skills that apply proven analysis frameworks — not vague opinions, but structured checklists with measurable criteria.

A technical SEO skill, for example, can review a page across five layers ordered by priority: crawlability, technical foundations, on-page optimization, content quality, and authority. Each layer has specific criteria and generates actionable recommendations. What used to require a 4-hour manual audit, the agent executes in minutes — but with your criteria, not some generic internet checklist.

Meta skills

The most interesting layer: skills that create other skills, that find relevant skills for a task, or that orchestrate complex flows by combining multiple skills. This is where the system becomes recursive and truly starts to scale.

A skill-creator, for example, guides you through creating new skills: it captures the intent, structures the SKILL.md with precise triggers, and defines an evaluation framework to measure whether the skill improves results. Another skill can work as a finder — you say "I need to optimize a registration form" and it suggests CRO, form UX, and onboarding skills you might not have associated with that task.

Before and after: how work changes

The theory sounds good, but what's the difference in practice? The short answer: the difference between generic feedback and actionable analysis.

PromptReview this landing. I want you to check spacing, WCAG AA color contrast, typographic hierarchy and CTA proximity. Give me specific values, not generic opinions. Use multiples of 8 for spacing...

Time~5 min

ConsistencyVariable

Effort85%

Without skills, you ask the agent to review something and get correct but shallow responses. With skills, the agent applies your standards, detects violations against your principles, and generates recommendations with specific values you can apply on the spot.

This isn't magic. It's that the agent has access to the same criteria you'd use if you reviewed it manually — but it applies them exhaustively and consistently, without getting tired or skipping steps.

Think of it this way: without a code review skill, the agent applies "general best practices." With a skill, it applies your checklist: Does the PR do one thing? Do the names communicate intent? Are there tests for edge cases? Is error handling explicit? The difference is between "looks good" and a structured analysis with severity levels and questions for the author.

Building your own skill system

You don't need 60 skills to start. You need one. The one that solves your most recurring problem.

Step 1: Identify the repetition

What do you explain over and over to your AI agent? What do you correct after every output? What principles do you apply manually that the agent should already know?

Some common patterns:

"I always have to remind it to use strict TypeScript"
"Every time it generates CSS it ignores my spacing system"
"Code reviews are too shallow"
"The copy it generates sounds like a corporate robot"

Each of those is a skill waiting to be created.

Step 2: Write your first SKILL.md

Start simple. A useful skill doesn't need 500 lines — it needs 3-5 clear principles, a step-by-step process, and a couple of good vs bad examples.

---
name: my-first-skill
description: Describe when it activates and what it does.
  Include keywords the agent can detect.
---

## Principles
The 3-5 fundamental principles you always apply.

## Process
The steps you follow, in order.

## Examples
Good vs bad. Concrete, not abstract.

Step 3: Test and observe

Use the skill on real tasks. There are three things to watch for:

Does it activate when it should? If not, improve the description with more triggers
Does the output improve? If not, your principles need to be more specific
Does it apply the examples? If not, add more examples with the exact format you expect

Step 4: Iterate relentlessly

The best skills are the ones that have gone through dozens of iterations. Every time you find a case your skill doesn't cover, you add it. Every time the agent misinterprets an instruction, you rephrase it.

A mature skill might have sub-documents, expanded checklists, and hundreds of lines. But it started as a 20-line file that solved a single problem.

A complete example

Imagine you work on a team where PR reviews are inconsistent. Some days the review is thorough, other days it's "LGTM." You could create this:

---
name: pr-review
description: Activates when asked to review a PR, pull request,
  merge request, or code diff. Triggers: "review PR",
  "check this PR", "code review", "check this diff".
---

## Philosophy
Every PR review is an opportunity to teach and learn.
It's not about finding errors — it's about improving
the code AND the author.

## Required checklist
1. Does the PR do ONE thing? If not, suggest splitting it
2. Do names communicate intent? (not `data`, `temp`, `x`)
3. Are there tests for new paths? And for edge cases?
4. Is error handling explicit or silently swallowed?
5. Are there API changes that need documentation?

## Feedback format
- CRITICAL: Potential bugs or security issues
- IMPORTANT: Architecture or team pattern violations
- SUGGESTION: Optional readability or performance improvements
- QUESTION: To understand the author's decisions

## Anti-patterns
- Do NOT approve just because "it works"
- Do NOT block over style preferences
- Do NOT rewrite the entire PR in the comments

That skill transforms every PR review from an ad-hoc exercise into a consistent process. And because it's documented, it scales — anyone new to the team gets the same quality review from day one.

Composition: when skills work together

A single skill is useful. Multiple skills that compose are a system.

The idea is simple: real tasks aren't atomic. Writing an article involves content strategy, writing, editing, and SEO. Developing a feature involves design, implementation, review, and deployment. Each phase can activate a different skill.

If you read the previous article on tokens and orchestration, you already know the orchestrator + sub-agents pattern. Skills add a fourth layer:

Orchestration divides work into phases with clean context
Persistent memory maintains decisions across sessions
Output compression reduces noise in the context window
Skills give specialized knowledge to each phase

When an orchestrator launches a sub-agent to design, that agent loads the design skills. When it launches one to implement, it loads the development ones. When it launches verification, it loads testing and SEO skills. Each agent works with the knowledge relevant to its phase — not with a generic dump of "best practices."

The result: a system where each piece amplifies the others. Orchestration provides structure. Memory provides continuity. Skills provide depth.

A concrete example: writing this article. The strategy phase activates a content skill that defines the type of piece, the audience, and SEO positioning. The writing phase activates a copywriting skill that applies principles like "one idea per section" and "specificity over vagueness." The editing phase activates a skill that runs seven sequential passes — clarity, voice, relevance, evidence, specificity, emotion, and risk. Each phase works with exactly the knowledge it needs, without noise.

This is what it means to go from "using tools" to "having a system."

Organizing your catalog

When you have more than 10-15 skills, you need organization. Not because it's bureaucracy — but because if you can't find a skill when you need it, it might as well not exist.

Two approaches that work:

By category: group skills into thematic folders (process, domain, content, growth). It's the most intuitive and works well for small teams.

With a registry: a central file that lists all skills with their name, description, and triggers. This way you can search by task ("I need to optimize a form") and find skills you might not have associated with that task.

The key is that the organizational system scales with you. Start simple and add structure when you need it, not before.

It's not about replacing, it's about amplifying

After months of building this system, what I've learned is simple: AI doesn't replace your judgment. It amplifies your ability to apply it.

Without skills, your agent is a brilliant generalist that starts from zero every session. With skills, it's a team of specialists that knows your standards, applies your principles, and lets you focus on the decisions that truly matter.

And the best part: the system is yours. You don't depend on someone publishing a plugin or an extension. Your skills codify your experience, in your format, for your problems. They're portable, editable, and improve with every iteration.

A freelancer with the right system can produce with the consistency and depth of a team. A team with shared skills can onboard new members in hours instead of weeks — because the standards aren't in anyone's head, they're in files that anyone can read and the agent applies automatically.

Not because AI does the work for you — but because you codify your experience in a format that scales.

The first article was about what tools to use. The second, about why they fail and how to fix it. This third one closes the arc: tools are the beginning. The system is what matters.

Start with one skill. The one it hurts most not to have.