PROBLEM

An enterprise can build an agent in a week. Getting it to production takes months.

IMAGINE…

A 5,000-person company is building IT Genie, an AI agent that handles everything IT used to: provisioning software, resetting passwords, answering policy questions. The prototype works in a week. Then production exposes what the prototype hid: permissions, access, dependency changes, observability.

Workato had the automation primitives competitors were bolting on after the fact. Agent Studio was the bet that governability, not model access, was the moat.

SOLUTION

Permission as a primitive, not a settings page

Most agent builder products make governance a separate tool nobody opens. I built three permission primitives directly into the skill builder: user consent, manager approval, and per-user authentication, now the standard pattern across other Workato AI surfaces.

Builders pick the right governance posture, including user confirmation, while writing the prompt for the skills that IT Genie needs to call.

When IT Genie provisions any app, it first routes to that employee's manager-of-record automatically and then runs as the requesting employee to take actions autonomously on users' behalf.

Agent knowledge base access control that inherits from data sources

Agents are only as useful as the docs it's been handed, and every enterprise's SOPs are different. I designed for allowing builders to quickly drag-and-drop knowledge to begin testing immediately.

If the end user doesn't have permissions to see a doc in the source, the agent can't either. Permissions inherit from source app automatically.

Probabilistic agents inside deterministic workflows

Enterprises don't run on agents. They run on deterministic, conditional, and auditable workflows. I designed the orchestration layer for how genies call each other and how they embed within larger automation workflows.

A network of genies — Onboarding, Scheduling, Meeting Prep, Agenda, etc. — splits the work that one HR coordinator would normally chase across five systems and three days, completing it before the new hire's first morning

An agent-to-agent call is a step inside the skill editor, same surface as any other skill. This ensures that the handoff carries permissions, context, and a return contract, so a multi-genie flow stays debuggable in one trace.

BUT WAIT... THERE'S MORE

Building a genie is half the work. Running it is the other half.

Most agent builders stop here. The control plane — evals, versioning, observability — is where Agent Studio earned its place in the suite.

Versioning, defined for a system that's never been versioned before

An agent isn't one thing you can diff. It's a graph of skills, knowledge, prompts, connector configs, each versioning on its own clock. I co-authored the dependency and versioning specs from scratch alongside my PM and engineering lead to ensure that builders have visibility into both active changes and dependency changes under-the-hood in a clean and intuitive way.

When IT updates the provisioning policy and IT Genie starts approving seats it shouldn't, the version log shows what changed, who changed it, and what depends on it, allowing builders to rollback to specific configurations or A/B test.

Eval sets you can grade by hand, or hand off to an agent

Builders can define their own eval sets to test whether their agents are production-ready for high-stakes flows

A test chat for before launch, conversation logs for after

Before launch, builders test genies in a chat next to the builder. This allows builders to iterate against the agent in a sandbox environment without needing to wait on IT to provision a Slack workspace or a Teams tenant just to find out a prompt is broken.

After launch, conversation logs (RBAC-gated) capture every turn. Each agent response links to the skills it called, the prompts it used, the knowledge it referenced, and the version of the agent that produced all of it. When something goes wrong in production, the trace from symptom to cause is already there.

REFLECTION

Working as the first designer on a 0→1 product meant every primitive I shipped became load-bearing for whoever came after, like other designers extending the surface, engineers building on the manifest, PMs scoping the next quarter.


What it taught me:

Thinking in systems. Everything touches everything. A change to how permissions are defined in one skill reshapes how every other skill has to handle consent, approval, and audit.


Prototype fast, polish later. The old sequence of design, build, test, is backwards now that we have AI-assisted tools at our hands to prototype directly. Break things early, then invest polish in what survived.


Designing on shifting ground. AI changes daily. The question stopped being "how do we design systems that scale" and became "how do we design systems that absorb change." That's an infrastructure problem as much as a design one, and it's the most interesting work I've done.