Step-by-Step: Deploy the Architecture Review Agent Using AZD AI CLI

Microsoft Tech Community | March 24, 2026

Motive / Why I Wrote This

Most agent tutorials end at the point where the agent works locally. That is the wrong place to stop. The gap between “it works on my machine” and “it is running in production” is where most agent projects stall — not because the AI is wrong, but because the infrastructure work is genuinely painful.

I wrote this article because after shipping the Architecture Review Agent, I wanted to document what it actually looks like to take a Foundry-hosted agent from zero to a live cloud deployment with Teams integration — and how the azd ai extension makes that a fundamentally different experience than the traditional path.

The real motivation was not to write a feature walkthrough. It was to show that the infrastructure tax on agent development is not a necessary cost anymore — and that teams can spend that recovered time iterating on the agent logic that actually matters.

Problem Context: Why Existing Agent Deployment Workflows Break

The typical agent development loop is not a single problem. It is a sequence of compounding friction points that each consume disproportionate time relative to their engineering value.

Across real projects, agent deployment almost never looks clean. It usually involves:

Writing the agent code and testing it by copy-pasting inputs into a local REPL
Manually building a Docker container and pushing it to a registry
Configuring RBAC and managed identities by hand
Writing Bicep or ARM templates that duplicate the intent already expressed in the agent code
Starting the whole cycle over again after every meaningful logic change

That sequence creates a practical problem. Before teams can actually evaluate whether the agent is useful, they first have to survive the deployment pipeline. That step is slow, inconsistent, and highly dependent on whoever happens to know the infrastructure setup best.

The underlying problem is not lack of DevOps knowledge. It is that the deployment tooling for agents was designed for generic containerised services, not for the iterative, logic-heavy development cycle that AI agents require. The result: an agent that is 100 lines of clean Python surrounded by 400 lines of Bicep and a 12-step deployment guide.

What I Built

The azd ai extension for the Azure Developer CLI collapses the entire build-deploy-iterate cycle for Foundry-hosted agents into a handful of commands. This article walks through applying it to the Architecture Review Agent — an open-source tool that converts messy architectural notes into structured risk assessments and interactive Excalidraw diagrams.

In practical terms, the workflow does three things:

Spins up a local Foundry-compatible server so agent invocations during development are identical to production
Deploys the full cloud infrastructure — ACR, Foundry AI Services, App Insights, RBAC — from a single azd up
Publishes the live agent to Microsoft Teams with individual scope, without requiring tenant admin approval

This is directly relevant to teams building enterprise AI tooling where the ability to iterate quickly on agent logic — without re-solving infrastructure on each cycle — directly affects how fast the agent improves.

What you actually get back

One of the reasons the AZD AI approach resonates is that the outputs at each stage are immediately useful. When each step completes, you receive:

Local stage: A localhost:8088 server running the same OpenAI Responses API protocol as the deployed agent — meaning the invocation you write now works identically against the cloud endpoint later
After azd up: A live Agent Playground URL, a production-ready API endpoint, auto-scaling (0–5 replicas), and Managed Identity auth — all provisioned without manual configuration
After Teams publish: A shareable agent link you can send directly to teammates for a workshop or demo, live within minutes

That combination is what makes the AZD AI approach feel practical. It does not stop at “here is a container”. It produces a deployed, integrated, shareable agent.

Visual overview

flowchart TD
    A[Local Python Agent Code] --> B[azd ai agent run\nLocal Foundry server · localhost:8088]
    B --> C[azd ai agent invoke --local\nTest: YAML · arrow notation · markdown]
    C --> D{Results look right?}
    D -- Iterate --> A
    D -- Ready to ship --> E[azd up]
    E --> F[Infrastructure Provisioned\nACR · Foundry AI Services · RBAC]
    F --> G[Container Built & Pushed to ACR]
    G --> H[Hosted Agent Live\nPlayground URL + API endpoint]
    H --> I[Publish to Teams\nIndividual scope · no admin approval needed]

The diagram shows why the local-to-cloud gap is smaller here than in traditional deployments: the same protocol and conversation persistence used at localhost:8088 is what the cloud endpoint exposes. Nothing changes between the left side and the right side of the azd up boundary except where the compute runs.

The End-to-End Process, Properly Explained

One of the reasons I wanted this documented well is that the workflow is more interesting than “run a few commands” suggests. Each stage is designed to eliminate a specific category of friction.

1. Install the extension and set up the environment

# Install AZD
winget install microsoft.azd

# Install the AI Agents extension
azd extension install azure.ai.agents

git clone https://github.com/Azure-Samples/agent-architecture-review-sample
cd agent-architecture-review-sample

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt

azd auth login
azd env new arch-review-dev

azd env set AZURE_AI_PROJECT_ENDPOINT "https://<your-resource>.services.ai.azure.com/api/projects/<your-project>"
azd env set AZURE_AI_MODEL_DEPLOYMENT_NAME "gpt-4.1"

This matters because the environment configuration here is the same configuration azd up will use. You are not setting up a local mock — you are pointing at your real Foundry project from the start.

2. Run the agent locally

azd ai agent run

This starts a local server on localhost:8088 that implements the OpenAI Responses API. The important thing here is that this is not a stub or a simulator. It is the same runtime that your deployed agent will use, with the same conversation persistence model. That means any invocation pattern that works here will work against the cloud endpoint without modification.

3. Invoke the agent — three supported input formats

Open a second terminal. The agent accepts three input patterns, each designed for a different kind of architecture context engineers actually have:

YAML — formal service specifications:

azd ai agent invoke --local "scenarios/ecommerce.yaml"

The rule-based parser processes this without an LLM call — the structure is deterministic. It returns a 12-component architecture diagram from a well-formed YAML spec instantly.

Arrow notation — whiteboard shorthand:

azd ai agent invoke --local "LB -> 3 API servers -> PostgreSQL primary with read replica -> Redis cache"

This is the format engineers actually use on whiteboards and in Slack. The parser extracts replica count, infers component types (LB becomes a Gateway), builds a valid connection graph, and surfaces single points of failure without any additional input.

Markdown / prose — existing design docs:

azd ai agent invoke --local "scenarios/event_driven.md"

Point it at any existing README or design document. It returns an 8-component event-driven streaming architecture from unstructured text.

For all three patterns the agent returns a structured Markdown risk report in the terminal and writes both an interactive architecture.excalidraw file and a high-res PNG to the local /output folder.

4. Deploy with a single command

azd up

azd up orchestrates three things in sequence — and does not require you to understand the order or manage the dependencies between them:

Provisions infrastructure — Foundry AI Services account, Azure Container Registry, Application Insights, and managed identities with correct RBAC. The azure.yaml in the repo declares all of this in about 30 lines.
Builds and pushes the container — Packages the Dockerfile and pushes the image to ACR using the managed identity, not a stored credential.
Deploys the agent — Registers the container image and creates a hosted agent version in Foundry Agent Service with auto-scaling configured.

The output is a live Agent Playground URL and a production-ready API endpoint. The agent scales from 0 to 5 replicas automatically and authenticates via Managed Identity — no secrets stored anywhere.

5. Publish to Microsoft Teams

In the Foundry portal, navigate to your deployed agent and select Publish to Teams and Microsoft 365 Copilot. Fill in the name and description, then select Individual scope.

The individual scope is significant: no M365 tenant admin approval is required. The portal provisions the Azure Bot Service, packages the app manifest, and registers the app automatically. Within two minutes the agent appears in the Teams Copilot agent store. You can generate a share link and send it directly to teammates — useful for demos, workshops, or early feedback rounds without any IT overhead.

Why You Should Add It to Your Workflow

This is useful not because it uses AZD but because it removes the parts of agent development that consume the most time for the least engineering return.

1. The local-to-cloud gap is eliminated, not bridged

Traditional agent development requires maintaining two separate runtime configurations — one for local testing (mocked or simplified) and one for cloud. Every time something behaves differently between them, you debug the gap rather than the agent. With azd ai agent run, the local server is the cloud runtime. There is no gap to debug.

2. Infrastructure becomes a byproduct, not a prerequisite

The azure.yaml declaration is ~30 lines. azd up translates that into a fully provisioned, correctly secured Foundry environment. Teams that previously spent a day on deployment scripting before they could test their first cloud invocation can now start iterating on agent logic instead.

3. Teams integration without the usual blockers

Publishing an agent to Teams typically involves bot registration, Azure App registrations, manifest packaging, and — in most organisations — tenant admin approval. The individual scope option removes that final blocker. For anyone running workshops, building internal tooling, or running a proof of concept, this changes the distribution model from “file an IT ticket” to “share a link.”

A quick note on what this does not replace

AZD AI handles infrastructure and deployment. It does not evaluate agent output quality, manage prompt versioning, or provide observability into what the agent is doing at inference time. Those remain engineering concerns that live in your agent code and your Foundry configuration. The tool eliminates the deployment tax — it does not substitute for the engineering work.

Why This Matters in Real Teams

Agent deployment usually breaks at the handoff between “works locally” and “works for the team.” That handoff involves credential management, container builds, cloud configuration, and — often — a dependency on someone who knows the infrastructure setup. Each of those is a blocker that has nothing to do with whether the agent logic is good.

The AZD AI workflow helps by making the outputs at each stage immediately shareable:

The local server runs on a standard API protocol so other team members can invoke it the same way you do
azd up produces a playground URL the team can use before the integration is wired in
The Teams publish produces a shareable link that works without any additional setup on the recipient’s side

That combination lowers the cost of agent iteration while increasing the likelihood that teams catch quality issues earlier — when they are cheap to fix — rather than after the infrastructure is locked.

What Changed for Me as a Builder

One of the most useful realisations from this workflow was that the hard problem was not the deployment command. It was the azure.yaml contract — specifically, getting the declaration right so that azd up produces an environment that is genuinely equivalent to what the agent expects at runtime.

It was not enough to get the infrastructure provisioned. It had to be provisioned with the right Managed Identity bindings or the agent would fail silently on first invocation. It was not enough to get the container pushed. The startup probe had to be configured correctly or the replica would scale to zero before the first request arrived.

The lesson is that declarative infrastructure is only as clean as the declarations are precise. Thirty lines is not simple — it is compressed. Every line carries a constraint that would otherwise show up as a runtime failure.

It also changed how I think about the local-to-cloud gap more broadly. The gap is not a deployment problem. It is a contract problem: the local environment and the cloud environment need to honour the same interface. AZD AI solves this structurally. Most other approaches solve it by hoping the two environments stay in sync.

Key Takeaways

The azd ai extension addresses the specific failure mode of agent development: the deployment cycle is too expensive relative to the logic iteration cycle it is supposed to serve
azd ai agent run is not a local mock — it is the same runtime, locally. That distinction eliminates an entire category of “works here, breaks there” debugging
azd up turns 30 lines of azure.yaml into a fully provisioned, correctly secured Foundry environment — infrastructure as a byproduct of declaring intent
Teams publishing with individual scope removes the admin approval blocker, changing agent distribution from an IT process to a share link
The declarative contract in azure.yaml is the hard part — get it right and everything downstream is reliable; get it wrong and failures are silent

Learn More & Explore Further

Architecture Review Agent GitHub repo — the complete sample this article is based on
Previous article: Stop Drawing Architecture Diagrams Manually — context on why the agent exists and what it produces
Install the Azure Developer CLI — official Microsoft Learn guide
Hosted agents in Foundry Agent Service — reference for the hosted agent model this workflow deploys to