The hardest portion of gathering AI systems contiguous is nary longer getting entree to a susceptible model. It is knowing however to choose, validate, optimize, and run the close exemplary crossed the afloat lifecycle of a existent application.
Take a retrieval-augmented procreation (RAG)-based lawsuit enactment copilot oregon a tool-calling cause that helps employees implicit concern workflows. In a prototype, it whitethorn beryllium capable to prime a beardown model, link a fewer information sources, and get a utile response. In production, the strategy needs to retrieve the close context, telephone the close tools, conscionable prime and information thresholds, enactment wrong latency targets, and tally astatine a outgo the concern tin sustain.
Models evolve, costs shift, and accumulation requirements often get aft the archetypal mentation is already working. Success depends little connected choosing the astir almighty exemplary and much connected gathering a disciplined operating attack astir the application.
That is where Microsoft Foundry comes in: a unified level to select, evaluate, optimize, operate, and continuously amended AI applications astatine accumulation scale.
What’s new
Microsoft Foundry continues to grow the exemplary ecosystem and operating aboveground for developers gathering accumulation AI systems.
Fireworks AI connected Microsoft Foundry is present mostly available, giving developers entree to production-grade unfastened exemplary inference done a azygous Azure endpoint, with endeavor service-level agreements (SLAs) and zero-setup onboarding.
Foundry is besides adding caller exemplary families and capabilities crossed modalities, including Microsoft AI models, spouse models, open-source models, customized models, and post-trained variants. Together, these updates springiness developers much prime portion keeping selection, evaluation, deployment, and operations successful 1 accordant workflow.
The situation is nary longer access. It is operations.
In a prototype, the questions are simple: Can the exemplary reply the prompt? Can it link to my data? Can it implicit the blessed path?
In production, the questions change. Which exemplary fits each task? How bash I validate it connected my ain data? What latency fund does this acquisition require? How overmuch throughput bash I request astatine peak? What happens erstwhile quota is constrained, costs spike, oregon a newer exemplary becomes available? How bash I show quality, observe eval drift, rotation backmost safely, and beryllium the strategy is governed?
Agentic systems often neglect erstwhile the exemplary is mismatched, valuation is incomplete, costs tally unchecked, oregon governance arrives excessively late. Teams that trust connected a azygous supplier look different risk: lock-in, with nary flight hatch erstwhile a exemplary degrades, pricing changes, oregon capableness becomes constrained.
Foundry is built connected the other philosophy. It is simply a model-agnostic level spanning Microsoft, open-source, and autarkic bundle vendor (ISV) spouse models, each connected the aforesaid operating surface.
The reply is to dainty exemplary enactment and optimization arsenic a continuous operating discipline:

1. Select the close exemplary for the task
Model enactment is astir workload fit, not leaderboard rank. Before choosing a model, specify the task contract: what the exemplary needs to do, what bully looks like, what constraints it indispensable run within, and which nonaccomplishment modes are unacceptable.
A routing measurement whitethorn request debased latency. A argumentation question whitethorn request grounded reasoning with citations. A coding cause whitethorn request deeper reasoning and instrumentality use. A customer-facing copilot whitethorn request beardown information boundaries, predictable latency, and outgo ratio astatine scale.
A elemental exemplary enactment framework:
| Classification, routing, extraction, oregon high-volume chat | Smaller, lower-latency model | Keeps outgo and latency low |
| Complex reasoning, coding, oregon planning | Stronger reasoning model | Improves prime for harder tasks |
| Image, speech, voice, oregon carnal AI | Modality-specific model | Matches the exemplary to the input and output type |
| Mixed workloads with antithetic complexity | Model Router | Routes each petition based connected quality, cost, and latency |
| Domain-specific behavior, tone, oregon format | Fine-tuned oregon customized model | Improves consistency for your scenario |
Effective exemplary prime depends connected 4 dimensions: capability, safety, latency, and cost.
Foundry helps developers marque these tradeoffs done a wide exemplary ecosystem and a accordant operating surface. Developers tin entree Microsoft models, starring basal models, spouse models similar Fireworks AI, open-source models, customized models, and post-trained variants done 1 selection, evaluation, and deployment workflow.
Developer tip: For developers who privation to bypass manual selection, Foundry provides Model Router successful Foundry Models. Model Router automatically routes each petition to the astir due exemplary based connected workload characteristics, outgo targets, and latency requirements.
2. Validate with your ain evals and data
Benchmarks are not enough. A exemplary that leads a nationalist leaderboard whitethorn inactive underperform connected your prompts, your data, your users, and your concern rules. Production assurance comes from evaluating against the workloads your exertion volition really run.
With Foundry, developers tin bring their ain valuation inputs, including CSV oregon JSONL datasets with prompts, expected outputs, labels, oregon ground-truth answers. They tin tally side-by-side comparisons crossed models and prompts, measure agents and multi-step workflows, and inspect results crossed datasets, traces, and production-like scenarios.
Built-in prime and information evaluators assistance measurement signals specified arsenic relevance, groundedness, coherence, fluency, safety, and argumentation adherence. Custom evaluators tin seizure application-specific rules, formats, and concern logic.
A beardown valuation covers:
Quality: Did the exemplary implicit the task correctly? Accuracy and groundedness: Did it nutrient reliable answers based connected the close context? Safety: Did it travel policies and debar unacceptable responses? Performance: Did it conscionable latency, throughput, and reliability requirements? Cost: Did it present the close result astatine the close price?
Evaluation should tally continuously arsenic caller exemplary versions, fine-tuned variants, cause changes, oregon caller exemplary families go available.
Developer tip: Define occurrence criteria earlier opening the exemplary catalog. Criteria-first valuation prevents anchoring connected exemplary estimation alternatively of workload fit.
3. Optimize outgo and performance
Cost is simply a first-class architectural concern, not an afterthought. In prototypes, it whitethorn beryllium acceptable to nonstop each task to the astir susceptible model. In production, that attack breaks down quickly.
A elemental classification task, a RAG response, a long-context reasoning workflow, and a multi-step agentic process should not ever usage the aforesaid exemplary oregon deployment strategy.
Foundry gives developers levers to optimize crossed quality, cost, and latency astatine the strategy level:
Intelligent routing: Send each task to the close exemplary based connected complexity and budget. Batching: Use asynchronous processing for workloads that bash not necessitate real-time responses. Caching: Avoid paying repeatedly for identical oregon near-identical requests. Provisioned throughput: Use dedicated capableness for predictable show astatine scale. Quota management: Scale much predictably with quota tiering, planetary lawsuit quota, and information portion lawsuit quota. Model optimization: Use exemplary compression, fine-tuning, oregon distillation wherever appropriate.
Fireworks AI connected Foundry is present mostly available, giving developers entree to a high-performance unfastened exemplary catalog done a azygous Azure endpoint, with endeavor SLAs, nary abstracted infrastructure, and nary abstracted contracts.
Developer tip: Profile outgo by task benignant earlier optimizing globally. Routing decisions are workload-specific, not one-size-fits-all.
4. Operate astatine standard with endeavor confidence
Deploying an endpoint is not the aforesaid arsenic operating a accumulation AI system. Teams request to recognize however the strategy behaves, enforce policies, show usage and cost, trial exemplary changes safely, and rotation backmost erstwhile prime oregon show regresses.
Foundry brings these operating capabilities into 1 surface: versioning, SLA-backed reliability, security, governance, entree controls, audit logging, usage monitoring, and controlled upgrades.
Teams tin show token usage and throughput, inspect logs and traces, measure exemplary and cause behavior, enforce policies, and comparison changes earlier rolling them retired broadly. As caller exemplary versions go available, they tin trial against valuation datasets and traces, validate quality, latency, and outgo impact, and trim hazard with versioning and rollback strategies.
The Fireworks AI connected Foundry mostly disposable (GA) merchandise is simply a factual illustration of this operating model, with endeavor SLAs, provisioned throughput portion (PTU) Data Zone support, SOC2 readiness, and the aforesaid entree controls and audit logging that govern Foundry.
Production adopters span AI-native and accepted endeavor workloads, including Perplexity, Motif, UiPath, and StackBlitz. During preview, the level processed much than 176 cardinal tokens crossed 17 S&P 500 enterprises.
Developer tip: Treat exemplary upgrades similar dependency upgrades: trial against baselines, signifier rollouts, show regressions, and support a rollback plan.
5. Continuously amended arsenic models and workloads evolve
AI systems are dynamic. Models improve, workloads shift, idiosyncratic behaviour changes, pricing evolves, and caller exemplary families arrive. The champion strategy contiguous whitethorn not beryllium the champion strategy six months from now.
That is wherefore the lifecycle loop matters:
Select the close exemplary for the task. Evaluate it against your ain information and accumulation baselines. Optimize for quality, cost, latency, and throughput. Operate with governance, observability, and reliability. Improve as caller models, tools, and customization options emerge.
For engineering teams, each model, prompt, tool, agent, oregon workflow alteration should beryllium treated similar a accumulation change. New exemplary versions should beryllium tested automatically against regression datasets, accumulation traces, and known borderline cases earlier rollout.
A exemplary whitethorn amended prime but summation latency, trim outgo but weaken groundedness, oregon execute amended connected communal cases portion regressing connected high-risk scenarios. Automated evaluations assistance teams observe those tradeoffs early.
Developer tip: Automate your valuation pipeline truthful each caller exemplary mentation is compared against accumulation baselines for quality, safety, latency, throughput, and outgo earlier deployment.
What this means for developers
The adjacent signifier of AI improvement volition not beryllium won by teams that simply person entree to the biggest models. It volition beryllium won by teams that cognize however to run models well.
That means choosing by workload fit, validating with existent data, optimizing outgo and performance, deploying with governance, and improving arsenic the scenery shifts.
Microsoft Foundry is designed for precisely this reality: a model-agnostic level spanning Microsoft, open-source, and ISV models, each connected 1 operating surface. No lock-in. No re-architecture. No guesswork.
The aboriginal of AI improvement is not astir guessing which exemplary mightiness work. It is astir gathering an operating subject that lets you know.
Get started
- Microsoft Foundry portal
- Microsoft Foundry documentation
- Fireworks AI connected Foundry (now mostly available)
- Evaluation quickstart
- Quota absorption docs
- Watch BRK230: Build smarter AI systems successful Foundry arsenic models and costs evolve
- Claude Foundry Skilling Learning Path
The station A Developer’s Guide to Managing Models, Cost and Quality successful Microsoft Foundry appeared archetypal connected Microsoft Azure Blog.