
§ 01 · The New Line Item
Three years ago this number was zero. Today it’s a real budget conversation.
The CFO of a thirty-partner regional accounting firm in the American Midwest pulled up her IT budget on a video call recently and pointed to a line that hadn’t existed eighteen months earlier. AI infrastructure: $14,200 per month. She wasn’t complaining about the number, exactly. She was complaining that nobody on the executive committee, including herself, had any idea whether $14,200 was the right number, whether it would be $30,000 next year, and whether there was anything the firm could be doing to manage it that they weren’t already doing. The firm had grown its use of generative AI tools across tax research, document summarization, and bookkeeping automation through 2025, and the inference costs had grown with it — quietly at first, then visibly, then suddenly into territory that demanded a board-level conversation.
This conversation is happening across the profession in 2026 in ways that nobody planned for in 2022. AI inference cost has gone, in three years, from a non-existent budget category to one of the faster-growing operational expenses inside the typical mid-market CPA firm. It is not yet large in absolute terms compared to occupancy, technology licensing, or salaries. But it is large enough to deserve management attention, growing fast enough to alarm the partners who track it, and structured in a way — consumption-based pricing, prepaid commitments, end-of-quarter true-ups — that creates real operational headaches for finance teams used to predictable software subscriptions.
The most consequential thing about this shift is not the absolute size of the spend. It is the way the spend behaves. AI inference costs scale with usage in a way traditional software subscriptions do not. They are denominated in tokens, processed in real time, billed in arrears, and subject to mid-quarter price changes from the underlying providers. A firm that quietly doubled its document-summarization workflow in Q3 will see the consequences in its Q4 invoice. A firm that committed to a discounted enterprise tier in January will discover by July whether its consumption forecast was accurate. The line item is volatile by design, and most CPA firms’ financial controls were built for a different kind of spend.
— Where the spend actually goes —
| Firm Size | Monthly AI Inference | Primary Drivers |
|---|---|---|
| Sole practitioner / 1–3 staff | $50–$300 | Tax research, document summarization, drafting |
| Small firm (5–15 staff) | $300–$1,500 | Above plus bookkeeping classification, audit prep |
| Mid-market firm (20–100 staff) | $2,000–$15,000 | Workflow automation, client portal AI, advisory |
| Upper mid-market (100–400 staff) | $15,000–$60,000 | Multi-engagement scaling, audit analytics, tax engine |
| Big Four / large national | Six to seven figures | Proprietary platforms, enterprise commitments, R&D |
Practitioner-observed ranges. Actual spend varies sharply with engagement mix, client size, and how much workflow has been migrated to AI-assisted tools. Mid-market growth rates of 50–150 percent year-over-year are common in firms actively expanding their AI footprint.
§ 02 · Why The Curve Is Going Up
Three structural drivers, all pushing the same direction.
The first driver is workflow expansion. Firms that started in 2024 with a narrow GenAI deployment — perhaps tax research and contract summarization — have widened the use cases through 2025 into bookkeeping classification, audit anomaly detection, advisory memo drafting, internal knowledge management, and client communications. Each new workflow that successfully demonstrates ROI is a permanent addition to the firm’s monthly inference bill. The Thomson Reuters Institute’s 2025 Generative AI in Professional Services Report found that 44 percent of accounting and tax firms using GenAI now use it daily or multiple times daily — the threshold at which the technology stops being experimental and starts being structural. Firms past that threshold do not reduce their spend year-over-year. They expand it.
The second driver is model upgrade pressure. The frontier of capability has moved every six to nine months for the past three years, and each generation of model is more expensive per token to run than the last in nominal terms, even as efficiency improves on a per-task basis. Firms that started on cheaper, smaller models in 2024 have largely migrated up the model stack as accuracy expectations have risen. Tax memo drafting that worked acceptably on a mid-tier model in early 2025 is increasingly being routed to a frontier model like Claude Opus or GPT-5 because the partners reviewing the output have grown accustomed to the higher quality. The per-token cost of that quality is real, and it shows up in the monthly bill regardless of whether the procurement conversation acknowledged it.
The third driver is enterprise commitment dynamics. The major model providers — Anthropic, OpenAI, Google, Microsoft for Azure-hosted variants — offer significant discounts in exchange for prepaid annual commitments. A firm that commits to $200,000 of Anthropic API spend over twelve months will pay materially less per token than one paying month-to-month. The discount math creates pressure to forecast spend optimistically and over-commit, which produces a different kind of operational problem: firms ending the commitment period with significant unused credit balances that expire without being recouped. This is now a common enough pattern that it has produced its own emerging market response.
Most firms over-commit to lock in the discount, then under-use the commitment, and end the year sitting on credit balances they cannot recover from the provider. The structural inefficiency is now large enough to have produced its own secondary market.
AWSCPA Journal · Editorial
§ 03 · The Emerging Secondary Market
When unused capacity meets unmet demand, a marketplace forms.
Wherever there is enterprise software with prepaid commitments, expiry dates, and a meaningful gap between contracted capacity and actual usage, secondary markets emerge to clear the inefficiency. Cloud computing went through this in the 2010s with the AWS Reserved Instance Marketplace. Software licenses have had secondary trading for decades. AI inference is the newest category to develop one, and it has done so with unusual speed because the underlying inefficiency is unusually large — firms routinely commit to AI spend forecasts that turn out to be 30 to 60 percent higher than their actual consumption, and the unused balance has historically just expired.
Marketplaces have started to emerge that match buyers and sellers of these unused balances directly. AI Credit Mart is one of the more visible operators in this space, providing a venue where firms with unused Anthropic, OpenAI, Azure OpenAI, and other major-provider credits can sell Anthropic credits they will not consume before expiry, and where firms running into their commitment ceiling can buy Claude credits at a discount to the rack rate. The economic logic is straightforward: a seller recovers some portion of value that would otherwise expire worthless; a buyer obtains inference capacity below sticker price; the marketplace takes a small spread for matching the two sides. For mid-market accounting firms running into either side of this problem — sitting on unused commitments or hitting the ceiling on a smaller plan — the marketplace mechanism provides a financial control lever that did not exist eighteen months ago.
Whether a CPA firm should engage with a credit marketplace depends on the structure of its existing AI procurement. Firms paying month-to-month at retail rates have less reason to participate as buyers because their volume probably does not justify the operational overhead, but they may have reason to participate as sellers if a particular project ends earlier than forecast. Firms operating at the upper mid-market scale, with annual commitments in the tens or hundreds of thousands of dollars, frequently have reason to participate on both sides over the course of a year — selling unused balances from one provider while topping up commitments at another. The procurement function is still maturing, and most accounting firms have not yet built the internal expertise to manage it well. Watching how the secondary market develops over the next eighteen months is a reasonable use of a CFO’s attention.
— Cost-management levers, ranked by impact —
| Lever | Typical Saving | Operational Effort |
|---|---|---|
| Model routing (cheaper model for routine tasks) | 20–40% | Moderate; requires workflow audit |
| Prepaid annual commitments at discounted tier | 10–25% | Low; requires accurate consumption forecast |
| Secondary market — buying discounted credits | 10–30% | Low; requires familiarity with marketplace mechanics |
| Secondary market — selling unused commitments | Recovers 50–90% of expiring balance | Low; relevant only at end of commitment period |
| Caching & prompt optimization | 15–35% | Higher; requires technical implementation |
| Workflow consolidation across vendors | 5–15% | High; meaningful procurement work |
The levers compound. A firm applying three or four in combination can realistically reduce effective AI inference cost by 40 to 60 percent versus pure rack-rate month-to-month spending.
§ 04 · The Practical Implication
AI procurement is becoming a real finance function, not just an IT function.
The most important practical shift inside CPA firms managing AI cost is that responsibility for the procurement is migrating out of IT and into finance. In the early adoption phase, AI tooling was bought the way other software was bought — an IT manager evaluated vendors, the partners approved the budget, and the line item ran through the same procurement process as the practice-management system or the document portal. That model has stopped working as the spend has grown and as the consumption-based pricing dynamics have started behaving like a real operating cost rather than a fixed software fee. The firms that handle this transition well are the ones whose CFO has taken direct ownership of the AI inference budget the same way they would own any other variable input cost. The firms that handle it poorly are the ones still treating it as a technology line item that the partners do not need to think about until the invoice arrives.
The Midwestern CFO mentioned at the start of this article eventually built a quarterly review process that tracks AI spend by use case, models out commitment versus consumption, identifies underused balances early enough to do something about them, and considers secondary-market participation as a routine cost-management tool alongside vendor renegotiation and workflow optimization. None of this was on the firm’s radar two years ago. All of it is normal practice management today. The firms that figure out the new operating model first will have a real cost advantage over the ones that do not, and that advantage will compound at every renewal cycle and every new engagement won.
— Reader Questions —
Eight questions on AI cost management, answered plainly.
How big does AI spend need to get before it warrants this level of attention?
Roughly when it crosses 1 to 2 percent of total firm operating expense, or when monthly inference spend exceeds about $5,000 — whichever comes first. Below those thresholds, the operational overhead of active management probably exceeds the savings. Above them, the cost-management levers start producing real money.
Are AI inference costs going to keep rising, or will they fall as models become more efficient?
Both will happen simultaneously. Per-token cost on equivalent capability has fallen consistently over the past three years and will probably continue falling. But firms keep migrating to higher-capability models and expanding their use cases at a pace that more than offsets the per-token efficiency gains. Net spend per firm has been rising and is likely to keep rising through at least 2027.
What is the biggest mistake firms make in AI procurement?
Forecasting consumption optimistically to qualify for the largest discount tier, then under-using the commitment and watching the unused balance expire. The next biggest mistake is the opposite — refusing to commit at all and paying month-to-month rack rates that are 20 to 30 percent above the discounted tier. Both errors have the same root cause: not building a real consumption forecast based on workflow data.
How do secondary marketplaces for AI credits actually work?
A seller with unused credits on their account lists them on the marketplace at a discount to the original purchase price. A buyer who needs inference capacity purchases the credits at the discount, and the marketplace facilitates the transfer through whatever mechanism the underlying provider supports — sometimes account-level transfer, sometimes invoice reassignment, sometimes structured resale. The marketplace takes a small spread for matching the parties and verifying the legitimacy of the credits.
Is buying or selling credits on a secondary market compliant with the original provider’s terms of service?
It depends on the provider and the structure. Some providers explicitly permit account-level credit transfers, others require specific approval, and others prohibit resale entirely. Reputable marketplaces structure their transactions in a way that respects the underlying provider’s terms, but firms should verify the specific arrangement before participating. Talk to the marketplace operator and review the relevant provider’s terms of service.
Should small firms worry about AI cost management?
Probably not yet, beyond basic awareness. A sole practitioner spending $200 a month on a generative AI subscription does not need a CFO-level cost-management process. The general advice is to track the spend, understand the use cases driving it, and revisit when the monthly number starts approaching $1,000. Below that threshold, the operational overhead of formal cost management exceeds the savings.
How does this interact with the broader CPA shortage?
The CPA shortage — an estimated 75,000 fewer accountants entering the US profession than the industry needs — is the structural force pushing firms to adopt AI in the first place. Firms that cannot hire are deploying technology to do more with the staff they have. That deployment generates the inference cost that this article is about. The two trends are tightly connected, and managing AI cost effectively is part of converting the shortage from a constraint into a competitive advantage.
What should a CFO do this quarter to get ahead of the curve?
Three things. First, build a complete inventory of every AI tool and subscription the firm is using, including ones individual staff bought without going through procurement. Second, compare actual consumption to commitment for each contracted vendor and identify any balances at risk of expiring unused. Third, evaluate at least one cost-management lever — commitment renegotiation, model routing, secondary-market participation — in detail enough to estimate the savings. None of these requires major investment; all of them shift AI cost from a passive line item to an actively managed one.
— Editor’s Note —
On the new operating discipline for an old profession.
CPA firms are accustomed to managing variable input costs — staff utilization, occupancy, technology licensing — with mature financial controls developed over decades of practice. AI inference cost is the newest entry on the variable-cost ledger, and the controls for managing it are still being built. The firms that build them well will have a structural cost advantage over the firms that treat the line item as a passive technology expense. The good news is that none of the cost-management levers described in this article require deep technical expertise. They require only the same operational discipline a well-run firm already applies to every other meaningful budget category.
AWSCPA Journal is editorially independent and does not accept compensation from vendors mentioned in our coverage. References to specific platforms, marketplaces, or tooling reflect our editorial judgement about what serves our readers, not commercial relationships. The framings, interpretations, and structural reads in this article are our own. Firms making procurement decisions on the basis of this analysis should treat it as a starting framework rather than a substitute for direct due diligence on the specific vendors and contracts involved.
