We Put Claude in Charge of Our Store’s Margins. The Experiment Escaped the Lab

By Heorhi Tratsiak

There’s a silent killer in ecommerce. It’s not in your analytics dashboard. It’s in the pricing decisions made by humans.

While your manager opens 15 tabs, parses competitor prices, checks exchange rates, and updates an Excel sheet, money is leaking out. Not because they’re bad at their job. Because they’re human. Humans sleep. Humans get sick. Humans cannot recalculate 500 SKUs every thirty minutes without losing their minds — or making mistakes.

We decided to test a hypothesis: what if we handed margin management to an AI? Not a dumb script that pulls a pricing formula, but an agent that thinks. That makes judgment calls. That owns the outcome.

That’s how an experiment code-named “The Digital Director” was born.

The Shell Around the Brain: How We Fed Claude Real Data

Let’s start with the architecture. The brain is Claude, by Anthropic. We picked it for two reasons. First, at the time of the experiment, it was the only model that didn’t hallucinate numbers when working with financial data. Second, Claude’s safety systems let us set hard guardrails — like “never drop the price below cost.” Without that, the experiment would have been corporate suicide.

But Claude can’t do anything on its own. It needs a shell — an environment that gives it access to real data and the ability to act.

We built that shell:

The agent lives on a dedicated server and talks to our WooCommerce store via REST API. It reads the catalog, prices, and stock levels, and can update them just as easily.
Supplier purchase prices are parsed from price lists every three hours.
Competitor prices come through the Keepa API.
Ad spend data is pulled from the Facebook Ads API.
The agent has its own task manager. It plans its own work. Every 30 minutes it wakes up, collects fresh data from all sources, and decides what to do next.

The agent has memory. Every decision and its justification is logged. Once a day it generates a summary: what it changed, why, and what risks it sees. We read it over coffee.

Crucially, the agent never applies changes directly. It drafts a list of proposed actions — what, why, and by how much. The system sends them to me on Telegram. I can reject, approve, or ask for clarification. Only after my confirmation does the agent touch the REST API to update prices. This is non-negotiable: the final decision always belongs to a human. The agent is an advisor, not an executioner.

What We Actually Handed Over to Claude

We started small and expanded the remit gradually. Here’s the list of functions that transitioned to the agent:

Monitoring purchase prices. Every three hours, the agent parses supplier price lists. If a supplier changes a price, the agent instantly recalculates the minimum viable selling price and proposes an adjustment. Previously, this happened once a week. By day eight, we were already losing money.
Controlling ad math. Every 30 minutes, the agent checks the ad cost-to-revenue ratio for each SKU against the actual cost of goods. If a campaign goes negative, the agent suggests either correcting the price or pausing the ad. Before, zombie campaigns bled money for weeks.
Reacting to competitors. The agent tracks the prices of the top five competitors for each SKU. If a competitor dumps, the agent doesn’t blindly join the race to the bottom. It evaluates whether we should react at all. Sometimes it’s smarter to step aside than to cut margin chasing someone liquidating inventory.
Forecasting stockouts. The agent analyzes sales velocity and warns: “This SKU will run out in five days. Reorder takes four days. Order today.” We stopped losing sales to out-of-stock items.
Daily human-readable report. Every morning, the agent sends a brief: which SKUs got more expensive, which got cheaper, where margins are under threat, and what it suggests. All in plain language, not analyst jargon.

What We Did NOT Hand Over (And Why)

This is even more important than the list above:

The final call. We deliberately kept approval with a human. The agent proposes. A person decides. This isn’t distrust of the model. It’s math: an LLM can hallucinate, and one pricing mistake costs more than a month of manual work.
Flagship brands. Products that define the store’s identity and are our main discovery channel. Their prices are only changed manually. The agent can advise, but not touch them.
Loss-leader products. Items deliberately kept at zero or negative margin to drive traffic. The agent ignores them.
Sales events. The decision to start a sale and how deep to go remains human. The agent lacks the context for why we’re liquidating this inventory or what our strategic goal is.

A Real Decision: What Claude Did Yesterday

Let me walk you through a specific case from the logs so you understand how the agent thinks.

Situation. Supplier raised the cost price of product X by 14%. Competitor prices stayed flat. Our margin collapsed from 18% to 3%.

What the agent did:

Detected the purchase price change from the supplier price list.
Checked five competitor prices — no changes.
Calculated the minimum selling price to maintain a 10% margin — we’d need to raise the price by 12%.
Analyzed demand elasticity: it reviewed sales history during past price increases for this product. It found that a 10% price hike leads to a 4% drop in sales. Conclusion: a 12% increase would drop sales by 4–5%.
Proposed raising the price by 8% instead of 12%. It consciously accepted a margin decline from 18% to 9%, because a sharper hike, by its math, would hurt sales more than the margin gain would help.

What I did. Approved the decision.

Before this experiment, the scenario looked like this: the manager noticed the supplier change at the end of the week, mechanically raised the price by 14%, sales dropped, and we didn’t know why. The agent did the same analysis in 12 minutes, with an elasticity check.

Results: What Changed in Three Months

We launched the experiment in early 2026 on 500 SKUs. Three months later, the picture looks like this:

Average portfolio margin increased by 4.2 percentage points. Not “2x” or “to the moon.” Just 4.2 p.p. But in our business, this is the difference between “operating at break-even” and “reinvesting in growth.”
Average response time to a supplier price change dropped from three days to 12 minutes. That’s the key number of the entire experiment. We stopped bleeding cash in the gap between “supplier raised the price” and “we realized it.”
Zero zombie ad campaigns. The agent kills them faster than any human can spot the problem.
Zero hallucination errors. In three months, Claude didn’t propose a single absurd price. Once it suggested raising a price by 80% — but flagged it itself as “needs verification,” and we rejected it. No change happens without human approval.

Important note: we tracked costs. Claude API runs about $120 per month under our load. The server shell is another $40. Total: $160 per month. Compare that to the salary of a person who partially did this work — and did it poorly, because you can’t do this well manually.

Where We Got Burned

I don’t want you to think we just pressed a button and it worked. There were three serious problems:

The first: data. Claude is only as good as the data it consumes. For the first week, we didn’t let the agent act. We just cleaned the product catalog. Bad descriptions, duplicate SKUs, incorrect cost prices — each error would have multiplied across hundreds of product pages.

The second: guardrails. In the second week, the agent proposed dumping the price on a product that was one of our flagships. We hadn’t told it that product was an anchor. We urgently had to write the rules: a list of exclusions, minimum profitability thresholds for each category, a ban on changing prices of certain brands. This was the most important work — not technical, but methodological.

The third: seasonality. The agent doesn’t naturally understand that before International Women’s Day, cosmetics should get more expensive not because suppliers raised prices, but because demand is about to explode. We spent a month teaching it. Essentially, we transferred unwritten knowledge that existed only in the founder’s head.

Who This Is For

We built this agent on WooCommerce for small and mid-sized ecommerce. Why? Because we don’t have pricing departments, armies of analysts, or the IT budget for a custom ERP. We have a store, 500–2000 SKUs, and two or three people on the team. We desperately need tools that take over the grunt work of recalculating prices, so we can focus on strategy.

A tool like our “Digital Director” isn’t a toy and isn’t hype. It’s a weapon. A weapon that lets a small business play on the same field as giants with dedicated pricing teams.

The Big Question: Can an LLM Manage Pricing?

It can. But not alone. The agent is an ideal advisor: fast, precise, doesn’t sleep, doesn’t make math errors. But the final call must stay with a human.

We tested this. Claude didn’t replace our manager. It replaced 70% of his busywork. The manager stopped being a monkey with an Excel sheet and became a commander who sets tasks and monitors outcomes.

And yes, as a side note, Anthropic recently ran its own experiment — Project Vend. They put Claude Sonnet 3.7 in charge of a real vending machine in their office. The agent sourced suppliers, set prices, ordered restocks. The result? The machine didn’t lose money, but it didn’t make money either. Claude gave discounts too eagerly and learned from mistakes too slowly. That’s exactly why we never gave our agent the final say.

While your competitor wakes up, opens Excel, and loads 15 tabs, our Claude has already recalculated 500 SKUs, checked ad campaigns, compared competitor prices, and sent me a report on Telegram.

The future of ecommerce isn’t “human vs. AI.” It’s human + AI vs. human without AI. And in the second case, the latter’s chances are getting slimmer by the day.