How We Evaluate Whether AI Is the Right Solution (Before Writing a Single Line of Code)

We get asked to build AI into things. A lot. And the honest answer to about 40% of those requests is “you probably shouldn’t.”

Not because AI isn’t powerful, it is. But because the gap between “AI could theoretically do this” and “AI is the best way to solve this specific problem given your constraints” is wider than most people realize. And the cost of getting it wrong isn’t just the failed implementation. It’s the six months your team spent on it instead of something that would’ve moved the needle.

So before we write code, before we touch a model, before we even open a notebook, we run through a framework. It’s not proprietary or complicated. It’s just the set of questions we’ve learned to ask after building enough of these things to know which ones succeed and which ones become expensive lessons.

Sharing it here because even if you never work with us, thinking through these questions will save you time and money.

Question 1: What decision or action does this enable?

This sounds obvious. It isn’t.

“We want to use AI to analyze customer feedback” isn’t an answer. It’s a technology looking for a purpose. The answer needs to be specific: “We want to automatically route incoming support tickets to the right team based on issue type, reducing misrouted tickets from 30% to under 5%.”

That second version tells you something useful. It tells you the current failure rate (30% misrouting). It tells you the target (under 5%). It tells you what success looks like in operational terms. And critically, it lets you ask the next question: is AI actually the best way to get from 30% to 5%, or would a better taxonomy and some keyword rules get you to 12% in two weeks while the AI approach takes three months?

If you can’t articulate the specific decision or action the AI enables, you’re not ready to build it. Full stop. Go spend another week with the people who do the work and understand the problem at the operational level.

Question 2: What’s the baseline without AI?

Every AI solution is competing against the status quo and against simpler alternatives. You need to know what those alternatives look like.

We worked with a team that wanted ML-based anomaly detection for their infrastructure monitoring. Interesting problem. Before building anything, we asked what they were doing now. The answer: an engineer manually reviewed dashboards twice a day. We asked what would happen if they just set static thresholds on the twelve metrics that mattered most. They hadn’t tried that.

We helped them set up the thresholds. It took a day. It caught 80% of what the ML approach would have caught. The remaining 20% was real, but the question became: is the delta between static thresholds and ML-based detection worth six months of model development, ongoing training pipeline maintenance, and the operational complexity of a system that nobody on the team fully understands?

Sometimes the answer is yes. For that team, it wasn’t. They shipped the thresholds, solved their immediate problem, and moved on to work that actually needed their engineering capacity.

The baseline question isn’t about being anti-AI. It’s about being honest. If a rules engine, a lookup table, or a well-structured SQL query gets you 80% of the way there, you should know that before you commit to the other path.

Question 3: Can you get the data you need, and can you keep getting it?

Model performance is a function of data quality. Everyone knows this in the abstract. Far fewer teams internalize what it means in practice.

The questions to ask aren’t just “do we have data?” but: Where does the data live? Who owns it? Is it labeled? If not, what would labeling cost? How often does the underlying distribution change? When the model starts degrading (and it will), what’s the feedback loop for catching that and retraining?

We’ve seen projects die not because the initial model didn’t work, but because the data pipeline that fed it was someone’s side project. The person who built it left. The pipeline broke. Nobody noticed for two months. By the time they did, the model was making decisions based on stale data and nobody trusted it anymore.

Data isn’t a one-time problem. It’s an ongoing operational commitment. If you’re not prepared to treat data quality with the same seriousness you treat uptime, the model will degrade and the investment will be wasted.

Question 4: What happens when it’s wrong?

Every model is wrong sometimes. The question is what “wrong” costs.

For a product recommendation engine, wrong means a user sees a suggestion they’re not interested in. The cost is negligible. For a medical triage system, wrong means someone with a serious condition gets deprioritized. The cost is potentially catastrophic.

Most business applications fall somewhere in between, and the failure mode analysis is where we spend the most time before committing to an approach. What’s the blast radius of a bad prediction? Is there a human in the loop? How quickly can a bad output be caught and corrected? What’s the reputational cost if this fails publicly?

The answers to these questions shape the entire architecture. High-stakes predictions need confidence thresholds, fallback paths, human review queues, and monitoring that goes well beyond accuracy metrics. Low-stakes predictions can ship with simpler guardrails and iterate based on user feedback.

If you haven’t mapped your failure modes before you start building, you’re going to discover them in production. That’s a significantly more expensive way to learn.

Question 5: Who operates this after it ships?

This is the one that kills the most AI projects and gets the least attention.

A model in production is not a feature you build and forget. It requires monitoring for drift, retraining when performance degrades, debugging when outputs don’t make sense, and someone who understands the system well enough to make those calls. That person needs to exist on your team, not on a consultant’s team.

We build AI systems for clients. We also make sure there’s a clear handoff plan before we start. Who on your team will own this? Do they have the skills? If not, what’s the training plan? What monitoring will be in place? What’s the runbook for when the model’s performance drops below the acceptable threshold?

If the answer to “who operates this” is “we’ll figure that out later,” the project will ship and slowly die. We’ve seen it happen enough times that we now treat operational ownership as a prerequisite, not a follow-up.

The meta-point

None of these questions require AI expertise to ask. They require discipline: the willingness to slow down before speeding up, to validate assumptions before investing in solutions, and to be honest about whether the problem actually warrants the complexity.

The best AI implementations we’ve been part of weren’t the most technically impressive. They were the ones where the team spent more time on the questions than on the code, and where the decision to build was made after, not before, the problem was fully understood.

That’s not a sales pitch. That’s just how good engineering works, regardless of whether AI is involved.

If you’re evaluating whether AI is the right fit for a specific problem, we’re happy to talk through it. No pitch, no commitment, just a second opinion from people who’ve made the call both ways. Reach out.