AI search visibility has become the new vanity metric, and most teams are measuring the wrong number. The AI visibility tools multiplying across the market count how often a model mentions you, or cites you, when someone types a prompt. That number feels like progress because it looks like the rank tracking we have done for twenty years. It is not the same thing, and the gap between what these tools count and what actually moves your business keeps widening. This is a guide to the metrics that matter in AI search, the ones that only look like they matter, and how to tell them apart, built from six No Hacks podcast conversations with people who do this for a living and the data that landed this month.
GET WEEKLY WEB STRATEGY TIPS FOR THE AI AGE
Practical strategies for making your website work for AI agents and the humans using it. Podcast episodes, articles, videos. Plus exclusive tools, free for subscribers. No spam.
Contents
- Why AI visibility is the wrong number
- Being cited is not being recommended
- Ask once, measure noise
- What to measure instead: presence and recommendation share
- The same vanity metric we already lived through
- Start with brand accuracy
- Two blind spots: training cutoff and platform data
- Know who you are
Why AI visibility is the wrong number
The dominant way to measure AI search right now is prompt tracking, and as most teams use it, it is the wrong instrument. The pitch is familiar: a tool types a set of prompts into ChatGPT, Perplexity, and Google's AI Overviews, and reports how often your brand shows up. It looks like rank tracking, so it sells. I get more pitches every week from companies that built one and want to come on the podcast. There are far more of them than the market needs, and the attention is rushing toward the most visible, most obvious thing to count, whether or not it is the thing that matters.
Jono Alderson, a technical SEO consultant, put the objection this way when he came on the show. "We need to instead try and influence how the machine perceives us. And that's not prompt tracking, which is what everyone is doing at the moment," he said. "There is a place for that, but it's far smaller than I think." His diagnosis of why these dashboards feel off is sharper still: "it's copy paste the current modality of rank tracking into a new thing. It doesn't really fit, but it's better than nothing."
WATCH ON YOUTUBE
Prompt tracking also rests on a guess about user behavior that mostly is not grounded. You invent a list of prompts you hope your customers type, then measure yourself against them. For most brands that list has little to do with what real people are actually asking, and an AI prompt is not a keyword. Ground those prompts in real search data instead, and you hit a second problem: AI is corrupting that data faster than you can read it.
I have seen that corruption first-hand. Last year I helped investigate a strange leak: real users' ChatGPT prompts were turning up inside Google Search Console, the report website owners use to track search traffic. I worked on it with the analytics consultant Jason Packer, who published the findings on Quantable, and Ars Technica and a long list of outlets covered it. We traced it to a bugged prompt box that made ChatGPT search almost every time, with a ChatGPT URL leading the queries that Google then tokenized, so websites ranking for those terms saw strangers' private prompts in their dashboards. My own concern at the time, which Ars Technica quoted, was that the leak was contributing to "crocodile mouth" in Search Console, the pattern where impressions spike while clicks dip.
The leak was the visible version of a problem that is now everywhere and impossible to see. AI systems hit Google constantly to ground their answers, fanning a single prompt out into several parallel queries and reading results no human ever lays eyes on. Every one of those searches lands as an impression on the pages that rank for it. So when your impressions climb while your clicks do not, that is not rising human demand. A growing share of it is machines searching on someone's behalf and consuming the answer without ever clicking through. You can watch the same distortion in search-trend and keyword-volume data, where the curves climb and stop telling you how much of the demand is even human. Search Console is a noisy, leaky place to read AI behavior, and the instruments most people reach for are measuring a number that AI is busy inflating. Underneath nearly all of this is still web search, with grounding and query fan-out doing the work. Google built AI visibility reporting straight into Search Console, which tells you where it thinks this is heading, but the report shows impressions, not AI clicks. It hands you the number AI inflates and withholds the one that would let you check it.
Being cited is not being recommended
The single most important distinction in AI search measurement is that a citation is not a recommendation. A citation is when a model names your page as a source under its answer. A recommendation is when the model tells the user to choose you. Most tools count the first and let you assume it means the second. It does not.
Lily Ray pulled the AI Overview answers for 100 business software "best of" queries at three checkpoints, in April, May, and June 2026. When a brand's own self-promotional listicle was cited as a source, that brand was left out of the actual recommendation 69% of the time, 224 of the 323 self-promotional listicles cited. Google was reading the page and then recommending the competitors named inside it. Separately, Jeff Oxford's team at Visibility Labs tested 20,000 ChatGPT responses and found product recommendations changed 80.2% once search was switched on, with only a weak 0.4 correlation between being cited and being recommended. BrightEdge, looking across five engines, found the same shape from another angle: source overlap between engine pairs ranged from 16 to 59 percent, but the set of recommended brands stayed in a tighter 36 to 55 percent band. And Kevin Indig's analysis of 3.7 million citations found that 91% of cited URLs appear in only one engine, so even your citation footprint does not travel.
Alisa Scharf, the Chief AI Officer at Seer Interactive, had been arguing the same thing for longer. "Citations are an even worse metric than page one visibility," she told me, "because they don't necessarily indicate that your brand is mentioned in that response. We think of it as a leading indicator, akin to being on page two or page three of Google." She laid the levels out as a hierarchy: "There's the citation where your webpage is mentioned. There's the mention where you've got your brand in the response. But rarely is ChatGPT or Claude specifically saying, you should go with X." That last step, the recommendation, is the one that pays, and it is the one prompt-tracking scores treat as the same thing as a footnote.
Malte Landwehr, who runs product and marketing at the AI search platform Peec AI, had the cleanest illustration of how far citation and recommendation can drift apart. He described a case where a now-defunct tool became one of the most-cited sources behind ChatGPT's answers in its category. "They didn't gain visibility as a brand," he said. "But they now have power over what brands are recommended by LLMs." Being the source and being the choice are not the same measurement.
WATCH ON YOUTUBE
Ask once, measure noise
A single measurement of an AI answer is close to worthless, because the answer is different every time you ask. This is the part prompt-tracking dashboards paper over: they show you a number as if it were stable.
Rand Fishkin, who runs the audience-research firm SparkToro, ran the study that quantified it. "You are not getting an answer when you ask," he told me. "You are getting one of thousands or potentially millions of answers, and every time you ask it's gonna be different. Every different person who asks is gonna get a different list, a different number of items, a different order, and a different set of recommendations." How different is that? "In order to get two lists of brands that are the same in an answer, on average you would need to ask Claude or ChatGPT 1,500 times before you get two answers with the same list of brands in the same order."
WATCH ON YOUTUBE
That number is the whole case against single-shot measurement. It does not mean AI visibility is unmeasurable. It means it has to be measured the way you measure a poll, not the way you check a rank. He was explicit that the signal is there if you do the work: "If you ask the right number of prompts, the right number of times, with some variability, you can get a statistical number that's basically plus or minus 5%, or plus or minus 1% if you go really hard." The instrument is fine. The problem is that most tools run it once and call the result a ranking.
What to measure instead: presence and recommendation share
The number that replaces prompt tracking is presence: how often you are named across the answer space, read against whether that presence turns into a recommendation and an action.
Rand Fishkin called it the only honest version of the number. "Percent of visibility is the number that's real, that an AI tracking tool should be giving you," he said. "It's not like Google rank tracking. It's more like when brands in the 20th century used to survey consumers and they would say, have you heard of Nike shoes, have you heard of Adidas shoes." Wil Reynolds, the founder of Seer Interactive, added the instrumentation detail that almost no one tracks: not only whether you appear, but the composition of the answer over time. "If you're tracking visibility and you don't also track things like the number of words or brands mentioned per model per prompt over time," he said, "you would not know that back in November ChatGPT doubled the length of the answer." When the answer doubles, your raw visibility can rise while nothing real changed. You did not get more valuable. The same person is seeing more words.
There is a harder caveat under all of this, and it is the one he was most blunt about: visibility only counts if it is tied to the outcome. "You can be visible. That's great," he said. "But somebody's gotta actually take an action for you to make any money from that visibility. If you don't track those two metrics against each other, you're the sucker."
WATCH ON YOUTUBE
I run these measurements on my own brand, and the result is the clearest proof I have that the work moves the metric. Right now, Google's AI Overviews recommend No Hacks as the best podcast for AI web strategy. A month ago they did not. I did not get there by climbing a prompt-tracking dashboard. I got there by changing what the systems know about my entity, which is a different exercise entirely. And the honest scoreboard has to account for which prompts you are even measuring. With Search Console you have a rough, grounded idea of the queries you rank for. With prompt tracking you start with prompts you made up and put in a list, which is not grounded in what is actually happening for most brands. Measure recommendation share if you can, but know that its quality depends entirely on whether the prompts behind it are real.
The same vanity metric we already lived through
It took the search industry the better part of two decades to accept that impressions and clicks were, on their own, vanity numbers, because in the end they did not have to mean revenue. We have been here before. AI visibility for the sake of visibility is the same trap wearing new clothes. It might be branding. It is not the metric that matters, and it is a very good vanity metric precisely because it is easy to make go up.
Wil Reynolds, whose episode was titled around this exact point, traced the loop directly. "The vanity metric early was rankings," he said, "and then people went, wait, I gotta get traffic from those rankings, and then I need that traffic to turn into a business. So to me it's just a regurgitation of what we did years ago." Jono Alderson went further and argued the precise attribution we comforted ourselves with was never real to begin with: "the crutch and the lies that we've told ourselves for the last decade, that we can neatly attribute impression share through to clicks, through to actions, through to revenue. It's never been true and it's getting less true." The job, restated, is older than any of the tooling. As he put it, two decades ago we could have decided our work was to influence how people perceive the brand.
Start with brand accuracy
The metric to build first is brand accuracy: whether the AI describes your entity correctly at all. Recommendation share comes after. If the model holds wrong facts about you, every downstream number is built on sand, because it is recommending, or refusing to recommend, a version of you that is not real.
This is where clarity starts. It means brand consistency across every platform, other people describing you the same way, and a coherent answer to the basic questions of who you are and what you do. Duane Forrester, who helped launch Schema.org and built Bing Webmaster Tools, framed the goal as being the trusted source rather than the high ranker. "Your goal should be to be seen as the canonical for whatever your question is," he said. "Not rankings, but that you are the source of knowledge." His read on why this compounds was that the machine is lazy in a useful way: "It costs money and cycles and tokens to go build trust. So if I've done all that work and I trust you, and you're a good answer, and my consumer is happy with that answer, why would I change?"
WATCH ON YOUTUBE
Alisa Scharf turned this into a measurement you can actually run, which she calls a brand accuracy audit. "You come up with a list of objective criteria," she said. "It can't be, we want to rank for best X for Y. It's got to be: when were you founded, where are you based, what do you sell, who do you compete against. And you take that list of queries and see, for each model, what is it consistently getting right, what is it consistently getting wrong." That is the audit. Run your own non-negotiable facts through each engine on a schedule, and score the model on accuracy, not on whether it flattered you with a mention.
WATCH ON YOUTUBE
Two blind spots: training cutoff and platform data
Honest measurement has to name its blind spots, and AI search has two big ones. The first is the training-data cutoff. A meaningful share of answers come from what the model already learned before any live grounding kicks in, frozen at a date you do not control, and we do not yet have a clean way to measure whether the work we do today is moving those baked-in answers at all. You can be optimizing hard against a version of the model's knowledge that is months stale.
The second is platform data, and here the shape of the market matters. The frontier model companies have little reason to hand you usage data. There is no obvious world where OpenAI or Anthropic exposes how its model decided what to recommend. The companies that might give you something are the ones with a larger ecosystem to protect: Google is adding AI impressions to Search Console, and Microsoft surfaces a version of this through Bing Webmaster Tools, because they run both the model and a measurement surface that benefits from your attention. It is weak data, and it is something rather than nothing. Whether the pure-play model companies ever open this up is a genuine open question, and a lot of how measurable AI search becomes depends on the answer.
Know who you are
Do you know who you are, and what you want people to think about you? Are you communicating that clearly, and in enough different places, that AI systems can form an accurate picture of your entity rather than a guess? Be clear and be consistent. It sounds almost too simple to write down, and yet schema, the pages on your website, every social profile, and every place you are mentioned all have to say the same unambiguous thing about what you are, what your company is called, and who is behind it.
There is a reason this is becoming the deterministic core of the work, not the soft branding edge of it. A German court recently held Google liable for false statements its AI Overview generated about a business, on the reasoning that the AI answer is Google's own speech. Here is a thesis I cannot prove yet, and I want to be clear it is speculation: a platform that is now legally on the hook for what its AI says about you has a strong incentive to surface only the entities it is confident about. Picture a confidence threshold, some internal certainty score, where if the system is sure enough about who you are it includes you, and if it is not, it leaves you out rather than risk being wrong. The exact number is invented. The direction feels right. And if it is right, then the most important thing you can measure is not how often you appear. It is how certain the machine is that it knows you, because that certainty is about to decide whether you appear at all.

