Principles
- Real queries, real responses. Every check calls the actual API of ChatGPT, Claude, Perplexity, or Gemini — no synthetic prompts, no scraped training data.
- Queries come from customers.We don't invent keywords for our reports. Queries are the same ones Aeonic customers track against their own brands.
- Aggregate only. Individual checks are private to the customer who owns them. Public endpoints expose only aggregates; per-site identifiers are never surfaced.
- Pre-registered methodology. This page is the spec. When we update the pipeline we version this doc and note the change.
1. Query generation
Queries come from three sources, in priority order:
- Customer-tracked keywords. Every site adds a keyword set (either manually or via our keyword intelligence pipeline from GSC data). Each keyword is expanded into ~3–5 natural-language variations — the phrasings a human would actually type into a chatbot.
- Topic cluster expansion.Clusters found by our NLP clustering pipeline generate comparison and recommendation queries ("best X for Y", "alternatives to Z").
- Competitor gap queries.When a competitor is cited and the customer isn't, we re-run the query against all four engines for a fair comparison.
2. Engine calls
Each query is sent to each configured engine through its official API with a fixed, versioned prompt template. We currently measure:
ChatGPT
OpenAI API · gpt-4 class model, browsing enabled where supported
Claude
Anthropic API · Claude 3.5 / 4 class model
Perplexity
Perplexity API · sonar-class model with live web retrieval
Gemini
Google AI Studio API · latest Gemini flagship
Each response is stored along with engine name, prompt template version, cost, and raw text. Retries are bounded and idempotent: if a call fails, the check is marked failed rather than retried silently.
3. Brand detection
For every stored response, we ask two questions: is the brand mentioned? and how is it mentioned?
- Exact-name match. First pass: does the brand name or any registered alias appear in the response, case insensitive, with word-boundary matching? Aliases are configured per site to catch common misspellings and alternate brand forms.
- URL match.If the response contains a link to any page on the customer's domain, that's a higher-confidence signal than an unlinked mention.
- Confidence scoring. Each detection gets a
citation_confidencebetween 0 and 1. Factors: exact vs. fuzzy match, URL vs. unlinked, unambiguous vs. generic term, and surrounding sentiment. - Position.Where in the response does the mention occur? We normalize to a 0–1 scale (0 = top, 1 = bottom). Top-of-response citations carry more weight in the aggregate reports because they're more likely to be read.
4. Mention classification
Mentions are classified into five discrete types, stored on the check row as mention_type:
recommended
The engine explicitly recommends the brand as the answer. Highest-value citation.
mentioned
The brand is listed among options but not explicitly recommended.
compared
The brand is named in a comparison against a peer (still valuable; you're in the consideration set).
negative
The brand is mentioned negatively — lost customers, poor reviews, limitations. Still captured, still counts against you.
absent
The brand is not mentioned. The default state for every check that doesn't match a detector.
5. Source type
A citation can come from two very different places, and they're worth different amounts:
url_cited — live retrieval
The engine fetched the page at query time (Perplexity, ChatGPT with browsing, Gemini with search grounding). This is the highest-quality citation: it proves your content is being ranked by the engine's retrieval layer. Optimize structure, schema, and llms.txt to win more of these.
brand_memory — training recall
The engine mentioned the brand from its training data, without fetching a URL. This is still valuable — it means your brand is in the model's latent knowledge — but it's a lagging indicator. Influenced by long-term mentions across the web, not by anything you ship this week.
not_found
No mention of the brand in the response. Counted in the denominator of the citation-rate metric.
6. Aggregation and privacy
Raw checks are stored per-site in ai_citation_checks and rolled up daily into citation_trends. Public endpoints (/api/public/state-of-ai, /api/public/proof-study) only ever read aggregated columns:
- Per-engine counts, citation rates, avg confidence, avg position
- Distribution over mention types and source types
- Distinct cited-site counts (no site ids, just counts)
Per-engine stats in public reports require a minimum of 20 checks before the engine appears. The proof correlation study requires at least 6 paired sites before publishing a number. These thresholds are deliberate conservatism — a single dramatic case study is cheap; a real population correlation requires real sample size.
What we don't do
- We don't scrape training data. Every check is a live API call.
- We don't normalize, smooth, or weight numbers before publishing. If a bucket is noisy, it's noisy.
- We don't use sentiment analysis for anything except the
negativemention type. Sentiment models are opinionated; we'd rather publish raw counts and let readers draw conclusions. - We don't change methodology quietly. Updates are versioned here, and reports generated under the old methodology stay available.
Replicate our numbers
You can hit the public aggregation endpoints directly:
curl https://aeonic.pro/api/public/state-of-ai
curl https://aeonic.pro/api/public/proof-studyBoth responses include generatedAt, full sample sizes, and a methodology string. For custom cuts (by industry, region, time window) reach out to press@aeonic.pro.
See also
- How AI-Readiness scoring works — the 13 factors behind the score used in the proof study
- State of AI Citations — the live aggregate report built on this pipeline
- Proof study — score delta vs. citation-rate delta correlation