How Citation Detection Works

Principles

Real queries, real responses. Every check calls the actual API of ChatGPT, Claude, Perplexity, or Gemini — no synthetic prompts, no scraped training data.
Queries come from customers.We don't invent keywords for our reports. Queries are the same ones Aeonic customers track against their own brands.
Aggregate only. Individual checks are private to the customer who owns them. Public endpoints expose only aggregates; per-site identifiers are never surfaced.
Pre-registered methodology. This page is the spec. When we update the pipeline we version this doc and note the change.

1. Query generation

Queries come from three sources, in priority order:

Customer-tracked keywords. Every site adds a keyword set (either manually or via our keyword intelligence pipeline from GSC data). Each keyword is expanded into ~3–5 natural-language variations — the phrasings a human would actually type into a chatbot.
Topic cluster expansion.Clusters found by our NLP clustering pipeline generate comparison and recommendation queries ("best X for Y", "alternatives to Z").
Competitor gap queries.When a competitor is cited and the customer isn't, we re-run the query against all four engines for a fair comparison.

2. Engine calls

Each query is sent to each configured engine through its official API with a fixed, versioned prompt template. We currently measure:

ChatGPT

OpenAI API · gpt-4 class model, browsing enabled where supported

Claude

Anthropic API · Claude 3.5 / 4 class model

Perplexity

Perplexity API · sonar-class model with live web retrieval

Gemini

Google AI Studio API · latest Gemini flagship

Each response is stored along with engine name, prompt template version, cost, and raw text. Retries are bounded and idempotent: if a call fails, the check is marked failed rather than retried silently.

3. Brand detection

For every stored response, we ask two questions: is the brand mentioned? and how is it mentioned?

Exact-name match. First pass: does the brand name or any registered alias appear in the response, case insensitive, with word-boundary matching? Aliases are configured per site to catch common misspellings and alternate brand forms.
URL match.If the response contains a link to any page on the customer's domain, that's a higher-confidence signal than an unlinked mention.
Confidence scoring. Each detection gets a citation_confidence between 0 and 1. Factors: exact vs. fuzzy match, URL vs. unlinked, unambiguous vs. generic term, and surrounding sentiment.
Position.Where in the response does the mention occur? We normalize to a 0–1 scale (0 = top, 1 = bottom). Top-of-response citations carry more weight in the aggregate reports because they're more likely to be read.

4. Mention classification

Mentions are classified into five discrete types, stored on the check row as mention_type:

recommended

The engine explicitly recommends the brand as the answer. Highest-value citation.

mentioned

The brand is listed among options but not explicitly recommended.

compared

The brand is named in a comparison against a peer (still valuable; you're in the consideration set).

negative

The brand is mentioned negatively — lost customers, poor reviews, limitations. Still captured, still counts against you.

absent

The brand is not mentioned. The default state for every check that doesn't match a detector.

5. Source type

A citation can come from two very different places, and they're worth different amounts:

url_cited — live retrieval

The engine fetched the page at query time (Perplexity, ChatGPT with browsing, Gemini with search grounding). This is the highest-quality citation: it proves your content is being ranked by the engine's retrieval layer. Optimize structure, schema, and llms.txt to win more of these.

brand_memory — training recall

The engine mentioned the brand from its training data, without fetching a URL. This is still valuable — it means your brand is in the model's latent knowledge — but it's a lagging indicator. Influenced by long-term mentions across the web, not by anything you ship this week.

not_found

No mention of the brand in the response. Counted in the denominator of the citation-rate metric.

6. Aggregation and privacy

Raw checks are stored per-site in ai_citation_checks and rolled up daily into citation_trends. Public endpoints (/api/public/state-of-ai, /api/public/proof-study) only ever read aggregated columns:

Per-engine counts, citation rates, avg confidence, avg position
Distribution over mention types and source types
Distinct cited-site counts (no site ids, just counts)

Per-engine stats in public reports require a minimum of 20 checks before the engine appears. The proof correlation study requires at least 6 paired sites before publishing a number. These thresholds are deliberate conservatism — a single dramatic case study is cheap; a real population correlation requires real sample size.

What we don't do

We don't scrape training data. Every check is a live API call.
We don't normalize, smooth, or weight numbers before publishing. If a bucket is noisy, it's noisy.
We don't use sentiment analysis for anything except the negative mention type. Sentiment models are opinionated; we'd rather publish raw counts and let readers draw conclusions.
We don't change methodology quietly. Updates are versioned here, and reports generated under the old methodology stay available.

Replicate our numbers

You can hit the public aggregation endpoints directly:

curl https://aeonic.pro/api/public/state-of-ai
curl https://aeonic.pro/api/public/proof-study

Both responses include generatedAt, full sample sizes, and a methodology string. For custom cuts (by industry, region, time window) reach out to press@aeonic.pro.

Principles

Real queries, real responses. Every check calls the actual API of ChatGPT, Claude, Perplexity, or Gemini — no synthetic prompts, no scraped training data.
Queries come from customers.We don't invent keywords for our reports. Queries are the same ones Aeonic customers track against their own brands.
Aggregate only. Individual checks are private to the customer who owns them. Public endpoints expose only aggregates; per-site identifiers are never surfaced.
Pre-registered methodology. This page is the spec. When we update the pipeline we version this doc and note the change.

1. Query generation

Queries come from three sources, in priority order:

Customer-tracked keywords. Every site adds a keyword set (either manually or via our keyword intelligence pipeline from GSC data). Each keyword is expanded into ~3–5 natural-language variations — the phrasings a human would actually type into a chatbot.
Topic cluster expansion.Clusters found by our NLP clustering pipeline generate comparison and recommendation queries ("best X for Y", "alternatives to Z").
Competitor gap queries.When a competitor is cited and the customer isn't, we re-run the query against all four engines for a fair comparison.

2. Engine calls

Each query is sent to each configured engine through its official API with a fixed, versioned prompt template. We currently measure:

ChatGPT

OpenAI API · gpt-4 class model, browsing enabled where supported

Claude

Anthropic API · Claude 3.5 / 4 class model

Perplexity

Perplexity API · sonar-class model with live web retrieval

Gemini

Google AI Studio API · latest Gemini flagship

3. Brand detection

For every stored response, we ask two questions: is the brand mentioned? and how is it mentioned?

Exact-name match. First pass: does the brand name or any registered alias appear in the response, case insensitive, with word-boundary matching? Aliases are configured per site to catch common misspellings and alternate brand forms.
URL match.If the response contains a link to any page on the customer's domain, that's a higher-confidence signal than an unlinked mention.
Confidence scoring. Each detection gets a citation_confidence between 0 and 1. Factors: exact vs. fuzzy match, URL vs. unlinked, unambiguous vs. generic term, and surrounding sentiment.
Position.Where in the response does the mention occur? We normalize to a 0–1 scale (0 = top, 1 = bottom). Top-of-response citations carry more weight in the aggregate reports because they're more likely to be read.

4. Mention classification

Mentions are classified into five discrete types, stored on the check row as mention_type:

recommended

The engine explicitly recommends the brand as the answer. Highest-value citation.

mentioned

The brand is listed among options but not explicitly recommended.

compared

The brand is named in a comparison against a peer (still valuable; you're in the consideration set).

negative

The brand is mentioned negatively — lost customers, poor reviews, limitations. Still captured, still counts against you.

absent

The brand is not mentioned. The default state for every check that doesn't match a detector.

5. Source type

A citation can come from two very different places, and they're worth different amounts:

url_cited — live retrieval

brand_memory — training recall

not_found

No mention of the brand in the response. Counted in the denominator of the citation-rate metric.

6. Aggregation and privacy

Per-engine counts, citation rates, avg confidence, avg position
Distribution over mention types and source types
Distinct cited-site counts (no site ids, just counts)

What we don't do

We don't scrape training data. Every check is a live API call.
We don't normalize, smooth, or weight numbers before publishing. If a bucket is noisy, it's noisy.
We don't use sentiment analysis for anything except the negative mention type. Sentiment models are opinionated; we'd rather publish raw counts and let readers draw conclusions.
We don't change methodology quietly. Updates are versioned here, and reports generated under the old methodology stay available.

Replicate our numbers

You can hit the public aggregation endpoints directly:

curl https://aeonic.pro/api/public/state-of-ai
curl https://aeonic.pro/api/public/proof-study

Both responses include generatedAt, full sample sizes, and a methodology string. For custom cuts (by industry, region, time window) reach out to press@aeonic.pro.

Principles

1. Query generation

2. Engine calls

3. Brand detection

4. Mention classification

5. Source type

6. Aggregation and privacy

What we don't do

Replicate our numbers

See also

How citation detection works

Principles

1. Query generation

2. Engine calls

3. Brand detection

4. Mention classification

5. Source type

6. Aggregation and privacy

What we don't do

Replicate our numbers

See also