Get Citation Audit
Bottom Line

Quick-reference glossary of 60+ AI visibility terms organized by category.

Reference Article 13 of 14

Reputation Glossary

A plain-language reference for every key term used across the AI Advisory Learn series.

This glossary defines the terms you will encounter throughout the AI Advisory Learn series. Each definition is written for non-technical readers and includes context about why the term matters for your brand's AI visibility. Terms are organized by category, and where relevant, each entry points to the article in this series where the concept is discussed in detail.

Where this fits: This is a reference article in the AI Advisory Learn series. You can use it as a standalone resource or as a companion to any other article in the series. If you encounter an unfamiliar term while reading, check here for a clear explanation.

How to Use This Glossary

Use the category links below to jump to a specific section, or press Ctrl+F (Cmd+F on Mac) to search for any term. Each definition includes plain-language context and links back to the full article where the concept is covered in depth. If you are reading another article in the series and encounter an unfamiliar term, bookmark this page as your quick-reference companion.

1. AI and Language Model Fundamentals

Category at a Glance

  • LLMs and Generative AI — The engines behind AI-generated responses your customers see
  • Training Data and Knowledge Cutoffs — Why some brands appear in AI answers and others do not
  • Hallucination and Grounding — How AI engines verify facts and why source quality matters
  • Fine-Tuning and Inference — The processes that shape what AI says about your brand in real time

Large Language Model (LLM)

A large language model is an AI system trained on enormous amounts of text data to understand and generate human-like language. LLMs work by predicting what word or phrase comes next based on patterns learned from their training data. They do not "know" things the way humans do — they produce responses based on statistical patterns in the text they were trained on. Examples include OpenAI's GPT series (which powers ChatGPT), Google's Gemini, and Anthropic's Claude. These models are what power the AI engines discussed throughout this series.

Discussed in: How AI Engines Select Sources

Generative AI

Generative AI refers to AI systems that can create new content — text, images, video, audio, and code — based on patterns learned from training data. Unlike traditional software that follows rules to process existing data, generative AI produces original output in response to user prompts. When someone asks ChatGPT a question and it writes a multi-paragraph response, that response is generated on the spot — it is not retrieved from a database of pre-written answers.

Natural Language Processing (NLP)

Natural Language Processing is the branch of AI focused on enabling computers to understand, interpret, and generate human language. NLP powers the technology behind translation services, sentiment analysis, chatbots, and search engines. When an AI engine reads a review of your product and determines whether the reviewer's experience was positive or negative, it is using NLP to interpret the meaning of the text.

Discussed in: Review Platforms & Ratings

Training Data

Training data is the collection of text, documents, and web content that an AI model learns from during its development. Think of it as the textbook that the AI studied before being deployed. For large language models, training data typically includes billions of pages of web content, books, articles, and other text sources. The quality and breadth of training data directly affects what the model "knows" and how it responds. If your brand's information was not well-represented in the training data, the model may not know about you or may have incomplete information.

Discussed in: How AI Engines Select Sources

Knowledge Cutoff

A knowledge cutoff is the date beyond which an AI model's training data does not extend. For example, if a model has a knowledge cutoff of June 2024, it was trained on text published up to that date. Events, products, or companies that appeared only after this date will not be in the model's training data. However, many AI engines now supplement their training data with real-time web search (as ChatGPT does with its browse mode), which means a knowledge cutoff does not necessarily prevent your brand from being cited.

Discussed in: How AI Engines Select Sources

Tokens and Tokenization

Tokens are the basic units that AI models use to process text. A token might be a whole word, part of a word, or a punctuation mark. When an AI model reads your content, it breaks the text into tokens and processes them. The concept matters for AI visibility because AI models have a limited number of tokens they can process at once (called a context window). Content that is clearly structured and efficiently organized is easier for AI models to process and cite.

Hallucination

In AI terminology, a hallucination occurs when an AI model generates information that sounds plausible but is factually incorrect. This can include citing sources that do not exist, stating incorrect statistics, or confidently presenting false information as true. Hallucinations happen because AI models predict text based on patterns rather than verifying facts against reality. This is one reason why AI engines increasingly rely on real-time retrieval from trusted sources — it reduces the risk of hallucinating by grounding responses in actual, current content.

Discussed in: How AI Engines Select Sources

Grounding

Grounding is the process of anchoring an AI model's responses to verifiable information from external sources. Instead of generating answers purely from its training data (which may be outdated or incomplete), a grounded AI system retrieves current information from the web or a database and uses that information to construct its response. Grounding dramatically reduces hallucinations and is the mechanism behind features like ChatGPT's browse mode and Perplexity's real-time search.

Discussed in: How AI Engines Select Sources

Fine-Tuning

Fine-tuning is the process of taking a pre-trained AI model and training it further on a smaller, specialized dataset to improve its performance for a specific task or domain. For example, a general-purpose language model might be fine-tuned on medical literature to make it better at answering healthcare questions. Fine-tuning affects AI visibility because it means different AI engines may have different strengths and knowledge areas depending on how they were fine-tuned.

Inference

Inference is the process of an AI model generating a response to a prompt. When you type a question into ChatGPT and it produces an answer, that generation process is called inference. Unlike training (which happens once and takes weeks or months), inference happens in real time every time someone uses the model.

2. AI Search and Retrieval

Commonly Confused Terms

GEO vs. AEO: These terms are often used interchangeably, but GEO (Generative Engine Optimization) is the broader practice of optimizing for all AI-generated responses, while AEO (Answer Engine Optimization) focuses specifically on structuring content for direct-answer formats. AI Citation vs. Brand Mention: A citation is a specific source reference with a link in an AI response; a brand mention is any reference to your brand name in online content, whether or not a link is included. Both matter, but they are measured differently.

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation is a technique that combines an AI model's language generation capabilities with real-time information retrieval from external sources. When you ask Perplexity a question, it searches the web for relevant pages, reads them, and then generates a response based on what it found. This is RAG in action. RAG is the reason that maintaining fresh, well-structured, and accessible content matters — RAG-powered engines actively search for and read your content when generating responses.

Discussed in: How AI Engines Select Sources

AI Overview (Google)

An AI Overview is a Google Search feature that displays an AI-generated summary at the top of search results. Instead of just showing a list of links, Google generates a synthesized answer to the query and displays it prominently. AI Overviews now appear in over 60% of all Google searches, and 85% of the citations they include come from content published within the last two years.

Discussed in: Core Ranking Signals Explained, Freshness & Update Strategy

Answer Engine

An answer engine is an AI-powered system that directly answers user questions rather than returning a list of links for the user to browse. Perplexity is the clearest example of a pure answer engine. ChatGPT, Google AI Overviews, and Gemini also function as answer engines. The distinction from traditional search engines is critical: in a traditional search engine, your brand can appear as one of ten links. In an answer engine, your brand is either mentioned in the response or it is absent entirely.

Generative Engine Optimization (GEO)

Generative Engine Optimization is the practice of optimizing your content, website, and brand presence to appear in AI-generated responses. GEO is similar in spirit to SEO (search engine optimization) but targets AI engines instead of traditional search engines. The key difference is that traditional SEO focuses on ranking your page in a list, while GEO focuses on getting your brand mentioned, cited, or recommended in an AI-written response. A study published at the KDD 2024 conference showed that GEO techniques can boost visibility in generative engine responses by up to 40%.

Answer Engine Optimization (AEO)

Answer Engine Optimization is closely related to GEO and is sometimes used interchangeably. AEO specifically focuses on structuring your content to provide clear, direct answers that AI engines can easily extract and cite. This includes using question-and-answer formats, leading with direct answers before providing detail, and formatting content in digestible passages.

Discussed in: Content That AI Trusts

AI Citation

An AI citation occurs when an AI engine references a specific source in its response. For example, when Perplexity answers a question and includes a numbered reference to your website, that is an AI citation. AI citations are the primary metric for measuring AI visibility — they indicate that an AI engine considered your content trustworthy and relevant enough to reference.

Citation Rate

Citation rate measures how frequently a brand or domain is cited in AI-generated responses for queries relevant to its industry. A higher citation rate means the brand appears more often when potential customers ask AI engines questions about the brand's category. Citation rates can be tracked using specialized AI visibility monitoring tools.

Citation Drift

Citation drift refers to the natural fluctuation in which sources AI engines cite over time. Research shows 40-60% monthly drift in AI citation patterns, meaning the sources cited for the same query can change substantially from month to month. This is why ongoing optimization — rather than a one-time effort — is necessary for maintaining AI visibility.

Discussed in: Freshness & Update Strategy

Source Consensus

Source consensus occurs when multiple independent sources agree on the same information about a brand, product, or topic. AI engines treat information with high source consensus as more trustworthy than information that appears in only one place. If five different review sites, three industry publications, and your Wikipedia article all describe your company consistently, AI engines are more confident in presenting that information.

Discussed in: Core Ranking Signals Explained, Third-Party Validation

Vector Embedding

A vector embedding is a mathematical representation of a word, phrase, or piece of content as a series of numbers that captures its meaning. AI models use vector embeddings to understand how similar different pieces of content are to each other and to user queries. When an AI engine evaluates whether your content is relevant to a question, it compares the vector embeddings of the question and your content. Content that is semantically close to the question — covering the same concepts in depth — will be more likely to be selected.

Discussed in: Query Intent & Brand Matching

3. Brand and Reputation

Category at a Glance

  • E-E-A-T — The quality framework that determines whether AI engines trust your content enough to cite it
  • Entities and Disambiguation — How AI engines identify your brand and distinguish it from others with similar names
  • Sentiment and Share of Voice — The metrics that reveal how AI engines perceive and position your brand versus competitors

E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness)

E-E-A-T is Google's framework for evaluating content quality. It stands for Experience (has the creator used or experienced what they are writing about?), Expertise (does the creator have knowledge or credentials in the field?), Authoritativeness (is the creator or website recognized as a go-to source?), and Trustworthiness (is the content accurate, honest, and safe?). While E-E-A-T was originally a Google concept, all AI engines implicitly evaluate similar factors when deciding which sources to cite. Research shows that 96% of AI Overview citations come from sources with strong E-E-A-T signals.

Discussed in: Core Ranking Signals Explained, Content That AI Trusts

Brand Mention

A brand mention is any reference to your brand name in online content — whether or not it includes a hyperlink to your website. Recent research has found that brand mentions correlate three times more strongly with AI visibility than backlinks (0.664 vs 0.218 correlation), making them one of the most important factors for AI citation rates.

Discussed in: Backlink Authority Building

Entity

In the context of knowledge graphs and AI, an entity is a distinct, identifiable thing — a person, company, product, place, or concept — that can be referenced and described with structured data. When AI engines try to understand "which Mercury?" in a query, they are performing entity disambiguation — determining which entity the user is referring to. Having your brand established as a recognized entity in knowledge graphs like Wikidata and Google's Knowledge Graph makes it much easier for AI engines to understand and reference your brand correctly.

Discussed in: Wikipedia & Knowledge Graphs, Third-Party Validation

Entity Disambiguation

Entity disambiguation is the process by which AI engines determine which specific entity a name or term refers to when the same name could mean multiple things. For example, "Apple" could refer to the technology company, the fruit, or Apple Records. AI engines use context clues and knowledge graph data to resolve these ambiguities. Maintaining clear, consistent entity information across platforms helps AI engines correctly identify your brand.

Discussed in: Wikipedia & Knowledge Graphs

Named Entity Recognition (NER)

Named Entity Recognition is the NLP technique that AI engines use to identify and categorize important proper nouns in text — company names, product names, person names, locations, and other specific entities. When an AI engine processes a web page about your industry, NER helps it identify which brands and products are being discussed. Research has found that pages with 15 or more recognized entities show 4.8 times higher selection probability in AI citations.

Digital Footprint

A digital footprint is the totality of your brand's presence across the internet — your website, social media profiles, directory listings, review platform entries, news mentions, forum discussions, and every other place where your brand appears online. AI engines build their understanding of your brand by aggregating information from your entire digital footprint. The broader and more consistent your footprint, the more confident AI engines are in citing you.

Online Reputation Management (ORM)

Online Reputation Management is the practice of monitoring and influencing how your brand is perceived across the internet. In the AI era, ORM extends beyond traditional search results to include how AI engines describe and recommend your brand. Effective ORM now requires monitoring AI-generated responses, maintaining consistent information across platforms, and building the signals (reviews, mentions, content quality) that influence AI citation behavior.

Sentiment Analysis

Sentiment analysis is the automated process of determining whether text expresses a positive, negative, or neutral opinion. AI engines use sentiment analysis when processing reviews, forum discussions, and news coverage about your brand. Strong positive sentiment across multiple sources strengthens AI confidence in recommending your brand, while negative sentiment may cause AI engines to either avoid mentioning you or to include caveats.

Discussed in: Review Platforms & Ratings

Share of Voice (AI Context)

Share of voice in the AI context measures how often your brand is mentioned or cited in AI-generated responses compared to your competitors for the same set of queries. If your category has five major competitors and your brand appears in 30% of relevant AI responses, your share of voice is 30%. Tracking share of voice over time reveals whether your AI visibility efforts are improving your competitive position.

4. Technical SEO and Structured Data

Category at a Glance

  • Schema Markup and JSON-LD — The code-level labels that make your content machine-readable for AI crawlers
  • Knowledge Graphs and Wikidata — The structured databases AI engines query to verify your brand identity
  • robots.txt and AI Crawlers — The access controls that determine whether AI engines can read your site at all

Schema.org / Schema Markup

Schema.org is a standardized vocabulary of structured data markup that helps AI engines and search engines understand the content on web pages. By adding Schema.org markup to your pages, you explicitly tell AI engines what your content is about — for example, that a page describes an Organization with a specific name, address, and founding date, or that a page contains an FAQ with specific questions and answers. Pages with Schema.org markup are 36% more likely to appear in AI-generated summaries.

Discussed in: Technical Optimization for AI

JSON-LD

JSON-LD (JavaScript Object Notation for Linked Data) is the format recommended by Google and other platforms for implementing Schema.org markup on web pages. It is added as a script block in the HTML of your page and does not affect how the page looks to visitors — it only provides machine-readable data for AI engines and search crawlers. JSON-LD is the most widely supported and recommended format for structured data.

Discussed in: Technical Optimization for AI

Knowledge Graph

A knowledge graph is a database that stores information about entities (people, companies, products, places) and the relationships between them in a structured, machine-readable format. Google's Knowledge Graph, for example, contains billions of entries and powers Knowledge Panels in search results, AI Overviews, and other features. Being represented accurately in knowledge graphs is fundamental to AI visibility because AI engines query these graphs when generating responses.

Discussed in: Wikipedia & Knowledge Graphs

Wikidata

Wikidata is a free, open knowledge base that serves as the structured data backbone for Wikipedia and is widely used by AI engines for entity verification. Wikidata contains over 112 million entries, each identified by a unique QID (for example, Q95 for Google). Creating a Wikidata entry for your brand establishes it as a recognized entity and provides a structured, machine-readable identity that AI engines can reference directly.

Discussed in: Wikipedia & Knowledge Graphs

Structured Data

Structured data is information organized in a standardized, machine-readable format. Unlike the freeform text on a web page (which requires AI to interpret meaning), structured data explicitly labels what each piece of information represents. For example, structured data can explicitly state that "John Smith" is the "author" of a page, that "$49.99" is the "price" of a product, or that "4.7 out of 5" is the "aggregate rating" based on "523 reviews." AI engines can process structured data far more accurately than unstructured text.

Canonical URL

A canonical URL is the preferred version of a web page when the same content is accessible at multiple addresses. Setting a canonical URL tells search engines and AI crawlers which version of the page should be treated as the original. This prevents duplicate content issues that could dilute your page's authority signals.

robots.txt

The robots.txt file is a text file at the root of your website that tells web crawlers which pages they are allowed and not allowed to access. For AI visibility, your robots.txt file must allow access to AI-specific crawlers: GPTBot (OpenAI), PerplexityBot, ClaudeBot (Anthropic), and Google-Extended (Google AI). If your robots.txt blocks these crawlers, AI engines cannot read your content and cannot cite you.

Discussed in: Technical Optimization for AI

AI Crawler / AI Bot

An AI crawler (or AI bot) is an automated program that AI companies use to visit and read web pages. Unlike traditional search engine crawlers that index pages for search results, AI crawlers collect content that may be used for AI training data or real-time retrieval. The major AI crawlers include GPTBot and OAI-SearchBot (OpenAI), PerplexityBot (Perplexity), ClaudeBot (Anthropic), and Google-Extended (Google). Ensuring these crawlers can access your content is a prerequisite for AI visibility.

Discussed in: Technical Optimization for AI

Crawl Budget

Crawl budget refers to the number of pages that a web crawler will visit on your site within a given time period. AI crawlers, like search engine crawlers, allocate limited resources to each website. Sites with fast load times, clean architecture, and proper sitemaps make it easier for crawlers to use their budget efficiently, ensuring your most important pages are crawled regularly.

Discussed in: Technical Optimization for AI, Freshness & Update Strategy

Indexing

Indexing is the process by which a search engine or AI system stores and organizes the content it has crawled. A page must be indexed before it can appear in search results or be referenced by AI engines. You can check whether your pages are indexed using Google Search Console or Bing Webmaster Tools. Pages that are blocked by robots.txt, marked with a noindex tag, or have other technical issues may fail to be indexed.

5. Content and Strategy

Content Freshness

Content freshness refers to how recently a piece of content was published or last updated. AI engines strongly prefer fresh content — pages updated within two months are 28% more likely to be cited in AI responses than older content, and 76.4% of ChatGPT's most-cited pages were updated within the last 30 days.

Discussed in: Freshness & Update Strategy

Semantic Completeness

Semantic completeness measures how thoroughly a piece of content covers all aspects of its topic. Research has found that content scoring 8.5 out of 10 or higher on semantic completeness is 4.2 times more likely to be cited by AI engines. A page about "CRM software" that covers features, pricing, comparisons, use cases, implementation, and common questions scores higher on semantic completeness than a page that only covers features and pricing.

Discussed in: Content That AI Trusts, Core Ranking Signals Explained

Content Decay

Content decay is the gradual decline in a page's visibility and performance over time. In the AI era, content decays faster because AI engines prioritize fresh sources. A page can still rank in traditional search results while being completely absent from AI-generated responses — creating a hidden form of decay that is only visible if you track AI citations separately.

Discussed in: Freshness & Update Strategy

Query Deserves Freshness (QDF)

Query Deserves Freshness is an algorithmic principle that temporarily increases the weight given to content freshness when a topic is trending or rapidly evolving. When a spike in published articles and user searches signals that something significant is happening with a topic, AI engines shift their preference toward the most recently published content about that topic.

Discussed in: Freshness & Update Strategy

SERP (Search Engine Results Page)

A SERP is the page displayed by a search engine in response to a query. Modern SERPs include traditional organic listings, paid advertisements, featured snippets, Knowledge Panels, local results, and increasingly, AI-generated summaries (AI Overviews). Visibility on SERPs — and particularly within AI Overview sections — is a key indicator of both traditional and AI visibility.

Long-Tail Keyword

A long-tail keyword is a specific, multi-word search phrase (usually three or more words) that has lower search volume but higher specificity. For example, "best CRM for real estate agents under 10 employees" is a long-tail keyword compared to "best CRM." Long-tail queries are particularly important for AI visibility because they often align with the conversational questions people ask AI engines.

Topical Authority

Topical authority is the degree to which your website is recognized as a comprehensive, authoritative source on a particular topic. Rather than creating isolated pages about many different subjects, building topical authority means creating a deep, interconnected set of content that thoroughly covers all aspects of a topic. AI engines evaluate topical authority when deciding which sources to cite — sites with demonstrated depth in a subject area are more likely to be selected.

Internal Linking

Internal linking is the practice of linking from one page on your website to another page on your website. Strategic internal linking helps AI crawlers discover your content, understand the relationships between your pages, and evaluate your topical authority. Research has shown that adding three to five contextually relevant internal links can produce a 100-150% boost in traffic from AI search tools.

Discussed in: Backlink Authority Building

Co-Citation and Co-Occurrence

Co-citation occurs when two brands or sources are mentioned together in the same piece of content. Co-occurrence is the broader pattern of your brand appearing alongside specific topics, concepts, or other brands across the web. AI models learn from co-occurrence patterns — when your brand repeatedly appears in discussions about a specific topic, the model builds an association between your brand and that topic.

Discussed in: Backlink Authority Building

6. Platforms and Tools

Google Business Profile

Google Business Profile (formerly Google My Business) is Google's free platform for managing how your business appears in Google Search and Maps. It includes your business name, address, hours, photos, reviews, and other information. Google Business Profile is essential for local AI visibility because it feeds directly into Google AI Overviews for local and commercial queries.

NAP Consistency

NAP stands for Name, Address, and Phone number. NAP consistency means ensuring that your business's name, address, and phone number are listed identically across every online platform where they appear — directories, review sites, social profiles, and your website. Inconsistencies in NAP data reduce the confidence that AI engines have in your business information and can hurt your AI visibility.

Discussed in: Third-Party Validation

Data Aggregator

A data aggregator is a service that collects your business information and distributes it to hundreds or thousands of online directories, mapping services, and platforms. Major data aggregators include Yext, BrightLocal, and Data Axle. Using a data aggregator helps ensure consistent business information across the web, which in turn supports the cross-platform consistency that AI engines evaluate.

Discussed in: Third-Party Validation

Third-Party Validation

Third-party validation is the process by which AI engines confirm your brand's credibility by checking what independent sources say about you. Rather than relying solely on your own website, AI engines cross-reference information from directories, review platforms, news outlets, and knowledge bases. Research shows that brands are 6.5 times more likely to be cited through third-party sources than through their own domains.

Discussed in: Third-Party Validation

Earned Media

Earned media is press coverage, reviews, mentions, and other publicity that you receive organically rather than paying for. Research has found that 82% of AI citations come from earned media, making it the dominant source of AI-referenced brand information. Earned media includes news articles, editorial reviews, industry analysis, and organic mentions in relevant publications.

Discussed in: Industry Publications & PR

Domain Rating / Domain Authority

Domain Rating (DR, an Ahrefs metric) and Domain Authority (DA, a Moz metric) are third-party scores that estimate a website's overall credibility based on its backlink profile. While AI engines do not use these scores directly, the underlying factors they measure — the quality and quantity of other sites linking to you — do influence AI citation behavior. Research shows a threshold effect: sites with DR above 88 receive dramatically more AI citations than those below.

Discussed in: Backlink Authority Building

7. Metrics and Measurement

Category at a Glance

  • AI Visibility — The umbrella metric that captures how present your brand is across AI-generated responses
  • CTR and Conversion Rate — The business-impact metrics that connect AI citations to revenue
  • DR Threshold and Content Health — The benchmarks that tell you when your authority is strong enough to trigger AI citation gains

AI Visibility

AI visibility is the overall measure of how present and prominent your brand is in AI-generated responses. It encompasses citation rates, share of voice, sentiment in AI responses, and the breadth of queries for which your brand appears. AI visibility is the central metric that the strategies in this entire series aim to improve.

Click-Through Rate (CTR)

Click-through rate is the percentage of people who click on a link after seeing it in search results. In the context of AI visibility, CTR also applies to the citation links that AI engines include in their responses. Research shows that pages cited in AI responses earn 35% more organic clicks than uncited competitors, demonstrating the commercial value of AI citations.

Conversion Rate

Conversion rate is the percentage of website visitors who complete a desired action — purchasing a product, filling out a form, signing up for a trial. Tracking conversion rates from AI-referred traffic helps you measure the business impact of your AI visibility efforts, not just the traffic impact.

Domain Rating (DR) Threshold

Based on research into AI citation patterns, there is a practical threshold effect where sites with DR 88-100 receive dramatically more AI citations (over 6,000 on average) than sites with lower ratings. Understanding this threshold helps you assess whether domain authority improvements are likely to meaningfully affect your AI visibility.

Discussed in: Backlink Authority Building

Content Health Score

A content health score is a composite metric that evaluates the overall condition of a piece of content by aggregating factors like traffic trend, ranking position, backlink profile, content age, and AI citation rate. Content health scores help prioritize which pages most urgently need refreshing.

Discussed in: Freshness & Update Strategy

Using This Glossary

This glossary covers the key terms used across all fourteen articles in the AI Advisory Learn series. For the full context and strategy behind any concept, follow the article links included with each definition.

If you are new to AI visibility, we recommend starting with:

Sources

Definitions in this glossary are drawn from the research and sources cited across the full AI Advisory Learn series. For specific citations supporting individual statistics and findings, refer to the Sources section of the relevant article.

Next in the Reference Layer

Explore practical tools to monitor and improve your AI visibility.