Glossary — AI Knowledge Signal

A

Term	Direct Answer	Why it Matters
Accessibility	Whether AI crawlers, search engines, and answer systems can technically reach and read your website content.	If content is blocked, hidden, or poorly served, AI systems may never see it.
AI Crawler	A bot used by AI companies, search engines, or data providers to discover, visit, and collect web content.	Crawlers are often the first step in whether your content can enter AI search, retrieval, or training pipelines. If they can't access or interpret it, you effectively don't exist to AI systems.
AI Knowledge Signal	The set of signals your organisation sends across the web that help AI systems understand, trust, retrieve, and cite you.	Strong knowledge signals increase the chance your organisation is accurately represented in AI-generated answers.
AI Visibility	How often and how accurately your organisation appears in AI-generated responses.	Visibility is becoming as important as traditional search ranking because users increasingly receive answers without clicking through to websites.
Answer Engine	A system that gives users a direct answer rather than only a list of search results.	ChatGPT, Perplexity, Gemini, Copilot, and Google AI Overviews are shifting discovery from ranked links to generated answers.
Answer Search Optimisation	The practice of optimising content so AI answer engines can find, understand, and use it in responses.	It moves beyond traditional SEO by focusing on answer inclusion, representation, and citation — not just ranking on a search results page.
Authority	The perceived credibility, expertise, and trustworthiness of your organisation, website, or content.	AI systems are more likely to rely on sources that appear authoritative, consistent, and well-supported.

C

Term	Direct Answer	Why it Matters
Citation Probability	The likelihood that an AI system will cite, reference, or draw from your content in an answer.	Higher citation probability means greater influence, visibility, and trust in AI-mediated discovery.
Common Crawl	A large, publicly available archive of web crawl data collected from across the internet.	Many AI datasets and research pipelines have used Common Crawl as a raw input, so being present and well-structured on the open web can affect downstream AI visibility.
Conversational Search	A mode of search where users ask natural-language questions and receive synthesised answers rather than a list of keyword-matched results.	Discovery is shifting from "search and click" to "ask and receive" — meaning organisations must optimise for answer inclusion, not just search ranking. As conversational search becomes the dominant interface for knowledge retrieval, organisations whose content is structured for synthesis — rather than for click-through — gain representation advantages that traditional SEO does not address.
Corpus Survival Likelihood Coined term	The estimated probability that a piece of content survives AI training-data quality filters rather than being removed, ignored, or down-weighted before model training.	Content can be crawled but still fail to influence AI systems if it is filtered out for quality, duplication, low authority, poor structure, or weak provenance. The AI Knowledge Signal Audit scores content against the dimensions that determine this likelihood.
Crawlability	Whether bots can technically access, navigate, and index your website.	Poor crawlability means important pages may be missed, even if the content itself is strong.

D

Term	Direct Answer	Why it Matters
Direct Answer	A clear, concise answer to a specific question, usually placed near the top of a page or section.	AI systems favour content that answers questions directly and can be extracted cleanly.

E

Term	Direct Answer	Why it Matters
Entity	A distinct person, organisation, place, product, concept, or topic that machines can identify and connect to other information.	Clear entity signals help AI systems understand who you are, what you do, and how you relate to your market.
Epistemic Presence Strategy Coined term	The deliberate design of organisational knowledge so it remains visible, trusted, and influential inside machine-mediated knowledge systems.	Reframes content strategy around AI-era influence: not just publishing information, but ensuring that information survives, is understood, and shapes answers. Distinct from SEO (optimising for search ranking) and authority optimisation (optimising for credibility signals). Epistemic presence strategy targets accurate representation within AI training corpora and retrieval pipelines.
Epistemic Risk Coined term	The risk that organisational knowledge changes meaning as it moves through AI systems due to summarisation, distortion, circular authority, provenance loss, or outdated source reuse.	AI systems can misstate, flatten, or recontextualise organisational knowledge — creating reputational, strategic, legal, and trust risks. This risk does not arise from malicious intent. It arises from how machine learning systems ingest, compress, and re-express knowledge during training and retrieval.
Extractability	How easily AI systems can pull useful meaning, facts, answers, entities, and relationships from your content.	If content is vague, buried, visual-only, or poorly structured, AI systems may ignore or misunderstand it.

G

Term	Direct Answer	Why it Matters
Generative Engine Optimisation (GEO)	The practice of structuring digital content and managing online presence to improve visibility in responses generated by generative AI systems — including ChatGPT, Google Gemini, Claude, and Perplexity AI. Related terms: answer engine optimisation (AEO), artificial intelligence optimisation (AIO).	Where SEO targets ranking in search result pages, GEO targets accurate representation in AI-synthesised answers. GEO is a whole-of-entity strategy — AI systems synthesise signals from your entire digital presence, not a single URL. See also: Wikipedia: Generative engine optimization

J

Term	Direct Answer	Why it Matters
JSON-LD	A structured data format used to describe webpages, organisations, articles, FAQs, products, and other entities in machine-readable form.	JSON-LD helps machines interpret your content more accurately and can strengthen your schema markup.

K

Term	Direct Answer	Why it Matters
Knowledge Base	A structured collection of important information, definitions, FAQs, services, evidence, and organisational knowledge.	A strong knowledge base gives AI systems a clear source of truth about your organisation.

L

Term	Direct Answer	Why it Matters
Large Language Model (LLM)	An AI model trained on large volumes of text and other data to generate, summarise, classify, and reason with language.	LLMs increasingly mediate how users discover, interpret, and trust organisational information.
llms.txt	A proposed website file that points AI systems to useful pages, policies, or machine-readable guidance about your content.	It may become a useful signal for guiding AI crawlers and answer engines to high-value content.

M

Term	Direct Answer	Why it Matters
Machine Knowledge Readiness (MKR)	The degree to which an organisation's public web presence exposes the technical, structural, and semantic signals that support AI-mediated discovery, retrieval, citation, and representation.	MKR reframes web presence around what machines can read, not just what humans can see. An organisation can rank well in traditional search yet have low MKR — invisible to the AI systems through which audiences increasingly discover and decide. Introduced in the AI Knowledge Signal Framework as the organisational-readiness counterpart to Corpus Survival Likelihood (which scores individual content).
Measurement	Tracking how your organisation appears across AI systems over time.	Without measurement, you cannot tell whether your AI visibility is improving, declining, or being captured by competitors.
Metadata	Descriptive information about a webpage, such as title, description, author, date, topic, and structured tags.	Metadata helps search engines and AI systems understand what a page is about.
Model Collapse	The progressive degradation of AI model quality that occurs when models are trained on AI-generated content rather than original human knowledge. As AI-generated text enters the web at scale, the distinction between primary and derivative knowledge erodes.	Model collapse is a structural risk for AI training pipelines: when Common Crawl fills with AI-generated content, each generation of models trains on the distorted output of the last. Organisations that publish original, epistemically rigorous content act as anchors against this degradation. See: Shumailov et al. (2024), Nature, "AI models collapse when trained on recursively generated data."

P

Term	Direct Answer	Why it Matters
Presence	The breadth, depth, and consistency of your organisation's appearance across the web.	AI systems often rely on repeated, corroborated signals from multiple sources.
Prompt Testing	Testing common user questions across AI systems to see how your organisation, competitors, and key topics appear.	It helps identify gaps, inaccuracies, missed citations, competitor visibility, and opportunities for improvement.
Provenance	Information about where content came from, who produced it, when it was created, and how it has changed.	Clear provenance helps AI systems and users assess reliability, authority, and freshness.

R

Term	Direct Answer	Why it Matters
Retrieval	The process where an AI system finds relevant information before generating an answer.	If your content is not retrievable, it is unlikely to influence AI-generated responses.
Retrieval-Augmented Generation (RAG)	An AI pattern where a model retrieves external information before generating an answer.	RAG makes source quality, structure, and retrievability central to whether your content appears in AI answers.
Robots.txt	A website file that gives instructions to crawlers about what they can or cannot access.	It affects whether search, AI, and data crawlers are allowed to visit parts of your site.

S

Term	Direct Answer	Why it Matters
Schema Markup	Structured data added to webpages, often using JSON-LD, to explain entities, pages, products, FAQs, articles, and organisations.	Schema helps machines interpret content more accurately and confidently.
Search Engine Optimisation (SEO)	The practice of improving the visibility and overall performance of websites and web pages in search engine results pages (SERPs). It focuses on increasing the quantity and quality of traffic from unpaid (organic) search results.	Key strategies include creating high-quality content, keyword research, on-page optimisation, improving user experience, and building backlinks. SEO addresses click-through visibility; GEO addresses AI-synthesised representation.
Search Engine Results Page (SERP)	The page of results returned by a search engine in response to a query.	SERPs are where visibility is won or lost — AI systems often learn which sources are authoritative based on what ranks here.
Semantic Search	Search based on meaning and intent rather than exact keyword matching.	It rewards clear concepts, entities, relationships, and context rather than simple keyword repetition.
Sitemap	A file that lists important website pages and helps crawlers discover them.	A good sitemap improves discovery of key content by search engines and AI crawlers.
Source Consistency	The alignment of facts about your organisation across your website, directories, articles, profiles, and third-party sources.	Inconsistent information can reduce trust and cause AI systems to misrepresent your organisation.
Structured Content	Content organised with clear headings, summaries, lists, tables, FAQs, definitions, and logical sections.	Structure makes content easier for AI systems to parse, retrieve, and reuse.

T

Term	Direct Answer	Why it Matters
Training-Time Visibility	The chance your content is included in datasets used to train, fine-tune, or improve AI models.	Content that is filtered out during training may have less influence on model knowledge.
Trust Signal	Evidence that supports the reliability of your content, such as citations, authorship, dates, expertise, external references, and corroboration.	Trust signals help AI systems decide whether your content should be used, cited, ignored, or down-weighted.

V

Term	Direct Answer	Why it Matters
Vector Embedding	A numerical representation of content that captures meaning so machines can compare concepts, documents, and queries.	Embeddings support semantic search and retrieval, so clear content improves how your organisation is matched to user questions.

Attribution When citing the AI Knowledge Signal framework (2026), please reference: Foster-McBride, C. (2026), Digital Human Assistants, and accessed at aiknowledgesignal.io.