SEO for LLMs is Not SEO

Jeff Nolan
Jul 7
9 min read

Updated: Jul 9

SEO does not work for Large Language Model (LLM) AI systems, and content creators who apply SEO tactics to this new world will be disappointed with their results. It is critical to understand that SEO is about getting the click, LLM optimization is about getting your message in a response.

Everyone, B2B and B2C, requires the click to get to the sale. In the SEO model, keywords and placement drive the click. In the new world, the inclusion in the response is what drives the action, leading to a sale.

What Is So Different About LLMs?

The answer to this question is that LLMs are not search engines. LLMs operate as knowledge engines that seek to understand information at a semantic level. The basic disregard of keyword density and link structures means, quite literally, that your brilliant SEO tactics are completely useless for training LLMs.

What is ironic is that the conversational query nature on which LLMs rely first surfaced in search engines. According to various researchers, over 50% of search queries are made in the form of a question. This essential fact drives a lot of SEO today, which is why "what is the best podcaster mic for my iPhone" results in a litany of "The best podcaster mics for 2025" links in Google. SEO is being gamed to the point that search results feature an endless scroll of content structured to game the algorithm, which means it's all the same content.

Search engines understand content at a technical level through meta tags, alt text, domain performance, accessibility, and other factors collectively referred to as Search Engine Results Page (SERP) factors. External signals, known as backlinks, enhance domain authority, leading to higher placement in search engine results. The point is, Google doesn't know or care what the meaning of the underlying content is as long as it meets the criteria for inclusion in the results page. Google and others want you to engage with search results because each click generates revenue, and then conduct related searches to optimize for increased ad placement opportunities.

LLMs differ significantly in their approach and monetization. These services generate narrative responses, and often without citing sources. They don't value content structure or backlinks; LLMs prioritize content clarity, factual accuracy, and contextual relevance. They are trying to understand the intent and context of the user conversation and build on each interaction with the preceding queries and responses to increase relevance to the user.

What You Must Do Differently for LLMs

To thrive in the age of Large Language Models (LLMs) like ChatGPT, Gemini, Grok, and Llama, content creators must shift beyond traditional SEO tactics and embrace strategies that align with how these AI systems process and prioritize information.

Here’s how to adapt your content creation to ensure LLMs surface your work in their narrative responses, while still complementing SEO efforts.

Craft Content as Direct Answers to User Questions:

LLMs prioritize content that mirrors how users ask questions, such as “What’s the best microphone for podcasting on an iPhone?” Instead of keyword-stuffed articles, write clear, conversational responses that address intent. For example, start a blog post with a concise answer: “The best iPhone podcasting mic in 2025 is the Shure MV88, offering plug-and-play simplicity and crisp audio.” Use FAQs or Q&A sections to capture long-tail queries, like “Can I use a USB mic with an iPhone?” This approach not only aligns with the conversational nature of LLMs but also enhances SEO by targeting featured snippets.

Structure Content for Easy Parsing:

LLMs, trained on platforms like Reddit (ChatGPT) and YouTube (Gemini), favor structured formats that simplify data extraction. Use headings, bullet points, and numbered lists to organize content, making it scannable for both AI and humans. For instance, a YouTube video script on “LLM Optimization” should include timestamps (e.g., “2:00 Why Transcripts Matter”) and clear narration verbalizing key points, as Gemini relies heavily on transcripts. On Reddit, format posts with markdown headings (e.g., ## Step 1: Research Questions) to appeal to ChatGPT’s training pipeline. Structured data, like schema markup for FAQs, doubles as an SEO win.

Prioritize Factual Accuracy and Depth:

LLMs filter out speculative or vague content to deliver reliable answers, so anchor your work in verifiable information. For example, a blog claiming “ChatGPT trains on Reddit” should cite the 2024 OpenAI-Reddit partnership. Provide in-depth explanations for complex topics—e.g., a 500-word section on “How Grok Uses X Data”—to increase the chance of being referenced by LLMs. This depth also positions your content as authoritative for SEO, potentially earning backlinks. Avoid clickbait or unverified claims, as LLMs like Claude prioritize curated, high-quality data.

Tailor Content to LLM Training Platforms:

Each LLM draws from specific social media datasets, so customize your approach:

Reddit (ChatGPT): Post detailed, community-driven content in relevant subreddits (e.g., r/SEO). A post titled “How I Optimized My Blog for ChatGPT” with a step-by-step guide and comment engagement increases training data relevance.
X (Grok): Share concise, trending threads with hashtags (e.g., #AI #ContentCreation) and pin them to your profile. For example, tweet: “3 LLM optimization tips: Clear answers, structured posts, factual data. #SEO [blog link].”
YouTube (Gemini): Upload videos with accurate transcripts and verbalized key points. A tutorial like “SEO vs. LLM Optimization 2025” should narrate steps clearly, as Gemini can’t process on-screen text without narration.
Instagram (Llama): Create public visual posts (e.g., infographics on “5 LLM Tips”) with descriptive captions, as Llama trains on public Meta data. By aligning with these platforms, you maximize visibility in LLM responses.

Engage Actively to Amplify Reach:

LLMs like ChatGPT and Grok value dynamic, user-engaged content. Respond to comments on Reddit within 24 hours to enrich discussion data for ChatGPT. Reply to X mentions to boost Grok’s real-time training. On YouTube, encourage viewer comments (e.g., “What’s your LLM strategy?”) to enhance Gemini’s data pool. Engagement signals also improve SEO by increasing dwell time and social shares.

Test and Refine with LLM Tools:

Use tools like DeepEval (deepeval.com) or Promptfoo (promptfoo.dev) to test if your content appears in LLM responses. For example, query ChatGPT with “How to optimize for LLMs?” to check if your blog’s key points are reflected. If not, refine your content with clearer answers or repost on X to align with Grok’s real-time data.

This iterative approach ensures your content stays relevant as LLM training evolves.

By blending these LLM-focused strategies with SEO best practices—like keyword-rich titles and schema markup—you create content that ranks on Google and resonates with AI-driven systems. The key is to think like a user asking questions, not a search engine crawling pages, while leveraging the platforms where LLMs learn.

Use the LLMs to Improve Your Content

An essential step in the content creation process is to ask your favorite LLM to score your draft content against your website using custom criteria that you define. Using the custom GPT feature in ChatGPT:

1) Define your criteria:

Clarity (0–10)
Grammar/Spelling (Pass/Fail or score)
Tone Consistency (on-brand?)
SEO Optimization (meta tags, keyword use)
Readability (Flesch score or grade level)
Call to Action Quality
Compliance (e.g., legal disclaimers present)

2) Create a Custom GPT

Go to chat.openai.com/gpts → “Explore GPTs” → “Create” → walk through the builder. The builder is straightforward; you tell it what to do when a user provides website copy, e.g., "analyze it and return a structured score". Add your scoring criteria defined above. Establish the action you want it to take, e.g., "return a summary and suggest 5 improvements".

3) Add custom actions

Connect it to your website or CMS using APIs
Add file or URL upload support to evaluate live content

A new generation of tools has emerged to test LLM responses, ensuring content aligns with LLM outputs (e.g. cited or reflected in answers.) These tools help verify performance, accuracy, and relevance.

DeepEval: An open-source framework for evaluating LLM outputs, including metrics like answer relevance and contextual recall. Creators can test if their content (e.g., Reddit posts) is reflected in ChatGPT responses by assessing semantic alignment.
- Use: Run tests to compare LLM outputs against expected responses from your content.
- Example: Test if a blog’s key points appear in ChatGPT’s answer to “How to optimize for LLMs?”
Promptfoo: A CLI tool for systematically testing LLM prompts and responses. Creators can input their content (e.g., YouTube transcripts) and check if Gemini generates relevant answers.
- Use: Evaluate response quality across multiple prompts to ensure platform-specific content (e.g., X threads) is cited.
- Example: Test Gemini’s summarization of a YouTube video against your transcript.
OpenAI Evals: A framework for benchmarking LLM accuracy and coherence. Creators can test if their content influences LLM responses by designing custom test cases.
- Use: Create test cases with questions like “What’s the best SEO strategy?” to check if your blog is referenced.
- Example: Assess ChatGPT’s output for alignment with your Reddit post’s advice.
Ragas: An open-source tool for evaluating Retrieval-Augmented Generation (RAG) systems, useful for testing LLM responses in context. Creators can verify if their content (e.g., Instagram posts) is accurately represented in Llama’s answers.
- Use: Measure response relevance and faithfulness to your content’s intent.
- Example: Test Llama’s response to a query about Instagram content strategies.
Chatbot Arena (LMSYS): An open platform for comparing LLM responses via human votes. Creators can input prompts to see if their content shapes responses from models like Grok or ChatGPT, though it’s less automated.
- Use: Manually test responses to gauge if X posts influence Grok’s answers.
- Example: Query “What’s trending on X?” to check if your thread is reflected.

About the Platforms

LLM platforms are expanding their reach on training data, and it is crucial to inventory these platforms for prioritization in your social media strategies. The list of training data sets is far too extensive to list here and will likely be out of date by the week's end.

Here is a summary of the data sets LLMs are training on:

ChatGPT (OpenAI):
- Reddit: Confirmed via 2024 partnership with Reddit, accessing Data API for posts, comments, and discussions to enhance conversational training.
- X: Likely included as part of publicly available internet data; 2025 OpenAI proposals suggest interest in real-time X integration, implying some current use.
- Other Platforms: Public content from Facebook, Instagram, or Telegram may be included in broad web scraping; however, no specific partnerships have been confirmed.
- News-Related Social Data: Partnerships with Axel Springer (Business Insider, Politico) and Associated Press include user comments or shared posts, indirectly providing social media-like data.
Gemini (Google):
- YouTube: Confirmed training on transcripts, audio, and select video data from public or creator-permitted YouTube videos
- Other Google Platforms: Public content from Google Blogs or Google News comments may be included; however, no specific social media partnerships have been confirmed beyond YouTube.
- Web Data: Gemini’s “double check” feature uses Google Search results, which may include social media content (e.g., Reddit, X) indirectly, but not as primary training data.
Grok (xAI):
- X: Confirmed training on public X posts, leveraging xAI’s integration with the platform to access real-time, user-generated content for conversational and trending topics.
- Note: Grok’s focus on X aligns with its mission to provide real-time, truth-seeking responses, making X posts a core training source.
Llama (Meta AI):
- Facebook and Instagram: Meta has stated that its AI models, including Llama, train on public posts and comments from Facebook and Instagram, subject to user privacy settings.
- Note: Meta’s internal platforms provide a vast, proprietary social dataset, reducing reliance on external sources.
Claude (Anthropic):
- No Specific Social Media Platforms Confirmed: Anthropic does not disclose specific social media datasets, citing proprietary training processes. Likely includes public web data, which may encompass Reddit, X, or YouTube content, but no partnerships are confirmed.
- Speculative Sources: Given Anthropic’s focus on safe and interpretable AI, it may use curated social media data (e.g., Reddit discussions, public X posts) filtered for quality and safety.
Mistral (Mistral AI):
- No Specific Social Media Platforms Confirmed: Mistral does not publicly detail its training data. As an open-source-focused model, it likely uses publicly available web data, including social media platforms like Reddit, X, or YouTube, but no specific agreements are known.
- Note: Mistral’s European base may limit data use due to GDPR, potentially restricting social media scraping compared to U.S.-based models.

It's SEO and LLM Optimization - Together

The simple fact is that marketers must add a new toolkit to their chest. Search will be a foundation for marketing programs for the foreseeable future; therefore, abandoning SEO will do irreparable harm to your marketing efforts. However, it is also true that LLMs don't care about your SEO strategy. Re-evaluating your content creation and promotion to incorporate tactics designed to improve your probability of being featured in conversational responses is an essential growth area for 2025-26.

The strategic imperative for search companies is to integrate LLM into search results, and Google is invested in this with its AI Overview card, which resides at the top of search results. You will begin to see immediate benefits from LLM optimization showing up in this Google feature.

SEO is not a dead end, but it is an area of investment with diminished returns, precisely because SEO is a well-traveled skillset within marketing teams. LLM optimization represents the next growth wave for performance marketing efforts.

BONUS: Glossary of Key Terminology for Optimizing Content for LLMs

Conversational Tone: A natural, human-like writing style that mimics how users ask questions, improving LLM compatibility by matching query patterns (e.g., “What are good mics for my iPhone for podcasting in outdoor weather?”).
DeepSearch Mode: A feature in some LLMs (e.g., Grok) that iteratively searches the web for real-time data, prioritizing fresh, relevant content from platforms like X.
Multimodal Training: The process of training LLMs on diverse data types (text, audio, video), as seen in Gemini’s use of YouTube transcripts and audio, enabling video summarization and analysis.
Question-Based Queries: Long-tail search phrases or questions (e.g., “What is noise cancelling in iPhone mics?”) that LLMs prioritize, overlapping with SEO featured snippets and user intent.
Schema Markup: Structured data (e.g., JSON-LD for FAQs, articles) that enhances search engine visibility and helps LLMs parse content for direct answers.
Semantic Clarity: Clear, contextually rich content that LLMs can easily interpret, using structured formats like headings, lists, and Q&A to align with training data from Reddit or YouTube.
Social Media Training Datasets: Platforms like Reddit (ChatGPT), X (Grok), YouTube (Gemini), and Facebook/Instagram (Llama) where LLMs source user-generated content for conversational and multimodal training.
Transcripts: Text versions of video or audio content, critical for Gemini’s YouTube training, as they allow LLMs to process narrated information accurately.

Jeff Nolan

Performance Marketing Leader

ROI-Driven Enterprise Tech Marketing Expert