Do AI engines really read robots.txt?

Yes. Crawlers like GPTBot, ClaudeBot, PerplexityBot and Google-Extended respect robots.txt directives. If you disallow those user-agents, the engine never fetches your page, so it cannot cite you no matter how good the content is. Check your robots.txt and server logs before anything else.

Can a page rank on Google but still be invisible to ChatGPT?

Absolutely, and it is common. Classic ranking rewards links, depth and relevance, while AI citation rewards a clean, extractable, well-attributed answer near the top of the page. A page can rank well yet bury its answer or contain no quotable fact, which is why it gets skipped by answer engines.

How fast can fixing these mistakes get me cited?

Crawl-and-citation cycles for AI engines often run faster than traditional ranking. Once a previously blocked or restructured page is recrawled, you can see new citations within days to a few weeks, especially for lower-competition questions. Total-invisibility fixes like unblocking crawlers move fastest, while authority and original-data gains compound over months.

Which schema type matters most for AEO?

FAQPage and Article schema do the most work for AEO because they label your questions, answers and publish dates as clean, extractable data. Add Organization schema with sameAs links for entity and trust signals, and HowTo for step-based content. Validate everything in Google's Rich Results Test with zero errors.

How do I check if GPTBot or other AI bots crawled my site?

Look in your raw server access logs and filter by user-agent strings such as GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot and Google-Extended. Most analytics tools hide bot traffic, so the server log is the reliable source. Zero hits over the last 30 days usually means a robots.txt block or a discovery problem.

Do I need original data to get cited by AI?

Not for every page, but it is the strongest single lever for competitive topics. Engines preferentially credit the source of a unique survey, benchmark or result over sites that merely rehash it. Publishing one first-party statistic per pillar page can earn citations across many AI answers and is worth the effort.

11 AEO Mistakes Killing Your AI Search Visibility

By the Lenoretech SEO Strategy Team · Reviewed by a senior SEO strategist · Last updated: June 2026

The most common AEO mistakes are burying the answer below the fold, padding pages with fluff that contains no extractable facts, omitting structured data, and accidentally blocking AI crawlers in robots.txt. AI engines extract short, factual, well-attributed passages. If your page hides its answer or has no quotable sentence, it gets skipped no matter how well it ranks on classic Google. This is exactly why a page can sit on page one of Search and still be invisible inside ChatGPT, a gap we unpack in AEO vs SEO. Below are the 11 reasons pages never get cited, each with a concrete fix and a measurable signal.

1. The answer is buried 800 words down

Answer engines reward proximity. When a model assembles a response, it pulls the cleanest sentence that directly answers the question, and it strongly favours passages near the top of a page or section. If your "what is X" answer arrives only after a long intro, the model usually grabs a competitor who stated it in line one.

The fix: Lead every page and every H2 with a one or two sentence direct answer, then expand. Put the definition or the number first and the storytelling second. We cover the exact passage structure that gets pulled in how to optimise content for ChatGPT, Perplexity and AI Overviews.

Measurable signal: The first 40 words under each heading should contain the full answer to that heading's question. Read your top sections aloud. If you cannot answer the heading in the opening sentence, rewrite it.

2. Walls of fluff with zero extractable facts

"In today's fast-paced digital landscape" tells a model nothing. AI engines cite passages that carry a discrete, checkable fact: a number, a date, a step, a definition, or a comparison. A paragraph that could appear on any website in any industry has nothing to extract, so it gets ignored.

The fix: Every paragraph should contain at least one specific claim, such as a stat, a price, a threshold, a named tool, or a measured result. Cut any sentence that survives being deleted without changing the meaning.

Measurable signal: As a working rule we use at Lenoretech, scan each page for "facts per paragraph" and aim for at least one concrete, checkable claim in every paragraph. If you read a full section and cannot point to a single number, name or specific outcome, that section will not get cited and needs a rewrite.

3. You blocked the AI crawlers in robots.txt

This is the one that silently kills entire sites. Many CMS templates and "AI-protection" plugins now disallow GPTBot, ClaudeBot, PerplexityBot, Google-Extended and others by default. If those user-agents are blocked, you are invisible to the exact engines you are trying to win.

The fix: Open /robots.txt and confirm you are not disallowing GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended, Amazonbot or Bytespider unless you have a deliberate reason. Allow the bots whose engines you want citations from.

Measurable signal: Check your server logs for hits from those user-agents in the last 30 days. Zero crawler visits means zero chance of citation.

4. No schema markup, so machines guess your meaning

Structured data is how you hand a machine the answer instead of making it infer one. FAQPage, HowTo, Article, Product and Organization schema turn your content into labelled, extractable data. Pages with clean schema are disproportionately surfaced in AI Overviews because the answer is already structured.

The fix: Add JSON-LD for the content type you publish. For Q&A blocks use FAQPage, for processes use HowTo, and for the brand use Organization with sameAs links. Our 15 schema markup examples walk through the exact code.

Measurable signal: Every key page validates with zero errors in Google's Rich Results Test, and at least FAQPage or Article schema is present.

5. Questions are not phrased the way people ask them

People prompt AI in full natural-language questions, such as "how much does dental SEO cost in India," not "dental seo cost." If your headings and content never mirror the conversational query, the semantic match weakens and a competitor who used the real phrasing wins the citation.

The fix: Turn primary headings into the literal questions your buyers type, then answer each in the line below. Mine People Also Ask, Reddit and your own sales-call transcripts for the exact wording.

Measurable signal: At least half your H2 and H3 headings are full questions, not keyword fragments.

6. Thin entity and authorship signals

AI engines weight who is saying something. A page with a named author, real credentials, an About page, and consistent business details across the web reads as trustworthy. Anonymous content from an unverifiable brand gets discounted, especially for money or health topics.

The fix: Add real author bylines with bios and credentials, a strong About page, and Organization schema with sameAs links to your verified social profiles. Keep your name, address and phone identical everywhere. This is core answer engine optimization groundwork.

Measurable signal: Every commercial page names a credentialed author and links to a bio, and your brand resolves to a single consistent entity across the top 10 mentions of it online.

7. One giant page instead of crisp, scannable sections

Models chunk content. A 3,000-word block with no clear structure forces the engine to slice arbitrarily, often mid-thought, and the resulting passage is too messy to quote. Clean H2 and H3 hierarchy gives the model clean chunks.

The fix: Break content into self-contained sections under descriptive headings, each answering one question completely so it can be lifted out without surrounding context. Use lists and short paragraphs.

Measurable signal: Any single section, copied alone, still makes complete sense as a standalone answer.

8. Stale content with no visible dates

AI engines favour fresh, dated information for anything time-sensitive, including pricing, statistics, "best of" lists and regulations. A page with no published or updated date, or one that is clearly two years old, loses to a competitor showing a current date and current numbers.

The fix: Display a clear "updated" date, refresh the stats and prices on a schedule, and add datePublished and dateModified to your Article schema so machines see the recency.

Measurable signal: Every fact-heavy page shows an update within the last 6 to 12 months and the schema dates match the visible date.

Not sure which of these 11 mistakes is costing you AI citations? We will tell you exactly where you stand.

See our SEO and AEO services or book a free audit →

9. No original data, only rehashed opinions

Engines preferentially cite sources that contribute something unique, such as a survey, a benchmark, a case result or a proprietary number. If your page only summarises what everyone else already published, there is no reason for a model to credit you over the original. Original assets are also the single strongest lever in our generative engine optimization guide.

The fix: Publish at least one first-party data point per pillar page. Survey your own customers, share anonymised account results, run a small experiment, or compile pricing across vendors. A single original number that others want to quote can earn citations across dozens of AI answers.

Measurable signal: Each pillar page contains at least one statistic, chart or result that exists nowhere else on the web, and ideally one that other sites start linking back to.

10. Orphaned answer pages with no internal links

A brilliant answer page that nothing links to is hard for any crawler to discover and weak on topical context. AI engines lean on the surrounding link graph to understand what a page is about and how authoritative it is within your site. An orphan page sends none of those signals, so even a perfect answer can sit uncrawled and uncited.

The fix: Link every new answer page from at least three relevant existing pages using descriptive anchor text, and link out from it to your related cluster pages. Build deliberate topic clusters rather than isolated posts, the approach we detail in our internal linking strategy guide.

Measurable signal: No important page has fewer than three internal inbound links, and every cluster page links to its pillar and at least two siblings.

11. Inconsistent formatting that breaks passage extraction

Even a fact-rich page fails if its formatting is hostile to extraction. Prices written three different ways, definitions hidden inside long run-on sentences, key numbers locked in images, or tables rendered as unlabelled divs all make it harder for a model to lift a clean, confident passage. The engine wants a tidy unit it can quote verbatim, not text it has to reconstruct.

The fix: Standardise how you present facts. Use real HTML tables for comparisons, bold the key term in each definition, write numbers as plain text rather than baking them into graphics, and keep one idea per sentence. If you want a number cited, make it copy-pasteable.

Measurable signal: Every important fact, price and definition appears as selectable text (not an image), and your comparison data sits in real table or list markup.

How to prioritise these 11 fixes

Start with the mistakes that cause total invisibility before the ones that cost a few citations. Check robots.txt first, because a blocked crawler makes every other fix pointless. Next, surface your direct answers to the top of each section and confirm each page carries real, extractable facts. Then layer on schema, dates, authorship and internal links. Most sites we audit are losing AI visibility on three or four of these at once, and clearing the top two or three usually moves the needle within a crawl cycle or two. For the complete positive playbook rather than the mistake list, see our guide on how to appear in AI search. If you would rather have a senior team diagnose and fix all 11 across your site, that is exactly what our AEO services deliver.

Global Reach, Local Teams

India

North America

UK & Europe

Middle East & APAC