AI Chatbot Training Data Limits Compared in 2026 (10 Platforms)

Ruben Buijs Ruben Buijs Feb 23, 2026 11 min read ChatGPT Claude
AI Chatbot Training Data Limits Compared in 2026 (10 Platforms)

TL;DR: AI chatbot platforms measure training data in four different ways: characters, pages, megabytes, and words. There is no industry standard. Dante AI offers the most raw data per dollar at entry level (862K chars/$1), while Boei at $14/month provides 5M characters across 1,000 pages with aggressive content cleaning. Most small business websites fit comfortably on any entry-level paid plan.

When choosing an AI chatbot platform, one of the most confusing aspects is understanding how much training data you actually get. Every platform measures and advertises their limits differently, making apples-to-apples comparisons nearly impossible.

I spent a few hours going through pricing pages, documentation, and support chatbots of 10 leading AI chatbot platforms to understand exactly how they measure training data. The platforms I reviewed: Boei, Chatbase, Dante AI, FastBots, SiteGPT, CustomGPT, DocsBot AI, ChatNode, Wonderchat, and LiveChatAI.

For each one, I checked their pricing pages, read through help docs, and when something wasn't clear, I asked their own chatbot or support team. Then I normalized everything into comparable units so you can actually see what you get for your money.

Four Different Measurement Approaches

AI chatbot platforms use four distinct approaches to measure training data limits:

  1. Characters per page - converting characters into a "page" count (SiteGPT, DocsBot, Boei)
  2. Raw character limits - total characters with no abstraction (Dante AI, FastBots, ChatNode, Wonderchat)
  3. File size in MB/KB - measuring storage by data size (Chatbase)
  4. Word-based limits - total words stored (CustomGPT)

This inconsistency makes it hard for customers to compare platforms at a glance. So I did the work of normalizing everything into one comparable unit.

Platform-by-Platform Breakdown

Platforms That Use "Pages" (Characters Per Page)

Platform Definition Entry Paid Price
SiteGPT 2,500 cleaned chars = 1 page $39/mo
DocsBot AI ~5,000 cleaned chars = 1 page $19/mo (Hobby)
Boei 5,000 cleaned chars = 1 page $14/mo (Start)

SiteGPT uses the smallest page definition at 2,500 characters. This means the exact same content would show as 200 pages on SiteGPT but only 100 pages on DocsBot or Boei. When a platform advertises a page limit, always check the underlying character definition.

DocsBot also notes that UTF-8 languages like Chinese, Japanese, and Korean may use more characters per page due to encoding differences.

Platforms Using Raw Character Limits

Platform Free Tier Entry Paid Tier Mid Tier High Tier
Dante AI 500K chars 25M chars ($29/mo) 50M chars ($149/mo) 200M chars ($299/mo)
FastBots 500K chars 12M chars ($39/mo) 15M chars ($89/mo) 25M chars ($399/mo)
ChatNode 500K chars 12.5M chars ($35/mo) 12.5M chars ($89/mo) 12.5M chars ($377/mo)
Wonderchat N/A 500K chars ($29/mo) 3M chars ($99/mo) 12M chars ($299/mo)

These platforms are the most transparent since you see the exact character count you are working with.

Chatbase: File Size (MB/KB)

Chatbase stands out by measuring training data in file size rather than characters or pages:

Plan Storage Per Agent Price
Free 400 KB $0
Hobby 33 MB $40/mo
Standard 40 MB $150/mo
Pro 60 MB $500/mo

This is a different approach from most competitors. MB-based limits factor in file overhead, encoding, and metadata, not just the cleaned text content. Roughly speaking, 1 MB of plain text equals approximately 1 million characters, but actual results vary depending on file format, language, and encoding. A PDF or DOCX file will "cost" more MB than raw text because of formatting metadata.

Some older Chatbase reviews still reference character counts (like 11M or 33M characters), likely from their previous measurement system before they switched to MB/KB.

CustomGPT: Word-Based Limits

CustomGPT uses a completely different system, measuring in "words stored" and counting "pages" as individual resources (one webpage, one document, one file):

Plan Words Stored Pages (Resources) Price
Standard 60M words 5,000 $99/mo
Premium 300M words 20,000 $499/mo

Words stored are shared across all bots on the account. A rough conversion: 1 word averages about 5-6 characters, so 60 million words translates to roughly 300-360 million characters.

The Normalized Comparison

To make a fair comparison, I converted every platform's entry-level paid plan into estimated millions of characters. For MB-based platforms, I used the approximation that 1 MB of cleaned text is roughly 1 million characters. For word-based platforms, I used 5.5 characters per word on average.

Platform Entry Paid Price Metric Shown Estimated Characters Est. Chars per Dollar
Dante AI $29/mo 25M chars ~25M ~862K chars/$1
CustomGPT $99/mo 60M words ~330M ~3.3M chars/$1
Chatbase $40/mo 33 MB ~33M ~825K chars/$1
FastBots $39/mo 12M chars ~12M ~308K chars/$1
ChatNode $35/mo 12.5M chars ~12.5M ~357K chars/$1
Boei $14/mo 1,000 pages ~5M ~357K chars/$1
DocsBot AI $19/mo 1,000 pages ~5M ~263K chars/$1
SiteGPT $39/mo Pages (varies) Varies by plan Varies
Wonderchat $29/mo 500K chars ~0.5M ~17K chars/$1

Important caveats about this comparison:

  • Chatbase's MB limit includes file formatting overhead, so the actual usable text may be lower than the raw conversion suggests
  • CustomGPT's massive word count looks generous, but their starting price is also significantly higher ($99/mo vs $29-40/mo for others)
  • "Cleaned characters" varies by platform. Some strip only HTML tags, while others go further. Boei, for example, strips HTML, navigation, headers, footers, and boilerplate content, then converts the remaining content into clean Markdown before counting characters. This means the character count reflects only the actual readable content that matters for chatbot training, not the surrounding page structure. Other platforms may be less aggressive in their cleaning, meaning more of your character allowance gets consumed by non-content markup
  • These numbers reflect per-agent/per-bot limits for most platforms, but some share limits across all bots. For instance, Chatbase and FastBots set storage limits per individual agent, meaning each bot gets its own allowance. Boei takes a different approach with a single shared limit per account, so your total page budget is spread across all your chatbots. CustomGPT also shares word storage at the account level. This distinction matters if you plan to run multiple chatbots: per-bot limits give each bot a guaranteed allocation, while per-account limits offer more flexibility to distribute data unevenly across bots based on need

Mid-Tier Plan Comparison (~$89-150/mo range)

Platform Price Metric Shown Estimated Characters
Dante AI $149/mo 50M chars ~50M
Chatbase $150/mo 40 MB ~40M
FastBots $89/mo 15M chars ~15M
ChatNode $89/mo 12.5M chars ~12.5M
Wonderchat $99/mo 3M chars ~3M
DocsBot AI $49/mo 5,000 pages ~25M

How Platforms Count Crawled Pages

There is another layer of complexity that the headline numbers do not reveal: how platforms count pages when they crawl your website.

Some platforms count every URL as exactly 1 page, regardless of how much content is on it. A 200-word "About Us" page and a 10,000-word comprehensive guide would both count as 1 page. This can work in your favor for content-heavy sites, but it can also waste your quota on thin pages like privacy policies or nearly-empty category pages.

Other platforms, like SiteGPT, ignore the URL entirely and convert all crawled content into character-based pages using their conversion ratio (2,500 chars = 1 page in their case).

Boei uses a hybrid approach: 1 crawled URL counts as a minimum of 1 page. If the cleaned content exceeds 5,000 characters, it counts as additional pages. So a 12,000-character guide would cost 3 pages, while a short contact page with 800 characters still costs 1 page. This approach is straightforward and predictable, though it means thin pages use a full page credit.

Re-crawling and syncing

A hidden cost on some platforms is how re-crawling works. Certain platforms track cumulative data indexed over time. If you re-crawl 100 pages because your content changed, they might count that as 200 total pages used (the old version plus the new one). Boei and others that track total stored content avoid this problem entirely. A re-sync simply replaces the old content, so your page count stays the same.

What 5,000 Characters Actually Looks Like

Since Boei uses 5,000 characters as its page definition, here is some context for what that represents:

  • A typical blog post or help article: 1 page
  • A product landing page (cleaned text): 0.5-1 page
  • A detailed FAQ page: 1-3 pages
  • A 10-page PDF document: 3-6 pages (after cleaning)
  • A full website with 50 content pages: roughly 30-80 pages

The 5,000-character definition aligns well with what most people intuitively consider "a page of content." A standard printed page with normal margins contains about 2,000-3,000 characters, but web content tends to include more text per logical page. The 5,000-character unit captures one complete piece of content, whether that is a blog post, a help article, or a product description.

How Much Training Data Does Your Business Actually Need?

The raw numbers in the tables above can feel abstract. Here are some real-world scenarios to help you estimate what plan you would need across platforms.

Small business marketing website (30 pages)

A typical small business site with a homepage, services pages, about page, contact page, blog posts, and FAQ. After cleaning, this usually adds up to around 100,000-200,000 characters.

  • On Boei (5,000 chars/page): ~20-40 pages used, well within the Start plan's 1,000 pages ($14/mo)
  • On SiteGPT (2,500 chars/page): ~40-80 pages used
  • On Chatbase: well under 1 MB
  • On character-based platforms: well under 1M characters

Verdict: Almost any paid plan on any platform handles this comfortably. Even most free tiers would cover it.

SaaS product with 100 help docs

A software company with a knowledge base of 100 help articles, each averaging 4,000 characters after cleaning. Total: roughly 400,000 characters.

  • On Boei (5,000 chars/page): ~100 pages used, fits easily in the Start plan's 1,000 pages ($14/mo)
  • On SiteGPT (2,500 chars/page): ~160 pages used
  • On Chatbase: under 1 MB
  • On character-based platforms: under 1M characters

Verdict: Still fits comfortably on entry-level paid plans. Free tiers would be tight on some platforms.

E-commerce store with 500 products

An online store with 500 product pages, each containing a description, specs, and reviews. Average cleaned content per product: 2,000-3,000 characters. Plus 20 informational pages (shipping, returns, FAQ, about). Total: roughly 1.2-1.7 million characters.

  • On Boei (5,000 chars/page): ~500-540 pages used (each product is 1 page minimum), fits in the Start plan's 1,000 pages ($14/mo)
  • On SiteGPT (2,500 chars/page): ~500-680 pages used
  • On Chatbase: roughly 1.5-2 MB
  • On character-based platforms: 1.2-1.7M characters

Verdict: Entry-level paid plans on most platforms still cover this. However, if product descriptions are lengthy or you have thousands of SKUs, you will start hitting limits on cheaper plans.

Large documentation site (1,000+ pages)

Enterprise software documentation, government sites, or content-heavy publishers with 1,000+ pages averaging 5,000 characters each. Total: roughly 5 million+ characters.

  • On Boei (5,000 chars/page): ~1,000+ pages used, would need a second chatbot ($20/mo total) for 2,000 pages
  • On SiteGPT (2,500 chars/page): ~2,000+ pages used
  • On Chatbase: roughly 5+ MB
  • On character-based platforms: 5M+ characters

Verdict: You will need mid-tier or higher plans on most platforms. This is where the differences in pricing and limits start to matter significantly.

Training Data Is Only Half the Equation

This article focuses on training data limits, but most platforms also cap the number of messages or responses your chatbot can handle per month. A platform might offer generous training data but restrict you to 1,000 messages per month, which could be a bigger bottleneck than the data limit for high-traffic sites.

When comparing platforms, make sure to evaluate both dimensions: how much data you can train on AND how many conversations your chatbot can handle. A deep comparison of message credit pricing across platforms is a topic for a separate analysis, but keep it in mind as you evaluate your options.

Key Findings

1. There Is No Industry Standard

Four completely different measurement systems exist across just 10 platforms. Characters, pages, megabytes, and words all represent the same underlying concept but are presented differently to customers.

2. "Pages" Can Mean Very Different Things

SiteGPT's page (2,500 chars) is half the size of Boei's and DocsBot's (5,000 chars). A plan advertising "500 pages" on SiteGPT contains the same amount of data as a plan advertising "250 pages" on Boei.

3. MB-Based Limits Add Complexity

Chatbase's MB-based system introduces a variable that pure character counts avoid: file format overhead. A 1 MB PDF might contain only 500K characters of actual text because the rest is formatting data. This makes it harder for customers to predict how much content they can actually store.

4. Free Tiers Converge Around 400-500K Characters

Nearly every platform with a free tier lands in the 400K-500K character range. This appears to be the industry consensus for a meaningful test, enough for a small site with 50-100 pages.

5. Dante AI Offers the Most Raw Data Per Dollar at Entry Level

At $29/month for 25 million characters, Dante AI provides significantly more raw training capacity than competitors at similar price points. However, data capacity is only one factor. Response quality, integrations, and support matter too.

6. Transparency Varies Wildly

Platforms showing raw character counts (Dante, FastBots, ChatNode) are the easiest to understand and compare. Page-based systems require knowing the conversion factor. MB-based systems require understanding file format overhead. Word-based systems require a characters-per-word assumption.

What to Look For When Comparing Platforms

When evaluating AI chatbot platforms for training data limits, ask these questions:

  1. What unit do they use? Characters, pages, MB, or words? Find the underlying definition.
  2. Are characters "cleaned" or raw? Cleaned characters (without HTML/code) give you more usable content per unit.
  3. Are limits per bot or per account? Some platforms share limits across all bots, others give each bot its own allocation.
  4. Does file format matter? On MB-based platforms like Chatbase, a PDF costs more storage than raw text for the same content.
  5. What happens when you hit the limit? Some platforms stop indexing new data, others stop responding entirely.
  6. How does retraining/syncing count? Some platforms count retraining against your limit, others treat it as a refresh of existing data.

Understanding these details before committing will help you avoid surprises and choose the platform that genuinely fits your needs.

Ruben Buijs

Article by

Ruben is the founder of Boei, with 12+ years of experience in conversion optimization. Former IT consultant at Ernst & Young and Accenture, where he helped product teams at Shell, ING, Rabobank, Aegon, NN, and AirFrance/KLM optimize their digital experiences. Now building tools to help businesses convert more website visitors into customers.

Create your first Boei widget today

Get 30% more conversations and effortlessly convert them into customers.
Don't wait, experience it for free yourself!

URL

https://
https://

Trusted by 10,000+ businesses

Quick 5-min, no code setup

Andrew Lee David S. Vance W. Grant Nitesh Manav
from 159 reviews

Read more

Mastering Lead Capture: Strategies for SMBs to Boost Lead Generation and Conversion
Mastering Lead Capture: Strategies for SMBs to Boost Lead Generation and Conversion

Discover effective lead capture strategies tailored for small and medium-sized businesses. Learn how to overcome common challenges in lead generation with actionable tips and industry insights to boost your conversion rates.

10 Best Kommunicate Alternatives for AI Customer Chat (2026)
10 Best Kommunicate Alternatives for AI Customer Chat (2026)

Compare the 10 best Kommunicate alternatives in 2026. Find AI chatbot tools with more channels and lower pricing, starting at $14/month with AI.

How to Add Instagram Chat to Your Website (2026 Guide)
How to Add Instagram Chat to Your Website (2026 Guide)

Add Instagram DM chat to your website in minutes. Step-by-step guide for Shopify, WordPress, and Wix. Free setup, no coding required.

5 Best Open Source Chatbot Platforms in 2026 (+ SaaS Alternative)
5 Best Open Source Chatbot Platforms in 2026 (+ SaaS Alternative)

Compare 5 open source chatbot platforms: Botpress, Rasa, Typebot, Chatwoot, and Botonic. Plus when a SaaS tool is the smarter choice.

10 Best Landbot Alternatives With Cheaper WhatsApp (2026)
10 Best Landbot Alternatives With Cheaper WhatsApp (2026)

Compare the 10 best Landbot alternatives in 2026. Find chatbot tools with affordable WhatsApp, AI chat, and omnichannel support. From free to $14/month.