TL;DR: AI chatbot platforms measure training data in four different ways: characters, pages, megabytes, and words. There is no industry standard. Dante AI offers the most raw data per dollar at entry level (862K chars/$1), while Boei at $14/month provides 5M characters across 1,000 pages with aggressive content cleaning. Most small business websites fit comfortably on any entry-level paid plan.
When choosing an AI chatbot platform, one of the most confusing aspects is understanding how much training data you actually get. Every platform measures and advertises their limits differently, making apples-to-apples comparisons nearly impossible.
I spent a few hours going through pricing pages, documentation, and support chatbots of 10 leading AI chatbot platforms to understand exactly how they measure training data. The platforms I reviewed: Boei, Chatbase, Dante AI, FastBots, SiteGPT, CustomGPT, DocsBot AI, ChatNode, Wonderchat, and LiveChatAI.
For each one, I checked their pricing pages, read through help docs, and when something wasn't clear, I asked their own chatbot or support team. Then I normalized everything into comparable units so you can actually see what you get for your money.
AI chatbot platforms use four distinct approaches to measure training data limits:
This inconsistency makes it hard for customers to compare platforms at a glance. So I did the work of normalizing everything into one comparable unit.
| Platform | Definition | Entry Paid Price |
|---|---|---|
| SiteGPT | 2,500 cleaned chars = 1 page | $39/mo |
| DocsBot AI | ~5,000 cleaned chars = 1 page | $19/mo (Hobby) |
| Boei | 5,000 cleaned chars = 1 page | $14/mo (Start) |
SiteGPT uses the smallest page definition at 2,500 characters. This means the exact same content would show as 200 pages on SiteGPT but only 100 pages on DocsBot or Boei. When a platform advertises a page limit, always check the underlying character definition.
DocsBot also notes that UTF-8 languages like Chinese, Japanese, and Korean may use more characters per page due to encoding differences.
| Platform | Free Tier | Entry Paid Tier | Mid Tier | High Tier |
|---|---|---|---|---|
| Dante AI | 500K chars | 25M chars ($29/mo) | 50M chars ($149/mo) | 200M chars ($299/mo) |
| FastBots | 500K chars | 12M chars ($39/mo) | 15M chars ($89/mo) | 25M chars ($399/mo) |
| ChatNode | 500K chars | 12.5M chars ($35/mo) | 12.5M chars ($89/mo) | 12.5M chars ($377/mo) |
| Wonderchat | N/A | 500K chars ($29/mo) | 3M chars ($99/mo) | 12M chars ($299/mo) |
These platforms are the most transparent since you see the exact character count you are working with.
Chatbase stands out by measuring training data in file size rather than characters or pages:
| Plan | Storage Per Agent | Price |
|---|---|---|
| Free | 400 KB | $0 |
| Hobby | 33 MB | $40/mo |
| Standard | 40 MB | $150/mo |
| Pro | 60 MB | $500/mo |
This is a different approach from most competitors. MB-based limits factor in file overhead, encoding, and metadata, not just the cleaned text content. Roughly speaking, 1 MB of plain text equals approximately 1 million characters, but actual results vary depending on file format, language, and encoding. A PDF or DOCX file will "cost" more MB than raw text because of formatting metadata.
Some older Chatbase reviews still reference character counts (like 11M or 33M characters), likely from their previous measurement system before they switched to MB/KB.
CustomGPT uses a completely different system, measuring in "words stored" and counting "pages" as individual resources (one webpage, one document, one file):
| Plan | Words Stored | Pages (Resources) | Price |
|---|---|---|---|
| Standard | 60M words | 5,000 | $99/mo |
| Premium | 300M words | 20,000 | $499/mo |
Words stored are shared across all bots on the account. A rough conversion: 1 word averages about 5-6 characters, so 60 million words translates to roughly 300-360 million characters.
To make a fair comparison, I converted every platform's entry-level paid plan into estimated millions of characters. For MB-based platforms, I used the approximation that 1 MB of cleaned text is roughly 1 million characters. For word-based platforms, I used 5.5 characters per word on average.
| Platform | Entry Paid Price | Metric Shown | Estimated Characters | Est. Chars per Dollar |
|---|---|---|---|---|
| Dante AI | $29/mo | 25M chars | ~25M | ~862K chars/$1 |
| CustomGPT | $99/mo | 60M words | ~330M | ~3.3M chars/$1 |
| Chatbase | $40/mo | 33 MB | ~33M | ~825K chars/$1 |
| FastBots | $39/mo | 12M chars | ~12M | ~308K chars/$1 |
| ChatNode | $35/mo | 12.5M chars | ~12.5M | ~357K chars/$1 |
| Boei | $14/mo | 1,000 pages | ~5M | ~357K chars/$1 |
| DocsBot AI | $19/mo | 1,000 pages | ~5M | ~263K chars/$1 |
| SiteGPT | $39/mo | Pages (varies) | Varies by plan | Varies |
| Wonderchat | $29/mo | 500K chars | ~0.5M | ~17K chars/$1 |
Important caveats about this comparison:
| Platform | Price | Metric Shown | Estimated Characters |
|---|---|---|---|
| Dante AI | $149/mo | 50M chars | ~50M |
| Chatbase | $150/mo | 40 MB | ~40M |
| FastBots | $89/mo | 15M chars | ~15M |
| ChatNode | $89/mo | 12.5M chars | ~12.5M |
| Wonderchat | $99/mo | 3M chars | ~3M |
| DocsBot AI | $49/mo | 5,000 pages | ~25M |
There is another layer of complexity that the headline numbers do not reveal: how platforms count pages when they crawl your website.
Some platforms count every URL as exactly 1 page, regardless of how much content is on it. A 200-word "About Us" page and a 10,000-word comprehensive guide would both count as 1 page. This can work in your favor for content-heavy sites, but it can also waste your quota on thin pages like privacy policies or nearly-empty category pages.
Other platforms, like SiteGPT, ignore the URL entirely and convert all crawled content into character-based pages using their conversion ratio (2,500 chars = 1 page in their case).
Boei uses a hybrid approach: 1 crawled URL counts as a minimum of 1 page. If the cleaned content exceeds 5,000 characters, it counts as additional pages. So a 12,000-character guide would cost 3 pages, while a short contact page with 800 characters still costs 1 page. This approach is straightforward and predictable, though it means thin pages use a full page credit.
A hidden cost on some platforms is how re-crawling works. Certain platforms track cumulative data indexed over time. If you re-crawl 100 pages because your content changed, they might count that as 200 total pages used (the old version plus the new one). Boei and others that track total stored content avoid this problem entirely. A re-sync simply replaces the old content, so your page count stays the same.
Since Boei uses 5,000 characters as its page definition, here is some context for what that represents:
The 5,000-character definition aligns well with what most people intuitively consider "a page of content." A standard printed page with normal margins contains about 2,000-3,000 characters, but web content tends to include more text per logical page. The 5,000-character unit captures one complete piece of content, whether that is a blog post, a help article, or a product description.
The raw numbers in the tables above can feel abstract. Here are some real-world scenarios to help you estimate what plan you would need across platforms.
A typical small business site with a homepage, services pages, about page, contact page, blog posts, and FAQ. After cleaning, this usually adds up to around 100,000-200,000 characters.
Verdict: Almost any paid plan on any platform handles this comfortably. Even most free tiers would cover it.
A software company with a knowledge base of 100 help articles, each averaging 4,000 characters after cleaning. Total: roughly 400,000 characters.
Verdict: Still fits comfortably on entry-level paid plans. Free tiers would be tight on some platforms.
An online store with 500 product pages, each containing a description, specs, and reviews. Average cleaned content per product: 2,000-3,000 characters. Plus 20 informational pages (shipping, returns, FAQ, about). Total: roughly 1.2-1.7 million characters.
Verdict: Entry-level paid plans on most platforms still cover this. However, if product descriptions are lengthy or you have thousands of SKUs, you will start hitting limits on cheaper plans.
Enterprise software documentation, government sites, or content-heavy publishers with 1,000+ pages averaging 5,000 characters each. Total: roughly 5 million+ characters.
Verdict: You will need mid-tier or higher plans on most platforms. This is where the differences in pricing and limits start to matter significantly.
This article focuses on training data limits, but most platforms also cap the number of messages or responses your chatbot can handle per month. A platform might offer generous training data but restrict you to 1,000 messages per month, which could be a bigger bottleneck than the data limit for high-traffic sites.
When comparing platforms, make sure to evaluate both dimensions: how much data you can train on AND how many conversations your chatbot can handle. A deep comparison of message credit pricing across platforms is a topic for a separate analysis, but keep it in mind as you evaluate your options.
Four completely different measurement systems exist across just 10 platforms. Characters, pages, megabytes, and words all represent the same underlying concept but are presented differently to customers.
SiteGPT's page (2,500 chars) is half the size of Boei's and DocsBot's (5,000 chars). A plan advertising "500 pages" on SiteGPT contains the same amount of data as a plan advertising "250 pages" on Boei.
Chatbase's MB-based system introduces a variable that pure character counts avoid: file format overhead. A 1 MB PDF might contain only 500K characters of actual text because the rest is formatting data. This makes it harder for customers to predict how much content they can actually store.
Nearly every platform with a free tier lands in the 400K-500K character range. This appears to be the industry consensus for a meaningful test, enough for a small site with 50-100 pages.
At $29/month for 25 million characters, Dante AI provides significantly more raw training capacity than competitors at similar price points. However, data capacity is only one factor. Response quality, integrations, and support matter too.
Platforms showing raw character counts (Dante, FastBots, ChatNode) are the easiest to understand and compare. Page-based systems require knowing the conversion factor. MB-based systems require understanding file format overhead. Word-based systems require a characters-per-word assumption.
When evaluating AI chatbot platforms for training data limits, ask these questions:
Understanding these details before committing will help you avoid surprises and choose the platform that genuinely fits your needs.
Article by
Ruben is the founder of Boei, with 12+ years of experience in conversion optimization. Former IT consultant at Ernst & Young and Accenture, where he helped product teams at Shell, ING, Rabobank, Aegon, NN, and AirFrance/KLM optimize their digital experiences. Now building tools to help businesses convert more website visitors into customers.
Get 30% more conversations and effortlessly convert them into customers.
Don't wait, experience it for free yourself!
URL
Trusted by 10,000+ businesses
Quick 5-min, no code setup
Discover effective lead capture strategies tailored for small and medium-sized businesses. Learn how to overcome common challenges in lead generation with actionable tips and industry insights to boost your conversion rates.
Compare the 10 best Kommunicate alternatives in 2026. Find AI chatbot tools with more channels and lower pricing, starting at $14/month with AI.
Add Instagram DM chat to your website in minutes. Step-by-step guide for Shopify, WordPress, and Wix. Free setup, no coding required.
Compare 5 open source chatbot platforms: Botpress, Rasa, Typebot, Chatwoot, and Botonic. Plus when a SaaS tool is the smarter choice.
Compare the 10 best Landbot alternatives in 2026. Find chatbot tools with affordable WhatsApp, AI chat, and omnichannel support. From free to $14/month.