A technical deep-dive into embeddings, vector search, RAG architecture, and how we prevent hallucinations while delivering accurate answers from your content.
AI chatbot features like automatic learning are what set modern chatbots apart from traditional rule-based bots. When you connect your website to Boei, a sophisticated process transforms your content into something an AI can understand and search through. These features are perfect for ecommerce websites looking to automate support. Here's what happens under the hood:
Our crawler visits every page of your website and extracts the meaningful content. This isn't just copying HTML. We intelligently remove:
What remains is the actual content your visitors care about: product descriptions, FAQs, policies, articles, and documentation. We also extract metadata like page titles, H1 headings, and descriptions.
Long pages are split into smaller, semantic chunks that preserve meaning. Our chunking algorithm:
Embeddings are numerical representations of text that capture semantic meaning. Think of them as coordinates in a multi-dimensional space where similar concepts are close together. The sentence "What's your return policy?" and "Can I send items back?" have different words but nearly identical embeddings because they mean the same thing.
We use advanced embedding models to convert each chunk of your content into these numerical vectors: typically 1,536 dimensions that capture nuance, context, and meaning.
These embeddings are stored in a specialized vector database optimized for similarity search. Unlike traditional databases that match exact keywords, vector databases find content based on meaning. This is why the chatbot understands questions even when visitors don't use the exact words from your website. See these features in action: 18 chatbot use cases. All features included in our simple pricing. Easy setup on WordPress.
From raw content to searchable knowledge base
Crawl your website via sitemap or domain discovery. Support for JavaScript-rendered pages using our custom scraper.
Clean HTML, remove navigation/ads/scripts, extract meaningful content and metadata.
Split content into semantic units while preserving code blocks, tables, and context.
Convert text chunks into 1,536-dimensional vectors that capture semantic meaning.
Save embeddings in Weaviate vector database with hybrid BM25 + vector search.
Hybrid BM25 + vector search finds the most relevant content for any question.
One of the most important AI chatbot features is intelligent question handling. When a visitor types a question, a multi-stage process ensures they get the most accurate answer possible:
The raw question is analyzed and enhanced before searching. This includes:
We don't rely on just one search method. Instead, we combine:
This hybrid approach catches both semantic matches ("refund" matches "return policy") and exact matches (specific product names, model numbers).
Results from both search methods are combined and re-ranked based on:
The top-ranked content chunks are sent to the LLM (GPT-5 or Claude) along with the original question. The AI synthesizes an answer using only the provided content. Never its general training data. This is called Retrieval-Augmented Generation (RAG).
Every answer includes links to the source pages used. Visitors can verify information themselves, and you can see exactly what content informed each response.
AI hallucinations happen when models generate plausible-sounding but incorrect information. This is the #1 concern businesses have about AI chatbots. Here's how Boei solves it:
How We Prevent Hallucinations
How to create your AI chatbot from scratch
All the AI chatbot features included with Boei. No hidden costs or add-ons
Create a bot using AI from a prompt or one-click from your domain
Upload sitemap or use our crawler to learn your entire site
Learn from FAQ, text, Excel, PDF, PPT, and other documents
Show sources to customers. No hallucinations, fully verifiable
Get leads via email, webhook, or Boei inbox with full transcripts
Track page visits, bot opens, interactions, and conversion to leads
Configure which fields to collect: email, name, phone, custom fields
Full searchable history of all conversations with AI summaries
Suggested responses for visitors to click instead of typing
Interface automatically translates to visitor's language (95+ languages)
Widget on your site or standalone page/landing page
Fully adjust bot behavior to match your exact use case
GPT-5, Claude 4 Sonnet, GPT-4o, and o3-mini available
Match your brand colors, fonts, and styling
Adjust all interface text and messages
Hand off conversations to human agents when needed
Custom system prompts for power users
Set up test cases to review bot performance automatically
Override AI responses for specific questions, bulk-import via Excel template
Structured question flows inside the chatbot to capture name, email, phone, company
Lead collectors can appear after N chat messages, not just start/end
Email/phone/name auto-sent to Google Analytics, Meta Pixel for conversion tracking
Tag training data by topic. Pre-chat questions let visitors self-select, so the AI only searches matching content
Your chatbot doesn't just answer questions. It takes actions: API calls, calculations, data lookups, and webhook triggers. Configure actions conversationally in the Agent tab.
Full email inbox with AI that replies automatically. Customizable greeting and signature. Configurable reply limits. BCC support.
Bidirectional SMS with AI auto-reply. Messages appear in your shared inbox alongside chat and WhatsApp.
Clean, Minimal, or WhatsApp theme. Dark mode auto-detects your visitor's OS. Gradient support and branded header.
Track leads from first conversation to closed deal. Kanban board with custom stages, follow-up dates, and activity tracking.
Send follow-up emails and WhatsApp messages on a smart schedule. Working days, send times, timezone aware. Never lose a lead.
Built-in hallucination prevention. The AI verifies answers against your actual content before responding. Safety constraints and fabrication detection.
Chat history survives page reloads. Visitors pick up where they left off, even after navigating your site.
Suggested conversation starters that guide visitors. Show always, once, or never. Configurable per chatbot.
The AI models and infrastructure powering your chatbot
GPT-5 (Latest, fastest) • Claude 4 Sonnet (Most human-like responses) • GPT-4o (Reliable workhorse) • o3-mini (Budget-friendly option). Choose based on your needs. Switch anytime.
Powered by Weaviate. An production-ready vector database that handles millions of documents. Supports hybrid BM25 + vector search for optimal retrieval accuracy.
Scrape → Process → Chunk → Embed → Store → Search. Intelligent chunking preserves semantic boundaries. Custom content rules for pricing, tables, and code blocks.
Automatic removal of nav, footer, sidebar, ads, forms, scripts, styles, comments, pagination, and breadcrumbs. Metadata extraction for title, H1, and description.
Flexible knowledge sources are among the most powerful AI chatbot features available. Your AI chatbot can learn from multiple types of content, all processed through the same embedding pipeline:
This prevents answer mix-ups when a single chatbot serves content that overlaps or contradicts across topics — like an insurance company where "Am I covered?" has a different answer for every product.
All sources are combined into a unified knowledge base. The bot searches across everything when answering questions, filtered by tags when a topic has been selected.
Combines BM25 keyword matching with vector similarity search for best results
Enhances queries before search for better retrieval accuracy
Results re-ranked by content type, page importance, and relevance
Every answer includes clickable links to original source pages
Try the demo chatbot or start your free trial. Setup takes 5 minutes.
Learn more about AI chatbots
Real examples across industries
Simple, transparent pricing
Our most popular platform