Unstructured Data Sources

An unstructured data source is a data source where all the content is stored together in one combined dataset, without any way to separate, target, or isolate individual pieces of information.

That means you cannot:

  • select one specific policy
  • select one specific page
  • exclude a single section
  • tell the chatbot which part of the content applies

Everything is merged, and the AI must decide relevance on its own.

Why Unstructured Data Sources Happens?

It usually occur when you ingest something large as one chunk, such as:

  • scraping a full website into one dataset
  • loading all PDFs into a single knowledge base
  • indexing an entire folder without categorising
  • feeding hundreds of pages of mixed topics into one RAG source

You get one giant pool of text.


You cannot tell the chatbot where one topic ends and another begins.

Chatbot Example

Most chatbots I have seen usually start with the platform ingesting all the content from the council's website. While this can work, it can easily get stuck with thousands of pages of content, and the AI has trouble identifying when to use the content.

In the model below, you can see what an average chatbot with unstructured data looks like.

MacBook Pro with images of computer language codes