The bug report is always the same. The RAG pipeline answers questions about the contracts, the policy docs, the wiki — and then someone uploads the quarterly financials, asks "what was APAC revenue in Q3," and gets back a confident paragraph about something on a different page. The retrieval looks fine. The embedding model is the good one. Nothing is broken, exactly. The system just cannot read a table.

This is not a tuning problem, and it is the single most common place a working text-RAG system falls over when it meets real enterprise data. Tables are where the fresh, reliable, domain-specific numbers live — and they break two assumptions that text-RAG quietly depends on. Understanding which two is the whole job, because they need opposite fixes.

A table isn't text, and your chunker treats it like one#

The first assumption is that meaning survives chunking. For prose it mostly does: split a paragraph at the wrong sentence and each half still says something. Split a table and you get carnage. A fixed-size chunker doesn't know where the rows are; it slices at token 512, lands in the middle of row forty, and produces a chunk of bare numbers whose column headers were left behind three chunks ago. The embedded vector is now "4.2, 11.8, APAC, 2024" with no idea that 4.2 is revenue in millions. The grid was the meaning, and the chunker flattened it away.

The fix is to stop treating the table as a string and start treating each row as a record. Serialize every row into a self-contained sentence that carries its own headers — "Region: APAC; Year: 2024; Revenue: \$4.2M; Growth: 11.8%" — and embed that. When a table is too wide or too long to fit one chunk, repeat the header on every piece so no data is ever orphaned from its labels. This isn't a clever trick; it is now a first-class feature in document-parsing stacks. Docling's HybridChunker has a repeat_table_header flag (on by default) for exactly this, plus a contextualize step that prepends the headers to the row before it goes to the embedder. The whole feature exists because detached headers were silently destroying retrieval.

There's a sharp piece of evidence for how different tabular retrieval is. On ordinary text, BM25 — plain keyword matching — is a strong, hard-to-beat baseline that dense embeddings only edge out. The TARGET benchmark ran the same comparison on tables and found the gap inverts: BM25 is markedly worse on tables, and dense retrievers win by a wide margin. The reason is intuitive once you see it — a paragraph is full of descriptive words to match on; a table cell is a number and a two-word label, so lexical search has almost nothing to grip. TARGET's other finding is the one to act on: the descriptive metadata around a table — its title, caption, the sentence that introduces it — often matters more for retrieval than the cells themselves. Embed the table's description, not just its contents.

Similarity can't do arithmetic#

Fix the chunking and lookup questions start working. Then someone asks "what was total revenue across all regions," and the system fails again — and this time no amount of better embedding will save it.

Here is the second, deeper assumption text-RAG makes: that the answer is in the corpus, waiting to be matched. For prose it is — the sentence you want exists somewhere. But "total revenue across all regions" is not a row in the table. It is a computation over rows. No vector, however good, retrieves the sum of a column, because the sum was never stored. Most real questions people ask of tables are like this: aggregations, filters, rankings, comparisons. They are arithmetic, and semantic similarity cannot do arithmetic. "\$4.2M" and "\$3.9M" sit almost on top of each other in embedding space, yet the question "which is bigger" is a comparison, not a similarity.

The answer to "what's the total" is not a row you can retrieve — it's a computation you have to run.

The working pattern for these questions retrieves nothing from the data at all. It retrieves the schema — the table's columns and types — hands that to the model, and asks it to write a SQL or Python query, which an execution layer then runs against the real table. This is the architecture behind TableRAG, the EMNLP 2025 framework that decomposes a question, programs SQL for the structured part, executes it, and composes the result. Its headline finding is the one that should reset your mental model: on million-token tables, generating-and-executing beats both reading the whole table as text and retrieving individual rows or columns. You couldn't fit the table in context anyway — and even if you could, reading it as flattened text loses exactly the grid you needed. SQL sidesteps both walls. (This is the same engine behind the text-to-SQL tools you may already be evaluating; the insight is that it belongs inside your RAG router, not in a separate product.)

The question decides the pipeline, not the table#

The mistake almost every team makes is picking one pipeline for "tables." They either embed everything — and watch every aggregation question fail — or route everything to SQL and watch fuzzy lookups ("find the row about the Singapore office") return nothing, because there's no clean WHERE clause for "about."

The load-bearing variable is the question type, not the table. A lookup over a million-row table still wants SQL for a "count where" query but embeddings for a "which row mentions X" query. A tiny ten-row table needs the same fork. So the router runs first: classify the query as retrieval-shaped or computation-shaped, and send it down the matching path — embed serialized rows for the former, generate-and-run code for the latter. The hardest real questions are multi-hop across both modalities ("how does the region with the highest growth describe its strategy"), which is why TableRAG's benchmark, HeteQA, is built specifically on questions that span a table and its surrounding prose. Those you decompose: SQL finds the region, retrieval finds the strategy.

Everything else about your stack — the embedding model, the chunking strategy, the parent-document tricks — is tuning on top of that one decision. Get the routing right and tables stop being the thing that breaks your demo. Get it wrong and the best embedding model in the world will keep confidently summarizing the wrong page.