Building a knowledge graph from a pile of documents looks, in the demo, like a solved problem: feed the text to an LLM, ask for entities and relationships, draw the result. The demo always works. The graph it produces is usually junk — not because the extraction is wrong, but because of a step that doesn't appear in the demo at all.

Here's the pipeline as it actually runs, in four stages.

Stage 1 and 2: chunk, then extract

You split the documents into text units and, for each unit, prompt an LLM to pull out the entities and the relationships connecting them — subject, predicate, object triples. "Anthropic — released — Claude." "Claude — is-a — language model." This is the part every tutorial shows, and modern models are genuinely good at it. Microsoft's GraphRAG frames its first phase exactly this way: an LLM analyzes each text unit to identify entities (with a title, type, and description) plus the relationships among them.

The one real decision here is schema-guided versus open extraction. Schema-guided means you hand the model an allowed list — entity types like Company, Model, Person; relation types like released, acquired, works_at — and forbid anything else. Open extraction lets the model invent types as it goes. The tradeoff is clean: a fixed schema buys precision and consistency and keeps the graph queryable, at the cost of recall and cross-domain flexibility; open extraction finds more and trusts more. Both LangChain (via allowed_nodes / allowed_relationships) and LlamaIndex (via a SchemaLLMPathExtractor) let you constrain it, and generally nudge you to, because a schema is the cheapest consistency you'll ever buy.

Stage 3: the step the demo skips

Now the problem. Each chunk was extracted independently. The model that processed chunk 4 has no memory of chunk 9. So when chunk 4 mentions "OpenAI", chunk 9 mentions "OpenAI Inc.", and chunk 12 says "the company," you don't get one node with three mentions. You get three nodes.

The graph is only as good as its entity resolution. Skip it and you haven't built a knowledge graph — you've built a pile of disconnected sentences that happen to be shaped like one.

This is the quality-determining step, and it has a name in the literature. Neo4j calls duplicated entities "a common challenge with knowledge graphs constructed from unstructured data with the help of LLMs," and its Graph Builder ships node-similarity merging to fix it. Two strategies dominate: embedding-similarity merge (embed the node names and descriptions, then cluster or KNN-match the ones above a threshold) and LLM-based matching (prompt a model to decide whether two candidate records are the same entity and fold them together). Most production systems use both — a cheap embedding pass to propose candidates, an LLM to adjudicate the close calls.

The academic framing makes the point even sharper. The Extract, Define, Canonicalize (EDC) framework from EMNLP 2024 splits construction into three explicit phases — and canonicalize is its own phase, merging semantically similar schema elements via vector similarity plus an LLM verification step. When researchers give a stage its own name in the pipeline, it's because that stage is where the quality lives. GraphRAG agrees by construction: after extraction it merges entities and relationships with identical identifiers, then runs an LLM summarization pass to consolidate the multiple descriptions a merged node accumulates into one.

Stage 4: store — and what you do after

Writing nodes and edges to a graph store (Neo4j, FalkorDB, Memgraph — a decision worth its own analysis) is the easy part. The interesting work is what some systems do on top of the resolved graph. GraphRAG runs Leiden hierarchical community detection to cluster densely connected entities, then has an LLM write a summary report per community — which is what lets it answer global, "what are the themes across this whole corpus" questions that plain vector RAG can't. In its evaluation, the global graph approach reports comprehensiveness win rates of roughly 72–83% over naive RAG on million-token datasets. That payoff is real, but it sits entirely on top of a clean, well-resolved graph; run community detection over un-deduplicated nodes and you get communities of phantom duplicates.

The tools, and what they actually give you

The named options pair an extraction step with some form of merging, and differ mostly in how much structure and lifecycle they manage. LangChain's LLMGraphTransformer is the lightweight, framework-native path. LlamaIndex's PropertyGraphIndex runs a configurable pipeline of extractors and supersedes its older triple-based index. Neo4j's LLM Graph Builder is a full app that turns PDFs, web pages, and transcripts into a combined lexical-and-entity graph with dedup built in. And Graphiti is worth knowing if your graph changes over time: it's a temporal knowledge-graph engine that ingests data incrementally and tracks when each fact was valid — superseded facts are marked invalid rather than deleted — which is why it's the engine behind Zep's agent memory.

The throughline across all of them: extraction is the part that demos, resolution is the part that matters. Treat "get the LLM to output triples" as 20% of the work, budget the other 80% for deciding which of those triples are secretly about the same thing, and you'll build a graph someone can actually query.