Knowledge Graphs and Graph RAG
Leveraging graph-based relationships for improved context and retrieval in RAG systems
Introduction
Traditional Retrieval-Augmented Generation (RAG) systems typically use vector-based similarity searches to find relevant documents. While effective for straightforward queries, vector searches often struggle with more nuanced information needs that involve understanding connections between entities dispersed across multiple documents.
That’s where knowledge graphs come into play. Unlike traditional vector-based approaches, knowledge graphs explicitly capture entities and their relationships, uncovering connections that otherwise might be missed.
Consider three simple documents:
- “Elon Musk is the CEO of SpaceX.”
- “Starship is a spacecraft developed by SpaceX, designed for missions to Mars.”
- “Tesla produces electric vehicles, and Elon Musk serves as its CEO.”
If a user queries, “Who leads the companies involved in Mars exploration, and what other companies does this individual lead?”, a traditional vector search might only identify the second document about mars, and space, potentially overlooking Elon Musk’s relationship with Tesla and SpaceX. In contrast, a knowledge graph explicitly represents these interconnected relationships, providing a comprehensive, context-rich answer by traversing connections across all three documents. Let’s dig into why and how?
Core Concepts
What is a Knowledge Graph?
A knowledge graph is a structured representation of information that consists of:
- Entities: Distinct objects, concepts, or things (e.g., people, organizations, products, technologies)
- Relationships: Connections between entities that describe how they relate to each other
- Properties (optional): Additional attributes that describe entities or relationships
The example we will build will finally look like:
Implementation in Morphik
Morphik’s knowledge graph implementation is built on several core components:
Entity and Relationship Extraction
When you create a knowledge graph, Morphik processes your documents to extract entities and relationships. Entities and relationships are extracted for every chunk of the documents requested for creation. This is implemented in the GraphService
class:
The system uses language models to identify entities and relationships between entities of various types:
- People (e.g., “Sam Altman”)
- Organizations (e.g., “OpenAI”)
- Locations (e.g., “San Francisco”)
- Technologies (e.g., “Machine Learning”)
- Concepts (e.g., “Retrieval Augmented Generation”)
- Products (e.g., “GPT-4”)
- Events (e.g., “AI Conference 2025”)
- And more…
Entity Resolution
One challenge with extracting entities from text is that the same entity might be referenced in different ways. For example, “Sam Altman”, “Samuel H. Altman”, and “OpenAI CEO” might all refer to the same person.
Morphik addresses this with entity resolution, implemented in the EntityResolver
class:
This ensures that the knowledge graph accurately represents unique entities and their relationships, even when they’re referenced inconsistently across documents.
Graph Construction and Storage
The extracted entities and relationships are stored in a graph structure:
Each entity and relationship maintains references to the documents and chunks where they were found, enabling the system to retrieve the original context when needed.
Graph-Enhanced Retrieval
When querying with a knowledge graph, Morphik enhances the retrieval process:
This process is implemented in the query_with_graph
method:
The hop_depth
parameter controls how far to traverse the graph from the initial entities, allowing you to balance between focused and comprehensive retrieval.
Using Knowledge Graphs in Morphik
Creating a Knowledge Graph
You can create a knowledge graph from your documents using the Python SDK or the UI component. I’ll show the SDK below, UI should be simpler.
Behind the scenes, Morphik:
- Retrieves the matching documents
- Processes each document to extract entities and relationships
- Performs entity resolution to eliminate duplicates
- Constructs the graph and saves it
Querying with a Knowledge Graph
Once you’ve created a knowledge graph, you can use it to enhance your queries:
The hop_depth
parameter determines how far to traverse the graph from the initial entities found in the query. A higher value casts a wider net but may include less relevant information.
When include_paths=True
, the response includes the paths through the graph that led to the retrieved documents, providing explainability for why certain information was included.
Example: Building a Healthcare Knowledge Graph
Let’s walk through a complete example of using knowledge graphs for a healthcare application:
In this example, the knowledge graph might identify entities like:
- Conditions: “Diabetes”, “Hypertension”
- Treatments: “Insulin”, “ACE inhibitors”, “Lifestyle modifications”
- Outcomes: “Blood sugar control”, “Blood pressure reduction”
And relationships like:
- “Insulin” -> “treats” -> “Diabetes”
- “ACE inhibitors” -> “treats” -> “Hypertension”
- “Diabetes” -> “comorbid with” -> “Hypertension”
- “Lifestyle modifications” -> “improves” -> “Blood sugar control”
- “Lifestyle modifications” -> “improves” -> “Blood pressure reduction”
The graph traversal might find that “Lifestyle modifications” is effective for both conditions, even if that connection wasn’t explicitly stated in a single document.
Graph Visualization
When working with knowledge graphs, visualization can provide valuable insights into the structure and connections within your data.
Updating Existing Graphs
As your document collection grows, you can update existing graphs:
Without any arguments, the function will check if something for the filter has been updated and if so will add the docs to the graph.
Even more Implementation Details (for the nerds)
Graph Traversal Algorithm
The core of knowledge graph querying is the entity expansion algorithm, which traverses the graph to find related entities:
This algorithm efficiently expands from the initial entities found in the query to related entities, gathering relevant context for the retrieval process.
Entity Resolution
The entity resolution process is crucial for maintaining a clean, accurate knowledge graph:
This approach allows the system to recognize different references to the same entity, improving retrieval accuracy.
Performance Considerations
Knowledge graph operations involve several performance considerations:
-
Graph Creation Time: Creating a graph involves processing all documents with LLMs, which can be time-consuming for large document collections.
-
Query Processing Overhead: Graph-enhanced retrieval requires extra processing compared to standard vector search but often produces more comprehensive results.
-
Graph Size: As the number of entities and relationships grows, memory usage increases, and traversal operations may become more expensive.
Performance tips:
- Use metadata filters to create focused graphs rather than one large graph for all documents
- Start with smaller hop depths (1 or 2) and increase only if needed
- Consider the tradeoff between processing time and retrieval quality
Conclusion
Knowledge graphs in Morphik provide a powerful way to enhance retrieval by capturing and leveraging relationships between entities in your documents. By combining traditional vector search with graph-based retrieval, Morphik delivers more comprehensive and contextually relevant information for complex queries.
Whether you’re building applications in healthcare, finance, research, or any domain with complex information relationships, knowledge graphs can significantly improve the quality of information retrieval and generation.
Next Steps
To get started with knowledge graphs in your Morphik applications:
- Review your document collection and identify domains that would benefit from relationship-aware retrieval
- Create focused knowledge graphs for these domains
- Experiment with different hop depths and query formulations
- Consider including path information to understand how the system connects information
For more advanced use cases, explore combining knowledge graphs with other Morphik features like rules processing or ColPali for multi-modal retrieval.