Here’s the sequence for o4-mini-high:

Here’s the sequence for Morphik:

A technical exploration of why even natively multimodal LLMs struggle with diagram interpretation in documents

When Multimodal Models Go Blind

Morphik Documentation

What is Morphik?

Getting Started

Get the most out of Morphik by customizing it to your needs

Configure Morphik

An overview of Retrieval Augmented Generation with Vector Similarity Search

Introduction to RAG

Organizing data with user and folder scoping in Morphik

User and Folder Scoping

An overview of rules based ingestion in Morphik

Rules

Using Late-interaction and Contrastive learning to achieve state-of-the-art performance in visual retrieval

Retrieving Images

Leveraging graph-based relationships for improved context and retrieval in RAG systems

Knowledge Graphs and Graph RAG

Learn how to use the Morphik UI interface

Morphik UI

Learn how to use the Morphik command-line interface

Morphik Shell

Enable Claude and other AI assistants to access your Morphik knowledge base

Model Context Protocol (MCP)

Learn how to use the Morphik Python SDK or REST API

Code

Thanking our amazing users for pointing out bugs, requesting features, and being early adopters ❤️

Special Thanks

What I Learnt From Vibe-Coding an Open-Source Alternative to ChatGPT's New Memory Feature

Vibe-Coding Memory

Drowning in Discoveries? How LLMs (and Morphik) Are Learning to Read Science

LLM Science Battle

Health Check

Readiness check that verifies the application is initialized.

Readiness Check

Ingest a text document.

Args:
    request: IngestTextRequest containing:
        - content: Text content to ingest
        - filename: Optional filename to help determine content type
        - metadata: Optional metadata dictionary
        - rules: Optional list of rules. Each rule should be either:
               - MetadataExtractionRule: {"type": "metadata_extraction", "schema": {...}}
               - NaturalLanguageRule: {"type": "natural_language", "prompt": "..."}
        - folder_name: Optional folder to scope the document to
        - end_user_id: Optional end-user ID to scope the document to
    auth: Authentication context

Returns:
    Document: Metadata of ingested document

Ingest Text

Ingest a file document.

Args:
    file: File to ingest
    metadata: JSON string of metadata
    rules: JSON string of rules list. Each rule should be either:
           - MetadataExtractionRule: {"type": "metadata_extraction", "schema": {...}}
           - NaturalLanguageRule: {"type": "natural_language", "prompt": "..."}
    auth: Authentication context
    use_colpali: Whether to use ColPali embedding model
    folder_name: Optional folder to scope the document to
    end_user_id: Optional end-user ID to scope the document to

Returns:
    Document: Metadata of ingested document

Ingest File

Batch ingest multiple files.

Args:
    files: List of files to ingest
    metadata: JSON string of metadata (either a single dict or list of dicts)
    rules: JSON string of rules list. Can be either:
           - A single list of rules to apply to all files
           - A list of rule lists, one per file
    use_colpali: Whether to use ColPali-style embedding
    parallel: Whether to process files in parallel
    folder_name: Optional folder to scope the documents to
    end_user_id: Optional end-user ID to scope the documents to
    auth: Authentication context

Returns:
    BatchIngestResponse containing:
        - documents: List of successfully ingested documents
        - errors: List of errors encountered during ingestion

Batch Ingest Files

Retrieve relevant chunks.

Args:
    request: RetrieveRequest containing:
        - query: Search query text
        - filters: Optional metadata filters
        - k: Number of results (default: 4)
        - min_score: Minimum similarity threshold (default: 0.0)
        - use_reranking: Whether to use reranking
        - use_colpali: Whether to use ColPali-style embedding model
        - folder_name: Optional folder to scope the search to
        - end_user_id: Optional end-user ID to scope the search to
    auth: Authentication context
    
Returns:
    List[ChunkResult]: List of relevant chunks

Retrieve Chunks

Retrieve relevant documents.

Args:
    request: RetrieveRequest containing:
        - query: Search query text
        - filters: Optional metadata filters
        - k: Number of results (default: 4)
        - min_score: Minimum similarity threshold (default: 0.0)
        - use_reranking: Whether to use reranking
        - use_colpali: Whether to use ColPali-style embedding model
        - folder_name: Optional folder to scope the search to
        - end_user_id: Optional end-user ID to scope the search to
    auth: Authentication context
    
Returns:
    List[DocumentResult]: List of relevant documents

Retrieve Documents

Retrieve multiple documents by their IDs in a single batch operation.

Args:
    request: Dictionary containing:
        - document_ids: List of document IDs to retrieve
        - folder_name: Optional folder to scope the operation to
        - end_user_id: Optional end-user ID to scope the operation to
    auth: Authentication context
    
Returns:
    List[Document]: List of documents matching the IDs

Batch Get Documents

Retrieve specific chunks by their document ID and chunk number in a single batch operation.

Args:
    request: Dictionary containing:
        - sources: List of ChunkSource objects (with document_id and chunk_number)
        - folder_name: Optional folder to scope the operation to
        - end_user_id: Optional end-user ID to scope the operation to
    auth: Authentication context
    
Returns:
    List[ChunkResult]: List of chunk results

Batch Get Chunks

Generate completion using relevant chunks as context.

When graph_name is provided, the query will leverage the knowledge graph 
to enhance retrieval by finding relevant entities and their connected documents.

Args:
    request: CompletionQueryRequest containing:
        - query: Query text
        - filters: Optional metadata filters
        - k: Number of chunks to use as context (default: 4)
        - min_score: Minimum similarity threshold (default: 0.0)
        - max_tokens: Maximum tokens in completion
        - temperature: Model temperature
        - use_reranking: Whether to use reranking
        - use_colpali: Whether to use ColPali-style embedding model
        - graph_name: Optional name of the graph to use for knowledge graph-enhanced retrieval
        - hop_depth: Number of relationship hops to traverse in the graph (1-3)
        - include_paths: Whether to include relationship paths in the response
        - prompt_overrides: Optional customizations for entity extraction, resolution, and query prompts
        - folder_name: Optional folder to scope the operation to
        - end_user_id: Optional end-user ID to scope the operation to
    auth: Authentication context
    
Returns:
    CompletionResponse: Generated completion

Query Completion

List accessible documents.

Args:
    auth: Authentication context
    skip: Number of documents to skip
    limit: Maximum number of documents to return
    filters: Optional metadata filters
    folder_name: Optional folder to scope the operation to
    end_user_id: Optional end-user ID to scope the operation to
    
Returns:
    List[Document]: List of accessible documents

List Documents

Get Document

Delete a document and all associated data.

This endpoint deletes a document and all its associated data, including:
- Document metadata
- Document content in storage
- Document chunks and embeddings in vector store

Args:
    document_id: ID of the document to delete
    auth: Authentication context (must have write access to the document)

Returns:
    Deletion status

Delete Document

Get document by filename.

Args:
    filename: Filename of the document to retrieve
    auth: Authentication context
    folder_name: Optional folder to scope the operation to
    end_user_id: Optional end-user ID to scope the operation to
    
Returns:
    Document: Document metadata if found and accessible

Get Document By Filename

Update a document with new text content using the specified strategy.

Args:
    document_id: ID of the document to update
    request: Text content and metadata for the update
    update_strategy: Strategy for updating the document (default: 'add')
    
Returns:
    Document: Updated document metadata

Update Document Text

Update a document with content from a file using the specified strategy.

Args:
    document_id: ID of the document to update
    file: File to add to the document
    metadata: JSON string of metadata to merge with existing metadata
    rules: JSON string of rules to apply to the content
    update_strategy: Strategy for updating the document (default: 'add')
    use_colpali: Whether to use multi-vector embedding

Returns:
    Document: Updated document metadata

Update Document File

Update only a document's metadata.

Args:
    document_id: ID of the document to update
    metadata: New metadata to merge with existing metadata

Returns:
    Document: Updated document metadata

Update Document Metadata

Get usage statistics for the authenticated user.

Get Usage Stats

Get Recent Usage

Create a new cache with specified configuration.

Create Cache

Get Cache

Update cache with new documents matching its filter.

Update Cache

Add Docs To Cache

Query Cache

Create a graph from documents.

This endpoint extracts entities and relationships from documents
matching the specified filters or document IDs and creates a graph.

Args:
    request: CreateGraphRequest containing:
        - name: Name of the graph to create
        - filters: Optional metadata filters to determine which documents to include
        - documents: Optional list of specific document IDs to include
        - prompt_overrides: Optional customizations for entity extraction and resolution prompts
        - folder_name: Optional folder to scope the operation to
        - end_user_id: Optional end-user ID to scope the operation to
    auth: Authentication context

Returns:
    Graph: The created graph object

Create Graph

Get a graph by name.

This endpoint retrieves a graph by its name if the user has access to it.

Args:
    name: Name of the graph to retrieve
    auth: Authentication context
    folder_name: Optional folder to scope the operation to
    end_user_id: Optional end-user ID to scope the operation to

Returns:
    Graph: The requested graph object

Get Graph

List all graphs the user has access to.

This endpoint retrieves all graphs the user has access to.

Args:
    auth: Authentication context
    folder_name: Optional folder to scope the operation to
    end_user_id: Optional end-user ID to scope the operation to

Returns:
    List[Graph]: List of graph objects

List Graphs

Update an existing graph with new documents.

This endpoint processes additional documents based on the original graph filters 
and/or new filters/document IDs, extracts entities and relationships, and 
updates the graph with new information.

Args:
    name: Name of the graph to update
    request: UpdateGraphRequest containing:
        - additional_filters: Optional additional metadata filters to determine which new documents to include
        - additional_documents: Optional list of additional document IDs to include
        - prompt_overrides: Optional customizations for entity extraction and resolution prompts
        - folder_name: Optional folder to scope the operation to
        - end_user_id: Optional end-user ID to scope the operation to
    auth: Authentication context

Returns:
    Graph: The updated graph object

Implementation Stories

​Here’s the sequence for o4-mini-high:

​Here’s the sequence for Morphik:

Here’s the sequence for o4-mini-high:

Here’s the sequence for Morphik: