Direct Installation

Prerequisites

Cloning the Repository

To get started with Morphik, we need to first setup the server. This involves cloning the repository, installing the dependencies, and the running the server. You are just a few steps away from accurate, agentic RAG over your multi-modal data!

First, let’s clone the repository from GitHub.

git clone https://github.com/morphik-org/morphik-core.git

After cloning the repository, you will notice a morphik-core folder in your current directory.

ls morphik-core

If you see an error like ls: morphik-core: No such file or directory, it means that the repository is not cloned properly. Please try again.

Once you have cloned the repository, navigate into the morphik-core folder.

cd morphik-core

Next, you need to set up a virtual environment called .venv.

Setting Up the Environment

Installing Python Dependencies

While it is not required, we highly recommend using a virtual environment to ensure dependencies from other projects do not conflict with Morphik. You may use managers like uv or poetry, but for this guide, we will use the built-in venv module.

Morphik currently supports Python 3.12 as the latest version. We recommend using Python 3.12 for optimal compatibility.

python3.12 -m venv .venv

Now, you need to activate the virtual environment. The activation command differs based on your operating system.

source .venv/bin/activate

After activation, your command prompt should be prefixed with (.venv), indicating that the virtual environment is active. Once your virtual environment is activated, you can install the required dependencies.

pip install -r requirements.txt

Python Dependencies for Document Processing

You may also need additional Python packages or NLTK resources for processing various document types:

# If using Python 3.12+, you might need a specific version of unstructured
pip install unstructured==0.16.10

# Download required NLTK resources
python -m nltk.downloader averaged_perceptron_tagger punkt

Setting up the Server Parameters

At this point, you may want to customize the server - such as use a different embedding model, enable or disable certain features, etc. - you can do so by editing the databridge.toml file. You can find more details about configuration here.

Morphik uses environment variables to manage secrets and api keys. In order to ensure that any pre-set keys are available to the server, copy the .env.example file to .env:

cp .env.example .env

In case you’re using external models, you may have to edit the .env file with the necessary API keys. Finally, you can run the setup script to install dependencies and setup the database and vector store.

python quick_setup.py

Launching the Server

You are now ready to launch the Morphik server! Just run the following command to start the server.

python start_server.py

You should see the following output:

INFO:     Started server process [15169]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)

This means that the server is running on http://localhost:8000. You can now interact with the server using the API or the Python SDK.

Next Steps

Now that you have the server running, you can explore the different ways to interact with the server.

Special thanks to rex777 from the Morphik Discord community for help with troubleshooting and improving these setup instructions!!!