Vector Database Applications with BuildShip

Vector databases are a powerful tool for efficiently storing and retrieving unstructured data like text, images, and audio using vector embeddings and similarity-based searches.

Unlike traditional databases that rely on exact keyword matching, vector databases use vector embeddings to represent data in high-dimensional vector spaces. This approach enables similarity-based searches, making it easier to find relevant information even when exact matches are not available.

Vector Database Applications with BuildShip

One of the key applications of vector databases is in the field of Natural Language Processing (NLP), where they are used in conjunction with Retrieval-Augmented Generation (RAG) models. These models leverage vector databases to retrieve relevant information from large text corpora, which is then used to generate more informed and context-aware responses.

With BuildShip, you can quickly and easily create vector database applications, streamlining the process of ingesting text, creating vector embeddings, and performing efficient vector searches.

SECTION 1: Creating a Vector Database

Vector databases are designed to store and retrieve vector embeddings, which are high-dimensional numerical representations of text or other data. These embeddings capture the semantic meaning of the data, allowing for efficient similarity searches based on vector distances.

In a vector database, each entry typically consists of an ID, metadata, and the vector embedding itself. For example, a JSON object representing a text entry might look like this:

{
  "id": "text_1",
  "metadata": {
    "source": "example.com",
    "title": "Introduction to Vector Databases"
  },
  "embedding": [0.021, -0.032, 0.112, ... ]
}

The process of generating vector embeddings involves using a machine learning model trained on large text corpora. One popular option is OpenAI's ada embedding model, which can generate embeddings for text up to 8,192 tokens (approximately 8,000 words).

BuildShip offers a pre-built node for generating embeddings using the OpenAI ada model, making it easy to integrate into your workflows.

💡

OpenAI provides comprehensive documentation on their embeddings API, learn more (opens in a new tab).

Several vector database services are available, each with its own features and pricing models. BuildShip provides pre-built, ready-to-use CRUD (Create, Read, Update, Delete) nodes for Pinecone, a managed vector database service. Know more. (opens in a new tab).

Additionally, MongoDB, a popular NoSQL database, can also be used as a vector database by storing vector embeddings as data fields.

💡

BuildShip let’s you easily build custom nodes for MongoDB or other databases using Generate with AI.

EXAMPLE: Syncing Documentation to a Vector Index

Here's an example of how you can use BuildShip to sync documentation to a vector index:

In this example we're using the sitemap for the documentation website to get all the URL paths for the content present on the website. We have an "XML to JSON" node that converts the sitemap XML to JSON format.
Next, we use the "Concat Property Values" nodes to extract the paths from the JSON object to get an array of URLs.
Then we pass the list of the URLs to a loop node. The loop node will iterate through each URL and fetch the content from the URL using the "Scrape Web URL" node. Inside the loop, we can generate the vector embeddings using the "Generate Embeddings" node, and parse it into a JSON format which we'll be able to use for upserting the data into the vector index.
Finally, outside the loop, we can use the "Upsert Document" node to add the documents to the vector index. For this example, we've used the Pinecone vector database.

This process allows you to quickly and easily ingest text data, generate vector embeddings, and add documents to the vector index.

SECTION 2: Querying the Vector Database

Once you have ingested your data into a vector database, you can perform efficient similarity-based searches to retrieve relevant information. This process involves comparing the vector embeddings of a query with the embeddings of the stored documents to find the most similar matches.

How to Query the Vector Database

Convert the question/prompt to a vector embedding: The first step is to convert your question or prompt into a vector embedding using the same model and process used for generating embeddings during the data ingestion stage.
Query the vector database collection: With the vector embedding of your prompt, you can query your vector database collection to retrieve the top relevant documents based on vector similarity. This is typically achieved by specifying a topK parameter, which determines the number of closest vector matches to return.

💡

BuildShip offers a bunch of pre-built nodes for querying vector databases available in the Node Library, making it super easy to integrate vector searches into your workflows.

Utilize the retrieved data: The top relevant documents retrieved from the vector database can be used in various ways, depending on your requirements. For example, you could display the relevant documents directly to the user, extract specific information from the documents, or pass the data to an AI model for further processing.

EXAMPLE: Creating a Chat Bot

One compelling use case for querying vector databases is the creation of a chat bot trained on your specific data. This approach is significantly more efficient and straightforward compared to alternatives like fine-tuning language models, which can be time-consuming and resource-intensive.

By leveraging a vector database, you can create a chat bot that provides precise and relevant responses based on the data stored in the vector index. Here's an example of how you can query a vector index, retrieve relevant data, and pass it to an AI model to generate a response:

In this example, we first convert the prompt to a vector embedding. Then, we query the vector index to retrieve the top 3 most relevant documents based on vector similarity. The text content of these documents is concatenated to form the context, which is passed along with the prompt to an AI model. The AI model can then generate a precise response by leveraging both the prompt and the relevant context from the vector database.

Need Help?

💬
Join BuildShip Community
An active and large community of no-code / low-code builders. Ask questions, share feedback, showcase your project and connect with other BuildShip enthusiasts.
🙋
Hire a BuildShip Expert
Need personalized help to build your product fast? Browse and hire from a range of independent freelancers, agencies and builders - all well versed with BuildShip.
🛟
Send a Support Request
Got a specific question on your workflows / project or want to report a bug? Send a us a request using the "Support" button directly from your BuildShip Dashboard.
⭐️
Feature Request
Something missing in BuildShip for you? Share on the #FeatureRequest channel on Discord. Also browse and cast your votes on other feature requests.