Use Cases And Anit-Use Cases for Vector Databases
10 Examples of Proper Activities with a Vector Database
-
Semantic Search: Powering a search engine that understands the meaning behind a query, not just the keywords. For example, a search for “best pizza restaurant” could return results for “top-rated pizzerias”.
- Reason: Vector databases excel at finding results that are conceptually similar by comparing the proximity of vector embeddings, which capture semantic meaning.
-
Recommendation Engines: Suggesting products, movies, or music to users based on their preferences and behavior.
- Reason: User preferences and item attributes can be converted into vectors. The database can then quickly find items whose vectors are closest to a user’s preference vector.
-
Image and Visual Search: Allowing a user to search for visually similar images by uploading a picture.
- Reason: Images are converted into vector embeddings that represent their visual features. The database can then perform a similarity search to find other images with comparable vector representations.
-
Retrieval-Augmented Generation (RAG): Providing a Large Language Model (LLM) with relevant, up-to-date, or domain-specific context to improve its answers and reduce “hallucinations”.
- Reason: A user’s query is used to perform a similarity search on a vector database containing an organization’s private documents. The most relevant results are fed to the LLM as context, grounding its response in factual data.
-
Anomaly and Fraud Detection: Identifying unusual patterns of behavior that may indicate fraudulent activity.
- Reason: Normal behavior patterns are stored as vectors. New activities are compared against these patterns, and significant deviations (distant vectors) are flagged as potential anomalies.
-
Multimodal Search: Searching across different types of data with a single query, such as using a text description to find a relevant image or audio clip.
- Reason: Different data types (text, images, audio) can be represented in the same vector space, allowing for cross-modal similarity comparisons.
-
Drug Discovery and Genomics: Finding molecular structures or genetic patterns that are similar to a target compound.
- Reason: Complex biological structures can be represented as high-dimensional vectors, enabling rapid similarity searches that can accelerate research.
-
Personalized Advertising: Matching user profiles with the most relevant advertisements.
- Reason: User behaviors and ad characteristics are converted into vectors, and the database finds the closest matches to serve personalized ads.
-
Question-Answering Systems: Finding the most relevant passages of text from a large knowledge base to answer a user’s specific question.
- Reason: The system vectorizes the question and searches for text chunks with the most similar vector embeddings, which are likely to contain the answer.
-
Clustering and Classification: Automatically grouping similar data points together without pre-defined labels.
- Reason: The database’s ability to measure similarity allows it to efficiently group data points based on the proximity of their vectors.
10 Examples of What Not to Do with a Vector Database
-
Processing Financial Transactions: Using it to manage a bank transfer where money is debited from one account and credited to another.
- Reason: Vector databases lack ACID (Atomicity, Consistency, Isolation, Durability) compliance, which is essential for guaranteeing that transactions are processed reliably and data integrity is maintained.
-
Serving as a Primary System of Record: Storing critical, structured application data like user accounts, order histories, or inventory levels.
- Reason: They are not designed to be general-purpose databases and lack the robust data management, integrity, and consistency guarantees of traditional SQL or NoSQL databases.
-
Performing Complex Relational Joins: Combining data from multiple tables based on foreign key relationships, such as linking a
customers
table with anorders
table.- Reason: The data model is not relational. Vector databases lack the concepts of tables, rows, and foreign keys required to perform joins.
-
Running Aggregation Queries: Calculating the total sales per region (
SUM
andGROUP BY
) for a business report.- Reason: Their indexes are optimized to avoid scanning the entire dataset. Forcing an aggregation would bypass this optimization, leading to highly inefficient performance compared to a traditional database.
-
Exact Keyword Matching: Searching for a specific, unique product ID like “A2848” or a rare technical term where precision is critical.
- Reason: Vector search is designed for semantic similarity, not lexical identity. It may generalize the term and return conceptually similar but inexact results.
-
High-Throughput Transactional (OLTP) Workloads: Powering an e-commerce checkout system that processes thousands of concurrent orders.
- Reason: They are optimized for read-heavy similarity searches, not the high volume of short, write-heavy transactions that characterize OLTP systems.
-
Ensuring Strong Data Consistency: Guaranteeing that a write operation is instantly visible across all system replicas, as required in a stock trading platform.
- Reason: Vector databases often prioritize availability and low latency, operating on an “eventual consistency” model where data may be temporarily stale across replicas.
-
Handling Negative or Exclusionary Queries: Trying to find “all animals except cats.”
- Reason: The nature of semantic search means including the term “cats” in the query will actually increase the similarity score for cat-related results, making it difficult to exclude them effectively.
-
Storing and Querying Highly Structured Data: Managing a traditional employee database with columns for name, ID, salary, and start date.
- Reason: The architecture is optimized for handling high-dimensional vector data, not for efficiently storing and querying structured data in rows and columns.
-
Replacing a Full-Text Search Engine for Keyword-Heavy Tasks: Building a legal document search where finding every instance of a specific keyword is mandatory.
- Reason: While they can be part of a hybrid search system, their core strength is semantic relevance. For tasks demanding pure, high-precision keyword retrieval, a dedicated full-text search engine like Elasticsearch is better suited.