🌱 (seedling) | Literature note |

Use Cases And Anit-Use Cases for Vector Databases

10 Examples of Proper Activities with a Vector Database

  1. Semantic Search: Powering a search engine that understands the meaning behind a query, not just the keywords. For example, a search for “best pizza restaurant” could return results for “top-rated pizzerias”.  

    • Reason: Vector databases excel at finding results that are conceptually similar by comparing the proximity of vector embeddings, which capture semantic meaning.  
  2. Recommendation Engines: Suggesting products, movies, or music to users based on their preferences and behavior.  

    • Reason: User preferences and item attributes can be converted into vectors. The database can then quickly find items whose vectors are closest to a user’s preference vector.  
  3. Image and Visual Search: Allowing a user to search for visually similar images by uploading a picture.  

    • Reason: Images are converted into vector embeddings that represent their visual features. The database can then perform a similarity search to find other images with comparable vector representations.  
  4. Retrieval-Augmented Generation (RAG): Providing a Large Language Model (LLM) with relevant, up-to-date, or domain-specific context to improve its answers and reduce “hallucinations”.  

    • Reason: A user’s query is used to perform a similarity search on a vector database containing an organization’s private documents. The most relevant results are fed to the LLM as context, grounding its response in factual data.  
  5. Anomaly and Fraud Detection: Identifying unusual patterns of behavior that may indicate fraudulent activity.  

    • Reason: Normal behavior patterns are stored as vectors. New activities are compared against these patterns, and significant deviations (distant vectors) are flagged as potential anomalies.  
  6. Multimodal Search: Searching across different types of data with a single query, such as using a text description to find a relevant image or audio clip.  

    • Reason: Different data types (text, images, audio) can be represented in the same vector space, allowing for cross-modal similarity comparisons.  
  7. Drug Discovery and Genomics: Finding molecular structures or genetic patterns that are similar to a target compound.  

    • Reason: Complex biological structures can be represented as high-dimensional vectors, enabling rapid similarity searches that can accelerate research.  
  8. Personalized Advertising: Matching user profiles with the most relevant advertisements.  

    • Reason: User behaviors and ad characteristics are converted into vectors, and the database finds the closest matches to serve personalized ads.  
  9. Question-Answering Systems: Finding the most relevant passages of text from a large knowledge base to answer a user’s specific question.  

    • Reason: The system vectorizes the question and searches for text chunks with the most similar vector embeddings, which are likely to contain the answer.  
  10. Clustering and Classification: Automatically grouping similar data points together without pre-defined labels.  

    • Reason: The database’s ability to measure similarity allows it to efficiently group data points based on the proximity of their vectors.

10 Examples of What Not to Do with a Vector Database

  1. Processing Financial Transactions: Using it to manage a bank transfer where money is debited from one account and credited to another.

    • Reason: Vector databases lack ACID (Atomicity, Consistency, Isolation, Durability) compliance, which is essential for guaranteeing that transactions are processed reliably and data integrity is maintained.  
  2. Serving as a Primary System of Record: Storing critical, structured application data like user accounts, order histories, or inventory levels.

    • Reason: They are not designed to be general-purpose databases and lack the robust data management, integrity, and consistency guarantees of traditional SQL or NoSQL databases.  
  3. Performing Complex Relational Joins: Combining data from multiple tables based on foreign key relationships, such as linking a customers table with an orders table.

    • Reason: The data model is not relational. Vector databases lack the concepts of tables, rows, and foreign keys required to perform joins.  
  4. Running Aggregation Queries: Calculating the total sales per region (SUM and GROUP BY) for a business report.

    • Reason: Their indexes are optimized to avoid scanning the entire dataset. Forcing an aggregation would bypass this optimization, leading to highly inefficient performance compared to a traditional database.  
  5. Exact Keyword Matching: Searching for a specific, unique product ID like “A2848” or a rare technical term where precision is critical.

    • Reason: Vector search is designed for semantic similarity, not lexical identity. It may generalize the term and return conceptually similar but inexact results.  
  6. High-Throughput Transactional (OLTP) Workloads: Powering an e-commerce checkout system that processes thousands of concurrent orders.

    • Reason: They are optimized for read-heavy similarity searches, not the high volume of short, write-heavy transactions that characterize OLTP systems.  
  7. Ensuring Strong Data Consistency: Guaranteeing that a write operation is instantly visible across all system replicas, as required in a stock trading platform.

    • Reason: Vector databases often prioritize availability and low latency, operating on an “eventual consistency” model where data may be temporarily stale across replicas.  
  8. Handling Negative or Exclusionary Queries: Trying to find “all animals except cats.”

    • Reason: The nature of semantic search means including the term “cats” in the query will actually increase the similarity score for cat-related results, making it difficult to exclude them effectively.  
  9. Storing and Querying Highly Structured Data: Managing a traditional employee database with columns for name, ID, salary, and start date.

    • Reason: The architecture is optimized for handling high-dimensional vector data, not for efficiently storing and querying structured data in rows and columns.  
  10. Replacing a Full-Text Search Engine for Keyword-Heavy Tasks: Building a legal document search where finding every instance of a specific keyword is mandatory.

    • Reason: While they can be part of a hybrid search system, their core strength is semantic relevance. For tasks demanding pure, high-precision keyword retrieval, a dedicated full-text search engine like Elasticsearch is better suited.

Notes mentioning this note