Chroma 向量数据库的使用

Chroma向量数据库的使用‌主要包括安装、配置、数据操作（增删改查）和高级功能等几个方面。以下是一个详细的使用指南：

1. 安装

使用pip进行安装：

pip install chromadb

2. 初始化与配置

创建Client

import chromadb 
chroma_client = chromadb.Client() # 默认数据存储在内存中，程序运行完数据会丢失

如果你需要数据持久化存储，可以创建PersistentClient：

client = chromadb.PersistentClient(path="/path/to/save/to") # 指定数据存储路径

3. 数据操作

创建集合

collection = chroma_client.create_collection(name="my_collection") # 创建一个名为"my_collection"的集合

添加数据

collection.add( documents=["This is a document", "This is another document"], metadatas=[{"source": "my_source"}, {"source": "my_source"}], ids=["id1", "id2"] )

或者，如果你已经有了文档的向量表示，可以直接添加向量：

collection.add( embeddings=[[1.2, 2.3, 4.5], [6.7, 8.2, 9.2]], documents=["This is a document", "This is another document"], metadatas=[{"source": "my_source"}, {"source": "my_source"}], ids=["id1", "id2"] )

查询数据

results = collection.query( query_texts=["This is a query document"], n_results=2 # 返回最相似的2个结果 )

或者，使用向量进行查询：

results = collection.query( query_embeddings=[[1.1, 2.3, 3.2], [4.5, 6.9, 4.4]], n_results=2 )

更新数据

collection.update( ids=["id1", "id2"], embeddings=[[1.1, 2.3, 3.2], [4.5, 6.9, 4.4]], metadatas=[{"chapter": "3", "verse": "16"}, {"chapter": "3", "verse": "5"}], documents=["doc1", "doc2"] )

或者，使用upsert进行更新或添加：

collection.upsert( ids=["id1", "id2"], embeddings=[[1.1, 2.3, 3.2], [4.5, 6.9, 4.4]], metadatas=[{"chapter": "3", "verse": "16"}, {"chapter": "3", "verse": "5"}], documents=["doc1", "doc2"] )

删除数据

collection.delete(ids=["id1", "id2"])

4. 高级功能

持久化存储

如前所述，使用PersistentClient可以将数据持久化存储到硬盘。

客户端/服务端部署

对于实际项目，Chroma支持客户端/服务端部署方式。你可以启动Chroma服务：

chroma run --path db_path

然后在客户端连接：

client = chromadb.HttpClient(host='localhost', port=8000)

自定义嵌入函数

在创建集合时，你可以指定自定义的嵌入函数：

collection = client.create_collection(name="my_collection", embedding_function=emb_fn)

注意事项

确保你的Python版本与Chroma兼容。
在使用自定义嵌入函数时，确保向量的维度与集合的维度一致。
在进行复杂查询时，可以参考Chroma的官方文档以了解更多的过滤和排序选项。

以上是使用Chroma向量数据库的基本步骤和注意事项。在实际应用中，你可能需要根据具体需求进行更多的配置和优化。

Menu

Share

Chroma 向量数据库的使用

1. 安装

2. 初始化与配置

创建Client

3. 数据操作

创建集合

添加数据

查询数据

更新数据

删除数据

4. 高级功能

持久化存储

客户端/服务端部署

自定义嵌入函数

注意事项

Comment

HuggingFace 的镜像站

Chroma 的下载和安装

基于llm大语言模型生成ai导游App

大模型最常使用的5大向量数据库：Chroma、Pinecone、Weaviate、Milvus和Faiss介绍

Rembg 去除背景工具介绍

Linux sed 命令详解

Peewee 使用教程

Anaconda 安装 Pytorch

NumPy 库介绍

Ollama 介绍