Ollama JavaScript Library

Ollama 快速入门:Ollama JavaScript Library

Ollama JavaScript 库提供了将 JavaScript 项目与 Ollama 集成的最简单方法。

先决条件

您需要运行本地 ollama 服务器才能继续。具体操作如下:

安装依赖

npm i ollama

用例

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'llama2',
  messages: [{ role: 'user', content: 'Why is the sky blue?' }],
})
console.log(response.message.content)

浏览器用例

要在没有 Nodejs 的情况下使用该库,请导入浏览器模块。

import ollama from 'ollama/browser'

流式响应

可以通过设置 stream: true、修改函数调用来启用响应流,以返回 AsyncGenerator 其中每个部分都是流中的一个对象。

import ollama from 'ollama'

const message = { role: 'user', content: 'Why is the sky blue?' }
const response = await ollama.chat({ model: 'llama2', messages: [message], stream: true })
for await (const part of response) {
  process.stdout.write(part.message.content)
}

创造

import ollama from 'ollama'

const modelfile = ` FROM llama2
SYSTEM "You are mario from super mario bros." `
await ollama.create({ model: 'example', modelfile: modelfile })

API

Ollama JavaScript 库的 API 是围绕 Ollama REST API 设计的.

chat

ollama.chat(request)
  • request<Object>: 包含聊天参数的请求对象.
    • model <string> 用于聊天的模型的名称.
    • messages<Message[]>: 历史聊天记录的消息对象数组.
      • role <string>: 消息发送者的角色 (‘user’, ‘system’, or ‘assistant’).
      • content <string>: 消息的内容.
      • images <Uint8Array[] | string[]>: (可选)要包含在消息中的图像,可以是 Uint8Array 或 base64 编码的字符串。
    • format <string>: (可选)设置预期的响应格式 (json).
    • stream <boolean>: (可选)当值为 true 时,返回 AsyncGenerator.
    • keep_alive <string | number>: (可选)保持模型加载的时间.
    • tools <Tool[]>:(可选)模型可能进行的工具调用列表.
    • options <Options>: (可选)配置运行时的选项.
  • 返回: <ChatResponse>

generate

ollama.generate(request)
  • request<Object>: 包含生成参数的请求对象.
    • model <string> 用于聊天的模型的名称.
    • prompt <string>: 发送给模型的提示.
    • suffix <string>:(可选)后缀是插入文本后的文本.
    • system <string>: (可选)覆盖模型系统提示.
    • template <string>: (可选)覆盖模型模板.
    • raw <boolean>: (可选)绕过提示模板并将提示直接传递给模型.
    • images <Uint8Array[] | string[]>: (可选)要包含的图像,可以是 Uint8Array 或 base64 编码的字符串.
    • format <string>: (可选)设置预期的响应格式 (json).
    • stream <boolean>: (可选)当值为 true 时,返回 AsyncGenerator.
    • keep_alive <string | number>:(可选)保持模型加载的时间.
    • options <Options>: (可选)配置运行时的选项.
  • 返回: <GenerateResponse>

pull

ollama.pull(request)
  • request<Object>: 包含拉取参数的请求对象.
    • model <string> 要拉取的模型的名称.
    • insecure <boolean>: (可选)从无法验证身份的服务器中拉取.
    • stream <boolean>: (可选)当值为 true 时,返回 AsyncGenerator.
  • 返回: <ProgressResponse>

push

ollama.push(request)
  • request<Object>: 包含推送参数的请求对象.
    • model <string>: 要推送的模型的名称.
    • insecure <boolean>: (可选)推送到无法验证身份的服务器.
    • stream <boolean>: (可选)当值为 true 时,返回 AsyncGenerator.
  • 返回: <ProgressResponse>

create

ollama.create(request)
  • request<Object>: 包含创建参数的请求对象.
    • model <string>: 要创建的模型的名称.
    • path <string>: (可选)要创建的模型的 Modelfile 路径.
    • modelfile <string>: (可选)要创建的 Modelfile 的内容.
    • stream <boolean>: (可选)当值为 true 时,返回 AsyncGenerator.
  • 返回: <ProgressResponse>

delete

ollama.delete(request)
  • request<Object>: 包含删除参数的请求对象.
    • model <string> 要删除的模型的名称.
  • 返回: <StatusResponse>

copy

ollama.copy(request)
  • request<Object>: 包含复制参数的请求对象.
    • source <string>: 要复制的模型的名称.
    • destination <string>: 要复制到的模型的名称.
  • 返回: <StatusResponse>

list

ollama.list()
  • 返回: <ListResponse>

show

ollama.show(request)
  • request<Object>: 包含显示参数的请求对象.
    • model <string> 要显示的模型的名称.
    • system <string>: (可选)覆盖返回的模型系统提示.
    • template <string>: (可选)覆盖返回的模型模板.
    • options <Options>: (可选)配置运行时的选项.
  • 返回: <ShowResponse>

embeddings

ollama.embeddings(request)
  • request<Object>: 包含嵌入参数的请求对象.
    • model <string>: 用于生成嵌入的模型的名称.
    • input <string>: 用于生成嵌入的输入.
    • truncate <boolean>: (可选)截断输入以适合模型支持的最大上下文长度.
    • keep_alive <string | number>:(可选)保持模型加载的时间.
    • options <Options>: (可选)配置运行时的选项.
  • 返回: <EmbeddingsResponse>

ps

ollama.ps()
  • 返回:<ListResponse>

自定义客户端

可以使用以下字段创建自定义客户端:

  • host <string>:(可选)Ollama 主机地址。默认值: "http://127.0.0.1:11434".
  • fetch <Object>:(可选)用于向 Ollama 主机发出请求的获取库.
import { Ollama } from 'ollama'

const ollama = new Ollama({ host: 'http://localhost:11434' })
const response = await ollama.chat({
  model: 'llama2',
  messages: [{ role: 'user', content: 'Why is the sky blue?' }],
})

构建

要构建项目文件,请运行:

npm run build

Using LangChain with Ollama using JavaScript

Using LangChain with Ollama using JavaScript

In this tutorial, we are going to use JavaScript with LangChain and Ollama to learn about something just a touch more recent. In August 2023, there was a series of wildfires on Maui. There is no way an LLM trained before that time can know about this, since their training data would not include anything as recent as that. So we can find the Wikipedia article about the fires and ask questions about the contents.

To get started, let’s just use LangChain to ask a simple question to a model. To do this with JavaScript, we need to install LangChain:

npm install @langchain/community

Now we can start building out our JavaScript:

import { Ollama } from "@langchain/community/llms/ollama";

const ollama = new Ollama({
  baseUrl: "http://localhost:11434",
  model: "llama3",
});

const answer = await ollama.invoke(`why is the sky blue?`);

console.log(answer);

That will get us the same thing as if we ran ollama run llama3 "why is the sky blue" in the terminal. But we want to load a document from the web to ask a question against. Cheerio is a great library for ingesting a webpage, and LangChain uses it in their CheerioWebBaseLoader. So let’s install Cheerio and build that part of the app.

npm install cheerio
import { CheerioWebBaseLoader } from "langchain/document_loaders/web/cheerio";

const loader = new CheerioWebBaseLoader("https://en.wikipedia.org/wiki/2023_Hawaii_wildfires");
const data = await loader.load();

That will load the document. Although this page is smaller than the Odyssey, it is certainly bigger than the context size for most LLMs. So we are going to need to split into smaller pieces, and then select just the pieces relevant to our question. This is a great use for a vector datastore. In this example, we will use the MemoryVectorStore that is part of LangChain. But there is one more thing we need to get the content into the datastore. We have to run an embeddings process that converts the tokens in the text into a series of vectors. And for that, we are going to use Tensorflow. There is a lot of stuff going on in this one. First, install the Tensorflow components that we need.

npm install @tensorflow/tfjs-core@3.6.0 @tensorflow/tfjs-converter@3.6.0 @tensorflow-models/universal-sentence-encoder@1.3.3 @tensorflow/tfjs-node@4.10.0

If you just install those components without the version numbers, it will install the latest versions, but there are conflicts within Tensorflow, so you need to install the compatible versions.

import { RecursiveCharacterTextSplitter } from "langchain/text_splitter"
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import "@tensorflow/tfjs-node";
import { TensorFlowEmbeddings } from "langchain/embeddings/tensorflow";

// Split the text into 500 character chunks. And overlap each chunk by 20 characters
const textSplitter = new RecursiveCharacterTextSplitter({
 chunkSize: 500,
 chunkOverlap: 20
});
const splitDocs = await textSplitter.splitDocuments(data);

// Then use the TensorFlow Embedding to store these chunks in the datastore
const vectorStore = await MemoryVectorStore.fromDocuments(splitDocs, new TensorFlowEmbeddings());

To connect the datastore to a question asked to a LLM, we need to use the concept at the heart of LangChain: the chain. Chains are a way to connect a number of activities together to accomplish a particular tasks. There are a number of chain types available, but for this tutorial we are using the RetrievalQAChain.

import { RetrievalQAChain } from "langchain/chains";

const retriever = vectorStore.asRetriever();
const chain = RetrievalQAChain.fromLLM(ollama, retriever);
const result = await chain.call({query: "When was Hawaii's request for a major disaster declaration approved?"});
console.log(result.text)

So we created a retriever, which is a way to return the chunks that match a query from a datastore. And then connect the retriever and the model via a chain. Finally, we send a query to the chain, which results in an answer using our document as a source. The answer it returned was correct, August 10, 2023.

And that is a simple introduction to what you can do with LangChain and Ollama.

参考资料:

Comments

No comments yet. Why don’t you start the discussion?

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注