mirror of https://github.com/abetlen/llama-cpp-python.git synced 2023-09-07 17:34:22 +03:00

Go to file

Lucas Doyle b9098b0ef7 llama_cpp server: prompt is a string

Not sure why this union type was here but taking a look at llama.py, prompt is only ever processed as a string for completion

This was breaking types when generating an openapi client

2023-05-02 14:47:07 -07:00

.github/workflows

Fix tests

2023-05-01 15:28:46 -04:00

docs

Update docs

2023-04-24 19:56:57 -04:00

examples

Detect multi-byte responses and wait

2023-04-28 12:50:30 +02:00

llama_cpp

llama_cpp server: prompt is a string

2023-05-02 14:47:07 -07:00

tests

Fix

2023-05-01 22:41:54 -04:00

vendor

Update llama.cpp

2023-05-01 15:23:01 -04:00

.gitignore

Ignore ./idea folder

2023-04-05 18:23:17 -04:00

.gitmodules

Add llama.cpp to vendor folder

2023-03-23 05:37:26 -04:00

CMakeLists.txt

Add FORCE_CMAKE option

2023-04-25 01:36:37 -04:00

LICENSE.md

Initial commit

2023-03-23 05:33:06 -04:00

mkdocs.yml

Add search to mkdocs

2023-03-31 00:01:53 -04:00

poetry.lock

tests: simple test for server module

2023-04-29 11:42:20 -07:00

pyproject.toml

Bump version

2023-05-01 15:23:59 -04:00

README.md

Fix whitespace

2023-05-01 18:07:45 -04:00

setup.py

Bump version

2023-05-01 15:23:59 -04:00

README.md

🦙 Python Bindings for `llama.cpp`

Simple Python bindings for @ggerganov's llama.cpp library. This package provides:

Low-level access to C API via ctypes interface.
High-level Python API for text completion
- OpenAI-like API
- LangChain compatibility

Installation

Install from PyPI (requires a c compiler):

pip install llama-cpp-python

The above command will attempt to install the package and build build llama.cpp from source. This is the recommended installation method as it ensures that llama.cpp is built with the available optimizations for your system.

This method defaults to using make to build llama.cpp on Linux / MacOS and cmake on Windows. You can force the use of cmake on Linux / MacOS setting the FORCE_CMAKE=1 environment variable before installing.

High-level API

>>> from llama_cpp import Llama
>>> llm = Llama(model_path="./models/7B/ggml-model.bin")
>>> output = llm("Q: Name the planets in the solar system? A: ", max_tokens=32, stop=["Q:", "\n"], echo=True)
>>> print(output)
{
  "id": "cmpl-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "object": "text_completion",
  "created": 1679561337,
  "model": "./models/7B/ggml-model.bin",
  "choices": [
    {
      "text": "Q: Name the planets in the solar system? A: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, Neptune and Pluto.",
      "index": 0,
      "logprobs": None,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 14,
    "completion_tokens": 28,
    "total_tokens": 42
  }
}

Web Server

llama-cpp-python offers a web server which aims to act as a drop-in replacement for the OpenAI API. This allows you to use llama.cpp compatible models with any OpenAI compatible client (language libraries, services, etc).

To install the server package and get started:

pip install llama-cpp-python[server]
export MODEL=./models/7B/ggml-model.bin
python3 -m llama_cpp.server

Navigate to http://localhost:8000/docs to see the OpenAPI documentation.

Low-level API

The low-level API is a direct ctypes binding to the C API provided by llama.cpp. The entire API can be found in llama_cpp/llama_cpp.py and should mirror llama.h.

Documentation

Documentation is available at https://abetlen.github.io/llama-cpp-python. If you find any issues with the documentation, please open an issue or submit a PR.

Development

This package is under active development and I welcome any contributions.

To get started, clone the repository and install the package in development mode:

git clone --recurse-submodules git@github.com:abetlen/llama-cpp-python.git
# Will need to be re-run any time vendor/llama.cpp is updated
python3 setup.py develop

How does this compare to other Python bindings of `llama.cpp`?

I originally wrote this package for my own use with two goals in mind:

Provide a simple process to install llama.cpp and access the full C API in llama.h from Python
Provide a high-level Python API that can be used as a drop-in replacement for the OpenAI API so existing apps can be easily ported to use llama.cpp

Any contributions and changes to this package will be made with these goals in mind.

License

This project is licensed under the terms of the MIT license.

Languages

Python 96.6%

Dockerfile 1.7%

Shell 0.6%

Makefile 0.6%

CMake 0.5%

README.md

🦙 Python Bindings for llama.cpp

Installation

High-level API

Web Server

Low-level API

Documentation

Development

How does this compare to other Python bindings of llama.cpp?

License

🦙 Python Bindings for `llama.cpp`

How does this compare to other Python bindings of `llama.cpp`?