11 Commits

Author SHA1 Message Date
dane madsen
724393d70a Update llama.cpp 2023-11-29 22:30:49 +10:00
dane madsen
a120e632d5 Switch llama.cpp fork to support GGUFv1 2023-11-05 22:45:57 +10:00
dane madsen
f5eb782a29 refactor 2023-11-04 19:27:04 +10:00
dane madsen
7532265514 Refactor 2023-11-04 08:46:35 +10:00
dane madsen
c832bb9d82 refactor 2023-11-04 08:46:34 +10:00
dane madsen
5a72239a92 Move stuff around 2023-11-02 18:27:38 +10:00
Gardner
d6dc71e45a Fix issue with madvise() 2023-10-19 00:14:13 +13:00
dane madsen
c6730be3c2 Fix android not compiling 2023-10-15 20:03:56 +10:00
dane madsen
5b0144cc76 Update .gitmodules 2023-10-12 19:34:53 +10:00
Daniel Drake
e4834993f6 Update llama.cpp and move core processing to native code
Update llama.cpp to the latest version as part of an effort to make this
app usable on my Samsung Galaxy S10 smartphone.

The newer llama.cpp includes a double-close fix which was causing the app
to immediately crash upon starting the AI conversation (llama.cpp commit
47f61aaa5f76d04).

It also adds support for 3B models, which are considerably smaller. The
llama-7B models were causing Android's low memory killer to terminate
Sherpa after just a few words of conversation, whereas new models such as
orca-mini-3b.ggmlv3.q4_0.bin work on this device without quickly exhausting
all available memory.

llama.cpp's model compatibility has changed within this update, so ggml
files that were working in the previous version are unlikely to work now;
they need converting. However the orca-mini offering is already in the
new format and works out of the box.

llama.cpp's API has changed in this update. Rather than rework the Dart
code, I opted to leave it in C++, using llama.cpp's example code as a base.
This solution is included in a new "llamasherpa" library which calls
into llama.cpp. Since lots of data is passed around in large arrays,
I expect running this in Dart had quite some overhead, and this native
approach should perform considerably faster.

This eliminates the need for Sherpa's Dart code to call llama.cpp directly,
so there's no need to separately maintain a modified version of llama.cpp
and we can use the official upstream.
2023-07-01 21:22:38 +02:00
Maxime GUERIN
ac2b4dd7b9 add submodule 2023-03-27 20:17:39 +02:00