gitea-ailhan-registry

alihan/ autothink

Python 0 0

Optimizing inference proxy for LLMs

Updated 2025-05-28 09:39:38 +03:00

alihan/ TPI-LLM

Python 0 0

TPI-LLM: A High-Performance Tensor Parallelism Inference System for Edge LLM Services.

edge-ai llm-inference ml-system

Updated 2024-10-04 22:25:48 +03:00

alihan/ LLMLingua

Python 0 0

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

Updated 2024-01-23 02:05:47 +03:00

alihan/ wrapyfi-examples_llama-llm

Python 0 0

Inference code for facebook LLaMA models with Wrapyfi support

Updated 2023-09-17 17:00:28 +03:00

alihan/ wrapyfi-examples_llama

Python 0 0

Inference code for facebook LLaMA models with Wrapyfi support

Updated 2023-09-11 00:48:30 +03:00

alihan/ text-generation-inference

Python 0 0

Large Language Model Text Generation Inference

nlp deep-learning pytorch transformer bloom gpt starcoder inference falcon

Updated 2023-08-23 10:47:54 +03:00

alihan/ llm-text-generation-inference

Python 0 0

Large Language Model Text Generation Inference

nlp deep-learning pytorch transformer bloom gpt starcoder inference falcon

Updated 2023-08-15 01:09:35 +03:00

alihan/ zeroshot_topics-nlp

Python 0 0

Topic Inference with Zeroshot models

Updated 2022-04-17 03:56:56 +03:00