ThinkMesh is a python library for running diverse reasoning paths in parallel, scoring them with internal confidence signals, reallocates compute to promising branches, and fuses outcomes with verifiers and reducers. It works with offline Hugging Face Transformers and vLLM/TGI, and with hosted APIs.

Note: This is still in it's early development phase and breaking changes can sometimes occur

Highlights

Parallel reasoning with DeepConf‑style confidence gating and budget reallocation
Offline‑first with Transformers; optional vLLM/TGI for server‑side batching
Hosted adapters for OpenAI and Anthropic
Async execution with dynamic micro‑batches
Reducers (majority/judge) and pluggable verifiers (regex/numeric/custom)
Caching, metrics, and JSON traces

Install

git clone https://github.com/martianlantern/thinkmesh.git
cd thinkmesh
pip install -e ".[dev,transformers]"

Quickstart: Offline DeepConf

from thinkmesh import think, ThinkConfig, ModelSpec, StrategySpec

cfg = ThinkConfig(
  model=ModelSpec(backend="transformers", model_name="Qwen2.5-7B-Instruct",
                  max_tokens=256, temperature=0.7, seed=42, extra={"device":"cuda:0"}),
  strategy=StrategySpec(name="deepconf", parallel=8, max_steps=2,
                        deepconf={"k":5,"tau_low":-1.25,"tau_ent":2.2,"realloc_top_p":0.4}),
  reducer={"name":"majority"},
  budgets={"wall_clock_s":20,"tokens":4000},
)
ans = think("Show that the product of any three consecutive integers is divisible by 3.", cfg)
print(ans.content, ans.confidence)

Quickstart: OpenAI Self‑Consistency

import os
os.environ["OPENAI_API_KEY"] = "sk-..."
from thinkmesh import think, ThinkConfig, ModelSpec, StrategySpec
cfg = ThinkConfig(
  model=ModelSpec(backend="openai", model_name="gpt-4o-mini", max_tokens=256, temperature=0.6),
  strategy=StrategySpec(name="self_consistency", parallel=6, max_steps=1),
  reducer={"name":"majority"},
  budgets={"wall_clock_s":15,"tokens":3000},
)
print(think("List three creative uses for a paperclip.", cfg).content)

CLI

thinkmesh think -m Qwen2.5-7B-Instruct --backend transformers --strategy deepconf "What is 37*43?"

Examples

Debate Strategy (hosted)

from thinkmesh import think, ThinkConfig, ModelSpec, StrategySpec
cfg = ThinkConfig(
  model=ModelSpec(backend="openai", model_name="gpt-4o-mini", max_tokens=256, temperature=0.7),
  strategy=StrategySpec(name="debate", parallel=4, max_steps=2, debate={"rounds":2}),
  reducer={"name":"judge"},
  budgets={"wall_clock_s":25,"tokens":5000},
)
print(think("Argue whether every even integer > 2 is the sum of two primes.", cfg).content)

vLLM Local Server

from thinkmesh import think, ThinkConfig, ModelSpec, StrategySpec
cfg = ThinkConfig(
  model=ModelSpec(backend="vllm", model_name="Qwen2.5-7B-Instruct",
                  max_tokens=256, temperature=0.7, extra={"base_url":"http://localhost:8000/v1","api_key":"sk-"}),
  strategy=StrategySpec(name="deepconf", parallel=8, max_steps=2, deepconf={"k":5}),
  reducer={"name":"majority"},
  budgets={"wall_clock_s":20,"tokens":4000},
)
print(think("Give a constructive proof for the Pigeonhole Principle on a simple case.", cfg).content)

Custom Verifier

from thinkmesh import think, ThinkConfig, ModelSpec, StrategySpec
cfg = ThinkConfig(
  model=ModelSpec(backend="transformers", model_name="Qwen2.5-7B-Instruct", max_tokens=128),
  strategy=StrategySpec(name="self_consistency", parallel=5, max_steps=1),
  reducer={"name":"majority"},
  verifier={"type":"regex","pattern":r"Final Answer\s*:\s*.+$"},
  budgets={"wall_clock_s":10,"tokens":1500},
)
print(think("Answer with 'Final Answer: <value>' for 19*21.", cfg).content)

Tree Of Thought (offline)

from thinkmesh import think, ThinkConfig, ModelSpec, StrategySpec
cfg = ThinkConfig(
  model=ModelSpec(backend="transformers", model_name="Qwen2.5-7B-Instruct", max_tokens=192),
  strategy=StrategySpec(name="tree", parallel=6, max_steps=2, tree={"branches":3,"depth":2}),
  reducer={"name":"majority"},
  budgets={"wall_clock_s":20,"tokens":3500},
)
print(think("Sketch a plan to prove that sqrt(2) is irrational.", cfg).content)

Traces, Metrics, Caching

Traces are emitted as JSON graphs inside the returned structure. Prometheus metrics and OpenTelemetry spans can be enabled via config extras. A local disk cache deduplicates repeated generations by hashing adapter, model, prompt, and params.

Extending

Implement a new backend by providing a Thinker.generate method that returns token text and optional token logprobs
Add a new strategy by wiring a function in thinkmesh/strategies and registering by name
Add reducers/verifiers under thinkmesh/reduce

License

MIT

References

@misc{deepconf2025,
  title         = {DeepConf: Deep Think with Confidence},
  year          = {2025},
  howpublished  = {\url{https://jiaweizzhao.github.io/deepconf/}}
}

@misc{wang2022selfconsistency,
  title         = {Self-Consistency Improves Chain-of-Thought Reasoning in Language Models},
  author        = {Wang, Xuezhi and Wei, Jason and others},
  year          = {2022},
  eprint        = {2203.11171},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL}
}

@misc{yao2023tree,
  title         = {Tree of Thoughts: Deliberate Problem Solving with Large Language Models},
  author        = {Yao, Shunyu and others},
  year          = {2023},
  eprint        = {2305.10601},
  archivePrefix = {arXiv},
  primaryClass  = {cs.AI}
}

Citation

If you use this library in your work, please cite:

@software{thinkmesh2025,
  title        = {ThinkMesh: Parallel Thinking for LLMs},
  author       = {martianlantern},
  year         = {2025},
  note         = {Version 0.1.1},
}

README.md Unescape Escape

ThinkMesh