mirror of
https://github.com/langchain-ai/mcpdoc.git
synced 2025-10-19 03:18:14 +03:00
Compare commits
29 Commits
mcpdoc==0.
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
538df6d05c | ||
|
|
a429dc788b | ||
|
|
d1db6319b9 | ||
|
|
74237e7714 | ||
|
|
b0f7a8e2ad | ||
|
|
c9b45f098b | ||
|
|
3f859a3fc9 | ||
|
|
6a0d649d30 | ||
|
|
35e5481ada | ||
|
|
53479ff021 | ||
|
|
7e62344a91 | ||
|
|
bac98dc41a | ||
|
|
a885a655cc | ||
|
|
c2977b3602 | ||
|
|
1bc11f5ea1 | ||
|
|
ef4d6b08ab | ||
|
|
a9e1b14d43 | ||
|
|
71ddda1d09 | ||
|
|
bb3328b0c3 | ||
|
|
f7556c9bd6 | ||
|
|
921fe07dd0 | ||
|
|
0e688fee9a | ||
|
|
677facfb64 | ||
|
|
465e69ffcb | ||
|
|
be13a215f4 | ||
|
|
6e2221fd5b | ||
|
|
19d45109c2 | ||
|
|
5d15b6c113 | ||
|
|
fd354128ce |
301
README.md
301
README.md
@@ -1,99 +1,286 @@
|
||||
# MCP LLMS-TXT Documentation Server
|
||||
|
||||
The MCP LLMS-TXT Documentation Server is a specialized Model Control Protocol (MCP) server that delivers documentation directly from llms.txt files. It serves as a testbed for integrating documentation into IDEs via external **tools**, rather than relying solely on built-in features. While future IDEs may offer robust native support for llms.txt files, this server allows us to experiment with alternative methods, giving us full control over how documentation is retrieved and displayed.
|
||||
## Overview
|
||||
|
||||
## Usage
|
||||
[llms.txt](https://llmstxt.org/) is a website index for LLMs, providing background information, guidance, and links to detailed markdown files. IDEs like Cursor and Windsurf or apps like Claude Code/Desktop can use `llms.txt` to retrieve context for tasks. However, these apps use different built-in tools to read and process files like `llms.txt`. The retrieval process can be opaque, and there is not always a way to audit the tool calls or the context returned.
|
||||
|
||||
### Cursor
|
||||
[MCP](https://github.com/modelcontextprotocol) offers a way for developers to have *full control* over tools used by these applications. Here, we create [an open source MCP server](https://github.com/modelcontextprotocol) to provide MCP host applications (e.g., Cursor, Windsurf, Claude Code/Desktop) with (1) a user-defined list of `llms.txt` files and (2) a simple `fetch_docs` tool read URLs within any of the provided `llms.txt` files. This allows the user to audit each tool call as well as the context returned.
|
||||
|
||||
1. Install Cursor: https://www.cursor.com/en
|
||||
2. Launch the MCP server in **SSE** transport.
|
||||
|
||||
```shell
|
||||
uvx --from mcpdoc mcpdoc \
|
||||
--urls LangGraph:https://langchain-ai.github.io/langgraph/llms.txt \
|
||||
--transport sse \
|
||||
--port 8081
|
||||
--host localhost
|
||||
```
|
||||
<img src="https://github.com/user-attachments/assets/736f8f55-833d-4200-b833-5fca01a09e1b" width="60%">
|
||||
|
||||
3. Add the mcp server to Cursor. Remember to put the URL as **[host]/sse** for example **http://localhost:8081/sse**.
|
||||
## llms-txt
|
||||
|
||||
Cursor needs to be in **agent** mode for this to work.
|
||||
You can find llms.txt files for langgraph and langchain here:
|
||||
|
||||
5. You should be able to use it within composer now.
|
||||
| Library | llms.txt |
|
||||
|------------------|------------------------------------------------------------------------------------------------------------|
|
||||
| LangGraph Python | [https://langchain-ai.github.io/langgraph/llms.txt](https://langchain-ai.github.io/langgraph/llms.txt) |
|
||||
| LangGraph JS | [https://langchain-ai.github.io/langgraphjs/llms.txt](https://langchain-ai.github.io/langgraphjs/llms.txt) |
|
||||
| LangChain Python | [https://python.langchain.com/llms.txt](https://python.langchain.com/llms.txt) |
|
||||
| LangChain JS | [https://js.langchain.com/llms.txt](https://js.langchain.com/llms.txt) |
|
||||
|
||||
### Claude Code
|
||||
## Quickstart
|
||||
|
||||
1. Install Claude Code: https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/overview
|
||||
2. Install [uv](https://github.com/astral-sh/uv). This step is required if you want to run the MCP server in using `uvx` command. This is generally recommended as it'll simplify all the dependency management for you.
|
||||
3. Configure the MCP server with claude code
|
||||
#### Install uv
|
||||
* Please see [official uv docs](https://docs.astral.sh/uv/getting-started/installation/#installation-methods) for other ways to install `uv`.
|
||||
|
||||
```shell
|
||||
claude mcp add-json langgraph-docs '{"type":"stdio","command":"uvx" ,"args":["--from", "mcpdoc", "mcpdoc", "--urls", "langgraph:https://langchain-ai.github.io/langgraph/llms.txt"]}' -s user
|
||||
```
|
||||
```bash
|
||||
curl -LsSf https://astral.sh/uv/install.sh | sh
|
||||
```
|
||||
|
||||
4. Launch claude code
|
||||
#### Choose an `llms.txt` file to use.
|
||||
* For example, [here's](https://langchain-ai.github.io/langgraph/llms.txt) the LangGraph `llms.txt` file.
|
||||
|
||||
```shell
|
||||
claude code
|
||||
```
|
||||
|
||||
Verify that the server is running by typing `/mcp` in the chat window.
|
||||
> **Note: Security and Domain Access Control**
|
||||
>
|
||||
> For security reasons, mcpdoc implements strict domain access controls:
|
||||
>
|
||||
> 1. **Remote llms.txt files**: When you specify a remote llms.txt URL (e.g., `https://langchain-ai.github.io/langgraph/llms.txt`), mcpdoc automatically adds only that specific domain (`langchain-ai.github.io`) to the allowed domains list. This means the tool can only fetch documentation from URLs on that domain.
|
||||
>
|
||||
> 2. **Local llms.txt files**: When using a local file, NO domains are automatically added to the allowed list. You MUST explicitly specify which domains to allow using the `--allowed-domains` parameter.
|
||||
>
|
||||
> 3. **Adding additional domains**: To allow fetching from domains beyond those automatically included:
|
||||
> - Use `--allowed-domains domain1.com domain2.com` to add specific domains
|
||||
> - Use `--allowed-domains '*'` to allow all domains (use with caution)
|
||||
>
|
||||
> This security measure prevents unauthorized access to domains not explicitly approved by the user, ensuring that documentation can only be retrieved from trusted sources.
|
||||
|
||||
```
|
||||
> /mcp
|
||||
```
|
||||
#### (Optional) Test the MCP server locally with your `llms.txt` file(s) of choice:
|
||||
```bash
|
||||
uvx --from mcpdoc mcpdoc \
|
||||
--urls "LangGraph:https://langchain-ai.github.io/langgraph/llms.txt" "LangChain:https://python.langchain.com/llms.txt" \
|
||||
--transport sse \
|
||||
--port 8082 \
|
||||
--host localhost
|
||||
```
|
||||
|
||||
5. Test it out!
|
||||
* This should run at: http://localhost:8082
|
||||
|
||||
```
|
||||
> Write a langgraph application with two agents that debate the merits of taking a shower.
|
||||
```
|
||||
|
||||
|
||||
This MCP server was only configured with LangGraph documentation, but you can add more documentation sources by adding more `--urls` arguments or loading it from a JSON file or a YAML file.
|
||||

|
||||
|
||||
* Run [MCP inspector](https://modelcontextprotocol.io/docs/tools/inspector) and connect to the running server:
|
||||
```bash
|
||||
npx @modelcontextprotocol/inspector
|
||||
```
|
||||
|
||||

|
||||
|
||||
* Here, you can test the `tool` calls.
|
||||
|
||||
#### Connect to Cursor
|
||||
|
||||
* Open `Cursor Settings` and `MCP` tab.
|
||||
* This will open the `~/.cursor/mcp.json` file.
|
||||
|
||||
### Command-line Interface
|
||||

|
||||
|
||||
The `mcpdoc` command provides a simple CLI for launching the documentation server. You can specify documentation sources in three ways, and these can be combined:
|
||||
* Paste the following into the file (we use the `langgraph-docs-mcp` name and link to the LangGraph `llms.txt`).
|
||||
|
||||
```
|
||||
{
|
||||
"mcpServers": {
|
||||
"langgraph-docs-mcp": {
|
||||
"command": "uvx",
|
||||
"args": [
|
||||
"--from",
|
||||
"mcpdoc",
|
||||
"mcpdoc",
|
||||
"--urls",
|
||||
"LangGraph:https://langchain-ai.github.io/langgraph/llms.txt LangChain:https://python.langchain.com/llms.txt",
|
||||
"--transport",
|
||||
"stdio"
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
* Confirm that the server is running in your `Cursor Settings/MCP` tab.
|
||||
* Best practice is to then update Cursor Global (User) rules.
|
||||
* Open Cursor `Settings/Rules` and update `User Rules` with the following (or similar):
|
||||
|
||||
```
|
||||
for ANY question about LangGraph, use the langgraph-docs-mcp server to help answer --
|
||||
+ call list_doc_sources tool to get the available llms.txt file
|
||||
+ call fetch_docs tool to read it
|
||||
+ reflect on the urls in llms.txt
|
||||
+ reflect on the input question
|
||||
+ call fetch_docs on any urls relevant to the question
|
||||
+ use this to answer the question
|
||||
```
|
||||
|
||||
* `CMD+L` (on Mac) to open chat.
|
||||
* Ensure `agent` is selected.
|
||||
|
||||

|
||||
|
||||
Then, try an example prompt, such as:
|
||||
```
|
||||
what are types of memory in LangGraph?
|
||||
```
|
||||
|
||||

|
||||
|
||||
### Connect to Windsurf
|
||||
|
||||
* Open Cascade with `CMD+L` (on Mac).
|
||||
* Click `Configure MCP` to open the config file, `~/.codeium/windsurf/mcp_config.json`.
|
||||
* Update with `langgraph-docs-mcp` as noted above.
|
||||
|
||||

|
||||
|
||||
* Update `Windsurf Rules/Global rules` with the following (or similar):
|
||||
|
||||
```
|
||||
for ANY question about LangGraph, use the langgraph-docs-mcp server to help answer --
|
||||
+ call list_doc_sources tool to get the available llms.txt file
|
||||
+ call fetch_docs tool to read it
|
||||
+ reflect on the urls in llms.txt
|
||||
+ reflect on the input question
|
||||
+ call fetch_docs on any urls relevant to the question
|
||||
```
|
||||
|
||||

|
||||
|
||||
Then, try the example prompt:
|
||||
* It will perform your tool calls.
|
||||
|
||||

|
||||
|
||||
### Connect to Claude Desktop
|
||||
|
||||
* Open `Settings/Developer` to update `~/Library/Application\ Support/Claude/claude_desktop_config.json`.
|
||||
* Update with `langgraph-docs-mcp` as noted above.
|
||||
* Restart Claude Desktop app.
|
||||
|
||||
> [!Note]
|
||||
> If you run into issues with Python version incompatibility when trying to add MCPDoc tools to Claude Desktop, you can explicitly specify the filepath to `python` executable in the `uvx` command.
|
||||
>
|
||||
> <details>
|
||||
> <summary>Example configuration</summary>
|
||||
>
|
||||
> ```
|
||||
> {
|
||||
> "mcpServers": {
|
||||
> "langgraph-docs-mcp": {
|
||||
> "command": "uvx",
|
||||
> "args": [
|
||||
> "--python",
|
||||
> "/path/to/python",
|
||||
> "--from",
|
||||
> "mcpdoc",
|
||||
> "mcpdoc",
|
||||
> "--urls",
|
||||
> "LangGraph:https://langchain-ai.github.io/langgraph/llms.txt",
|
||||
> "--transport",
|
||||
> "stdio"
|
||||
> ]
|
||||
> }
|
||||
> }
|
||||
> }
|
||||
> ```
|
||||
> </details>
|
||||
|
||||
> [!Note]
|
||||
> Currently (3/21/25) it appears that Claude Desktop does not support `rules` for global rules, so appending the following to your prompt.
|
||||
|
||||
```
|
||||
<rules>
|
||||
for ANY question about LangGraph, use the langgraph-docs-mcp server to help answer --
|
||||
+ call list_doc_sources tool to get the available llms.txt file
|
||||
+ call fetch_docs tool to read it
|
||||
+ reflect on the urls in llms.txt
|
||||
+ reflect on the input question
|
||||
+ call fetch_docs on any urls relevant to the question
|
||||
</rules>
|
||||
```
|
||||
|
||||

|
||||
|
||||
* You will see your tools visible in the bottom right of your chat input.
|
||||
|
||||

|
||||
|
||||
Then, try the example prompt:
|
||||
|
||||
* It will ask to approve tool calls as it processes your request.
|
||||
|
||||

|
||||
|
||||
### Connect to Claude Code
|
||||
|
||||
* In a terminal after installing [Claude Code](https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/overview), run this command to add the MCP server to your project:
|
||||
```
|
||||
claude mcp add-json langgraph-docs '{"type":"stdio","command":"uvx" ,"args":["--from", "mcpdoc", "mcpdoc", "--urls", "langgraph:https://langchain-ai.github.io/langgraph/llms.txt", "LangChain:https://python.langchain.com/llms.txt"]}' -s local
|
||||
```
|
||||
* You will see `~/.claude.json` updated.
|
||||
* Test by launching Claude Code and running to view your tools:
|
||||
```
|
||||
$ Claude
|
||||
$ /mcp
|
||||
```
|
||||
|
||||

|
||||
|
||||
> [!Note]
|
||||
> Currently (3/21/25) it appears that Claude Code does not support `rules` for global rules, so appending the following to your prompt.
|
||||
|
||||
```
|
||||
<rules>
|
||||
for ANY question about LangGraph, use the langgraph-docs-mcp server to help answer --
|
||||
+ call list_doc_sources tool to get the available llms.txt file
|
||||
+ call fetch_docs tool to read it
|
||||
+ reflect on the urls in llms.txt
|
||||
+ reflect on the input question
|
||||
+ call fetch_docs on any urls relevant to the question
|
||||
</rules>
|
||||
```
|
||||
|
||||
Then, try the example prompt:
|
||||
|
||||
* It will ask to approve tool calls.
|
||||
|
||||

|
||||
|
||||
## Command-line Interface
|
||||
|
||||
The `mcpdoc` command provides a simple CLI for launching the documentation server.
|
||||
|
||||
You can specify documentation sources in three ways, and these can be combined:
|
||||
|
||||
1. Using a YAML config file:
|
||||
|
||||
* This will load the LangGraph Python documentation from the `sample_config.yaml` file in this repo.
|
||||
|
||||
```bash
|
||||
mcpdoc --yaml sample_config.yaml
|
||||
```
|
||||
|
||||
This will load the LangGraph Python documentation from the sample_config.yaml file.
|
||||
|
||||
2. Using a JSON config file:
|
||||
|
||||
* This will load the LangGraph Python documentation from the `sample_config.json` file in this repo.
|
||||
|
||||
```bash
|
||||
mcpdoc --json sample_config.json
|
||||
```
|
||||
|
||||
This will load the LangGraph Python documentation from the sample_config.json file.
|
||||
|
||||
3. Directly specifying llms.txt URLs with optional names:
|
||||
|
||||
```bash
|
||||
mcpdoc --urls https://langchain-ai.github.io/langgraph/llms.txt LangGraph:https://langchain-ai.github.io/langgraph/llms.txt
|
||||
```
|
||||
* URLs can be specified either as plain URLs or with optional names using the format `name:url`.
|
||||
* You can specify multiple URLs by using the `--urls` parameter multiple times.
|
||||
* This is how we loaded `llms.txt` for the MCP server above.
|
||||
|
||||
URLs can be specified either as plain URLs or with optional names using the format `name:url`.
|
||||
```bash
|
||||
mcpdoc --urls LangGraph:https://langchain-ai.github.io/langgraph/llms.txt --urls LangChain:https://python.langchain.com/llms.txt
|
||||
```
|
||||
|
||||
You can also combine these methods to merge documentation sources:
|
||||
|
||||
```bash
|
||||
mcpdoc --yaml sample_config.yaml --json sample_config.json --urls https://langchain-ai.github.io/langgraph/llms.txt
|
||||
mcpdoc --yaml sample_config.yaml --json sample_config.json --urls LangGraph:https://langchain-ai.github.io/langgraph/llms.txt --urls LangChain:https://python.langchain.com/llms.txt
|
||||
```
|
||||
|
||||
### Additional Options
|
||||
## Additional Options
|
||||
|
||||
- `--follow-redirects`: Follow HTTP redirects (defaults to False)
|
||||
- `--timeout SECONDS`: HTTP request timeout in seconds (defaults to 10.0)
|
||||
@@ -106,11 +293,13 @@ mcpdoc --yaml sample_config.yaml --follow-redirects --timeout 15
|
||||
|
||||
This will load the LangGraph Python documentation with a 15-second timeout and follow any HTTP redirects if necessary.
|
||||
|
||||
### Configuration Format
|
||||
## Configuration Format
|
||||
|
||||
Both YAML and JSON configuration files should contain a list of documentation sources. Each source must include an `llms_txt` URL and can optionally include a `name`:
|
||||
Both YAML and JSON configuration files should contain a list of documentation sources.
|
||||
|
||||
#### YAML Configuration Example (sample_config.yaml)
|
||||
Each source must include an `llms_txt` URL and can optionally include a `name`:
|
||||
|
||||
### YAML Configuration Example (sample_config.yaml)
|
||||
|
||||
```yaml
|
||||
# Sample configuration for mcp-mcpdoc server
|
||||
@@ -119,7 +308,7 @@ Both YAML and JSON configuration files should contain a list of documentation so
|
||||
llms_txt: https://langchain-ai.github.io/langgraph/llms.txt
|
||||
```
|
||||
|
||||
#### JSON Configuration Example (sample_config.json)
|
||||
### JSON Configuration Example (sample_config.json)
|
||||
|
||||
```json
|
||||
[
|
||||
@@ -130,7 +319,7 @@ Both YAML and JSON configuration files should contain a list of documentation so
|
||||
]
|
||||
```
|
||||
|
||||
### Programmatic Usage
|
||||
## Programmatic Usage
|
||||
|
||||
```python
|
||||
from mcpdoc.main import create_server
|
||||
|
||||
@@ -25,6 +25,9 @@ Examples:
|
||||
# Directly specifying llms.txt URLs with optional names
|
||||
mcpdoc --urls LangGraph:https://langchain-ai.github.io/langgraph/llms.txt
|
||||
|
||||
# Using a local file (absolute or relative path)
|
||||
mcpdoc --urls LocalDocs:/path/to/llms.txt --allowed-domains '*'
|
||||
|
||||
# Using a YAML config file
|
||||
mcpdoc --yaml sample_config.yaml
|
||||
|
||||
@@ -42,6 +45,12 @@ Examples:
|
||||
|
||||
# Using SSE transport with additional HTTP options
|
||||
mcpdoc --yaml sample_config.yaml --follow-redirects --timeout 15 --transport sse --host localhost --port 8080
|
||||
|
||||
# Allow fetching from additional domains. The domains hosting the llms.txt files are always allowed.
|
||||
mcpdoc --yaml sample_config.yaml --allowed-domains https://example.com/ https://another-example.com/
|
||||
|
||||
# Allow fetching from any domain
|
||||
mcpdoc --yaml sample_config.yaml --allowed-domains '*'
|
||||
"""
|
||||
|
||||
|
||||
@@ -66,7 +75,7 @@ def parse_args() -> argparse.Namespace:
|
||||
"-u",
|
||||
type=str,
|
||||
nargs="+",
|
||||
help="List of llms.txt URLs with optional names (format: 'url' or 'name:url')",
|
||||
help="List of llms.txt URLs or file paths with optional names (format: 'url_or_path' or 'name:url_or_path')",
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
@@ -74,6 +83,12 @@ def parse_args() -> argparse.Namespace:
|
||||
action="store_true",
|
||||
help="Whether to follow HTTP redirects",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--allowed-domains",
|
||||
type=str,
|
||||
nargs="*",
|
||||
help="Additional allowed domains to fetch documentation from. Use '*' to allow all domains.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--timeout", type=float, default=10.0, help="HTTP request timeout in seconds"
|
||||
)
|
||||
@@ -151,10 +166,11 @@ def load_config_file(file_path: str, file_format: str) -> List[Dict[str, str]]:
|
||||
|
||||
|
||||
def create_doc_sources_from_urls(urls: List[str]) -> List[DocSource]:
|
||||
"""Create doc sources from a list of URLs with optional names.
|
||||
"""Create doc sources from a list of URLs or file paths with optional names.
|
||||
|
||||
Args:
|
||||
urls: List of llms.txt URLs with optional names (format: 'url' or 'name:url')
|
||||
urls: List of llms.txt URLs or file paths with optional names
|
||||
(format: 'url_or_path' or 'name:url_or_path')
|
||||
|
||||
Returns:
|
||||
List of DocSource objects
|
||||
@@ -229,6 +245,7 @@ def main() -> None:
|
||||
follow_redirects=args.follow_redirects,
|
||||
timeout=args.timeout,
|
||||
settings=settings,
|
||||
allowed_domains=args.allowed_domains,
|
||||
)
|
||||
|
||||
if args.transport == "sse":
|
||||
|
||||
263
mcpdoc/main.py
263
mcpdoc/main.py
@@ -1,6 +1,8 @@
|
||||
"""MCP Llms-txt server for docs."""
|
||||
|
||||
from urllib.parse import urlparse
|
||||
import os
|
||||
import re
|
||||
from urllib.parse import urlparse, urljoin
|
||||
|
||||
import httpx
|
||||
from markdownify import markdownify
|
||||
@@ -34,56 +36,255 @@ def extract_domain(url: str) -> str:
|
||||
return f"{parsed.scheme}://{parsed.netloc}/"
|
||||
|
||||
|
||||
def _is_http_or_https(url: str) -> bool:
|
||||
"""Check if the URL is an HTTP or HTTPS URL."""
|
||||
return url.startswith(("http:", "https:"))
|
||||
|
||||
|
||||
def _get_fetch_description(has_local_sources: bool) -> str:
|
||||
"""Get fetch docs tool description."""
|
||||
description = [
|
||||
"Fetch and parse documentation from a given URL or local file.",
|
||||
"",
|
||||
"Use this tool after list_doc_sources to:",
|
||||
"1. First fetch the llms.txt file from a documentation source",
|
||||
"2. Analyze the URLs listed in the llms.txt file",
|
||||
"3. Then fetch specific documentation pages relevant to the user's question",
|
||||
"",
|
||||
]
|
||||
|
||||
if has_local_sources:
|
||||
description.extend(
|
||||
[
|
||||
"Args:",
|
||||
" url: The URL or file path to fetch documentation from. Can be:",
|
||||
" - URL from an allowed domain",
|
||||
" - A local file path (absolute or relative)",
|
||||
" - A file:// URL (e.g., file:///path/to/llms.txt)",
|
||||
]
|
||||
)
|
||||
else:
|
||||
description.extend(
|
||||
[
|
||||
"Args:",
|
||||
" url: The URL to fetch documentation from.",
|
||||
]
|
||||
)
|
||||
|
||||
description.extend(
|
||||
[
|
||||
"",
|
||||
"Returns:",
|
||||
" The fetched documentation content converted to markdown, or an error message", # noqa: E501
|
||||
" if the request fails or the URL is not from an allowed domain.",
|
||||
]
|
||||
)
|
||||
|
||||
return "\n".join(description)
|
||||
|
||||
|
||||
def _normalize_path(path: str) -> str:
|
||||
"""Accept paths in file:/// or relative format and map to absolute paths."""
|
||||
return (
|
||||
os.path.abspath(path[7:])
|
||||
if path.startswith("file://")
|
||||
else os.path.abspath(path)
|
||||
)
|
||||
|
||||
|
||||
def _get_server_instructions(doc_sources: list[DocSource]) -> str:
|
||||
"""Generate server instructions with available documentation source names."""
|
||||
# Extract source names from doc_sources
|
||||
source_names = []
|
||||
for entry in doc_sources:
|
||||
if "name" in entry:
|
||||
source_names.append(entry["name"])
|
||||
elif _is_http_or_https(entry["llms_txt"]):
|
||||
# Use domain name as fallback for HTTP sources
|
||||
domain = extract_domain(entry["llms_txt"])
|
||||
source_names.append(domain.rstrip("/").split("//")[-1])
|
||||
else:
|
||||
# Use filename as fallback for local sources
|
||||
source_names.append(os.path.basename(entry["llms_txt"]))
|
||||
|
||||
instructions = [
|
||||
"Use the list_doc_sources tool to see available documentation sources.",
|
||||
"This tool will return a URL for each documentation source.",
|
||||
]
|
||||
|
||||
if source_names:
|
||||
if len(source_names) == 1:
|
||||
instructions.append(
|
||||
f"Documentation URLs are available from this tool "
|
||||
f"for {source_names[0]}."
|
||||
)
|
||||
else:
|
||||
names_str = ", ".join(source_names[:-1]) + f", and {source_names[-1]}"
|
||||
instructions.append(
|
||||
f"Documentation URLs are available from this tool for {names_str}."
|
||||
)
|
||||
|
||||
instructions.extend(
|
||||
[
|
||||
"",
|
||||
"Once you have a source documentation URL, use the fetch_docs tool "
|
||||
"to get the documentation contents. ",
|
||||
"If the documentation contents contains a URL for additional documentation "
|
||||
"that is relevant to your task, you can use the fetch_docs tool to "
|
||||
"fetch documentation from that URL next.",
|
||||
]
|
||||
)
|
||||
|
||||
return "\n".join(instructions)
|
||||
|
||||
|
||||
def create_server(
|
||||
doc_source: list[DocSource],
|
||||
doc_sources: list[DocSource],
|
||||
*,
|
||||
follow_redirects: bool = False,
|
||||
timeout: float = 10,
|
||||
settings: dict | None = None,
|
||||
allowed_domains: list[str] | None = None,
|
||||
) -> FastMCP:
|
||||
"""Create the server and generate tools."""
|
||||
"""Create the server and generate documentation retrieval tools.
|
||||
|
||||
Args:
|
||||
doc_sources: List of documentation sources to make available
|
||||
follow_redirects: Whether to follow HTTP redirects when fetching docs
|
||||
timeout: HTTP request timeout in seconds
|
||||
settings: Additional settings to pass to FastMCP
|
||||
allowed_domains: Additional domains to allow fetching from.
|
||||
Use ['*'] to allow all domains
|
||||
The domain hosting the llms.txt file is always appended to the list
|
||||
of allowed domains.
|
||||
|
||||
Returns:
|
||||
A FastMCP server instance configured with documentation tools
|
||||
"""
|
||||
settings = settings or {}
|
||||
server = FastMCP(
|
||||
name="llms-txt",
|
||||
instructions=(
|
||||
"Use the list doc sources tool to see available documentation "
|
||||
"sources. Once you have a source, use fetch docs to get the "
|
||||
"documentation"
|
||||
),
|
||||
instructions=_get_server_instructions(doc_sources),
|
||||
**settings,
|
||||
)
|
||||
httpx_client = httpx.AsyncClient(follow_redirects=follow_redirects, timeout=timeout)
|
||||
|
||||
local_sources = []
|
||||
remote_sources = []
|
||||
|
||||
for entry in doc_sources:
|
||||
url = entry["llms_txt"]
|
||||
if _is_http_or_https(url):
|
||||
remote_sources.append(entry)
|
||||
else:
|
||||
local_sources.append(entry)
|
||||
|
||||
# Let's verify that all local sources exist
|
||||
for entry in local_sources:
|
||||
path = entry["llms_txt"]
|
||||
abs_path = _normalize_path(path)
|
||||
if not os.path.exists(abs_path):
|
||||
raise FileNotFoundError(f"Local file not found: {abs_path}")
|
||||
|
||||
# Parse the domain names in the llms.txt URLs and identify local file paths
|
||||
domains = set(extract_domain(entry["llms_txt"]) for entry in remote_sources)
|
||||
|
||||
# Add additional allowed domains if specified, or set to '*' if we have local files
|
||||
if allowed_domains:
|
||||
if "*" in allowed_domains:
|
||||
domains = {"*"} # Special marker for allowing all domains
|
||||
else:
|
||||
domains.update(allowed_domains)
|
||||
|
||||
allowed_local_files = set(
|
||||
_normalize_path(entry["llms_txt"]) for entry in local_sources
|
||||
)
|
||||
|
||||
@server.tool()
|
||||
def list_doc_sources() -> str:
|
||||
"""List all available doc sources. Always use this first."""
|
||||
"""List all available documentation sources.
|
||||
|
||||
This is the first tool you should call in the documentation workflow.
|
||||
It provides URLs to llms.txt files or local file paths that the user has made available.
|
||||
|
||||
Returns:
|
||||
A string containing a formatted list of documentation sources with their URLs or file paths
|
||||
"""
|
||||
content = ""
|
||||
for entry in doc_source:
|
||||
name = entry.get("name", "") or extract_domain(entry["llms_txt"])
|
||||
content += f"{name}\n"
|
||||
content += "URL: " + entry["llms_txt"] + "\n\n"
|
||||
for entry_ in doc_sources:
|
||||
url_or_path = entry_["llms_txt"]
|
||||
|
||||
if _is_http_or_https(url_or_path):
|
||||
name = entry_.get("name", extract_domain(url_or_path))
|
||||
content += f"{name}\nURL: {url_or_path}\n\n"
|
||||
else:
|
||||
path = _normalize_path(url_or_path)
|
||||
name = entry_.get("name", path)
|
||||
content += f"{name}\nPath: {path}\n\n"
|
||||
return content
|
||||
|
||||
# Parse the domain names in the llms.txt URLs
|
||||
allowed_domains = [extract_domain(entry["llms_txt"]) for entry in doc_source]
|
||||
fetch_docs_description = _get_fetch_description(
|
||||
has_local_sources=bool(local_sources)
|
||||
)
|
||||
|
||||
@server.tool()
|
||||
@server.tool(description=fetch_docs_description)
|
||||
async def fetch_docs(url: str) -> str:
|
||||
"""Use this to fetch documentation from a given URL.
|
||||
nonlocal domains, follow_redirects
|
||||
url = url.strip()
|
||||
# Handle local file paths (either as file:// URLs or direct filesystem paths)
|
||||
if not _is_http_or_https(url):
|
||||
abs_path = _normalize_path(url)
|
||||
if abs_path not in allowed_local_files:
|
||||
raise ValueError(
|
||||
f"Local file not allowed: {abs_path}. Allowed files: {allowed_local_files}"
|
||||
)
|
||||
try:
|
||||
with open(abs_path, "r", encoding="utf-8") as f:
|
||||
content = f.read()
|
||||
return markdownify(content)
|
||||
except Exception as e:
|
||||
return f"Error reading local file: {str(e)}"
|
||||
else:
|
||||
# Otherwise treat as URL
|
||||
if "*" not in domains and not any(
|
||||
url.startswith(domain) for domain in domains
|
||||
):
|
||||
return (
|
||||
"Error: URL not allowed. Must start with one of the following domains: "
|
||||
+ ", ".join(domains)
|
||||
)
|
||||
|
||||
Always use list doc sources before fetching documents.
|
||||
"""
|
||||
nonlocal allowed_domains
|
||||
if not any(url.startswith(domain) for domain in allowed_domains):
|
||||
return (
|
||||
"Error: URL not allowed. Must start with one of the following domains: "
|
||||
+ ", ".join(allowed_domains)
|
||||
)
|
||||
try:
|
||||
response = await httpx_client.get(url, timeout=timeout)
|
||||
response.raise_for_status()
|
||||
content = response.text
|
||||
|
||||
try:
|
||||
response = await httpx_client.get(url, timeout=timeout)
|
||||
response.raise_for_status()
|
||||
return markdownify(response.text)
|
||||
except (httpx.HTTPStatusError, httpx.RequestError) as e:
|
||||
return f"Encountered an HTTP error with code {e.response.status_code}"
|
||||
if follow_redirects:
|
||||
# Check for meta refresh tag which indicates a client-side redirect
|
||||
match = re.search(
|
||||
r'<meta http-equiv="refresh" content="[^;]+;\s*url=([^"]+)"',
|
||||
content,
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
if match:
|
||||
redirect_url = match.group(1)
|
||||
new_url = urljoin(str(response.url), redirect_url)
|
||||
|
||||
if "*" not in domains and not any(
|
||||
new_url.startswith(domain) for domain in domains
|
||||
):
|
||||
return (
|
||||
"Error: Redirect URL not allowed. Must start with one of the following domains: "
|
||||
+ ", ".join(domains)
|
||||
)
|
||||
|
||||
response = await httpx_client.get(new_url, timeout=timeout)
|
||||
response.raise_for_status()
|
||||
content = response.text
|
||||
|
||||
return markdownify(content)
|
||||
except (httpx.HTTPStatusError, httpx.RequestError) as e:
|
||||
return f"Encountered an HTTP error: {str(e)}"
|
||||
|
||||
return server
|
||||
|
||||
@@ -1,8 +1,9 @@
|
||||
[project]
|
||||
name = "mcpdoc"
|
||||
version = "0.0.4"
|
||||
version = "0.0.10"
|
||||
description = "Server llms-txt documentation over MCP"
|
||||
readme = "README.md"
|
||||
license = "MIT"
|
||||
requires-python = ">=3.10"
|
||||
dependencies = [
|
||||
"httpx>=0.28.1",
|
||||
@@ -31,3 +32,18 @@ test = [
|
||||
requires = ["hatchling"]
|
||||
build-backend = "hatchling.build"
|
||||
|
||||
[tool.pytest.ini_options]
|
||||
minversion = "8.0"
|
||||
# -ra: Report all extra test outcomes (passed, skipped, failed, etc.)
|
||||
# -q: Enable quiet mode for less cluttered output
|
||||
# -v: Enable verbose output to display detailed test names and statuses
|
||||
# --durations=5: Show the 10 slowest tests after the run (useful for performance tuning)
|
||||
addopts = "-ra -q -v --durations=5"
|
||||
testpaths = [
|
||||
"tests",
|
||||
]
|
||||
python_files = ["test_*.py"]
|
||||
python_functions = ["test_*"]
|
||||
asyncio_mode = "auto"
|
||||
asyncio_default_fixture_loop_scope = "function"
|
||||
|
||||
|
||||
71
tests/unit_tests/test_main.py
Normal file
71
tests/unit_tests/test_main.py
Normal file
@@ -0,0 +1,71 @@
|
||||
"""Tests for mcpdoc.main module."""
|
||||
|
||||
import pytest
|
||||
|
||||
from mcpdoc.main import (
|
||||
_get_fetch_description,
|
||||
_is_http_or_https,
|
||||
extract_domain,
|
||||
)
|
||||
|
||||
|
||||
def test_extract_domain() -> None:
|
||||
"""Test extract_domain function."""
|
||||
# Test with https URL
|
||||
assert extract_domain("https://example.com/page") == "https://example.com/"
|
||||
|
||||
# Test with http URL
|
||||
assert extract_domain("http://test.org/docs/index.html") == "http://test.org/"
|
||||
|
||||
# Test with URL that has port
|
||||
assert extract_domain("https://localhost:8080/api") == "https://localhost:8080/"
|
||||
|
||||
# Check trailing slash
|
||||
assert extract_domain("https://localhost:8080") == "https://localhost:8080/"
|
||||
|
||||
# Test with URL that has subdomain
|
||||
assert extract_domain("https://docs.python.org/3/") == "https://docs.python.org/"
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"url,expected",
|
||||
[
|
||||
("http://example.com", True),
|
||||
("https://example.com", True),
|
||||
("/path/to/file.txt", False),
|
||||
("file:///path/to/file.txt", False),
|
||||
(
|
||||
"ftp://example.com",
|
||||
False,
|
||||
), # Not HTTP or HTTPS, even though it's not a local file
|
||||
],
|
||||
)
|
||||
def test_is_http_or_https(url, expected):
|
||||
"""Test _is_http_or_https function."""
|
||||
assert _is_http_or_https(url) is expected
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"has_local_sources,expected_substrings",
|
||||
[
|
||||
(True, ["local file path", "file://"]),
|
||||
(False, ["URL to fetch"]),
|
||||
],
|
||||
)
|
||||
def test_get_fetch_description(has_local_sources, expected_substrings):
|
||||
"""Test _get_fetch_description function."""
|
||||
description = _get_fetch_description(has_local_sources)
|
||||
|
||||
# Common assertions for both cases
|
||||
assert "Fetch and parse documentation" in description
|
||||
assert "Returns:" in description
|
||||
|
||||
# Specific assertions based on has_local_sources
|
||||
for substring in expected_substrings:
|
||||
if has_local_sources:
|
||||
assert substring in description
|
||||
else:
|
||||
# For the False case, we only check that "local file path"
|
||||
# and "file://" are NOT present
|
||||
if substring in ["local file path", "file://"]:
|
||||
assert substring not in description
|
||||
Reference in New Issue
Block a user