Compare commits

...

29 Commits

Author SHA1 Message Date
Eugene Yurtsev
538df6d05c Release 0.10.0 (#41) 2025-07-22 16:23:31 -04:00
Aliyan Ishfaq
a429dc788b Fix: handle client-side meta refresh redirects (#40)
Fixes the "Redirecting..." response issue by adding support for HTML
meta refresh redirects in `mcpdoc/main.py`.

- Parses `<meta http-equiv="refresh">` tags to follow client-side
redirects
- Consistent with existing `--follow-redirects` flag behavior
- Resolves cases where documentation sites use meta refresh instead of
HTTP redirects

Modified: `mcpdoc/main.py`
2025-07-22 16:22:39 -04:00
Eugene Yurtsev
d1db6319b9 Update README.md (#37)
fix typo in readme with claude code
2025-07-07 17:12:08 -04:00
Eugene Yurtsev
74237e7714 Release 0.0.9 (#36) 2025-07-07 17:03:56 -04:00
Eugene Yurtsev
b0f7a8e2ad mcpdoc: update server description based on available tools (#35) 2025-07-07 17:03:14 -04:00
Eugene Yurtsev
c9b45f098b ci: configure pytest (#24) 2025-04-05 13:52:21 -04:00
Larsen Weigle
3f859a3fc9 fix(mcpdoc): update readme cli example and mcp json. (#22)
See[ this issue
thread](https://github.com/langchain-ai/mcpdoc/issues/21).

Update examples in the readme to match the arg parser in `cli.py` which
is configured to append multiple urls:

```python
    parser.add_argument(
        "--urls",
        "-u",
        type=str,
        nargs="+",
        help="List of llms.txt URLs or file paths with optional names (format: 'url_or_path' or 'name:url_or_path')",
    )
```

The current examples in the readme file uses multiple `--url` flags so
the previous url is overridden with each new url flag.
2025-04-05 13:23:50 -04:00
Eugene Yurtsev
6a0d649d30 fix: settings propagation (#19)
Fixes: https://github.com/langchain-ai/mcpdoc/issues/17
2025-03-31 11:43:56 -04:00
Eugene Yurtsev
35e5481ada README: add multiple url examples (#16) 2025-03-28 13:41:03 -04:00
Eugene Yurtsev
53479ff021 Update README.md (#15) 2025-03-28 13:23:14 -04:00
Eugene Yurtsev
7e62344a91 docs: scale down image 2025-03-27 13:56:14 -04:00
Eugene Yurtsev
bac98dc41a Update README.md 2025-03-27 13:53:37 -04:00
Eugene Yurtsev
a885a655cc Release 0.0.7 2025-03-27 13:23:57 -04:00
Lance Martin
c2977b3602 Add local llms.txt file reading (#14)
Add ability to read llms.txt from local files.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2025-03-27 10:22:42 -07:00
Eugene Yurtsev
1bc11f5ea1 update uv lock file (#13) 2025-03-24 10:29:59 -04:00
Eugene Yurtsev
ef4d6b08ab release 0.0.6 (#12) 2025-03-24 10:18:49 -04:00
Eugene Yurtsev
a9e1b14d43 add allowed domains cli option (#11) 2025-03-24 10:17:45 -04:00
Vadym Barda
71ddda1d09 use set for allowed domains (#9) 2025-03-24 10:08:42 -04:00
Vadym Barda
bb3328b0c3 update config README (#10) 2025-03-24 10:08:33 -04:00
Lance Martin
f7556c9bd6 Update README with rules 2025-03-21 13:29:08 -07:00
Lance Martin
921fe07dd0 release 0.0.5: Update / improve the tool descriptions (#8)
Currently I add this workflow to Cursor. We should embed this in the
tool itself.

```
use the langgraph-docs-mcp server to answer any LangGraph questions -- 
+ call list_doc_sources tool to get the available llms.txt file
+ call fetch_docs tool to read it
+ reflect on the urls in llms.txt 
+ reflect on the input question 
+ call fetch_docs on any urls relevant to the question
+ use this to answer the question
```

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2025-03-21 14:40:35 -04:00
Lance Martin
0e688fee9a Merge branch 'main' of https://github.com/langchain-ai/mcpdoc 2025-03-19 15:03:10 -07:00
Lance Martin
677facfb64 Minor update 2025-03-19 15:02:45 -07:00
Lance Martin
465e69ffcb Update README.md 2025-03-19 11:03:27 -07:00
Lance Martin
be13a215f4 Update 2025-03-19 11:00:35 -07:00
Lance Martin
6e2221fd5b Update README.md 2025-03-18 15:45:21 -07:00
Lance Martin
19d45109c2 Minor update 2025-03-18 15:44:04 -07:00
Eugene Yurtsev
5d15b6c113 Update README.md (#6) 2025-03-18 17:54:08 -04:00
Lance Martin
fd354128ce Update README (#5) 2025-03-18 17:50:25 -04:00
6 changed files with 586 additions and 92 deletions

301
README.md
View File

@@ -1,99 +1,286 @@
# MCP LLMS-TXT Documentation Server
The MCP LLMS-TXT Documentation Server is a specialized Model Control Protocol (MCP) server that delivers documentation directly from llms.txt files. It serves as a testbed for integrating documentation into IDEs via external **tools**, rather than relying solely on built-in features. While future IDEs may offer robust native support for llms.txt files, this server allows us to experiment with alternative methods, giving us full control over how documentation is retrieved and displayed.
## Overview
## Usage
[llms.txt](https://llmstxt.org/) is a website index for LLMs, providing background information, guidance, and links to detailed markdown files. IDEs like Cursor and Windsurf or apps like Claude Code/Desktop can use `llms.txt` to retrieve context for tasks. However, these apps use different built-in tools to read and process files like `llms.txt`. The retrieval process can be opaque, and there is not always a way to audit the tool calls or the context returned.
### Cursor
[MCP](https://github.com/modelcontextprotocol) offers a way for developers to have *full control* over tools used by these applications. Here, we create [an open source MCP server](https://github.com/modelcontextprotocol) to provide MCP host applications (e.g., Cursor, Windsurf, Claude Code/Desktop) with (1) a user-defined list of `llms.txt` files and (2) a simple `fetch_docs` tool read URLs within any of the provided `llms.txt` files. This allows the user to audit each tool call as well as the context returned.
1. Install Cursor: https://www.cursor.com/en
2. Launch the MCP server in **SSE** transport.
```shell
uvx --from mcpdoc mcpdoc \
--urls LangGraph:https://langchain-ai.github.io/langgraph/llms.txt \
--transport sse \
--port 8081
--host localhost
```
<img src="https://github.com/user-attachments/assets/736f8f55-833d-4200-b833-5fca01a09e1b" width="60%">
3. Add the mcp server to Cursor. Remember to put the URL as **[host]/sse** for example **http://localhost:8081/sse**.
## llms-txt
Cursor needs to be in **agent** mode for this to work.
You can find llms.txt files for langgraph and langchain here:
5. You should be able to use it within composer now.
| Library | llms.txt |
|------------------|------------------------------------------------------------------------------------------------------------|
| LangGraph Python | [https://langchain-ai.github.io/langgraph/llms.txt](https://langchain-ai.github.io/langgraph/llms.txt) |
| LangGraph JS | [https://langchain-ai.github.io/langgraphjs/llms.txt](https://langchain-ai.github.io/langgraphjs/llms.txt) |
| LangChain Python | [https://python.langchain.com/llms.txt](https://python.langchain.com/llms.txt) |
| LangChain JS | [https://js.langchain.com/llms.txt](https://js.langchain.com/llms.txt) |
### Claude Code
## Quickstart
1. Install Claude Code: https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/overview
2. Install [uv](https://github.com/astral-sh/uv). This step is required if you want to run the MCP server in using `uvx` command. This is generally recommended as it'll simplify all the dependency management for you.
3. Configure the MCP server with claude code
#### Install uv
* Please see [official uv docs](https://docs.astral.sh/uv/getting-started/installation/#installation-methods) for other ways to install `uv`.
```shell
claude mcp add-json langgraph-docs '{"type":"stdio","command":"uvx" ,"args":["--from", "mcpdoc", "mcpdoc", "--urls", "langgraph:https://langchain-ai.github.io/langgraph/llms.txt"]}' -s user
```
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```
4. Launch claude code
#### Choose an `llms.txt` file to use.
* For example, [here's](https://langchain-ai.github.io/langgraph/llms.txt) the LangGraph `llms.txt` file.
```shell
claude code
```
Verify that the server is running by typing `/mcp` in the chat window.
> **Note: Security and Domain Access Control**
>
> For security reasons, mcpdoc implements strict domain access controls:
>
> 1. **Remote llms.txt files**: When you specify a remote llms.txt URL (e.g., `https://langchain-ai.github.io/langgraph/llms.txt`), mcpdoc automatically adds only that specific domain (`langchain-ai.github.io`) to the allowed domains list. This means the tool can only fetch documentation from URLs on that domain.
>
> 2. **Local llms.txt files**: When using a local file, NO domains are automatically added to the allowed list. You MUST explicitly specify which domains to allow using the `--allowed-domains` parameter.
>
> 3. **Adding additional domains**: To allow fetching from domains beyond those automatically included:
> - Use `--allowed-domains domain1.com domain2.com` to add specific domains
> - Use `--allowed-domains '*'` to allow all domains (use with caution)
>
> This security measure prevents unauthorized access to domains not explicitly approved by the user, ensuring that documentation can only be retrieved from trusted sources.
```
> /mcp
```
#### (Optional) Test the MCP server locally with your `llms.txt` file(s) of choice:
```bash
uvx --from mcpdoc mcpdoc \
--urls "LangGraph:https://langchain-ai.github.io/langgraph/llms.txt" "LangChain:https://python.langchain.com/llms.txt" \
--transport sse \
--port 8082 \
--host localhost
```
5. Test it out!
* This should run at: http://localhost:8082
```
> Write a langgraph application with two agents that debate the merits of taking a shower.
```
This MCP server was only configured with LangGraph documentation, but you can add more documentation sources by adding more `--urls` arguments or loading it from a JSON file or a YAML file.
![Screenshot 2025-03-18 at 3 29 30 PM](https://github.com/user-attachments/assets/24a3d483-cd7a-4c7e-a4f7-893df70e888f)
* Run [MCP inspector](https://modelcontextprotocol.io/docs/tools/inspector) and connect to the running server:
```bash
npx @modelcontextprotocol/inspector
```
![Screenshot 2025-03-18 at 3 30 30 PM](https://github.com/user-attachments/assets/14645d57-1b52-4a5e-abfe-8e7756772704)
* Here, you can test the `tool` calls.
#### Connect to Cursor
* Open `Cursor Settings` and `MCP` tab.
* This will open the `~/.cursor/mcp.json` file.
### Command-line Interface
![Screenshot 2025-03-19 at 11 01 31 AM](https://github.com/user-attachments/assets/3d1c8eb3-4d40-487f-8bad-3f9e660f770a)
The `mcpdoc` command provides a simple CLI for launching the documentation server. You can specify documentation sources in three ways, and these can be combined:
* Paste the following into the file (we use the `langgraph-docs-mcp` name and link to the LangGraph `llms.txt`).
```
{
"mcpServers": {
"langgraph-docs-mcp": {
"command": "uvx",
"args": [
"--from",
"mcpdoc",
"mcpdoc",
"--urls",
"LangGraph:https://langchain-ai.github.io/langgraph/llms.txt LangChain:https://python.langchain.com/llms.txt",
"--transport",
"stdio"
]
}
}
}
```
* Confirm that the server is running in your `Cursor Settings/MCP` tab.
* Best practice is to then update Cursor Global (User) rules.
* Open Cursor `Settings/Rules` and update `User Rules` with the following (or similar):
```
for ANY question about LangGraph, use the langgraph-docs-mcp server to help answer --
+ call list_doc_sources tool to get the available llms.txt file
+ call fetch_docs tool to read it
+ reflect on the urls in llms.txt
+ reflect on the input question
+ call fetch_docs on any urls relevant to the question
+ use this to answer the question
```
* `CMD+L` (on Mac) to open chat.
* Ensure `agent` is selected.
![Screenshot 2025-03-18 at 1 56 54 PM](https://github.com/user-attachments/assets/0dd747d0-7ec0-43d2-b6ef-cdcf5a2a30bf)
Then, try an example prompt, such as:
```
what are types of memory in LangGraph?
```
![Screenshot 2025-03-18 at 1 58 38 PM](https://github.com/user-attachments/assets/180966b5-ab03-4b78-8b5d-bab43f5954ed)
### Connect to Windsurf
* Open Cascade with `CMD+L` (on Mac).
* Click `Configure MCP` to open the config file, `~/.codeium/windsurf/mcp_config.json`.
* Update with `langgraph-docs-mcp` as noted above.
![Screenshot 2025-03-19 at 11 02 52 AM](https://github.com/user-attachments/assets/d45b427c-1c1e-4602-820a-7161a310af24)
* Update `Windsurf Rules/Global rules` with the following (or similar):
```
for ANY question about LangGraph, use the langgraph-docs-mcp server to help answer --
+ call list_doc_sources tool to get the available llms.txt file
+ call fetch_docs tool to read it
+ reflect on the urls in llms.txt
+ reflect on the input question
+ call fetch_docs on any urls relevant to the question
```
![Screenshot 2025-03-18 at 2 02 12 PM](https://github.com/user-attachments/assets/5a29bd6a-ad9a-4c4a-a4d5-262c914c5276)
Then, try the example prompt:
* It will perform your tool calls.
![Screenshot 2025-03-18 at 2 03 07 PM](https://github.com/user-attachments/assets/0e24e1b2-dc94-4153-b4fa-495fd768125b)
### Connect to Claude Desktop
* Open `Settings/Developer` to update `~/Library/Application\ Support/Claude/claude_desktop_config.json`.
* Update with `langgraph-docs-mcp` as noted above.
* Restart Claude Desktop app.
> [!Note]
> If you run into issues with Python version incompatibility when trying to add MCPDoc tools to Claude Desktop, you can explicitly specify the filepath to `python` executable in the `uvx` command.
>
> <details>
> <summary>Example configuration</summary>
>
> ```
> {
> "mcpServers": {
> "langgraph-docs-mcp": {
> "command": "uvx",
> "args": [
> "--python",
> "/path/to/python",
> "--from",
> "mcpdoc",
> "mcpdoc",
> "--urls",
> "LangGraph:https://langchain-ai.github.io/langgraph/llms.txt",
> "--transport",
> "stdio"
> ]
> }
> }
> }
> ```
> </details>
> [!Note]
> Currently (3/21/25) it appears that Claude Desktop does not support `rules` for global rules, so appending the following to your prompt.
```
<rules>
for ANY question about LangGraph, use the langgraph-docs-mcp server to help answer --
+ call list_doc_sources tool to get the available llms.txt file
+ call fetch_docs tool to read it
+ reflect on the urls in llms.txt
+ reflect on the input question
+ call fetch_docs on any urls relevant to the question
</rules>
```
![Screenshot 2025-03-18 at 2 05 54 PM](https://github.com/user-attachments/assets/228d96b6-8fb3-4385-8399-3e42fa08b128)
* You will see your tools visible in the bottom right of your chat input.
![Screenshot 2025-03-18 at 2 05 39 PM](https://github.com/user-attachments/assets/71f3c507-91b2-4fa7-9bd1-ac9cbed73cfb)
Then, try the example prompt:
* It will ask to approve tool calls as it processes your request.
![Screenshot 2025-03-18 at 2 06 54 PM](https://github.com/user-attachments/assets/59b3a010-94fa-4a4d-b650-5cd449afeec0)
### Connect to Claude Code
* In a terminal after installing [Claude Code](https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/overview), run this command to add the MCP server to your project:
```
claude mcp add-json langgraph-docs '{"type":"stdio","command":"uvx" ,"args":["--from", "mcpdoc", "mcpdoc", "--urls", "langgraph:https://langchain-ai.github.io/langgraph/llms.txt", "LangChain:https://python.langchain.com/llms.txt"]}' -s local
```
* You will see `~/.claude.json` updated.
* Test by launching Claude Code and running to view your tools:
```
$ Claude
$ /mcp
```
![Screenshot 2025-03-18 at 2 13 49 PM](https://github.com/user-attachments/assets/eb876a0e-27b4-480e-8c37-0f683f878616)
> [!Note]
> Currently (3/21/25) it appears that Claude Code does not support `rules` for global rules, so appending the following to your prompt.
```
<rules>
for ANY question about LangGraph, use the langgraph-docs-mcp server to help answer --
+ call list_doc_sources tool to get the available llms.txt file
+ call fetch_docs tool to read it
+ reflect on the urls in llms.txt
+ reflect on the input question
+ call fetch_docs on any urls relevant to the question
</rules>
```
Then, try the example prompt:
* It will ask to approve tool calls.
![Screenshot 2025-03-18 at 2 14 37 PM](https://github.com/user-attachments/assets/5b9a2938-ea69-4443-8d3b-09061faccad0)
## Command-line Interface
The `mcpdoc` command provides a simple CLI for launching the documentation server.
You can specify documentation sources in three ways, and these can be combined:
1. Using a YAML config file:
* This will load the LangGraph Python documentation from the `sample_config.yaml` file in this repo.
```bash
mcpdoc --yaml sample_config.yaml
```
This will load the LangGraph Python documentation from the sample_config.yaml file.
2. Using a JSON config file:
* This will load the LangGraph Python documentation from the `sample_config.json` file in this repo.
```bash
mcpdoc --json sample_config.json
```
This will load the LangGraph Python documentation from the sample_config.json file.
3. Directly specifying llms.txt URLs with optional names:
```bash
mcpdoc --urls https://langchain-ai.github.io/langgraph/llms.txt LangGraph:https://langchain-ai.github.io/langgraph/llms.txt
```
* URLs can be specified either as plain URLs or with optional names using the format `name:url`.
* You can specify multiple URLs by using the `--urls` parameter multiple times.
* This is how we loaded `llms.txt` for the MCP server above.
URLs can be specified either as plain URLs or with optional names using the format `name:url`.
```bash
mcpdoc --urls LangGraph:https://langchain-ai.github.io/langgraph/llms.txt --urls LangChain:https://python.langchain.com/llms.txt
```
You can also combine these methods to merge documentation sources:
```bash
mcpdoc --yaml sample_config.yaml --json sample_config.json --urls https://langchain-ai.github.io/langgraph/llms.txt
mcpdoc --yaml sample_config.yaml --json sample_config.json --urls LangGraph:https://langchain-ai.github.io/langgraph/llms.txt --urls LangChain:https://python.langchain.com/llms.txt
```
### Additional Options
## Additional Options
- `--follow-redirects`: Follow HTTP redirects (defaults to False)
- `--timeout SECONDS`: HTTP request timeout in seconds (defaults to 10.0)
@@ -106,11 +293,13 @@ mcpdoc --yaml sample_config.yaml --follow-redirects --timeout 15
This will load the LangGraph Python documentation with a 15-second timeout and follow any HTTP redirects if necessary.
### Configuration Format
## Configuration Format
Both YAML and JSON configuration files should contain a list of documentation sources. Each source must include an `llms_txt` URL and can optionally include a `name`:
Both YAML and JSON configuration files should contain a list of documentation sources.
#### YAML Configuration Example (sample_config.yaml)
Each source must include an `llms_txt` URL and can optionally include a `name`:
### YAML Configuration Example (sample_config.yaml)
```yaml
# Sample configuration for mcp-mcpdoc server
@@ -119,7 +308,7 @@ Both YAML and JSON configuration files should contain a list of documentation so
llms_txt: https://langchain-ai.github.io/langgraph/llms.txt
```
#### JSON Configuration Example (sample_config.json)
### JSON Configuration Example (sample_config.json)
```json
[
@@ -130,7 +319,7 @@ Both YAML and JSON configuration files should contain a list of documentation so
]
```
### Programmatic Usage
## Programmatic Usage
```python
from mcpdoc.main import create_server

View File

@@ -25,6 +25,9 @@ Examples:
# Directly specifying llms.txt URLs with optional names
mcpdoc --urls LangGraph:https://langchain-ai.github.io/langgraph/llms.txt
# Using a local file (absolute or relative path)
mcpdoc --urls LocalDocs:/path/to/llms.txt --allowed-domains '*'
# Using a YAML config file
mcpdoc --yaml sample_config.yaml
@@ -42,6 +45,12 @@ Examples:
# Using SSE transport with additional HTTP options
mcpdoc --yaml sample_config.yaml --follow-redirects --timeout 15 --transport sse --host localhost --port 8080
# Allow fetching from additional domains. The domains hosting the llms.txt files are always allowed.
mcpdoc --yaml sample_config.yaml --allowed-domains https://example.com/ https://another-example.com/
# Allow fetching from any domain
mcpdoc --yaml sample_config.yaml --allowed-domains '*'
"""
@@ -66,7 +75,7 @@ def parse_args() -> argparse.Namespace:
"-u",
type=str,
nargs="+",
help="List of llms.txt URLs with optional names (format: 'url' or 'name:url')",
help="List of llms.txt URLs or file paths with optional names (format: 'url_or_path' or 'name:url_or_path')",
)
parser.add_argument(
@@ -74,6 +83,12 @@ def parse_args() -> argparse.Namespace:
action="store_true",
help="Whether to follow HTTP redirects",
)
parser.add_argument(
"--allowed-domains",
type=str,
nargs="*",
help="Additional allowed domains to fetch documentation from. Use '*' to allow all domains.",
)
parser.add_argument(
"--timeout", type=float, default=10.0, help="HTTP request timeout in seconds"
)
@@ -151,10 +166,11 @@ def load_config_file(file_path: str, file_format: str) -> List[Dict[str, str]]:
def create_doc_sources_from_urls(urls: List[str]) -> List[DocSource]:
"""Create doc sources from a list of URLs with optional names.
"""Create doc sources from a list of URLs or file paths with optional names.
Args:
urls: List of llms.txt URLs with optional names (format: 'url' or 'name:url')
urls: List of llms.txt URLs or file paths with optional names
(format: 'url_or_path' or 'name:url_or_path')
Returns:
List of DocSource objects
@@ -229,6 +245,7 @@ def main() -> None:
follow_redirects=args.follow_redirects,
timeout=args.timeout,
settings=settings,
allowed_domains=args.allowed_domains,
)
if args.transport == "sse":

View File

@@ -1,6 +1,8 @@
"""MCP Llms-txt server for docs."""
from urllib.parse import urlparse
import os
import re
from urllib.parse import urlparse, urljoin
import httpx
from markdownify import markdownify
@@ -34,56 +36,255 @@ def extract_domain(url: str) -> str:
return f"{parsed.scheme}://{parsed.netloc}/"
def _is_http_or_https(url: str) -> bool:
"""Check if the URL is an HTTP or HTTPS URL."""
return url.startswith(("http:", "https:"))
def _get_fetch_description(has_local_sources: bool) -> str:
"""Get fetch docs tool description."""
description = [
"Fetch and parse documentation from a given URL or local file.",
"",
"Use this tool after list_doc_sources to:",
"1. First fetch the llms.txt file from a documentation source",
"2. Analyze the URLs listed in the llms.txt file",
"3. Then fetch specific documentation pages relevant to the user's question",
"",
]
if has_local_sources:
description.extend(
[
"Args:",
" url: The URL or file path to fetch documentation from. Can be:",
" - URL from an allowed domain",
" - A local file path (absolute or relative)",
" - A file:// URL (e.g., file:///path/to/llms.txt)",
]
)
else:
description.extend(
[
"Args:",
" url: The URL to fetch documentation from.",
]
)
description.extend(
[
"",
"Returns:",
" The fetched documentation content converted to markdown, or an error message", # noqa: E501
" if the request fails or the URL is not from an allowed domain.",
]
)
return "\n".join(description)
def _normalize_path(path: str) -> str:
"""Accept paths in file:/// or relative format and map to absolute paths."""
return (
os.path.abspath(path[7:])
if path.startswith("file://")
else os.path.abspath(path)
)
def _get_server_instructions(doc_sources: list[DocSource]) -> str:
"""Generate server instructions with available documentation source names."""
# Extract source names from doc_sources
source_names = []
for entry in doc_sources:
if "name" in entry:
source_names.append(entry["name"])
elif _is_http_or_https(entry["llms_txt"]):
# Use domain name as fallback for HTTP sources
domain = extract_domain(entry["llms_txt"])
source_names.append(domain.rstrip("/").split("//")[-1])
else:
# Use filename as fallback for local sources
source_names.append(os.path.basename(entry["llms_txt"]))
instructions = [
"Use the list_doc_sources tool to see available documentation sources.",
"This tool will return a URL for each documentation source.",
]
if source_names:
if len(source_names) == 1:
instructions.append(
f"Documentation URLs are available from this tool "
f"for {source_names[0]}."
)
else:
names_str = ", ".join(source_names[:-1]) + f", and {source_names[-1]}"
instructions.append(
f"Documentation URLs are available from this tool for {names_str}."
)
instructions.extend(
[
"",
"Once you have a source documentation URL, use the fetch_docs tool "
"to get the documentation contents. ",
"If the documentation contents contains a URL for additional documentation "
"that is relevant to your task, you can use the fetch_docs tool to "
"fetch documentation from that URL next.",
]
)
return "\n".join(instructions)
def create_server(
doc_source: list[DocSource],
doc_sources: list[DocSource],
*,
follow_redirects: bool = False,
timeout: float = 10,
settings: dict | None = None,
allowed_domains: list[str] | None = None,
) -> FastMCP:
"""Create the server and generate tools."""
"""Create the server and generate documentation retrieval tools.
Args:
doc_sources: List of documentation sources to make available
follow_redirects: Whether to follow HTTP redirects when fetching docs
timeout: HTTP request timeout in seconds
settings: Additional settings to pass to FastMCP
allowed_domains: Additional domains to allow fetching from.
Use ['*'] to allow all domains
The domain hosting the llms.txt file is always appended to the list
of allowed domains.
Returns:
A FastMCP server instance configured with documentation tools
"""
settings = settings or {}
server = FastMCP(
name="llms-txt",
instructions=(
"Use the list doc sources tool to see available documentation "
"sources. Once you have a source, use fetch docs to get the "
"documentation"
),
instructions=_get_server_instructions(doc_sources),
**settings,
)
httpx_client = httpx.AsyncClient(follow_redirects=follow_redirects, timeout=timeout)
local_sources = []
remote_sources = []
for entry in doc_sources:
url = entry["llms_txt"]
if _is_http_or_https(url):
remote_sources.append(entry)
else:
local_sources.append(entry)
# Let's verify that all local sources exist
for entry in local_sources:
path = entry["llms_txt"]
abs_path = _normalize_path(path)
if not os.path.exists(abs_path):
raise FileNotFoundError(f"Local file not found: {abs_path}")
# Parse the domain names in the llms.txt URLs and identify local file paths
domains = set(extract_domain(entry["llms_txt"]) for entry in remote_sources)
# Add additional allowed domains if specified, or set to '*' if we have local files
if allowed_domains:
if "*" in allowed_domains:
domains = {"*"} # Special marker for allowing all domains
else:
domains.update(allowed_domains)
allowed_local_files = set(
_normalize_path(entry["llms_txt"]) for entry in local_sources
)
@server.tool()
def list_doc_sources() -> str:
"""List all available doc sources. Always use this first."""
"""List all available documentation sources.
This is the first tool you should call in the documentation workflow.
It provides URLs to llms.txt files or local file paths that the user has made available.
Returns:
A string containing a formatted list of documentation sources with their URLs or file paths
"""
content = ""
for entry in doc_source:
name = entry.get("name", "") or extract_domain(entry["llms_txt"])
content += f"{name}\n"
content += "URL: " + entry["llms_txt"] + "\n\n"
for entry_ in doc_sources:
url_or_path = entry_["llms_txt"]
if _is_http_or_https(url_or_path):
name = entry_.get("name", extract_domain(url_or_path))
content += f"{name}\nURL: {url_or_path}\n\n"
else:
path = _normalize_path(url_or_path)
name = entry_.get("name", path)
content += f"{name}\nPath: {path}\n\n"
return content
# Parse the domain names in the llms.txt URLs
allowed_domains = [extract_domain(entry["llms_txt"]) for entry in doc_source]
fetch_docs_description = _get_fetch_description(
has_local_sources=bool(local_sources)
)
@server.tool()
@server.tool(description=fetch_docs_description)
async def fetch_docs(url: str) -> str:
"""Use this to fetch documentation from a given URL.
nonlocal domains, follow_redirects
url = url.strip()
# Handle local file paths (either as file:// URLs or direct filesystem paths)
if not _is_http_or_https(url):
abs_path = _normalize_path(url)
if abs_path not in allowed_local_files:
raise ValueError(
f"Local file not allowed: {abs_path}. Allowed files: {allowed_local_files}"
)
try:
with open(abs_path, "r", encoding="utf-8") as f:
content = f.read()
return markdownify(content)
except Exception as e:
return f"Error reading local file: {str(e)}"
else:
# Otherwise treat as URL
if "*" not in domains and not any(
url.startswith(domain) for domain in domains
):
return (
"Error: URL not allowed. Must start with one of the following domains: "
+ ", ".join(domains)
)
Always use list doc sources before fetching documents.
"""
nonlocal allowed_domains
if not any(url.startswith(domain) for domain in allowed_domains):
return (
"Error: URL not allowed. Must start with one of the following domains: "
+ ", ".join(allowed_domains)
)
try:
response = await httpx_client.get(url, timeout=timeout)
response.raise_for_status()
content = response.text
try:
response = await httpx_client.get(url, timeout=timeout)
response.raise_for_status()
return markdownify(response.text)
except (httpx.HTTPStatusError, httpx.RequestError) as e:
return f"Encountered an HTTP error with code {e.response.status_code}"
if follow_redirects:
# Check for meta refresh tag which indicates a client-side redirect
match = re.search(
r'<meta http-equiv="refresh" content="[^;]+;\s*url=([^"]+)"',
content,
re.IGNORECASE,
)
if match:
redirect_url = match.group(1)
new_url = urljoin(str(response.url), redirect_url)
if "*" not in domains and not any(
new_url.startswith(domain) for domain in domains
):
return (
"Error: Redirect URL not allowed. Must start with one of the following domains: "
+ ", ".join(domains)
)
response = await httpx_client.get(new_url, timeout=timeout)
response.raise_for_status()
content = response.text
return markdownify(content)
except (httpx.HTTPStatusError, httpx.RequestError) as e:
return f"Encountered an HTTP error: {str(e)}"
return server

View File

@@ -1,8 +1,9 @@
[project]
name = "mcpdoc"
version = "0.0.4"
version = "0.0.10"
description = "Server llms-txt documentation over MCP"
readme = "README.md"
license = "MIT"
requires-python = ">=3.10"
dependencies = [
"httpx>=0.28.1",
@@ -31,3 +32,18 @@ test = [
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.pytest.ini_options]
minversion = "8.0"
# -ra: Report all extra test outcomes (passed, skipped, failed, etc.)
# -q: Enable quiet mode for less cluttered output
# -v: Enable verbose output to display detailed test names and statuses
# --durations=5: Show the 10 slowest tests after the run (useful for performance tuning)
addopts = "-ra -q -v --durations=5"
testpaths = [
"tests",
]
python_files = ["test_*.py"]
python_functions = ["test_*"]
asyncio_mode = "auto"
asyncio_default_fixture_loop_scope = "function"

View File

@@ -0,0 +1,71 @@
"""Tests for mcpdoc.main module."""
import pytest
from mcpdoc.main import (
_get_fetch_description,
_is_http_or_https,
extract_domain,
)
def test_extract_domain() -> None:
"""Test extract_domain function."""
# Test with https URL
assert extract_domain("https://example.com/page") == "https://example.com/"
# Test with http URL
assert extract_domain("http://test.org/docs/index.html") == "http://test.org/"
# Test with URL that has port
assert extract_domain("https://localhost:8080/api") == "https://localhost:8080/"
# Check trailing slash
assert extract_domain("https://localhost:8080") == "https://localhost:8080/"
# Test with URL that has subdomain
assert extract_domain("https://docs.python.org/3/") == "https://docs.python.org/"
@pytest.mark.parametrize(
"url,expected",
[
("http://example.com", True),
("https://example.com", True),
("/path/to/file.txt", False),
("file:///path/to/file.txt", False),
(
"ftp://example.com",
False,
), # Not HTTP or HTTPS, even though it's not a local file
],
)
def test_is_http_or_https(url, expected):
"""Test _is_http_or_https function."""
assert _is_http_or_https(url) is expected
@pytest.mark.parametrize(
"has_local_sources,expected_substrings",
[
(True, ["local file path", "file://"]),
(False, ["URL to fetch"]),
],
)
def test_get_fetch_description(has_local_sources, expected_substrings):
"""Test _get_fetch_description function."""
description = _get_fetch_description(has_local_sources)
# Common assertions for both cases
assert "Fetch and parse documentation" in description
assert "Returns:" in description
# Specific assertions based on has_local_sources
for substring in expected_substrings:
if has_local_sources:
assert substring in description
else:
# For the False case, we only check that "local file path"
# and "file://" are NOT present
if substring in ["local file path", "file://"]:
assert substring not in description

2
uv.lock generated
View File

@@ -259,7 +259,7 @@ cli = [
[[package]]
name = "mcpdoc"
version = "0.0.3"
version = "0.0.8"
source = { editable = "." }
dependencies = [
{ name = "httpx" },