Release 0.10.0 (#41 )

Fix: handle client-side meta refresh redirects (#40 )
Fixes the "Redirecting..." response issue by adding support for HTML meta refresh redirects in `mcpdoc/main.py`. - Parses `<meta http-equiv="refresh">` tags to follow client-side redirects - Consistent with existing `--follow-redirects` flag behavior - Resolves cases where documentation sites use meta refresh instead of HTTP redirects Modified: `mcpdoc/main.py`
2025-10-19 03:18:14 +03:00 · 2025-07-22 16:23:31 -04:00 · 2025-07-22 16:22:39 -04:00 · 2025-07-07 17:12:08 -04:00 · 2025-07-07 17:03:56 -04:00 · 2025-07-07 17:03:14 -04:00
6 changed files with 586 additions and 92 deletions
--- a/README.md
+++ b/README.md
@@ -1,99 +1,286 @@
 # MCP LLMS-TXT Documentation Server

-The MCP LLMS-TXT Documentation Server is a specialized Model Control Protocol (MCP) server that delivers documentation directly from llms.txt files. It serves as a testbed for integrating documentation into IDEs via external **tools**, rather than relying solely on built-in features. While future IDEs may offer robust native support for llms.txt files, this server allows us to experiment with alternative methods, giving us full control over how documentation is retrieved and displayed.
+## Overview

-## Usage
+[llms.txt](https://llmstxt.org/) is a website index for LLMs, providing background information, guidance, and links to detailed markdown files. IDEs like Cursor and Windsurf or apps like Claude Code/Desktop can use `llms.txt` to retrieve context for tasks. However, these apps use different built-in tools to read and process files like `llms.txt`. The retrieval process can be opaque, and there is not always a way to audit the tool calls or the context returned.

-### Cursor
+[MCP](https://github.com/modelcontextprotocol) offers a way for developers to have *full control* over tools used by these applications. Here, we create [an open source MCP server](https://github.com/modelcontextprotocol) to provide MCP host applications (e.g., Cursor, Windsurf, Claude Code/Desktop) with (1) a user-defined list of `llms.txt` files and (2) a simple  `fetch_docs` tool read URLs within any of the provided `llms.txt` files. This allows the user to audit each tool call as well as the context returned. 

-1. Install Cursor: https://www.cursor.com/en 
-2. Launch the MCP server in **SSE** transport.
- 
-   ```shell
-   uvx --from mcpdoc mcpdoc \
-       --urls LangGraph:https://langchain-ai.github.io/langgraph/llms.txt \ 
-       --transport sse \
-       --port 8081
-       --host localhost
-   ```
+<img src="https://github.com/user-attachments/assets/736f8f55-833d-4200-b833-5fca01a09e1b" width="60%">

-3. Add the mcp server to Cursor. Remember to put the URL as **[host]/sse** for example **http://localhost:8081/sse**.
+## llms-txt

-Cursor needs to be in **agent** mode for this to work.
+You can find llms.txt files for langgraph and langchain here:

-5. You should be able to use it within composer now.
+| Library          | llms.txt                                                                                                   |
+|------------------|------------------------------------------------------------------------------------------------------------|
+| LangGraph Python | [https://langchain-ai.github.io/langgraph/llms.txt](https://langchain-ai.github.io/langgraph/llms.txt)     |
+| LangGraph JS     | [https://langchain-ai.github.io/langgraphjs/llms.txt](https://langchain-ai.github.io/langgraphjs/llms.txt) |
+| LangChain Python | [https://python.langchain.com/llms.txt](https://python.langchain.com/llms.txt)                             |
+| LangChain JS     | [https://js.langchain.com/llms.txt](https://js.langchain.com/llms.txt)                                     |

-### Claude Code
+## Quickstart

-1. Install Claude Code: https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/overview
-2. Install [uv](https://github.com/astral-sh/uv). This step is required if you want to run the MCP server in using `uvx` command. This is generally recommended as it'll simplify all the dependency management for you.
-3. Configure the MCP server with claude code
+#### Install uv
+* Please see [official uv docs](https://docs.astral.sh/uv/getting-started/installation/#installation-methods) for other ways to install `uv`.

-    ```shell
-    claude mcp add-json langgraph-docs  '{"type":"stdio","command":"uvx" ,"args":["--from", "mcpdoc", "mcpdoc", "--urls", "langgraph:https://langchain-ai.github.io/langgraph/llms.txt"]}' -s user
-    ```
+```bash
+curl -LsSf https://astral.sh/uv/install.sh | sh
+```

-4. Launch claude code
+#### Choose an `llms.txt` file to use. 
+* For example, [here's](https://langchain-ai.github.io/langgraph/llms.txt) the LangGraph `llms.txt` file.

-    ```shell
-    claude code
-    ```
-   
-    Verify that the server is running by typing `/mcp` in the chat window.
+> **Note: Security and Domain Access Control**
+> 
+> For security reasons, mcpdoc implements strict domain access controls:
+> 
+> 1. **Remote llms.txt files**: When you specify a remote llms.txt URL (e.g., `https://langchain-ai.github.io/langgraph/llms.txt`), mcpdoc automatically adds only that specific domain (`langchain-ai.github.io`) to the allowed domains list. This means the tool can only fetch documentation from URLs on that domain.
+> 
+> 2. **Local llms.txt files**: When using a local file, NO domains are automatically added to the allowed list. You MUST explicitly specify which domains to allow using the `--allowed-domains` parameter.
+> 
+> 3. **Adding additional domains**: To allow fetching from domains beyond those automatically included:
+>    - Use `--allowed-domains domain1.com domain2.com` to add specific domains
+>    - Use `--allowed-domains '*'` to allow all domains (use with caution)
+> 
+> This security measure prevents unauthorized access to domains not explicitly approved by the user, ensuring that documentation can only be retrieved from trusted sources.

-   ```
-   > /mcp
-   ```
+#### (Optional) Test the MCP server locally with your `llms.txt` file(s) of choice:
+```bash
+uvx --from mcpdoc mcpdoc \
+    --urls "LangGraph:https://langchain-ai.github.io/langgraph/llms.txt" "LangChain:https://python.langchain.com/llms.txt" \
+    --transport sse \
+    --port 8082 \
+    --host localhost
+```

-5. Test it out! 
+* This should run at: http://localhost:8082

-   ```
-   > Write a langgraph application with two agents that debate the merits of taking a shower.
-   ```
- 
- 
-This MCP server was only configured with LangGraph documentation, but you can add more documentation sources by adding more `--urls` arguments or loading it from a JSON file or a YAML file.
+![Screenshot 2025-03-18 at 3 29 30 PM](https://github.com/user-attachments/assets/24a3d483-cd7a-4c7e-a4f7-893df70e888f)

+* Run [MCP inspector](https://modelcontextprotocol.io/docs/tools/inspector) and connect to the running server:
+```bash
+npx @modelcontextprotocol/inspector
+```

+![Screenshot 2025-03-18 at 3 30 30 PM](https://github.com/user-attachments/assets/14645d57-1b52-4a5e-abfe-8e7756772704)

+* Here, you can test the `tool` calls. 

+#### Connect to Cursor 

+* Open `Cursor Settings` and `MCP` tab.
+* This will open the `~/.cursor/mcp.json` file.

-### Command-line Interface
+![Screenshot 2025-03-19 at 11 01 31 AM](https://github.com/user-attachments/assets/3d1c8eb3-4d40-487f-8bad-3f9e660f770a)

-The `mcpdoc` command provides a simple CLI for launching the documentation server. You can specify documentation sources in three ways, and these can be combined:
+* Paste the following into the file (we use the `langgraph-docs-mcp` name and link to the LangGraph `llms.txt`).
+
+```
+{
+  "mcpServers": {
+    "langgraph-docs-mcp": {
+      "command": "uvx",
+      "args": [
+        "--from",
+        "mcpdoc",
+        "mcpdoc",
+        "--urls",
+        "LangGraph:https://langchain-ai.github.io/langgraph/llms.txt LangChain:https://python.langchain.com/llms.txt",
+        "--transport",
+        "stdio"
+      ]
+    }
+  }
+}
+```
+
+* Confirm that the server is running in your `Cursor Settings/MCP` tab.
+* Best practice is to then update Cursor Global (User) rules.
+* Open Cursor `Settings/Rules` and update `User Rules` with the following (or similar):
+
+```
+for ANY question about LangGraph, use the langgraph-docs-mcp server to help answer -- 
+ call list_doc_sources tool to get the available llms.txt file
+ call fetch_docs tool to read it
+ reflect on the urls in llms.txt 
+ reflect on the input question 
+ call fetch_docs on any urls relevant to the question
+ use this to answer the question
+```
+
+* `CMD+L` (on Mac) to open chat.
+* Ensure `agent` is selected. 
+
+![Screenshot 2025-03-18 at 1 56 54 PM](https://github.com/user-attachments/assets/0dd747d0-7ec0-43d2-b6ef-cdcf5a2a30bf)
+
+Then, try an example prompt, such as:
+```
+what are types of memory in LangGraph?
+```
+
+![Screenshot 2025-03-18 at 1 58 38 PM](https://github.com/user-attachments/assets/180966b5-ab03-4b78-8b5d-bab43f5954ed)
+
+### Connect to Windsurf
+
+* Open Cascade with `CMD+L` (on Mac).
+* Click `Configure MCP` to open the config file, `~/.codeium/windsurf/mcp_config.json`.
+* Update with `langgraph-docs-mcp` as noted above.
+
+![Screenshot 2025-03-19 at 11 02 52 AM](https://github.com/user-attachments/assets/d45b427c-1c1e-4602-820a-7161a310af24)
+
+* Update `Windsurf Rules/Global rules` with the following (or similar):
+
+```
+for ANY question about LangGraph, use the langgraph-docs-mcp server to help answer -- 
+ call list_doc_sources tool to get the available llms.txt file
+ call fetch_docs tool to read it
+ reflect on the urls in llms.txt 
+ reflect on the input question 
+ call fetch_docs on any urls relevant to the question
+```
+
+![Screenshot 2025-03-18 at 2 02 12 PM](https://github.com/user-attachments/assets/5a29bd6a-ad9a-4c4a-a4d5-262c914c5276)
+
+Then, try the example prompt:
+* It will perform your tool calls.
+
+![Screenshot 2025-03-18 at 2 03 07 PM](https://github.com/user-attachments/assets/0e24e1b2-dc94-4153-b4fa-495fd768125b)
+
+### Connect to Claude Desktop
+
+* Open `Settings/Developer` to update `~/Library/Application\ Support/Claude/claude_desktop_config.json`.
+* Update with `langgraph-docs-mcp` as noted above.
+* Restart Claude Desktop app.
+
+> [!Note]
+> If you run into issues with Python version incompatibility when trying to add MCPDoc tools to Claude Desktop, you can explicitly specify the filepath to `python` executable in the `uvx` command.
+>
+> <details>
+> <summary>Example configuration</summary>
+>
+> ```
+> {
+>   "mcpServers": {
+>     "langgraph-docs-mcp": {
+>       "command": "uvx",
+>       "args": [
+>         "--python",
+>         "/path/to/python",
+>         "--from",
+>         "mcpdoc",
+>         "mcpdoc",
+>         "--urls",
+>         "LangGraph:https://langchain-ai.github.io/langgraph/llms.txt",
+>         "--transport",
+>         "stdio"
+>       ]
+>     }
+>   }
+> }
+> ```
+> </details>
+
+> [!Note]
+> Currently (3/21/25) it appears that Claude Desktop does not support `rules` for global rules, so appending the following to your prompt.
+
+```
+<rules>
+for ANY question about LangGraph, use the langgraph-docs-mcp server to help answer -- 
+ call list_doc_sources tool to get the available llms.txt file
+ call fetch_docs tool to read it
+ reflect on the urls in llms.txt 
+ reflect on the input question 
+ call fetch_docs on any urls relevant to the question
+</rules>
+```
+
+![Screenshot 2025-03-18 at 2 05 54 PM](https://github.com/user-attachments/assets/228d96b6-8fb3-4385-8399-3e42fa08b128)
+
+* You will see your tools visible in the bottom right of your chat input.
+
+![Screenshot 2025-03-18 at 2 05 39 PM](https://github.com/user-attachments/assets/71f3c507-91b2-4fa7-9bd1-ac9cbed73cfb)
+
+Then, try the example prompt:
+
+* It will ask to approve tool calls as it processes your request.
+
+![Screenshot 2025-03-18 at 2 06 54 PM](https://github.com/user-attachments/assets/59b3a010-94fa-4a4d-b650-5cd449afeec0)
+
+### Connect to Claude Code
+
+* In a terminal after installing [Claude Code](https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/overview), run this command to add the MCP server to your project:
+```
+claude mcp add-json langgraph-docs '{"type":"stdio","command":"uvx" ,"args":["--from", "mcpdoc", "mcpdoc", "--urls", "langgraph:https://langchain-ai.github.io/langgraph/llms.txt", "LangChain:https://python.langchain.com/llms.txt"]}' -s local
+```
+* You will see `~/.claude.json` updated.
+* Test by launching Claude Code and running to view your tools:
+```
+$ Claude
+$ /mcp 
+```
+
+![Screenshot 2025-03-18 at 2 13 49 PM](https://github.com/user-attachments/assets/eb876a0e-27b4-480e-8c37-0f683f878616)
+
+> [!Note]
+> Currently (3/21/25) it appears that Claude Code does not support `rules` for global rules, so appending the following to your prompt.
+
+```
+<rules>
+for ANY question about LangGraph, use the langgraph-docs-mcp server to help answer -- 
+ call list_doc_sources tool to get the available llms.txt file
+ call fetch_docs tool to read it
+ reflect on the urls in llms.txt 
+ reflect on the input question 
+ call fetch_docs on any urls relevant to the question
+</rules>
+```
+
+Then, try the example prompt:
+
+* It will ask to approve tool calls.
+
+![Screenshot 2025-03-18 at 2 14 37 PM](https://github.com/user-attachments/assets/5b9a2938-ea69-4443-8d3b-09061faccad0)
+
+## Command-line Interface
+
+The `mcpdoc` command provides a simple CLI for launching the documentation server. 
+
+You can specify documentation sources in three ways, and these can be combined:

 1. Using a YAML config file:

+* This will load the LangGraph Python documentation from the `sample_config.yaml` file in this repo.
+
 ```bash
 mcpdoc --yaml sample_config.yaml
 ```

-This will load the LangGraph Python documentation from the sample_config.yaml file.
-
 2. Using a JSON config file:

+* This will load the LangGraph Python documentation from the `sample_config.json` file in this repo.
+
 ```bash
 mcpdoc --json sample_config.json
 ```

-This will load the LangGraph Python documentation from the sample_config.json file.
-
 3. Directly specifying llms.txt URLs with optional names:

-```bash
-mcpdoc --urls https://langchain-ai.github.io/langgraph/llms.txt LangGraph:https://langchain-ai.github.io/langgraph/llms.txt
-```
+* URLs can be specified either as plain URLs or with optional names using the format `name:url`.
+* You can specify multiple URLs by using the `--urls` parameter multiple times.
+* This is how we loaded `llms.txt` for the MCP server above.

-URLs can be specified either as plain URLs or with optional names using the format `name:url`.
+```bash
+mcpdoc --urls LangGraph:https://langchain-ai.github.io/langgraph/llms.txt --urls LangChain:https://python.langchain.com/llms.txt
+```

 You can also combine these methods to merge documentation sources:

 ```bash
-mcpdoc --yaml sample_config.yaml --json sample_config.json --urls https://langchain-ai.github.io/langgraph/llms.txt
+mcpdoc --yaml sample_config.yaml --json sample_config.json --urls LangGraph:https://langchain-ai.github.io/langgraph/llms.txt --urls LangChain:https://python.langchain.com/llms.txt
 ```

-### Additional Options
+## Additional Options

 - `--follow-redirects`: Follow HTTP redirects (defaults to False)
 - `--timeout SECONDS`: HTTP request timeout in seconds (defaults to 10.0)
@@ -106,11 +293,13 @@ mcpdoc --yaml sample_config.yaml --follow-redirects --timeout 15

 This will load the LangGraph Python documentation with a 15-second timeout and follow any HTTP redirects if necessary.

-### Configuration Format
+## Configuration Format

-Both YAML and JSON configuration files should contain a list of documentation sources. Each source must include an `llms_txt` URL and can optionally include a `name`:
+Both YAML and JSON configuration files should contain a list of documentation sources. 

-#### YAML Configuration Example (sample_config.yaml)
+Each source must include an `llms_txt` URL and can optionally include a `name`:
+
+### YAML Configuration Example (sample_config.yaml)

 ```yaml
 # Sample configuration for mcp-mcpdoc server
@@ -119,7 +308,7 @@ Both YAML and JSON configuration files should contain a list of documentation so
  llms_txt: https://langchain-ai.github.io/langgraph/llms.txt
 ```

-#### JSON Configuration Example (sample_config.json)
+### JSON Configuration Example (sample_config.json)

 ```json
 [
@@ -130,7 +319,7 @@ Both YAML and JSON configuration files should contain a list of documentation so
 ]
 ```

-### Programmatic Usage
+## Programmatic Usage

 ```python
 from mcpdoc.main import create_server
--- a/mcpdoc/cli.py
+++ b/mcpdoc/cli.py
@@ -25,6 +25,9 @@ Examples:
  # Directly specifying llms.txt URLs with optional names
  mcpdoc --urls LangGraph:https://langchain-ai.github.io/langgraph/llms.txt
  
+  # Using a local file (absolute or relative path)
+  mcpdoc --urls LocalDocs:/path/to/llms.txt --allowed-domains '*'
+  
  # Using a YAML config file
  mcpdoc --yaml sample_config.yaml

@@ -42,6 +45,12 @@ Examples:
  
  # Using SSE transport with additional HTTP options
  mcpdoc --yaml sample_config.yaml --follow-redirects --timeout 15 --transport sse --host localhost --port 8080
+  
+  # Allow fetching from additional domains. The domains hosting the llms.txt files are always allowed.
+  mcpdoc --yaml sample_config.yaml --allowed-domains https://example.com/ https://another-example.com/
+  
+  # Allow fetching from any domain
+  mcpdoc --yaml sample_config.yaml --allowed-domains '*'
 """


@@ -66,7 +75,7 @@ def parse_args() -> argparse.Namespace:
        "-u",
        type=str,
        nargs="+",
-        help="List of llms.txt URLs with optional names (format: 'url' or 'name:url')",
+        help="List of llms.txt URLs or file paths with optional names (format: 'url_or_path' or 'name:url_or_path')",
    )

    parser.add_argument(
@@ -74,6 +83,12 @@ def parse_args() -> argparse.Namespace:
        action="store_true",
        help="Whether to follow HTTP redirects",
    )
+    parser.add_argument(
+        "--allowed-domains",
+        type=str,
+        nargs="*",
+        help="Additional allowed domains to fetch documentation from. Use '*' to allow all domains.",
+    )
    parser.add_argument(
        "--timeout", type=float, default=10.0, help="HTTP request timeout in seconds"
    )
@@ -151,10 +166,11 @@ def load_config_file(file_path: str, file_format: str) -> List[Dict[str, str]]:


 def create_doc_sources_from_urls(urls: List[str]) -> List[DocSource]:
-    """Create doc sources from a list of URLs with optional names.
+    """Create doc sources from a list of URLs or file paths with optional names.

    Args:
-        urls: List of llms.txt URLs with optional names (format: 'url' or 'name:url')
+        urls: List of llms.txt URLs or file paths with optional names
+             (format: 'url_or_path' or 'name:url_or_path')

    Returns:
        List of DocSource objects
@@ -229,6 +245,7 @@ def main() -> None:
        follow_redirects=args.follow_redirects,
        timeout=args.timeout,
        settings=settings,
+        allowed_domains=args.allowed_domains,
    )

    if args.transport == "sse":
--- a/mcpdoc/main.py
+++ b/mcpdoc/main.py
@@ -1,6 +1,8 @@
 """MCP Llms-txt server for docs."""

-from urllib.parse import urlparse
+import os
+import re
+from urllib.parse import urlparse, urljoin

 import httpx
 from markdownify import markdownify
@@ -34,56 +36,255 @@ def extract_domain(url: str) -> str:
    return f"{parsed.scheme}://{parsed.netloc}/"


+def _is_http_or_https(url: str) -> bool:
+    """Check if the URL is an HTTP or HTTPS URL."""
+    return url.startswith(("http:", "https:"))
+
+
+def _get_fetch_description(has_local_sources: bool) -> str:
+    """Get fetch docs tool description."""
+    description = [
+        "Fetch and parse documentation from a given URL or local file.",
+        "",
+        "Use this tool after list_doc_sources to:",
+        "1. First fetch the llms.txt file from a documentation source",
+        "2. Analyze the URLs listed in the llms.txt file",
+        "3. Then fetch specific documentation pages relevant to the user's question",
+        "",
+    ]
+
+    if has_local_sources:
+        description.extend(
+            [
+                "Args:",
+                "    url: The URL or file path to fetch documentation from. Can be:",
+                "        - URL from an allowed domain",
+                "        - A local file path (absolute or relative)",
+                "        - A file:// URL (e.g., file:///path/to/llms.txt)",
+            ]
+        )
+    else:
+        description.extend(
+            [
+                "Args:",
+                "    url: The URL to fetch documentation from.",
+            ]
+        )
+
+    description.extend(
+        [
+            "",
+            "Returns:",
+            "    The fetched documentation content converted to markdown, or an error message",  # noqa: E501
+            "    if the request fails or the URL is not from an allowed domain.",
+        ]
+    )
+
+    return "\n".join(description)
+
+
+def _normalize_path(path: str) -> str:
+    """Accept paths in file:/// or relative format and map to absolute paths."""
+    return (
+        os.path.abspath(path[7:])
+        if path.startswith("file://")
+        else os.path.abspath(path)
+    )
+
+
+def _get_server_instructions(doc_sources: list[DocSource]) -> str:
+    """Generate server instructions with available documentation source names."""
+    # Extract source names from doc_sources
+    source_names = []
+    for entry in doc_sources:
+        if "name" in entry:
+            source_names.append(entry["name"])
+        elif _is_http_or_https(entry["llms_txt"]):
+            # Use domain name as fallback for HTTP sources
+            domain = extract_domain(entry["llms_txt"])
+            source_names.append(domain.rstrip("/").split("//")[-1])
+        else:
+            # Use filename as fallback for local sources
+            source_names.append(os.path.basename(entry["llms_txt"]))
+
+    instructions = [
+        "Use the list_doc_sources tool to see available documentation sources.",
+        "This tool will return a URL for each documentation source.",
+    ]
+
+    if source_names:
+        if len(source_names) == 1:
+            instructions.append(
+                f"Documentation URLs are available from this tool "
+                f"for {source_names[0]}."
+            )
+        else:
+            names_str = ", ".join(source_names[:-1]) + f", and {source_names[-1]}"
+            instructions.append(
+                f"Documentation URLs are available from this tool for {names_str}."
+            )
+
+    instructions.extend(
+        [
+            "",
+            "Once you have a source documentation URL, use the fetch_docs tool "
+            "to get the documentation contents. ",
+            "If the documentation contents contains a URL for additional documentation "
+            "that is relevant to your task, you can use the fetch_docs tool to "
+            "fetch documentation from that URL next.",
+        ]
+    )
+
+    return "\n".join(instructions)
+
+
 def create_server(
-    doc_source: list[DocSource],
+    doc_sources: list[DocSource],
    *,
    follow_redirects: bool = False,
    timeout: float = 10,
    settings: dict | None = None,
+    allowed_domains: list[str] | None = None,
 ) -> FastMCP:
-    """Create the server and generate tools."""
+    """Create the server and generate documentation retrieval tools.
+
+    Args:
+        doc_sources: List of documentation sources to make available
+        follow_redirects: Whether to follow HTTP redirects when fetching docs
+        timeout: HTTP request timeout in seconds
+        settings: Additional settings to pass to FastMCP
+        allowed_domains: Additional domains to allow fetching from.
+            Use ['*'] to allow all domains
+            The domain hosting the llms.txt file is always appended to the list
+            of allowed domains.
+
+    Returns:
+        A FastMCP server instance configured with documentation tools
+    """
+    settings = settings or {}
    server = FastMCP(
        name="llms-txt",
-        instructions=(
-            "Use the list doc sources tool to see available documentation "
-            "sources. Once you have a source, use fetch docs to get the "
-            "documentation"
-        ),
+        instructions=_get_server_instructions(doc_sources),
        **settings,
    )
    httpx_client = httpx.AsyncClient(follow_redirects=follow_redirects, timeout=timeout)

+    local_sources = []
+    remote_sources = []
+
+    for entry in doc_sources:
+        url = entry["llms_txt"]
+        if _is_http_or_https(url):
+            remote_sources.append(entry)
+        else:
+            local_sources.append(entry)
+
+    # Let's verify that all local sources exist
+    for entry in local_sources:
+        path = entry["llms_txt"]
+        abs_path = _normalize_path(path)
+        if not os.path.exists(abs_path):
+            raise FileNotFoundError(f"Local file not found: {abs_path}")
+
+    # Parse the domain names in the llms.txt URLs and identify local file paths
+    domains = set(extract_domain(entry["llms_txt"]) for entry in remote_sources)
+
+    # Add additional allowed domains if specified, or set to '*' if we have local files
+    if allowed_domains:
+        if "*" in allowed_domains:
+            domains = {"*"}  # Special marker for allowing all domains
+        else:
+            domains.update(allowed_domains)
+
+    allowed_local_files = set(
+        _normalize_path(entry["llms_txt"]) for entry in local_sources
+    )
+
    @server.tool()
    def list_doc_sources() -> str:
-        """List all available doc sources. Always use this first."""
+        """List all available documentation sources.
+
+        This is the first tool you should call in the documentation workflow.
+        It provides URLs to llms.txt files or local file paths that the user has made available.
+
+        Returns:
+            A string containing a formatted list of documentation sources with their URLs or file paths
+        """
        content = ""
-        for entry in doc_source:
-            name = entry.get("name", "") or extract_domain(entry["llms_txt"])
-            content += f"{name}\n"
-            content += "URL: " + entry["llms_txt"] + "\n\n"
+        for entry_ in doc_sources:
+            url_or_path = entry_["llms_txt"]
+
+            if _is_http_or_https(url_or_path):
+                name = entry_.get("name", extract_domain(url_or_path))
+                content += f"{name}\nURL: {url_or_path}\n\n"
+            else:
+                path = _normalize_path(url_or_path)
+                name = entry_.get("name", path)
+                content += f"{name}\nPath: {path}\n\n"
        return content

-    # Parse the domain names in the llms.txt URLs
-    allowed_domains = [extract_domain(entry["llms_txt"]) for entry in doc_source]
+    fetch_docs_description = _get_fetch_description(
+        has_local_sources=bool(local_sources)
+    )

-    @server.tool()
+    @server.tool(description=fetch_docs_description)
    async def fetch_docs(url: str) -> str:
-        """Use this to fetch documentation from a given URL.
+        nonlocal domains, follow_redirects
+        url = url.strip()
+        # Handle local file paths (either as file:// URLs or direct filesystem paths)
+        if not _is_http_or_https(url):
+            abs_path = _normalize_path(url)
+            if abs_path not in allowed_local_files:
+                raise ValueError(
+                    f"Local file not allowed: {abs_path}. Allowed files: {allowed_local_files}"
+                )
+            try:
+                with open(abs_path, "r", encoding="utf-8") as f:
+                    content = f.read()
+                return markdownify(content)
+            except Exception as e:
+                return f"Error reading local file: {str(e)}"
+        else:
+            # Otherwise treat as URL
+            if "*" not in domains and not any(
+                url.startswith(domain) for domain in domains
+            ):
+                return (
+                    "Error: URL not allowed. Must start with one of the following domains: "
+                    + ", ".join(domains)
+                )

-        Always use list doc sources before fetching documents.
-        """
-        nonlocal allowed_domains
-        if not any(url.startswith(domain) for domain in allowed_domains):
-            return (
-                "Error: URL not allowed. Must start with one of the following domains: "
-                + ", ".join(allowed_domains)
-            )
+            try:
+                response = await httpx_client.get(url, timeout=timeout)
+                response.raise_for_status()
+                content = response.text

-        try:
-            response = await httpx_client.get(url, timeout=timeout)
-            response.raise_for_status()
-            return markdownify(response.text)
-        except (httpx.HTTPStatusError, httpx.RequestError) as e:
-            return f"Encountered an HTTP error with code {e.response.status_code}"
+                if follow_redirects:
+                    # Check for meta refresh tag which indicates a client-side redirect
+                    match = re.search(
+                        r'<meta http-equiv="refresh" content="[^;]+;\s*url=([^"]+)"',
+                        content,
+                        re.IGNORECASE,
+                    )
+
+                    if match:
+                        redirect_url = match.group(1)
+                        new_url = urljoin(str(response.url), redirect_url)
+
+                        if "*" not in domains and not any(
+                            new_url.startswith(domain) for domain in domains
+                        ):
+                            return (
+                                "Error: Redirect URL not allowed. Must start with one of the following domains: "
+                                + ", ".join(domains)
+                            )
+
+                        response = await httpx_client.get(new_url, timeout=timeout)
+                        response.raise_for_status()
+                        content = response.text
+
+                return markdownify(content)
+            except (httpx.HTTPStatusError, httpx.RequestError) as e:
+                return f"Encountered an HTTP error: {str(e)}"

    return server
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,8 +1,9 @@
 [project]
 name = "mcpdoc"
-version = "0.0.4"
+version = "0.0.10"
 description = "Server llms-txt documentation over MCP"
 readme = "README.md"
+license = "MIT"
 requires-python = ">=3.10"
 dependencies = [
    "httpx>=0.28.1",
@@ -31,3 +32,18 @@ test = [
 requires = ["hatchling"]
 build-backend = "hatchling.build"

+[tool.pytest.ini_options]
+minversion = "8.0"
+# -ra: Report all extra test outcomes (passed, skipped, failed, etc.)
+# -q: Enable quiet mode for less cluttered output
+# -v: Enable verbose output to display detailed test names and statuses
+# --durations=5: Show the 10 slowest tests after the run (useful for performance tuning)
+addopts = "-ra -q -v --durations=5"
+testpaths = [
+    "tests",
+]
+python_files = ["test_*.py"]
+python_functions = ["test_*"]
+asyncio_mode = "auto"
+asyncio_default_fixture_loop_scope = "function"
+
--- a/tests/unit_tests/test_main.py
+++ b/tests/unit_tests/test_main.py
@@ -0,0 +1,71 @@
+"""Tests for mcpdoc.main module."""
+
+import pytest
+
+from mcpdoc.main import (
+    _get_fetch_description,
+    _is_http_or_https,
+    extract_domain,
+)
+
+
+def test_extract_domain() -> None:
+    """Test extract_domain function."""
+    # Test with https URL
+    assert extract_domain("https://example.com/page") == "https://example.com/"
+
+    # Test with http URL
+    assert extract_domain("http://test.org/docs/index.html") == "http://test.org/"
+
+    # Test with URL that has port
+    assert extract_domain("https://localhost:8080/api") == "https://localhost:8080/"
+
+    # Check trailing slash
+    assert extract_domain("https://localhost:8080") == "https://localhost:8080/"
+
+    # Test with URL that has subdomain
+    assert extract_domain("https://docs.python.org/3/") == "https://docs.python.org/"
+
+
+@pytest.mark.parametrize(
+    "url,expected",
+    [
+        ("http://example.com", True),
+        ("https://example.com", True),
+        ("/path/to/file.txt", False),
+        ("file:///path/to/file.txt", False),
+        (
+            "ftp://example.com",
+            False,
+        ),  # Not HTTP or HTTPS, even though it's not a local file
+    ],
+)
+def test_is_http_or_https(url, expected):
+    """Test _is_http_or_https function."""
+    assert _is_http_or_https(url) is expected
+
+
+@pytest.mark.parametrize(
+    "has_local_sources,expected_substrings",
+    [
+        (True, ["local file path", "file://"]),
+        (False, ["URL to fetch"]),
+    ],
+)
+def test_get_fetch_description(has_local_sources, expected_substrings):
+    """Test _get_fetch_description function."""
+    description = _get_fetch_description(has_local_sources)
+
+    # Common assertions for both cases
+    assert "Fetch and parse documentation" in description
+    assert "Returns:" in description
+
+    # Specific assertions based on has_local_sources
+    for substring in expected_substrings:
+        if has_local_sources:
+            assert substring in description
+        else:
+            # For the False case, we only check that "local file path"
+            # and "file://" are NOT present
+            if substring in ["local file path", "file://"]:
+                assert substring not in description
--- a/uv.lock
+++ b/uv.lock
@@ -259,7 +259,7 @@ cli = [

 [[package]]
 name = "mcpdoc"
-version = "0.0.3"
+version = "0.0.8"
 source = { editable = "." }
 dependencies = [
    { name = "httpx" },
Author	SHA1	Message	Date
Eugene Yurtsev	538df6d05c	Release 0.10.0 (#41 )	2025-07-22 16:23:31 -04:00
Aliyan Ishfaq	a429dc788b	Fix: handle client-side meta refresh redirects (#40 ) Fixes the "Redirecting..." response issue by adding support for HTML meta refresh redirects in `mcpdoc/main.py`. - Parses `<meta http-equiv="refresh">` tags to follow client-side redirects - Consistent with existing `--follow-redirects` flag behavior - Resolves cases where documentation sites use meta refresh instead of HTTP redirects Modified: `mcpdoc/main.py`	2025-07-22 16:22:39 -04:00
Eugene Yurtsev	d1db6319b9	Update README.md (#37 ) fix typo in readme with claude code	2025-07-07 17:12:08 -04:00
Eugene Yurtsev	74237e7714	Release 0.0.9 (#36 )	2025-07-07 17:03:56 -04:00
Eugene Yurtsev	b0f7a8e2ad	mcpdoc: update server description based on available tools (#35 )	2025-07-07 17:03:14 -04:00
Eugene Yurtsev	c9b45f098b	ci: configure pytest (#24 )	2025-04-05 13:52:21 -04:00
Larsen Weigle	3f859a3fc9	fix(mcpdoc): update readme cli example and mcp json. (#22 ) See[ this issue thread](https://github.com/langchain-ai/mcpdoc/issues/21). Update examples in the readme to match the arg parser in `cli.py` which is configured to append multiple urls: ```python parser.add_argument( "--urls", "-u", type=str, nargs="+", help="List of llms.txt URLs or file paths with optional names (format: 'url_or_path' or 'name:url_or_path')", ) ``` The current examples in the readme file uses multiple `--url` flags so the previous url is overridden with each new url flag.	2025-04-05 13:23:50 -04:00
Eugene Yurtsev	6a0d649d30	fix: settings propagation (#19 ) Fixes: https://github.com/langchain-ai/mcpdoc/issues/17	2025-03-31 11:43:56 -04:00
Eugene Yurtsev	35e5481ada	README: add multiple url examples (#16 )	2025-03-28 13:41:03 -04:00
Eugene Yurtsev	53479ff021	Update README.md (#15 )	2025-03-28 13:23:14 -04:00
Eugene Yurtsev	7e62344a91	docs: scale down image	2025-03-27 13:56:14 -04:00
Eugene Yurtsev	bac98dc41a	Update README.md	2025-03-27 13:53:37 -04:00
Eugene Yurtsev	a885a655cc	Release 0.0.7	2025-03-27 13:23:57 -04:00
Lance Martin	c2977b3602	Add local llms.txt file reading (#14 ) Add ability to read llms.txt from local files. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2025-03-27 10:22:42 -07:00
Eugene Yurtsev	1bc11f5ea1	update uv lock file (#13 )	2025-03-24 10:29:59 -04:00
Eugene Yurtsev	ef4d6b08ab	release 0.0.6 (#12 )	2025-03-24 10:18:49 -04:00
Eugene Yurtsev	a9e1b14d43	add allowed domains cli option (#11 )	2025-03-24 10:17:45 -04:00
Vadym Barda	71ddda1d09	use set for allowed domains (#9 )	2025-03-24 10:08:42 -04:00
Vadym Barda	bb3328b0c3	update config README (#10 )	2025-03-24 10:08:33 -04:00
Lance Martin	f7556c9bd6	Update README with rules	2025-03-21 13:29:08 -07:00
Lance Martin	921fe07dd0	release 0.0.5: Update / improve the tool descriptions (#8 ) Currently I add this workflow to Cursor. We should embed this in the tool itself. ``` use the langgraph-docs-mcp server to answer any LangGraph questions -- + call list_doc_sources tool to get the available llms.txt file + call fetch_docs tool to read it + reflect on the urls in llms.txt + reflect on the input question + call fetch_docs on any urls relevant to the question + use this to answer the question ``` --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2025-03-21 14:40:35 -04:00
Lance Martin	0e688fee9a	Merge branch 'main' of https://github.com/langchain-ai/mcpdoc	2025-03-19 15:03:10 -07:00
Lance Martin	677facfb64	Minor update	2025-03-19 15:02:45 -07:00
Lance Martin	465e69ffcb	Update README.md	2025-03-19 11:03:27 -07:00
Lance Martin	be13a215f4	Update	2025-03-19 11:00:35 -07:00
Lance Martin	6e2221fd5b	Update README.md	2025-03-18 15:45:21 -07:00
Lance Martin	19d45109c2	Minor update	2025-03-18 15:44:04 -07:00
Eugene Yurtsev	5d15b6c113	Update README.md (#6 )	2025-03-18 17:54:08 -04:00
Lance Martin	fd354128ce	Update README (#5 )	2025-03-18 17:50:25 -04:00