Docsmith MCP Server

Process Excel, Word, PDF, and PowerPoint documents with Python

README.md

docsmith-mcp

Python-powered document processing MCP with MCP Apps — Process Excel, Word, PDF, PowerPoint documents with ease using Python, and view them beautifully through an interactive MCP App.

Features

  • Excel: Read/write .xlsx files with sheet support and pagination
  • Word: Read/write .docx files with paragraph and table support
  • PDF: Read .pdf files with text extraction and pagination
  • PowerPoint: Read .pptx files with slide content extraction
  • Text Files: Read/write .txt, .csv, .md, .json, .yaml, .yml with pagination support
  • Run Python: Execute Python code for flexible file operations and data processing
  • MCP App: Beautiful React + Tailwind CSS app for viewing all document types
  • Flexible Reading Modes: Raw full read or paginated for large files
  • Powered by Pyodide: Runs in secure WebAssembly sandbox via code-runner-mcp

Quick Start

MCP Configuration

Add to your MCP client configuration (e.g., Claude Desktop, Cline, etc.):

Via npx (recommended):

{
  "mcpServers": {
    "docsmith": {
      "command": "npx",
      "args": ["-y", "docsmith-mcp"],
      "env": {
        "DOC_PAGE_SIZE": "100"
      }
    }
  }
}

Via global installation:

npm install -g docsmith-mcp
{
  "mcpServers": {
    "docsmith": {
      "command": "docsmith-mcp",
      "env": {
        "DOC_PAGE_SIZE": "100"
      }
    }
  }
}

Via local path:

{
  "mcpServers": {
    "docsmith": {
      "command": "node",
      "args": ["/path/to/docsmith-mcp/dist/index.js"]
    }
  }
}

Then use the read_document tool:

{
  "file_path": "/path/to/document.xlsx",
  "mode": "paginated",
  "page": 1,
  "page_size": 50
}

The MCP App will automatically open to display the document content beautifully.

Supported Formats

Format Extensions Read Write Notes
Excel .xlsx Multi-sheet support, pagination
Word .docx Paragraphs and tables
PDF .pdf Text extraction with pagination
PowerPoint .pptx Slide content extraction
CSV .csv -
Text .txt, .md Pagination support
JSON .json -
YAML .yaml, .yml -

Tools

read_document

Read document content with automatic format detection.

Parameters:

  • file_path (string, required): Path to the document
  • mode (string, optional): "paginated" or "raw" (default: "paginated")
  • page (number, optional): Page number for paginated mode (default: 1)
  • page_size (number, optional): Items per page (default: 100)
  • sheet_name (string, optional): Sheet name for Excel files

Example:

{
  "file_path": "/path/to/document.xlsx",
  "mode": "paginated",
  "page": 1,
  "page_size": 50,
  "sheet_name": "Sheet1"
}

write_document

Write document content.

Parameters:

  • file_path (string, required): Output path
  • format (string, required): "excel", "word", "csv", "txt", "json", "yaml"
  • data (array/object, required): Document content

Example:

{
  "file_path": "/path/to/output.xlsx",
  "format": "excel",
  "data": [
    ["Product", "Q1", "Q2"],
    ["Laptop", 100, 150],
    ["Mouse", 500, 600]
  ]
}

get_document_info

Get document metadata without reading full content.

Parameters:

  • file_path (string, required): Path to the document

Example:

{
  "file_path": "/path/to/document.pdf"
}

run_python

Execute Python code for flexible file operations, data processing, and custom tasks. Supports any file format and Python libraries.

Parameters:

  • code (string, required): Python code to execute
  • packages (object, optional): Package mappings (import_name -> pypi_name) for required dependencies
  • file_paths (array, optional): File paths that the code needs to access

Examples:

Read and process any file:

{
  "code": "import json\nwith open('/path/to/file.json') as f:\n    data = json.load(f)\n    result = len(data)\n    print(json.dumps({'count': result}))",
  "file_paths": ["/path/to/file.json"]
}

Batch rename files with regex:

{
  "code": "import os, re\nfolder = '/path/to/files'\nfor name in os.listdir(folder):\n    new_name = re.sub(r'old_', 'new_', name)\n    os.rename(os.path.join(folder, name), os.path.join(folder, new_name))\nprint(json.dumps({'success': True}))",
  "file_paths": ["/path/to/files"]
}

Process data with pandas:

{
  "code": "import pandas as pd\ndf = pd.read_csv('/path/to/data.csv')\nsummary = df.describe().to_dict()\nprint(json.dumps(summary))",
  "packages": {"pandas": "pandas"},
  "file_paths": ["/path/to/data.csv"]
}

Extract archive files:

{
  "code": "import zipfile, os\nwith zipfile.Zi

Tools 4

read_documentRead document content with automatic format detection.
write_documentWrite document content.
get_document_infoGet document metadata without reading full content.
run_pythonExecute Python code for flexible file operations, data processing, and custom tasks.

Environment Variables

DOC_PAGE_SIZEDefault number of items per page for paginated reading

Try it

Read the contents of my sales_report.xlsx file and summarize the data.
Extract the text from the document at /path/to/manual.pdf.
Use Python to rename all files in my documents folder that start with 'old_'.
Create a new Excel file at output.xlsx with the following data: [['Name', 'Age'], ['Alice', 30], ['Bob', 25]].
Get the metadata for the document at /path/to/project_specs.docx.

Frequently Asked Questions

What are the key features of Docsmith MCP?

Read and write Excel, Word, CSV, and text files. Extract text from PDF and PowerPoint documents. Execute custom Python code in a secure Pyodide WebAssembly sandbox. Paginated reading support for large files. Interactive MCP App for beautiful document viewing.

What can I use Docsmith MCP for?

Automating data extraction from large Excel reports. Batch processing and renaming of local files using Python scripts. Generating structured reports in Excel or Word format from AI analysis. Quickly previewing and summarizing PDF or PowerPoint content.

How do I install Docsmith MCP?

Install Docsmith MCP by running: npx -y docsmith-mcp

What MCP clients work with Docsmith MCP?

Docsmith MCP works with any MCP-compatible client including Claude Desktop, Claude Code, Cursor, and other editors with MCP support.

Turn this server into reusable context

Keep Docsmith MCP docs, env vars, and workflow notes in Conare so your agent carries them across sessions.

Open Conare