Real-time access to over 200 million scientific papers from 6 academic sources.
Scientific Paper Harvester MCP Server
A comprehensive Model Context Protocol (MCP) server that provides LLMs with real-time access to scientific papers from 6 major academic sources: arXiv, OpenAlex, PMC (PubMed Central), Europe PMC, bioRxiv/medRxiv, and CORE.
🚀 Features
**Comprehensive Source Coverage**
- arXiv: Computer science, physics, mathematics preprints and papers
- OpenAlex: Open catalog of scholarly papers with citation data
- PMC: PubMed Central biomedical and life science literature
- Europe PMC: European life science literature database
- bioRxiv/medRxiv: Biology and medical preprint servers
- CORE: World's largest collection of open access research papers
**Advanced Capabilities**
- Paper Fetching: Get latest papers from any source by category/concept
- Paper Search: Search papers by title, abstract, author, or full-text across 4 major sources
- Full-Text Extraction: Extract complete text content with intelligent fallback strategies
- Citation Analysis: Find top cited papers from OpenAlex since a specific date
- Paper Lookup: Retrieve full metadata for specific papers by ID
- Category Discovery: Browse available categories from all sources
- Smart Rate Limiting: Respectful API usage with per-source rate limiting
- DOI Resolution: Advanced DOI resolver with Unpaywall → Crossref → Semantic Scholar fallback
- Dual Interface: Both MCP protocol and CLI access
- TypeScript: Full type safety with ESM modules
📊 Coverage Statistics
- Total Sources: 6 academic databases
- Category Coverage: 100+ categories across all disciplines
- Paper Access: 200M+ papers with intelligent text extraction
- Text Extraction Success: >90% for supported paper types
- Response Time: <15 seconds average for paper fetching
🛠 Installation
npm install
npm run build
📋 MCP Client Configuration
To use this server with an MCP client (like Claude Desktop), add the following to your MCP client configuration:
For published package (available on npm):
Option 1: Using npx (recommended for AI tools like Claude)
{
"mcpServers": {
"scientific-papers": {
"command": "npx",
"args": [
"-y",
"@futurelab-studio/latest-science-mcp@latest"
]
}
}
}
Option 2: Global installation
npm install -g @futurelab-studio/latest-science-mcp
Then configure:
{
"mcpServers": {
"scientific-papers": {
"command": "latest-science-mcp"
}
}
}
📖 Usage
CLI Interface
List Categories
# List arXiv categories
node dist/cli.js list-categories --source=arxiv
# List OpenAlex concepts
node dist/cli.js list-categories --source=openalex
# List PMC biomedical categories
node dist/cli.js list-categories --source=pmc
# List Europe PMC life science categories
node dist/cli.js list-categories --source=europepmc
# List bioRxiv/medRxiv categories (includes both servers)
node dist/cli.js list-categories --source=biorxiv
# List CORE academic categories
node dist/cli.js list-categories --source=core
Fetch Latest Papers
# Get latest AI papers from arXiv
node dist/cli.js fetch-latest --source=arxiv --category=cs.AI --count=10
# Get latest biology papers from bioRxiv
node dist/cli.js fetch-latest --source=biorxiv --category="biorxiv:biology" --count=5
# Get latest immunology papers from PMC
node dist/cli.js fetch-latest --source=pmc --category=immunology --count=3
# Get latest papers from CORE by subject
node dist/cli.js fetch-latest --source=core --category=computer_science --count=5
# Search by concept name (OpenAlex)
node dist/cli.js fetch-latest --source=openalex --category="machine learning" --count=3
Fetch Top Cited Papers
# Get top 20 cited papers in machine learning since 2024
node dist/cli.js fetch-top-cited --concept="machine learning" --since=2024-01-01 --count=20
# Get top cited papers by concept ID
node dist/cli.js fetch-top-cited --concept=C41008148 --since=2023-06-01 --count=10
Search Papers
# Search by keywords across all fields
node dist/cli.js search-papers --source=arxiv --query="machine learning" --count=10
# Search by paper title
node dist/cli.js search-papers --source=openalex --query="neural networks" --field=title --count=5
# Search by author name
node dist/cli.js search-papers --source=europepmc --query="John Smith" --field=author --count=10
# Search full-text content sorted by citations
node dist/cli.js search-papers --source=core --query="climate change" --field=fulltext --sortBy=citations --count=20
Fetch Specific Paper Content
# Get arXiv paper by ID
node dist/cli.js fetch-content --source=arxiv --id=2401.12345
# Get bioRxiv paper by DOI
node dist/cli.js fetch-content --source=biorxiv --id="10.1101/2021.01.01.425001"
# Get PMC paper by ID
node dist/cli.js fetch-content --source=pmc --id=PMC8245678
# Get
Tools (5)
list-categoriesList available categories or concepts from a specific academic source.fetch-latestFetch the latest papers from a source by category or concept.fetch-top-citedGet top cited papers from OpenAlex based on a concept and date.search-papersSearch for papers across sources by query, field, or citation count.fetch-contentRetrieve full metadata and content for a specific paper by ID or DOI.Configuration
{"mcpServers": {"scientific-papers": {"command": "npx", "args": ["-y", "@futurelab-studio/latest-science-mcp@latest"]}}}