ResearchTwin: Federated Agentic Web of Research Knowledge
ResearchTwin is an open-source, federated platform that transforms a researcher's publications, datasets, and code repositories into a conversational Digital Twin. Built on a Bimodal Glial-Neural Optimization (BGNO) architecture, it enables dual-discovery where both humans and AI agents collaborate to accelerate scientific discovery.
Live at researchtwin.net | Join the Network
Project Vision
The exponential growth of scientific outputs has created a "discovery bottleneck." Traditional static PDFs and siloed repositories limit knowledge synthesis and reuse. ResearchTwin addresses this by:
- Integrating multi-modal research artifacts from Semantic Scholar, Google Scholar, GitHub, and Figshare
- Computing a real-time S-Index metric (Quality × Impact × Collaboration) across all output types
- Providing a conversational chatbot interface for interactive research exploration
- Exposing an Inter-Agentic Discovery API with Schema.org types for machine-to-machine research discovery
- Enabling a federated, Discord-like architecture supporting local nodes, hubs, and hosted edges
Architecture Overview
BGNO (Bimodal Glial-Neural Optimization)
Data Sources Glial Layer Neural Layer Interface
┌──────────────┐ ┌─────────────┐ ┌──────────────┐ ┌────────────┐
│Semantic Scholar│───▶│ │ │ │ │ Web Chat │
│Google Scholar │───▶│ SQLite │───▶│ RAG with │───▶│ Discord │
│GitHub API │───▶│ Cache + │ │ Claude API │ │ Agent API │
│Figshare API │───▶│ Rate Limit │ │ │ │ Embed │
└──────────────┘ └─────────────┘ └──────────────┘ └────────────┘
- Connector Layer: Pulls papers (S2+GS with deduplication), repos (GitHub), datasets (Figshare), and ORCID metadata
- Glial Layer: SQLite caching with 24h TTL, rate limiting, S2+GS title-similarity merge (0.85 threshold)
- Neural Layer: RAG with Claude — context assembly, prompt engineering, conversational synthesis
- Interface Layer: D3.js knowledge graph, chat widget, Discord bot, REST API
Federated Network Tiers
| Tier | Name | Description | Status |
|---|---|---|---|
| Tier 1 | Local Nodes | Researchers run python run_node.py locally |
Live |
| Tier 2 | Hubs | Lab aggregators federating multiple nodes | Planned |
| Tier 3 | Hosted Edges | Cloud-hosted at researchtwin.net | Live |
Inter-Agentic Discovery API
Machine-readable endpoints with Schema.org @type annotations:
| Endpoint | Schema.org Type | Purpose |
|---|---|---|
GET /api/researcher/{slug}/profile |
Person |
Researcher profile with HATEOAS links |
GET /api/researcher/{slug}/papers |
ItemList of ScholarlyArticle |
Papers with citations |
GET /api/researcher/{slug}/datasets |
ItemList of Dataset |
Datasets with QIC scores |
GET /api/researcher/{slug}/repos |
ItemList of SoftwareSourceCode |
Repos with QIC scores |
GET /api/discover?q=keyword&type=paper |
SearchResultSet |
Cross-researcher search |
Getting Started
Hosted (Tier 3) — Zero Setup
- Visit researchtwin.net/join.html
- Register with your name, email, and research identifiers
- Your Digital Twin is live immediately
Local Node (Tier 1) — Full Control
git clone https://github.com/martinfrasch/researchtwin.git
cd researchtwin
pip install -r backend/requirements.txt
cp node_config.json.example node_config.json
# Edit node_config.json with your details
python run_node.py --config node_config.json
Docker Deployment
cp .env.example .env # Add your API keys
docker-compose up -d --build
Required API keys: ANTHROPIC_API_KEY (for Claude RAG)
Optional: S2_API_KEY, GITHUB_TOKEN, DISCORD_BOT_TOKEN, SMTP credentials
Repository Structure
researchtwin/
├── backend/
│ ├── main.py # FastAPI endpoints (REST + Discovery API)
│ ├── researchers.py # SQLite researcher CRUD + token management
│ ├── database.py # SQLite schema, WAL mode, migrations
│ ├── models.py # Pydantic models for all endpoints
│ ├── rag.py # RAG context assembly for Claude
│ ├── qic_index.py # S-Index / QIC computation engine
│ ├
Tools 5
get_researcher_profileRetrieves a researcher's profile and metadata.get_researcher_papersFetches a list of scholarly articles and citations for a researcher.get_researcher_datasetsRetrieves datasets associated with a researcher including QIC scores.get_researcher_reposFetches software source code repositories for a researcher.discover_researchPerforms a cross-researcher search for papers or datasets based on keywords.Environment Variables
ANTHROPIC_API_KEYrequiredAPI key for Claude RAG operationsS2_API_KEYAPI key for Semantic ScholarGITHUB_TOKENGitHub personal access tokenDISCORD_BOT_TOKENToken for Discord bot integration