HF Dataset MCP Server

1

Add it to Claude Code

Run this in a terminal.

Run in terminal
claude mcp add hf-dataset -- npx -y @cfahlgren1/hf-dataset-mcp
README.md

MCP server for the Hugging Face Dataset Viewer API

HF Dataset MCP

MCP server for the Hugging Face Dataset Viewer API. Search datasets, fetch rows, filter data, and more.

Installation

npx @cfahlgren1/hf-dataset-mcp

Configuration

Claude Desktop

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "hf-datasets": {
      "command": "npx",
      "args": ["-y", "@cfahlgren1/hf-dataset-mcp"],
      "env": {
        "HF_TOKEN": "hf_..."
      }
    }
  }
}

Environment Variables

Variable Description
HF_TOKEN Hugging Face API token (required for private/gated datasets)
HF_DATASETS_SERVER Custom Dataset Viewer API URL (default: https://datasets-server.huggingface.co)

Tools

`search_datasets`

Find datasets on the Hugging Face Hub by name, tag, or author.

search_datasets(search?: string, author?: string, filter?: string[], sort?: string, limit?: number)

`validate_dataset`

Check if a dataset is accessible and which viewer features are available.

validate_dataset(dataset: string)

`list_splits`

Get all available configurations and splits for a dataset.

list_splits(dataset: string)

`get_dataset_info`

Get the schema, metadata, and row counts for a dataset configuration.

get_dataset_info(dataset: string, config: string)

`get_rows`

Fetch a slice of rows from a dataset split.

get_rows(dataset: string, config: string, split: string, offset?: number, length?: number)

`search_dataset`

Full-text search within a dataset split using BM25 ranking.

search_dataset(dataset: string, config: string, split: string, query: string, offset?: number, length?: number)

`filter_rows`

Filter dataset rows using SQL-like WHERE conditions.

filter_rows(dataset: string, config: string, split: string, where: string, orderby?: string, offset?: number, length?: number)

WHERE syntax: Column names in double quotes, strings in single quotes. Supports =, <>, >, <, >=, <=, AND, OR, NOT.

Example: "label"=1 AND "text" LIKE '%hello%'

`get_dataset_size`

Get row counts and byte sizes for all configs and splits.

get_dataset_size(dataset: string)

`list_parquet_files`

Get URLs for the dataset's Parquet files for direct download or processing.

list_parquet_files(dataset: string)

`get_statistics`

Get descriptive statistics for each column in a dataset split.

get_statistics(dataset: string, config: string, split: string)

Examples

Find text classification datasets

search_datasets(filter: ["task_categories:text-classification"], sort: "downloads", limit: 10)

Get IMDB dataset info

list_splits(dataset: "stanfordnlp/imdb")
get_dataset_info(dataset: "stanfordnlp/imdb", config: "plain_text")

Fetch rows from a dataset

get_rows(dataset: "stanfordnlp/imdb", config: "plain_text", split: "train", offset: 0, length: 10)

Search for specific content

search_dataset(dataset: "stanfordnlp/imdb", config: "plain_text", split: "train", query: "amazing movie")

Filter rows

filter_rows(dataset: "stanfordnlp/imdb", config: "plain_text", split: "train", where: "\"label\"=1", length: 10)

License

MIT

Tools (10)

search_datasetsFind datasets on the Hugging Face Hub by name, tag, or author.
validate_datasetCheck if a dataset is accessible and which viewer features are available.
list_splitsGet all available configurations and splits for a dataset.
get_dataset_infoGet the schema, metadata, and row counts for a dataset configuration.
get_rowsFetch a slice of rows from a dataset split.
search_datasetFull-text search within a dataset split using BM25 ranking.
filter_rowsFilter dataset rows using SQL-like WHERE conditions.
get_dataset_sizeGet row counts and byte sizes for all configs and splits.
list_parquet_filesGet URLs for the dataset's Parquet files for direct download or processing.
get_statisticsGet descriptive statistics for each column in a dataset split.

Environment Variables

HF_TOKENHugging Face API token (required for private/gated datasets)
HF_DATASETS_SERVERCustom Dataset Viewer API URL

Configuration

claude_desktop_config.json
{"mcpServers": {"hf-datasets": {"command": "npx", "args": ["-y", "@cfahlgren1/hf-dataset-mcp"], "env": {"HF_TOKEN": "hf_..."}}}}

Try it

Find the top 5 most downloaded text classification datasets on Hugging Face.
Get the schema and row count for the plain_text configuration of the stanfordnlp/imdb dataset.
Fetch the first 10 rows from the training split of the stanfordnlp/imdb dataset.
Search for rows containing 'amazing movie' in the stanfordnlp/imdb dataset.
Filter the IMDB dataset to show rows where the label is 1.

Frequently Asked Questions

What are the key features of HF Dataset?

Search and discover datasets on the Hugging Face Hub. Fetch raw data rows from specific dataset splits. Perform full-text search within datasets using BM25. Filter dataset rows using SQL-like syntax. Retrieve dataset statistics and schema information.

What can I use HF Dataset for?

Quickly exploring dataset contents without downloading large files. Programmatically filtering and searching through public ML datasets. Validating dataset accessibility and viewer feature support. Retrieving direct download links for Parquet files.

How do I install HF Dataset?

Install HF Dataset by running: npx @cfahlgren1/hf-dataset-mcp

What MCP clients work with HF Dataset?

HF Dataset works with any MCP-compatible client including Claude Desktop, Claude Code, Cursor, and other editors with MCP support.

Turn this server into reusable context

Keep HF Dataset docs, env vars, and workflow notes in Conare so your agent carries them across sessions.

Need the old visual installer? Open Conare IDE.
Open Conare