How to Download Polymarket Data

Updated March 2026 8 min read

TL;DR: There are three ways to get Polymarket data. The Polymarket REST API gives you market info and recent activity. For full historical trade data (865M+ events since 2022), you need either on-chain indexing or pre-built Parquet dumps.

Polymarket is the largest prediction market by volume, running on the Polygon blockchain. Whether you want to backtest a trading strategy, analyze whale behavior, or build a dashboard, you need the underlying data. This guide covers every way to get it, from free API calls to full historical dumps.

What Polymarket data exists

Polymarket trades happen on-chain through two smart contracts on Polygon: the CTF Exchange and the NegRisk Exchange. Every trade emits an OrderFilled event with the maker, taker, amounts, and token IDs.

Beyond fills, positions change through splits, merges, redemptions, and ERC-1155 transfers. A complete picture of any wallet's P&L requires tracking all of these events, not just trades.

Here's what's available:

Data	Records	Since
Order fills (trades)	865M+	2022
Wallet positions	150M+	2022
Splits & merges	Millions	2022
Redemptions (payouts)	Millions	2022
Position conversions	Millions	2023
Markets (metadata)	Thousands	2022

Option 1: Polymarket REST API (free, limited)

Polymarket runs a public API at https://clob.polymarket.com and a Gamma API at https://gamma-api.polymarket.com. These are useful for current market data but have significant limitations for historical analysis.

What you can get

Current market listings, prices, and volumes
Recent trades (limited history)
Order book snapshots
Market metadata (question, description, resolution)

Example: fetch active markets

# Get active markets
curl "https://gamma-api.polymarket.com/markets?active=true&limit=10" | python3 -m json.tool

# Get a specific market
curl "https://gamma-api.polymarket.com/markets?slug=will-trump-win-2024"

What you can't do

The API is designed for app integrations, not data analysis. It returns paginated JSON with fixed fields and no way to filter, aggregate, or query across the full dataset. Here are real examples of questions you simply cannot answer with it:

Not possible with the API

Get all trades in a date range. There's no from_date or to_date parameter. You can't say "give me every trade from October 2024" to analyze the US election period. You'd have to paginate through the entire history and filter client-side.

Not possible with the API

Find all positions larger than $10,000. There's no amount_gt filter. You can't ask "show me every wallet holding more than $10K on any market." The API has no concept of aggregated positions at all.

Not possible with the API

Filter by average entry price. Want to find wallets that bought YES above $0.80? The API doesn't track average cost basis. That requires replaying every fill, split, and merge for every wallet — something only an indexer can compute.

Not possible with the API

Rank wallets by realized P&L. "Who made the most money on the 2024 election?" requires computing P&L across fills, redemptions, and merges for every wallet on every related market. The API returns individual trades, not portfolio-level analytics.

Not possible with the API

Cross-market analysis. "What percentage of wallets that traded the Trump market also traded the Fed rate market?" Impossible — you can't join across markets, and there's no way to query by wallet across all markets at once.

With Parquet dumps, all of these become a single SQL query:

With Parquet dumps + DuckDB

Every example above is a one-liner. Filter by date, aggregate by wallet, compute P&L, join across markets — it's just SQL on local files.

-- Trades in October 2024 (election month)
SELECT * FROM 'order_filled_events/*.parquet'
WHERE timestamp BETWEEN '2024-10-01' AND '2024-10-31';

-- Wallets holding more than $10K on any market
SELECT user_address, token_id, amount / 1e6 as amount_usdc
FROM 'positions/positions_*.parquet'
WHERE amount > 10000000000
ORDER BY amount DESC;

-- Top 20 wallets by realized P&L
SELECT user_address, SUM(realized_pnl) / 1e6 as total_pnl
FROM 'positions/positions_*.parquet'
GROUP BY user_address
ORDER BY total_pnl DESC
LIMIT 20;

Other limitations

Rate limits on high-volume queries
No splits, merges, or redemption data
API can change without notice

The Polymarket API works well for building a live dashboard or getting current prices. It does not work for backtesting, P&L analysis, or anything that requires the complete trade history.

Option 2: Index the blockchain yourself (free, complex)

All Polymarket data lives on Polygon. You can index it yourself by running a node or using an RPC provider to fetch event logs from the relevant contracts.

The contracts

Contract	Address	Events
CTF Exchange	`0x4bFb...982E`	OrderFilled
NegRisk Exchange	`0xC5d5...0f80a`	OrderFilled
ConditionalTokens	`0x4D97...0045`	Split, Merge, Redeem, Transfer
NegRisk Adapter	`0xd91E...5296`	Conversions

Example: fetch OrderFilled logs with Python

from web3 import Web3

w3 = Web3(Web3.HTTPProvider("https://polygon-rpc.com"))

# CTF Exchange OrderFilled topic
topic = "0xd0a08e8c493f9c94f29311604c9de1d4e1f89571..."

logs = w3.eth.get_logs({
    "fromBlock": 55000000,
    "toBlock": 55001000,
    "address": "0x4bFb41d5B3570DeFd03C39a9A4D8dE6Bd8B8982E",
    "topics": [topic]
})

print(f"Found {len(logs)} fills in 1000 blocks")

The challenge

This approach is free but comes with real engineering costs:

Scale: 865M+ events across 20M+ blocks. Fetching at 1000 blocks/request takes days.
Decoding: Each event type has a different ABI. OrderFilled alone has 10+ fields packed into the log data.
Position tracking: Computing a wallet's position requires replaying every fill, split, merge, redemption, and transfer in order.
Infrastructure: You need a database, an RPC provider with high rate limits, and error handling for reorgs and dropped connections.
Maintenance: Contracts get upgraded. New event types appear. Your indexer needs to keep up.

If you have the engineering resources and want full control, this is the way. Budget 2-4 weeks for a production-quality indexer.

Option 3: Pre-built Parquet dumps

The fastest path from zero to analysis. Instead of indexing the chain yourself, download the complete dataset as Parquet files and load them into whatever tool you use — Python, DuckDB, PostgreSQL, ClickHouse, or even Excel.

Example: query Parquet directly with DuckDB

-- Query local Parquet files with DuckDB
SELECT maker, SUM(maker_amount_filled) / 1e6 as volume_usdc
FROM 'order_filled_events/20250115.parquet'
GROUP BY maker
ORDER BY volume_usdc DESC
LIMIT 20;

Example: load into Python with pandas

import pandas as pd

# Read daily Parquet files
df = pd.read_parquet("order_filled_events/20250115.parquet")

# Top markets by trade count
df.groupby("taker_asset_id").size().sort_values(ascending=False).head(10)

What's in the dumps

Table	Records	Description
order_filled_events	865M+	Every fill from both exchanges. Maker, taker, amounts in USDC.
positions	150M+	Daily snapshot of every wallet's position. Amount, avg price, realized P&L.
position_splits	Millions	When users split collateral into outcome tokens.
position_merges	Millions	When users merge outcome tokens back into collateral.
payout_redemptions	Millions	Winning payouts after market resolution.
position_conversions	Millions	NegRisk position conversions.
markets	Thousands	Market metadata: question, slug, condition ID, token IDs.

Files are split by day (e.g., order_filled_events/20250115.parquet) so you can download only the period you need.

All amounts are in raw on-chain units. Divide by 1e6 for USDC values. Token IDs are stored as strings to preserve uint256 precision.

Comparison: which option should you pick?

	Polymarket API	Self-index	Parquet dumps
Cost	Free	Free + infra	From $49/mo
Historical data	Limited	Full	Full (since 2022)
Setup time	Minutes	Weeks	Minutes
Maintenance	None	Ongoing	None
Positions / P&L	No	You build it	Pre-computed
Date filters	No	You build it	SQL WHERE clause
Cross-market joins	No	You build it	SQL JOIN
Real-time	Yes	You build it	Pro plan: sub-second
Best for	Dashboards, current prices	Full control, custom logic	Backtesting, research, analytics

Getting started with DuckDB (fastest path)

DuckDB can query Parquet files over HTTP with zero setup. No database to install, no data to download first. This is the fastest way to start exploring Polymarket data.

# Install DuckDB (macOS)
brew install duckdb

# Or with pip
pip install duckdb

-- Launch DuckDB and query directly
duckdb

-- Total USDC volume per month
SELECT
  strftime(timestamp, '%Y-%m') as month,
  round(SUM(maker_amount_filled) / 1e6, 2) as volume_usdc,
  COUNT(*) as trades
FROM 'order_filled_events/*.parquet'
GROUP BY month
ORDER BY month;

Ready to download Polymarket data?

865M+ trades, 150M+ positions. Parquet. Updated daily.

Get started

FAQ

Can I get Polymarket data for free?

Current market data, yes — through the Polymarket REST API. Full historical trade data requires either building your own indexer (free but weeks of engineering) or using a data provider.

How far back does the data go?

The CTF Exchange launched in 2022. The NegRisk Exchange launched later in 2023. The dumps include all events from both exchanges since deployment.

What format are the dumps in?

Everything is Parquet (ZSTD-compressed, columnar). Use DuckDB for the fastest experience — it reads Parquet natively over HTTP with zero setup. pandas, Spark, and ClickHouse also work out of the box.

How do I track a specific wallet's P&L?

You need to replay all fills, splits, merges, and redemptions for that wallet's token IDs. The positions table in the dumps has this pre-computed as a daily snapshot with avg_price and realized_pnl.

Is this the same data as Polymarket's API?

No. Polymarket's API serves market metadata and recent activity. The on-chain data includes every transaction that ever happened — including trades through aggregators, splits, merges, and direct transfers that never touch Polymarket's frontend.