How to Download Polymarket Data

Updated March 2026 8 min read
TL;DR: There are three ways to get Polymarket data. The Polymarket REST API gives you market info and recent activity. For full historical trade data (865M+ events since 2022), you need either on-chain indexing or pre-built Parquet dumps.

Polymarket is the largest prediction market by volume, running on the Polygon blockchain. Whether you want to backtest a trading strategy, analyze whale behavior, or build a dashboard, you need the underlying data. This guide covers every way to get it, from free API calls to full historical dumps.

What Polymarket data exists

Polymarket trades happen on-chain through two smart contracts on Polygon: the CTF Exchange and the NegRisk Exchange. Every trade emits an OrderFilled event with the maker, taker, amounts, and token IDs.

Beyond fills, positions change through splits, merges, redemptions, and ERC-1155 transfers. A complete picture of any wallet's P&L requires tracking all of these events, not just trades.

Here's what's available:

DataRecordsSince
Order fills (trades)865M+2022
Wallet positions150M+2022
Splits & mergesMillions2022
Redemptions (payouts)Millions2022
Position conversionsMillions2023
Markets (metadata)Thousands2022

Option 1: Polymarket REST API (free, limited)

Polymarket runs a public API at https://clob.polymarket.com and a Gamma API at https://gamma-api.polymarket.com. These are useful for current market data but have significant limitations for historical analysis.

What you can get

Example: fetch active markets

# Get active markets
curl "https://gamma-api.polymarket.com/markets?active=true&limit=10" | python3 -m json.tool

# Get a specific market
curl "https://gamma-api.polymarket.com/markets?slug=will-trump-win-2024"

What you can't do

The API is designed for app integrations, not data analysis. It returns paginated JSON with fixed fields and no way to filter, aggregate, or query across the full dataset. Here are real examples of questions you simply cannot answer with it:

Not possible with the API

Get all trades in a date range. There's no from_date or to_date parameter. You can't say "give me every trade from October 2024" to analyze the US election period. You'd have to paginate through the entire history and filter client-side.

Not possible with the API

Find all positions larger than $10,000. There's no amount_gt filter. You can't ask "show me every wallet holding more than $10K on any market." The API has no concept of aggregated positions at all.

Not possible with the API

Filter by average entry price. Want to find wallets that bought YES above $0.80? The API doesn't track average cost basis. That requires replaying every fill, split, and merge for every wallet — something only an indexer can compute.

Not possible with the API

Rank wallets by realized P&L. "Who made the most money on the 2024 election?" requires computing P&L across fills, redemptions, and merges for every wallet on every related market. The API returns individual trades, not portfolio-level analytics.

Not possible with the API

Cross-market analysis. "What percentage of wallets that traded the Trump market also traded the Fed rate market?" Impossible — you can't join across markets, and there's no way to query by wallet across all markets at once.

With Parquet dumps, all of these become a single SQL query:

With Parquet dumps + DuckDB

Every example above is a one-liner. Filter by date, aggregate by wallet, compute P&L, join across markets — it's just SQL on local files.

-- Trades in October 2024 (election month)
SELECT * FROM 'order_filled_events/*.parquet'
WHERE timestamp BETWEEN '2024-10-01' AND '2024-10-31';

-- Wallets holding more than $10K on any market
SELECT user_address, token_id, amount / 1e6 as amount_usdc
FROM 'positions/positions_*.parquet'
WHERE amount > 10000000000
ORDER BY amount DESC;

-- Top 20 wallets by realized P&L
SELECT user_address, SUM(realized_pnl) / 1e6 as total_pnl
FROM 'positions/positions_*.parquet'
GROUP BY user_address
ORDER BY total_pnl DESC
LIMIT 20;

Other limitations

The Polymarket API works well for building a live dashboard or getting current prices. It does not work for backtesting, P&L analysis, or anything that requires the complete trade history.

Option 2: Index the blockchain yourself (free, complex)

All Polymarket data lives on Polygon. You can index it yourself by running a node or using an RPC provider to fetch event logs from the relevant contracts.

The contracts

ContractAddressEvents
CTF Exchange0x4bFb...982EOrderFilled
NegRisk Exchange0xC5d5...0f80aOrderFilled
ConditionalTokens0x4D97...0045Split, Merge, Redeem, Transfer
NegRisk Adapter0xd91E...5296Conversions

Example: fetch OrderFilled logs with Python

from web3 import Web3

w3 = Web3(Web3.HTTPProvider("https://polygon-rpc.com"))

# CTF Exchange OrderFilled topic
topic = "0xd0a08e8c493f9c94f29311604c9de1d4e1f89571..."

logs = w3.eth.get_logs({
    "fromBlock": 55000000,
    "toBlock": 55001000,
    "address": "0x4bFb41d5B3570DeFd03C39a9A4D8dE6Bd8B8982E",
    "topics": [topic]
})

print(f"Found {len(logs)} fills in 1000 blocks")

The challenge

This approach is free but comes with real engineering costs:

If you have the engineering resources and want full control, this is the way. Budget 2-4 weeks for a production-quality indexer.

Option 3: Pre-built Parquet dumps

The fastest path from zero to analysis. Instead of indexing the chain yourself, download the complete dataset as Parquet files and load them into whatever tool you use — Python, DuckDB, PostgreSQL, ClickHouse, or even Excel.

Example: query Parquet directly with DuckDB

-- Query local Parquet files with DuckDB
SELECT maker, SUM(maker_amount_filled) / 1e6 as volume_usdc
FROM 'order_filled_events/20250115.parquet'
GROUP BY maker
ORDER BY volume_usdc DESC
LIMIT 20;

Example: load into Python with pandas

import pandas as pd

# Read daily Parquet files
df = pd.read_parquet("order_filled_events/20250115.parquet")

# Top markets by trade count
df.groupby("taker_asset_id").size().sort_values(ascending=False).head(10)

What's in the dumps

TableRecordsDescription
order_filled_events865M+Every fill from both exchanges. Maker, taker, amounts in USDC.
positions150M+Daily snapshot of every wallet's position. Amount, avg price, realized P&L.
position_splitsMillionsWhen users split collateral into outcome tokens.
position_mergesMillionsWhen users merge outcome tokens back into collateral.
payout_redemptionsMillionsWinning payouts after market resolution.
position_conversionsMillionsNegRisk position conversions.
marketsThousandsMarket metadata: question, slug, condition ID, token IDs.

Files are split by day (e.g., order_filled_events/20250115.parquet) so you can download only the period you need.

All amounts are in raw on-chain units. Divide by 1e6 for USDC values. Token IDs are stored as strings to preserve uint256 precision.

Comparison: which option should you pick?

Polymarket APISelf-indexParquet dumps
CostFreeFree + infraFrom $49/mo
Historical dataLimitedFullFull (since 2022)
Setup timeMinutesWeeksMinutes
MaintenanceNoneOngoingNone
Positions / P&LNoYou build itPre-computed
Date filtersNoYou build itSQL WHERE clause
Cross-market joinsNoYou build itSQL JOIN
Real-timeYesYou build itPro plan: sub-second
Best forDashboards, current pricesFull control, custom logicBacktesting, research, analytics

Getting started with DuckDB (fastest path)

DuckDB can query Parquet files over HTTP with zero setup. No database to install, no data to download first. This is the fastest way to start exploring Polymarket data.

# Install DuckDB (macOS)
brew install duckdb

# Or with pip
pip install duckdb
-- Launch DuckDB and query directly
duckdb

-- Total USDC volume per month
SELECT
  strftime(timestamp, '%Y-%m') as month,
  round(SUM(maker_amount_filled) / 1e6, 2) as volume_usdc,
  COUNT(*) as trades
FROM 'order_filled_events/*.parquet'
GROUP BY month
ORDER BY month;

Ready to download Polymarket data?

865M+ trades, 150M+ positions. Parquet. Updated daily.

Get started

FAQ

Can I get Polymarket data for free?

Current market data, yes — through the Polymarket REST API. Full historical trade data requires either building your own indexer (free but weeks of engineering) or using a data provider.

How far back does the data go?

The CTF Exchange launched in 2022. The NegRisk Exchange launched later in 2023. The dumps include all events from both exchanges since deployment.

What format are the dumps in?

Everything is Parquet (ZSTD-compressed, columnar). Use DuckDB for the fastest experience — it reads Parquet natively over HTTP with zero setup. pandas, Spark, and ClickHouse also work out of the box.

How do I track a specific wallet's P&L?

You need to replay all fills, splits, merges, and redemptions for that wallet's token IDs. The positions table in the dumps has this pre-computed as a daily snapshot with avg_price and realized_pnl.

Is this the same data as Polymarket's API?

No. Polymarket's API serves market metadata and recent activity. The on-chain data includes every transaction that ever happened — including trades through aggregators, splits, merges, and direct transfers that never touch Polymarket's frontend.