# predmktdata > On-chain Polymarket data feed. Parquet dumps + real-time API. Full history + daily updates. predmktdata indexes every Polymarket event on Polygon and serves it as Parquet. Two plans: - **Snapshot ($49/mo)**: Full history since 2022 as daily Parquet dumps. 865M+ fills, 150M+ positions, markets lookup. End-of-day updates (05:00 UTC). No real-time API, no per-wallet queries. 500 dump downloads/day. - **Pro ($149/mo)**: Everything in Snapshot + real-time API (GET /events, GET /user/*) + WebSocket feed (WS /ws/feed) + firehose stream (WS /ws/stream). Poll for new data every second, subscribe to WebSocket for push notifications, or stream all events to keep your own database in sync. Freshness: <1 second (indexes at the chain tip with automatic reorg recovery). 1,000 dump downloads/day. Pending users (signed up but not subscribed) can access sample data only (50 dump downloads/day). All paid plans include: full history, positions with avg_price and realized_pnl, markets lookup, Parquet format. ## Authentication All endpoints require `x-api-key` header. Get a key by signing in with Google at predmktdata.com. To check your plan, try GET /dumps — if you get 403, you need Snapshot or Pro. If GET /events returns 403, you need Pro. ## API **Base URL: https://api.predmktdata.com** All endpoints below are relative to this base URL. For example, GET /status means GET https://api.predmktdata.com/status. ### GET /health Unauthenticated health check for Docker/load balancer probes. Response: ```json {"ok": true, "block": 84570000} ``` ### GET /status Returns current indexer state and row counts (JSON). Response: ```json { "last_block": 84570000, "head_block": 84570000, "tables": { "order_filled_events": 854000000, "positions": 150000000, "payout_redemptions": 93000000, "position_splits": 7700000, "position_merges": 7800000, "position_conversions": 1800000 } } ``` ### GET /events (Pro only) Fetch events as CSV. One table per request. Supports gzip and zstd compression. Parameters: - `after_block` (required): Return events after this block number - `limit` (default: 5000, max: 5000): Max blocks to include - `tables` (required): Exactly one table name per request Available tables: order_filled_events, position_splits, position_merges, payout_redemptions, position_conversions, positions Note: Only one table can be requested per call. To sync multiple tables, make separate requests for each. Response headers: - `x-after-block`: The after_block you requested - `x-last-block`: Last block included in this response (use as next after_block) - `x-head-block`: Current chain head - `x-reorg`: Present if client is ahead of indexer (rollback to this block) Response body: CSV with header row. Example: ``` curl --compressed -H "x-api-key: YOUR_KEY" \ "https://api.predmktdata.com/events?after_block=84420000&limit=500&tables=order_filled_events" ``` ### GET /user/{address}/positions (Pro only) All current positions for a wallet address, ordered by most recent activity. CSV response. Includes market metadata (question, outcome, market_slug) via automatic join with the markets table — empty for tokens not in markets. Parameters: - `limit` (default: 10000, max: 10000): Max rows to return - `offset` (default: 0): Skip this many rows (for pagination) - `min_amount` (default: 0): Minimum position amount (raw units). Use `min_amount=1` to get only active (non-zero) positions — filters out closed positions and can dramatically reduce response size and latency for wallets with large history. Columns: user_address, token_id, amount, avg_price, realized_pnl, unrealized_pnl, total_pnl, total_bought, last_block, block_timestamp, question, outcome, market_slug unrealized_pnl and total_pnl are computed from current market prices, which are updated every minute via marketsync. unrealized_pnl is the mark-to-market gain/loss on open positions. total_pnl = realized_pnl + unrealized_pnl. Example: ``` # All positions (including closed) curl --compressed -H "x-api-key: YOUR_KEY" \ "https://api.predmktdata.com/user/0xabc.../positions" # Only active positions (much faster for wallets with large history) curl --compressed -H "x-api-key: YOUR_KEY" \ "https://api.predmktdata.com/user/0xabc.../positions?min_amount=1" # Paginate curl --compressed -H "x-api-key: YOUR_KEY" \ "https://api.predmktdata.com/user/0xabc.../positions?limit=1000&offset=0" ``` ### GET /user/{address}/pnl (Pro only) Current PnL summary across all positions for a wallet address. realized_pnl is locked-in profit/loss from closed trades. unrealized_pnl is mark-to-market on open positions using live market prices (updated every minute via marketsync). portfolio_value is the current market value of all open positions. All values in USD (raw amounts divided by 1e6). Response (JSON): ```json { "address": "0x...", "realized_pnl": 12345.67, "unrealized_pnl": -234.56, "total_pnl": 12111.11, "portfolio_value": 5678.90, "total_volume": 98765.43, "active_positions": 42, "markets_traded": 156 } ``` Example: ``` curl --compressed -H "x-api-key: YOUR_KEY" \ "https://api.predmktdata.com/user/0xabc.../pnl" ``` ### GET /user/{address}/fills (Pro only) All fills where address is maker or taker. CSV response. Includes market metadata. Results are not ordered — use after_block to paginate chronologically. Columns: transaction_hash, log_index, block_number, exchange, maker, taker, maker_asset_id, taker_asset_id, maker_amount_filled, taker_amount_filled, fee, side, block_timestamp, question, outcome, market_slug Parameters: - `after_block` (default: 0): Only fills after this block - `limit` (default: 50000, max: 500000): Max rows ### WS /ws/feed (Pro only) Real-time WebSocket feed of fills and position changes. Connect and subscribe to specific wallets or thresholds — data pushes to you the moment the indexer commits a batch. Connect: `wss://api.predmktdata.com/ws/feed?x_api_key=YOUR_KEY` Subscribe messages (send as JSON after connecting): ```json // Watch a wallet's fills and positions {"action": "subscribe", "type": "user", "address": "0xabc..."} // Watch large fills (raw units, ÷1e6 for USDC) {"action": "subscribe", "type": "threshold", "min_fill_amount": 10000000000} // Watch cheap positions (low avg_price) {"action": "subscribe", "type": "threshold", "max_avg_price": 50000} // Watch large positions (high amount) {"action": "subscribe", "type": "threshold", "min_position_amount": 1000000000} // Unsubscribe {"action": "unsubscribe", "type": "user", "address": "0xabc..."} // Keepalive {"action": "ping"} ``` Messages received: ```json { "fills": [{"transaction_hash": "0x...", "maker": "0x...", ...}], "positions": [{"user_address": "0x...", "amount": 5000000, ...}] } ``` Limits: 100 total connections, 5 per API key, 20 user subscriptions per connection. Values are raw i64 (same as REST API — divide by 1e6). ### WS /ws/stream (Pro only) Firehose WebSocket for keeping your own database in sync. Connects, catches up from a recent block, then streams all new events and positions as the indexer commits them. Connect: `wss://api.predmktdata.com/ws/stream?x_api_key=YOUR_KEY&start_block=84650000&tables=order_filled_events,positions` Parameters (query string): - `x_api_key` (required): Your API key - `start_block` (required): Block to start from (max ~900 blocks / 30 min behind head) - `tables` (optional): Comma-separated list of tables to stream. Default: all. Valid: order_filled_events, payout_redemptions, position_conversions, position_merges, position_splits, positions Protocol: 1. Server sends catchup batches (100 blocks each) from start_block to head 2. Server sends `{"type": "caught_up", "block": N}` when catchup is complete 3. Server streams live batches as the indexer commits new blocks 4. Client can send `{"action": "ping"}` to get `{"pong": true}` Each batch message: ```json { "type": "batch", "from_block": 84650000, "to_block": 84650100, "order_filled_events": [{"transaction_hash": "0x...", "maker": "0x...", ...}], "positions": [{"user_address": "0x...", "amount": 5000000, ...}] } ``` Only the tables you subscribed to are included. Empty arrays mean no events in that block range for that table. Close codes: - 4001: Unauthorized (missing or invalid API key) - 4003: Pro plan required - 4008: Slow consumer (client can't keep up — reconnect with a newer start_block) - 4029: Connection limit reached - 4400: start_block too old or invalid tables Limits: 10 concurrent stream connections, shared with /ws/feed pool (100 total, 5 per API key). Recommended usage: Use dumps for initial backfill, then connect /ws/stream with start_block = your last processed block to stay in sync. On disconnect, reconnect with the last block you received. ### GET /dumps (Snapshot + Pro) List available dump files for fast backfill. Response (JSON): ```json { "files": [ {"path": "order_filled_events/20250615.parquet", "size": 52428800}, {"path": "positions/positions_20260320.parquet", "size": 8033986444} ], "base_url": "/dumps/" } ``` ### GET /dumps/{path} (Snapshot + Pro) Download a dump file. Returns a 302 redirect to a time-limited download URL (20 min expiry). Dump file types: - **Daily events**: `/YYYYMMDD.parquet` — one file per day, full history since 2022 - **Positions**: `positions/positions_YYYYMMDD_.parquet` — full daily snapshot, 2 copies kept - **Markets**: `markets/markets.parquet` — token_id → question/outcome lookup, overwritten daily - **UMA resolution**: `uma/resolution_proposals.parquet`, `uma/resolution_disputes.parquet`, `uma/resolution_settlements.parquet` — full snapshots, overwritten daily ## When to use dumps vs the API Use **dumps** when you need bulk historical data. Dumps are Parquet files (one per day per table). They are the fastest way to backfill — downloading all history via dumps takes minutes, while syncing the same data through the API would take hours due to rate limits and pagination. Dumps are updated daily at 05:00 UTC. Use the **real-time API** when you need fresh data (seconds-old), per-wallet queries, or streaming updates. The API serves incremental chunks — you poll /events with after_block to get only what changed since your last request, or connect to the WebSocket for push notifications. **Most Pro users should combine both**: dumps for the initial backfill, then the API to stay current in real-time. | Use case | Best approach | Plan needed | |----------|--------------|-------------| | Load all Polymarket history into my database | Dumps | Snapshot | | Daily batch analytics (end-of-day) | Dumps | Snapshot | | Real-time trading bot or alert system | API + WebSocket | Pro | | One-time research / data export | Dumps | Snapshot | | Live dashboard tracking specific wallets | API (/user/*) + WebSocket | Pro | | Backfill + stay current | Dumps first, then API polling | Pro | ## Working with dump files Dumps are Parquet files. DuckDB is the recommended tool for querying them. ### Parquet Parquet files have proper column types built in (VARCHAR for addresses and token IDs, BIGINT for amounts). No manual type casting needed: ```sql -- Query a local Parquet file SELECT * FROM read_parquet('order_filled_events/20260315.parquet') WHERE taker = '0xabc...' LIMIT 100; -- Load all daily files CREATE TABLE fills AS SELECT * FROM read_parquet('order_filled_events/*.parquet'); -- Join with markets SELECT f.*, m.question, m.outcome_yes FROM read_parquet('order_filled_events/20260322.parquet') f LEFT JOIN read_parquet('markets/markets.parquet') m ON f.taker_asset_id = m.yes_token_id OR f.taker_asset_id = m.no_token_id LIMIT 10; ``` ### Remote querying (Parquet only, zero download) Parquet supports HTTP range requests — DuckDB reads only the columns and row groups needed for your query. A filtered query on a 2GB file might transfer only 5-50MB. To query remotely, get a presigned URL from the API, then pass it to DuckDB: ```python import requests, duckdb # Step 1: get presigned URL (valid 20 min) r = requests.get( "https://api.predmktdata.com/dumps/order_filled_events/20260315.parquet", headers={"x-api-key": "YOUR_KEY"}, allow_redirects=False, ) url = r.headers["Location"] # Step 2: query remotely — DuckDB only downloads what it needs duckdb.sql(f""" SELECT taker, count(*) as fills, sum(maker_amount_filled) / 1e6 as volume_usdc FROM read_parquet('{url}') WHERE maker_amount_filled > 1000000000 GROUP BY 1 ORDER BY 3 DESC LIMIT 20 """).show() ``` From the DuckDB CLI: ```sql INSTALL httpfs; LOAD httpfs; SELECT count(*) FROM read_parquet('PRESIGNED_URL_HERE'); ``` DuckDB works from Python (`pip install duckdb`), CLI, or as a library in most languages. ## Recommended setup ### Snapshot plan: Parquet dumps only 1. GET /dumps to list available files 2. Download all daily event files (`
/YYYYMMDD.parquet`) in parallel 3. Download the latest positions snapshot (`positions/positions_YYYYMMDD_.parquet`) 4. Query directly with DuckDB, or INSERT into your database and UPSERT positions 5. Run daily: download new daily files + fresh positions snapshot This gives you all of Polymarket history, updated to yesterday. No API polling needed. ### Pro plan: dumps + real-time API 1. Do the Snapshot setup above for fast backfill 2. Find MAX(last_block) from the positions Parquet — this is your sync cursor 3. Poll GET /events?after_block={max_last_block}&tables=order_filled_events to sync the gap 4. Do the same for positions and any other tables you need 5. Continue polling every few seconds for near real-time updates Note: The positions snapshot is not atomic — different rows may have different last_block values. Use MAX(last_block) as your sync cursor. The API will send any updates you missed, and your UPSERT will reconcile them. ### API-only sync (Pro, no dumps) 1. Start with after_block=0 (or any starting block) 2. Request /events?after_block=0&limit=5000&tables=order_filled_events 3. Read `x-last-block` response header — use it as after_block for the next request 4. Repeat until `x-last-block` == `x-head-block` (caught up) 5. Continue polling every few seconds for near real-time updates 6. Repeat for each table you need (positions, position_splits, etc.) IMPORTANT: Event tables are append-only — use INSERT. Positions are mutable — the same (user_address, token_id) may appear in multiple chunks. Always use UPSERT (INSERT ... ON CONFLICT (user_address, token_id) DO UPDATE) for positions. ## Data schema ### order_filled_events Exchange fills from CTF Exchange and NegRisk Exchange. Columns: timestamp, transaction_hash, block_number, maker, taker, maker_asset_id, taker_asset_id, maker_amount_filled, taker_amount_filled, fee, side (buy|sell) ### position_splits USDC to outcome token splits from ConditionalTokens contract. Columns: timestamp, transaction_hash, block_number, stakeholder, condition_id, amount ### position_merges Outcome tokens to USDC merges from ConditionalTokens contract. Columns: timestamp, transaction_hash, block_number, stakeholder, condition_id, amount ### payout_redemptions Post-resolution payouts from ConditionalTokens contract. Columns: timestamp, transaction_hash, block_number, redeemer, condition_id, payout ### position_conversions NegRisk position conversions from NegRisk Adapter. Columns: timestamp, transaction_hash, block_number, stakeholder, condition_id, index_set, amount ### positions Computed current state per (user, token_id). Derived from all event types. Columns: user_address, token_id, amount, avg_price, realized_pnl, total_bought, last_block The /events API and /user endpoints add `block_timestamp`. The /user/*/positions endpoint also adds `unrealized_pnl` and `total_pnl` (computed from live market prices). ### UMA resolution data (uma/ folder) Market resolution events from UMA OptimisticOracleV2. Track who proposes outcomes, who disputes, and final settlements. Updated daily. **resolution_proposals** — Who proposed each market's outcome. Columns: transaction_hash, block_number, requester, proposer, identifier, request_timestamp, ancillary_data (question text), proposed_price (1e18=YES, 0=NO), expiration_timestamp, currency **resolution_disputes** — Who disputed a proposed outcome. Columns: transaction_hash, block_number, requester, proposer, disputer, identifier, request_timestamp, ancillary_data, proposed_price **resolution_settlements** — Final outcome and bond payouts. Columns: transaction_hash, block_number, requester, proposer, disputer (0x0 if undisputed), identifier, request_timestamp, ancillary_data, settled_price, payout Dump paths: `uma/resolution_proposals.parquet`, `uma/resolution_disputes.parquet`, `uma/resolution_settlements.parquet` ### markets (lookup table) Maps token IDs to human-readable market info. Join with event or position tables on yes_token_id / no_token_id to see which question and outcome a trade or position belongs to. Overwritten daily. Columns: - condition_id — unique identifier for the market condition - question — the market question (e.g. "Will Bitcoin hit $100k?") - outcome_yes / outcome_no — human-readable labels (e.g. "Yes"/"No", or "Bitcoin"/"Ethereum") - yes_token_id / no_token_id — token IDs for each outcome (use these to join with events and positions) - market_slug — URL slug on polymarket.com - end_date_iso — market end date (ISO 8601) - neg_risk — true if market uses the NegRisk exchange (multi-outcome markets) To look up which market a trade belongs to, join the event's token_id (or taker_asset_id for fills) against yes_token_id or no_token_id. ## Compression The API supports two compression algorithms via Accept-Encoding header: - `gzip` — universally supported - `zstd` — ~30% smaller, faster compression/decompression Use `--compressed` with curl for automatic gzip, or send `Accept-Encoding: zstd` for zstd. ## Error responses All errors return JSON with a `detail` field. | Status | Meaning | Example | |--------|---------|---------| | 401 | Missing or invalid API key | `{"detail": "Unauthorized"}` | | 403 | Key valid but plan doesn't include this endpoint | `{"detail": "Pro plan required"}` | | 400 | Bad request (invalid table name, etc.) | `{"detail": "Specify exactly one table per request."}` | | 422 | Missing required parameter | `{"detail": [{"type": "missing", "loc": ["query", "after_block"], "msg": "Field required"}]}` | | 429 | Rate limit exceeded (Pro: 5 req/s) | `{"detail": "Rate limit exceeded"}` | | 500 | Internal server error | `{"detail": "Internal server error"}` | ## Rate limits - Pro plan: 5 requests/second per API key - Dump downloads: 50/day (pending), 500/day (Snapshot), 1,000/day (Pro) - WebSocket /ws/feed: 100 total connections, 5 per API key, 20 user subscriptions per connection - WebSocket /ws/stream: 10 concurrent stream connections (shared with /ws/feed pool) If you exceed the rate limit, back off and retry after 1 second. ## Notes - All amounts are in raw on-chain units (not human-readable). Divide by 1e6 for USDC. - avg_price is scaled by 1e6 (divide by 1e6 for decimal price 0-1). - Data covers Polygon from block 35,800,000 to present. - 865M+ fills, 150M+ positions indexed. - Indexer runs at the chain tip (<1s delay) with automatic reorg detection and recovery. - Daily event dumps cover one day each, full history since 2022, kept forever. - Positions dump is a daily full snapshot updated at 05:00 UTC (not atomic — rows may have different last_block values). - Dump downloads return 302 redirects to time-limited URLs (20 min expiry). Use `curl -L` to follow redirects. - Daily dump download limits: 50/day (pending), 500/day (Snapshot), 1,000/day (Pro).