# predmktdata

> On-chain Polymarket data feed. Parquet dumps + real-time API. Full history + daily updates.

predmktdata indexes every Polymarket event on Polygon and serves it as Parquet. Two plans:

- **Snapshot ($49/mo)**: Full history since 2022 as daily Parquet dumps. 865M+ fills, 150M+ positions, markets lookup. End-of-day updates (05:00 UTC). No real-time API, no per-wallet queries. 500 dump downloads/day.
- **Pro ($149/mo)**: Everything in Snapshot + real-time API (GET /events, GET /user/*) + WebSocket feed (WS /ws/feed) + firehose stream (WS /ws/stream). Poll for new data every second, subscribe to WebSocket for push notifications, or stream all events to keep your own database in sync. Freshness: <1 second (indexes at the chain tip with automatic reorg recovery). 1,000 dump downloads/day.

Pending users (signed up but not subscribed) can access sample data only (50 dump downloads/day).

All paid plans include: full history, positions with avg_price and realized_pnl, markets lookup, Parquet format.

## Authentication

All endpoints require `x-api-key` header. Get a key by signing in with Google at predmktdata.com.

To check your plan, try GET /dumps — if you get 403, you need Snapshot or Pro. If GET /events returns 403, you need Pro.

## API

**Base URL: https://api.predmktdata.com**

All endpoints below are relative to this base URL. For example, GET /status means GET https://api.predmktdata.com/status.

### GET /health

Unauthenticated health check for Docker/load balancer probes.

Response:
```json
{"ok": true, "block": 84570000}
```

### GET /status

Returns current indexer state and row counts (JSON).

Response:
```json
{
  "last_block": 84570000,
  "head_block": 84570000,
  "tables": {
    "order_filled_events": 854000000,
    "positions": 150000000,
    "payout_redemptions": 93000000,
    "position_splits": 7700000,
    "position_merges": 7800000,
    "position_conversions": 1800000
  }
}
```

### GET /events (Pro only)

Fetch events as CSV. One table per request. Supports gzip and zstd compression.

Parameters:
- `after_block` (required): Return events after this block number
- `limit` (default: 5000, max: 5000): Max blocks to include
- `tables` (required): Exactly one table name per request

Available tables: order_filled_events, position_splits, position_merges, payout_redemptions, position_conversions, positions

Note: Only one table can be requested per call. To sync multiple tables, make separate requests for each.

Response headers:
- `x-after-block`: The after_block you requested
- `x-last-block`: Last block included in this response (use as next after_block)
- `x-head-block`: Current chain head
- `x-reorg`: Present if client is ahead of indexer (rollback to this block)

Response body: CSV with header row.

Example:
```
curl --compressed -H "x-api-key: YOUR_KEY" \
  "https://api.predmktdata.com/events?after_block=84420000&limit=500&tables=order_filled_events"
```

### GET /user/{address}/positions (Pro only)

All current positions for a wallet address, ordered by most recent activity. CSV response. Includes market metadata (question, outcome, market_slug) via automatic join with the markets table — empty for tokens not in markets.

Parameters:
- `limit` (default: 10000, max: 10000): Max rows to return
- `offset` (default: 0): Skip this many rows (for pagination)
- `min_amount` (default: 0): Minimum position amount (raw units). Use `min_amount=1` to get only active (non-zero) positions — filters out closed positions and can dramatically reduce response size and latency for wallets with large history.

Columns: user_address, token_id, amount, avg_price, realized_pnl, unrealized_pnl, total_pnl, total_bought, last_block, block_timestamp, question, outcome, market_slug

unrealized_pnl and total_pnl are computed from current market prices, which are updated every minute via marketsync. unrealized_pnl is the mark-to-market gain/loss on open positions. total_pnl = realized_pnl + unrealized_pnl.

Example:
```
# All positions (including closed)
curl --compressed -H "x-api-key: YOUR_KEY" \
  "https://api.predmktdata.com/user/0xabc.../positions"

# Only active positions (much faster for wallets with large history)
curl --compressed -H "x-api-key: YOUR_KEY" \
  "https://api.predmktdata.com/user/0xabc.../positions?min_amount=1"

# Paginate
curl --compressed -H "x-api-key: YOUR_KEY" \
  "https://api.predmktdata.com/user/0xabc.../positions?limit=1000&offset=0"
```

### GET /user/{address}/pnl (Pro only)

Current PnL summary across all positions for a wallet address. realized_pnl is locked-in profit/loss from closed trades. unrealized_pnl is mark-to-market on open positions using live market prices (updated every minute via marketsync). portfolio_value is the current market value of all open positions. All values in USD (raw amounts divided by 1e6).

Response (JSON):
```json
{
  "address": "0x...",
  "realized_pnl": 12345.67,
  "unrealized_pnl": -234.56,
  "total_pnl": 12111.11,
  "portfolio_value": 5678.90,
  "total_volume": 98765.43,
  "active_positions": 42,
  "markets_traded": 156
}
```

Example:
```
curl --compressed -H "x-api-key: YOUR_KEY" \
  "https://api.predmktdata.com/user/0xabc.../pnl"
```

### GET /user/{address}/fills (Pro only)

All fills where address is maker or taker. CSV response. Includes market metadata. Results are not ordered — use after_block to paginate chronologically.

Columns: transaction_hash, log_index, block_number, exchange, maker, taker, maker_asset_id, taker_asset_id, maker_amount_filled, taker_amount_filled, fee, side, block_timestamp, question, outcome, market_slug

Parameters:
- `after_block` (default: 0): Only fills after this block
- `limit` (default: 50000, max: 500000): Max rows

### WS /ws/feed (Pro only)

Real-time WebSocket feed of fills and position changes. Connect and subscribe to specific wallets or thresholds — data pushes to you the moment the indexer commits a batch.

Connect: `wss://api.predmktdata.com/ws/feed?x_api_key=YOUR_KEY`

Subscribe messages (send as JSON after connecting):
```json
// Watch a wallet's fills and positions
{"action": "subscribe", "type": "user", "address": "0xabc..."}

// Watch large fills (raw units, ÷1e6 for USDC)
{"action": "subscribe", "type": "threshold", "min_fill_amount": 10000000000}

// Watch cheap positions (low avg_price)
{"action": "subscribe", "type": "threshold", "max_avg_price": 50000}

// Watch large positions (high amount)
{"action": "subscribe", "type": "threshold", "min_position_amount": 1000000000}

// Unsubscribe
{"action": "unsubscribe", "type": "user", "address": "0xabc..."}

// Keepalive
{"action": "ping"}
```

Messages received:
```json
{
  "fills": [{"transaction_hash": "0x...", "maker": "0x...", ...}],
  "positions": [{"user_address": "0x...", "amount": 5000000, ...}]
}
```

Limits: 100 total connections, 5 per API key, 20 user subscriptions per connection. Values are raw i64 (same as REST API — divide by 1e6).

### WS /ws/stream (Pro only)

Firehose WebSocket for keeping your own database in sync. Connects, catches up from a recent block, then streams all new events and positions as the indexer commits them.

Connect: `wss://api.predmktdata.com/ws/stream?x_api_key=YOUR_KEY&start_block=84650000&tables=order_filled_events,positions`

Parameters (query string):
- `x_api_key` (required): Your API key
- `start_block` (required): Block to start from (max ~900 blocks / 30 min behind head)
- `tables` (optional): Comma-separated list of tables to stream. Default: all. Valid: order_filled_events, payout_redemptions, position_conversions, position_merges, position_splits, positions

Protocol:
1. Server sends catchup batches (100 blocks each) from start_block to head
2. Server sends `{"type": "caught_up", "block": N}` when catchup is complete
3. Server streams live batches as the indexer commits new blocks
4. Client can send `{"action": "ping"}` to get `{"pong": true}`

Each batch message:
```json
{
  "type": "batch",
  "from_block": 84650000,
  "to_block": 84650100,
  "order_filled_events": [{"transaction_hash": "0x...", "maker": "0x...", ...}],
  "positions": [{"user_address": "0x...", "amount": 5000000, ...}]
}
```

Only the tables you subscribed to are included. Empty arrays mean no events in that block range for that table.

Close codes:
- 4001: Unauthorized (missing or invalid API key)
- 4003: Pro plan required
- 4008: Slow consumer (client can't keep up — reconnect with a newer start_block)
- 4029: Connection limit reached
- 4400: start_block too old or invalid tables

Limits: 10 concurrent stream connections, shared with /ws/feed pool (100 total, 5 per API key).

Recommended usage: Use dumps for initial backfill, then connect /ws/stream with start_block = your last processed block to stay in sync. On disconnect, reconnect with the last block you received.

### GET /dumps (Snapshot + Pro)

List available dump files for fast backfill.

Response (JSON):
```json
{
  "files": [
    {"path": "order_filled_events/20250615.parquet", "size": 52428800},
    {"path": "positions/positions_20260320.parquet", "size": 8033986444}
  ],
  "base_url": "/dumps/"
}
```

### GET /dumps/{path} (Snapshot + Pro)

Download a dump file. Returns a 302 redirect to a time-limited download URL (20 min expiry).

Dump file types:
- **Daily events**: `<table>/YYYYMMDD.parquet` — one file per day, full history since 2022
- **Positions**: `positions/positions_YYYYMMDD_<block>.parquet` — full daily snapshot, 2 copies kept
- **Markets**: `markets/markets.parquet` — token_id → question/outcome lookup, overwritten daily
- **UMA resolution**: `uma/resolution_proposals.parquet`, `uma/resolution_disputes.parquet`, `uma/resolution_settlements.parquet` — full snapshots, overwritten daily

## When to use dumps vs the API

Use **dumps** when you need bulk historical data. Dumps are Parquet files (one per day per table). They are the fastest way to backfill — downloading all history via dumps takes minutes, while syncing the same data through the API would take hours due to rate limits and pagination. Dumps are updated daily at 05:00 UTC.

Use the **real-time API** when you need fresh data (seconds-old), per-wallet queries, or streaming updates. The API serves incremental chunks — you poll /events with after_block to get only what changed since your last request, or connect to the WebSocket for push notifications.

**Most Pro users should combine both**: dumps for the initial backfill, then the API to stay current in real-time.

| Use case | Best approach | Plan needed |
|----------|--------------|-------------|
| Load all Polymarket history into my database | Dumps | Snapshot |
| Daily batch analytics (end-of-day) | Dumps | Snapshot |
| Real-time trading bot or alert system | API + WebSocket | Pro |
| One-time research / data export | Dumps | Snapshot |
| Live dashboard tracking specific wallets | API (/user/*) + WebSocket | Pro |
| Backfill + stay current | Dumps first, then API polling | Pro |

## Working with dump files

Dumps are Parquet files. DuckDB is the recommended tool for querying them.

### Parquet

Parquet files have proper column types built in (VARCHAR for addresses and token IDs, BIGINT for amounts). No manual type casting needed:

```sql
-- Query a local Parquet file
SELECT * FROM read_parquet('order_filled_events/20260315.parquet')
WHERE taker = '0xabc...' LIMIT 100;

-- Load all daily files
CREATE TABLE fills AS SELECT * FROM read_parquet('order_filled_events/*.parquet');

-- Join with markets
SELECT f.*, m.question, m.outcome_yes
FROM read_parquet('order_filled_events/20260322.parquet') f
LEFT JOIN read_parquet('markets/markets.parquet') m
  ON f.taker_asset_id = m.yes_token_id OR f.taker_asset_id = m.no_token_id
LIMIT 10;
```

### Remote querying (Parquet only, zero download)

Parquet supports HTTP range requests — DuckDB reads only the columns and row groups needed for your query. A filtered query on a 2GB file might transfer only 5-50MB.

To query remotely, get a presigned URL from the API, then pass it to DuckDB:

```python
import requests, duckdb

# Step 1: get presigned URL (valid 20 min)
r = requests.get(
    "https://api.predmktdata.com/dumps/order_filled_events/20260315.parquet",
    headers={"x-api-key": "YOUR_KEY"}, allow_redirects=False,
)
url = r.headers["Location"]

# Step 2: query remotely — DuckDB only downloads what it needs
duckdb.sql(f"""
    SELECT taker, count(*) as fills, sum(maker_amount_filled) / 1e6 as volume_usdc
    FROM read_parquet('{url}')
    WHERE maker_amount_filled > 1000000000
    GROUP BY 1 ORDER BY 3 DESC LIMIT 20
""").show()
```

From the DuckDB CLI:
```sql
INSTALL httpfs; LOAD httpfs;
SELECT count(*) FROM read_parquet('PRESIGNED_URL_HERE');
```

DuckDB works from Python (`pip install duckdb`), CLI, or as a library in most languages.

## Recommended setup

### Snapshot plan: Parquet dumps only

1. GET /dumps to list available files
2. Download all daily event files (`<table>/YYYYMMDD.parquet`) in parallel
3. Download the latest positions snapshot (`positions/positions_YYYYMMDD_<block>.parquet`)
4. Query directly with DuckDB, or INSERT into your database and UPSERT positions
5. Run daily: download new daily files + fresh positions snapshot

This gives you all of Polymarket history, updated to yesterday. No API polling needed.

### Pro plan: dumps + real-time API

1. Do the Snapshot setup above for fast backfill
2. Find MAX(last_block) from the positions Parquet — this is your sync cursor
3. Poll GET /events?after_block={max_last_block}&tables=order_filled_events to sync the gap
4. Do the same for positions and any other tables you need
5. Continue polling every few seconds for near real-time updates

Note: The positions snapshot is not atomic — different rows may have different last_block values. Use MAX(last_block) as your sync cursor. The API will send any updates you missed, and your UPSERT will reconcile them.

### API-only sync (Pro, no dumps)

1. Start with after_block=0 (or any starting block)
2. Request /events?after_block=0&limit=5000&tables=order_filled_events
3. Read `x-last-block` response header — use it as after_block for the next request
4. Repeat until `x-last-block` == `x-head-block` (caught up)
5. Continue polling every few seconds for near real-time updates
6. Repeat for each table you need (positions, position_splits, etc.)

IMPORTANT: Event tables are append-only — use INSERT. Positions are mutable — the same (user_address, token_id) may appear in multiple chunks. Always use UPSERT (INSERT ... ON CONFLICT (user_address, token_id) DO UPDATE) for positions.

## Data schema

### order_filled_events
Exchange fills from CTF Exchange and NegRisk Exchange.
Columns: timestamp, transaction_hash, block_number, maker, taker, maker_asset_id, taker_asset_id, maker_amount_filled, taker_amount_filled, fee, side (buy|sell)

### position_splits
USDC to outcome token splits from ConditionalTokens contract.
Columns: timestamp, transaction_hash, block_number, stakeholder, condition_id, amount

### position_merges
Outcome tokens to USDC merges from ConditionalTokens contract.
Columns: timestamp, transaction_hash, block_number, stakeholder, condition_id, amount

### payout_redemptions
Post-resolution payouts from ConditionalTokens contract.
Columns: timestamp, transaction_hash, block_number, redeemer, condition_id, payout

### position_conversions
NegRisk position conversions from NegRisk Adapter.
Columns: timestamp, transaction_hash, block_number, stakeholder, condition_id, index_set, amount

### positions
Computed current state per (user, token_id). Derived from all event types.
Columns: user_address, token_id, amount, avg_price, realized_pnl, total_bought, last_block

The /events API and /user endpoints add `block_timestamp`. The /user/*/positions endpoint also adds `unrealized_pnl` and `total_pnl` (computed from live market prices).

### UMA resolution data (uma/ folder)
Market resolution events from UMA OptimisticOracleV2. Track who proposes outcomes, who disputes, and final settlements. Updated daily.

**resolution_proposals** — Who proposed each market's outcome.
Columns: transaction_hash, block_number, requester, proposer, identifier, request_timestamp, ancillary_data (question text), proposed_price (1e18=YES, 0=NO), expiration_timestamp, currency

**resolution_disputes** — Who disputed a proposed outcome.
Columns: transaction_hash, block_number, requester, proposer, disputer, identifier, request_timestamp, ancillary_data, proposed_price

**resolution_settlements** — Final outcome and bond payouts.
Columns: transaction_hash, block_number, requester, proposer, disputer (0x0 if undisputed), identifier, request_timestamp, ancillary_data, settled_price, payout

Dump paths: `uma/resolution_proposals.parquet`, `uma/resolution_disputes.parquet`, `uma/resolution_settlements.parquet`

### markets (lookup table)
Maps token IDs to human-readable market info. Join with event or position tables on yes_token_id / no_token_id to see which question and outcome a trade or position belongs to. Overwritten daily.

Columns:
- condition_id — unique identifier for the market condition
- question — the market question (e.g. "Will Bitcoin hit $100k?")
- outcome_yes / outcome_no — human-readable labels (e.g. "Yes"/"No", or "Bitcoin"/"Ethereum")
- yes_token_id / no_token_id — token IDs for each outcome (use these to join with events and positions)
- market_slug — URL slug on polymarket.com
- end_date_iso — market end date (ISO 8601)
- neg_risk — true if market uses the NegRisk exchange (multi-outcome markets)

To look up which market a trade belongs to, join the event's token_id (or taker_asset_id for fills) against yes_token_id or no_token_id.

## Compression

The API supports two compression algorithms via Accept-Encoding header:
- `gzip` — universally supported
- `zstd` — ~30% smaller, faster compression/decompression

Use `--compressed` with curl for automatic gzip, or send `Accept-Encoding: zstd` for zstd.

## Error responses

All errors return JSON with a `detail` field.

| Status | Meaning | Example |
|--------|---------|---------|
| 401 | Missing or invalid API key | `{"detail": "Unauthorized"}` |
| 403 | Key valid but plan doesn't include this endpoint | `{"detail": "Pro plan required"}` |
| 400 | Bad request (invalid table name, etc.) | `{"detail": "Specify exactly one table per request."}` |
| 422 | Missing required parameter | `{"detail": [{"type": "missing", "loc": ["query", "after_block"], "msg": "Field required"}]}` |
| 429 | Rate limit exceeded (Pro: 5 req/s) | `{"detail": "Rate limit exceeded"}` |
| 500 | Internal server error | `{"detail": "Internal server error"}` |

## Rate limits

- Pro plan: 5 requests/second per API key
- Dump downloads: 50/day (pending), 500/day (Snapshot), 1,000/day (Pro)
- WebSocket /ws/feed: 100 total connections, 5 per API key, 20 user subscriptions per connection
- WebSocket /ws/stream: 10 concurrent stream connections (shared with /ws/feed pool)

If you exceed the rate limit, back off and retry after 1 second.

## Notes

- All amounts are in raw on-chain units (not human-readable). Divide by 1e6 for USDC.
- avg_price is scaled by 1e6 (divide by 1e6 for decimal price 0-1).
- Data covers Polygon from block 35,800,000 to present.
- 865M+ fills, 150M+ positions indexed.
- Indexer runs at the chain tip (<1s delay) with automatic reorg detection and recovery.
- Daily event dumps cover one day each, full history since 2022, kept forever.
- Positions dump is a daily full snapshot updated at 05:00 UTC (not atomic — rows may have different last_block values).
- Dump downloads return 302 redirects to time-limited URLs (20 min expiry). Use `curl -L` to follow redirects.
- Daily dump download limits: 50/day (pending), 500/day (Snapshot), 1,000/day (Pro).