Scaling Data Mining with API Efficiency Under TPM Limits
Issue
We needed to mine and transform large text datasets into structured formats like:
- Structured JSON from raw text
- Graphs from code or document relationships
However, GPT-4’s 2M TPM constraint and high latency posed a scaling bottleneck.
title: “Scaling Data Mining with API Efficiency Under TPM Limits” collection: journal excerpt: “Efficiently mining structured text or graphs using GPT-4 APIs while staying under 2M TPM.”
Issue
We needed to mine and transform large text datasets into structured formats like:
- Structured JSON from raw text
- Graphs from code or document relationships
However, GPT-4’s 2M TPM constraint and high latency posed a scaling bottleneck.
Solution
We implemented a parallel, optimized API pipeline using LangChain, with full token tracking and queue management:
1. Query Batching with Subtasks
- Split tasks into independent subtasks (e.g., paragraph → entity list)
- Batched multiple prompts per request using
LangChain.map_reduce
- Used OpenAI function-calling API to enforce structured output
- Compressed prompts before each call to fit more per request
2. TPM-Conscious Scheduling
- Used LangChain’s token-aware throttling to avoid rate limit breaches
- Distributed load across multiple API keys/orgs to parallelize requests
- Tracked token usage per key using a
token_window
sliding structure - Maintained a pending job queue with TTL and retries
3. Throttling Logic with Queue and Token Tracker
- Used a
deque
-based job queue for pending transformations - Created a
token_usage_log = {api_key: [(timestamp, token_count), ...]}
structure - In each loop:
while job_queue:
job = job_queue.popleft()
if not token_within_limit(api_key):
job_queue.append(job) # Requeue if over TPM
sleep(1)
continue
output = call_openai(job)
log_token_use(api_key, token_count)
save_output(output)