Raw API responses are bloating your agent’s context window and burning through tokens. Here’s how to strip the noise and feed your agents only what they need.
- Raw API responses are <strong>burning tokens</strong>.
- The fix: A <strong>Cleaning Layer</strong> to strip the noise.
- High-density signal = lower costs.
- Fewer hallucinations, sharper output.
- Works across n8n, Make, or custom scripts.
- Stop feeding your agents 'junk food'.
If you are building agentic AI that interacts with third-party APIs, you have likely noticed a recurring problem: the more tools you give your agent, the more expensive and erratic it becomes.
This is usually caused by context bloat. It happens when an agent calls a tool and receives a massive, raw JSON response containing metadata, headers, and nesting that the model doesn’t actually need. When that raw data is injected into the context window, it doesn’t just burn tokens; it creates noise that distracts the model from the actual task.
The Signal vs. Noise Problem
Most API responses are designed for developers, not for LLMs. They include exhaustive detail to ensure every possible edge case is covered. However, an agent typically only needs a fraction of that data.
For example, a raw Gmail API response for a single email might include MIME encoding details, complex threading metadata, and full header strings. To an AI agent trying to summarise a message, the vast majority of that is noise. When the agent carries that noise into the next turn of the conversation, the bloat compounds. By the time you reach the fifth or sixth interaction, you are paying for thousands of tokens of irrelevant metadata in every single request.
Let’s look at this in practice. Pulling 10 messages from Discord, the raw payload came in at roughly 28,000 characters / 8,800 tokens. After cleaning, the same conversation was 6,700 characters / 2,000 tokens — a 4x reduction with zero loss of conversational meaning.
Here is the shape of the raw Discord message schema, where every message in the array carries this much structural weight:
[
{
"type": "integer",
"content": "string",
"mentions": [],
"mention_roles": [],
"attachments": [],
"embeds": [
{
"type": "string",
"url": "string",
"title": "string",
"description": "string",
"color": "integer",
"provider": { "name": "string", "url": "string" },
"thumbnail": {
"url": "string", "proxy_url": "string",
"width": "integer", "height": "integer",
"content_type": "string", "placeholder": "string",
"placeholder_version": "integer", "flags": "integer"
},
"content_scan_version": "integer",
"author": { "name": "string", "url": "string" },
"image": { "url": "string", "proxy_url": "string", "width": "integer", "height": "integer", "content_type": "string", "placeholder": "string", "placeholder_version": "integer", "flags": "integer" },
"footer": { "text": "string", "icon_url": "string", "proxy_icon_url": "string" },
"video": { "url": "string", "width": "integer", "height": "integer", "placeholder": "string", "placeholder_version": "integer", "flags": "integer" }
}
],
"timestamp": "string",
"edited_timestamp": "null | string",
"flags": "integer",
"components": [],
"id": "string",
"channel_id": "string",
"author": {
"id": "string", "username": "string", "avatar": "string",
"discriminator": "string", "public_flags": "integer", "flags": "integer",
"bot": "boolean", "banner": "null", "accent_color": "null",
"global_name": "null | string", "avatar_decoration_data": "null",
"collectibles": "null", "display_name_styles": "null",
"banner_color": "null", "clan": "null", "primary_guild": "null"
},
"pinned": "boolean",
"mention_everyone": "boolean",
"tts": "boolean",
"reactions": [
{
"emoji": { "id": "null", "name": "string" },
"count": "integer",
"count_details": { "burst": "integer", "normal": "integer" },
"burst_colors": [], "me_burst": "boolean", "burst_me": "boolean",
"me": "boolean", "burst_count": "integer"
}
]
}
]
By stripping the guild metadata, user flags, nested timestamps, embeds, and empty arrays, we can reduce the payload to the only fields the model actually needs — author, message text, and channel grouping — flattened into a single chronological conversation string:
[
{
"channel_id": "string",
"conversation": "string"
}
]
The table below shows what was dropped, what was kept, and why each call matters when the destination is an LLM rather than a developer console.
| Area | Raw Discord-style JSON | Cleaned API response | Why it matters for AI input |
|---|---|---|---|
| Top-level structure | Array of message objects | Array of conversation objects | Both are valid, but the cleaned version is simpler and easier for the model to reason over |
| Core fields | type, content, mentions, attachments, embeds, timestamp, author, flags, components, id, channel_id, etc. |
channel_id, conversation |
Removes low-value metadata and keeps only what is needed |
| Message content | Split across many individual content fields |
Combined into one chronological conversation string |
Preserves meaning while reducing structural noise |
| Author data | Large nested author object with IDs, avatar, flags, bot status, profile metadata |
Author name is embedded inline in the conversation text | Keeps speaker context without carrying unnecessary account metadata |
| Embeds | Full nested embeds objects with previews, thumbnails, proxy URLs, dimensions, placeholders, scan versions |
Links remain inside the conversation text only | Avoids sending duplicate link preview data that rarely helps the model |
| Reactions | Nested reactions array with emoji metadata and counts |
Removed | Usually irrelevant unless analysing engagement |
| Empty fields | Many empty arrays: mentions, mention_roles, attachments, components |
Removed | Empty fields consume tokens without adding information |
| Null fields | Many null fields: banner, accent_color, collectibles, etc. |
Removed | Null profile fields add no useful context |
| Timestamps | Separate timestamp and edited_timestamp fields per message |
Timestamps preserved inline before each message | Keeps chronological context in a compact, readable format |
| IDs | Message IDs, channel IDs, author IDs | Only channel_id kept |
Retains useful grouping key while removing identifiers the model does not need |
| Token efficiency | High token usage due to repeated nested structures and metadata | Much lower token usage | More context can fit in the prompt, reducing cost and improving available working memory |
| Model comprehension | Model must filter noise before understanding the actual conversation | Model can focus directly on the conversation | Less distraction means better summarisation, extraction, classification, and reasoning |
| Risk of irrelevant output | Higher, because the model may latch onto embeds, thumbnails, avatars, flags, or internal metadata | Lower, because only meaningful conversational content remains | Cleaner input guides the model toward cleaner output |
| Best use case | Auditing raw platform/API payloads, debugging integrations, preserving every field | Feeding conversation history into an AI model | Use raw data for systems, cleaned data for model context |
| Overall | Complete but noisy | Compact and purpose-built | Cleaning API responses before AI injection saves tokens and improves output quality |
The Solution: The Cleaning Layer Pattern
The most effective way to handle this is to implement a cleaning layer between your API and your agent. Instead of the agent calling the API directly, it calls a middleware function that handles the request, cleans the response, and returns only the essential data.
I use n8n as the middleware in front of my agents (exposed over MCP) because it lets me visually map and strip data without writing a new deployment for every single tool. When my agent needs data from Gmail, n8n makes the call, runs the response through a transformation node, and returns a lean JSON object containing only the sender, subject, and body text.
The result is a massive reduction in overhead. For our 10-message Discord pull:
- Raw response: ~28,000 characters / ~8,800 tokens of headers, embeds, author objects, and reactions.
- Cleaned response: ~6,700 characters / ~2,000 tokens of high-signal conversation.
While n8n is my preferred tool for this, the pattern is stack-agnostic. You can achieve the same result using Make.com, a custom Node.js/Python middleware, or even simple serverless functions (AWS Lambda / Google Cloud Functions). The goal is the same: never let raw API output touch your agent’s context window.
How to Implement a Cleaner
You don’t need to spend hours manually mapping every field in a complex API. The most efficient way to build a cleaning layer is to use an LLM to write the transformation logic for you.
If you have a raw API response and you aren’t sure what to strip, use this prompt with a frontier model like Claude Sonnet 4.6:
“I have this raw JSON response from the [Name of API]. I am feeding this into an AI agent, and I want to eliminate context bloat. Please analyse the JSON and identify the 5-10 most critical fields for a task involving [describe your task]. Then, write a [JavaScript/Python] function that extracts only those fields and returns a clean, minimal JSON object.”
Once the LLM gives you the logic, the implementation is short. Here is the cleaner we used for the Discord example above — it collapses an array of raw message objects into a single chronological conversation string, keyed by channel:
// Discord message array -> minimal conversation object
function cleanDiscordMessages(rawMessages) {
if (!rawMessages.length) return [];
const channelId = rawMessages[0].channel_id;
const conversation = [...rawMessages]
.sort((a, b) => new Date(a.timestamp) - new Date(b.timestamp))
.map(m => {
const author = m.author?.global_name || m.author?.username || 'unknown';
return `[${m.timestamp}] ${author}: ${m.content}`;
})
.join('\n');
return [{ channel_id: channelId, conversation }];
}
Drop a script like this into your workflow between the API call and the agent. This ensures the agent receives a high-density signal, which reduces the likelihood of hallucinations and significantly lowers your API spend.
Key Takeaways for Agent Design
When building out your agent’s toolset, keep these three principles in mind:
- Define the minimal data set: Before building a tool, write down exactly which fields the agent needs to complete the task. If the agent is checking a calendar, it needs the event title and time, not the entire timezone configuration object.
- Enforce the filter: Make the cleaning layer mandatory. If you add a new API, the workflow is incomplete until the filter is implemented.
- Monitor token usage: Regularly check your agent’s turn-by-turn token growth. If you see a spike after a specific tool call, that tool is a candidate for a tighter cleaning layer.
Build Leaner Agent Workflows
If your agents are burning tokens and losing focus, we can help you implement professional cleaning layers and optimised orchestration.
Get in touch