Clean Your AI Food | Fixing Context Bloat in Agent Workflows

If you are building agentic AI that interacts with third-party APIs, you have likely noticed a recurring problem: the more tools you give your agent, the more expensive and erratic it becomes.

This is usually caused by context bloat. It happens when an agent calls a tool and receives a massive, raw JSON response containing metadata, headers, and nesting that the model doesn’t actually need. When that raw data is injected into the context window, it doesn’t just burn tokens; it creates noise that distracts the model from the actual task.

The Signal vs. Noise Problem

Most API responses are designed for developers, not for LLMs. They include exhaustive detail to ensure every possible edge case is covered. However, an agent typically only needs a fraction of that data.

For example, a raw Gmail API response for a single email might include MIME encoding details, complex threading metadata, and full header strings. To an AI agent trying to summarise a message, the vast majority of that is noise. When the agent carries that noise into the next turn of the conversation, the bloat compounds. By the time you reach the fifth or sixth interaction, you are paying for thousands of tokens of irrelevant metadata in every single request.

Let’s look at this in practice. Pulling 10 messages from Discord, the raw payload came in at roughly 28,000 characters / 8,800 tokens. After cleaning, the same conversation was 6,700 characters / 2,000 tokens — a 4x reduction with zero loss of conversational meaning.

Here is the shape of the raw Discord message schema, where every message in the array carries this much structural weight:

[
  {
    "type": "integer",
    "content": "string",
    "mentions": [],
    "mention_roles": [],
    "attachments": [],
    "embeds": [
      {
        "type": "string",
        "url": "string",
        "title": "string",
        "description": "string",
        "color": "integer",
        "provider": { "name": "string", "url": "string" },
        "thumbnail": {
          "url": "string", "proxy_url": "string",
          "width": "integer", "height": "integer",
          "content_type": "string", "placeholder": "string",
          "placeholder_version": "integer", "flags": "integer"
        },
        "content_scan_version": "integer",
        "author": { "name": "string", "url": "string" },
        "image": { "url": "string", "proxy_url": "string", "width": "integer", "height": "integer", "content_type": "string", "placeholder": "string", "placeholder_version": "integer", "flags": "integer" },
        "footer": { "text": "string", "icon_url": "string", "proxy_icon_url": "string" },
        "video": { "url": "string", "width": "integer", "height": "integer", "placeholder": "string", "placeholder_version": "integer", "flags": "integer" }
      }
    ],
    "timestamp": "string",
    "edited_timestamp": "null | string",
    "flags": "integer",
    "components": [],
    "id": "string",
    "channel_id": "string",
    "author": {
      "id": "string", "username": "string", "avatar": "string",
      "discriminator": "string", "public_flags": "integer", "flags": "integer",
      "bot": "boolean", "banner": "null", "accent_color": "null",
      "global_name": "null | string", "avatar_decoration_data": "null",
      "collectibles": "null", "display_name_styles": "null",
      "banner_color": "null", "clan": "null", "primary_guild": "null"
    },
    "pinned": "boolean",
    "mention_everyone": "boolean",
    "tts": "boolean",
    "reactions": [
      {
        "emoji": { "id": "null", "name": "string" },
        "count": "integer",
        "count_details": { "burst": "integer", "normal": "integer" },
        "burst_colors": [], "me_burst": "boolean", "burst_me": "boolean",
        "me": "boolean", "burst_count": "integer"
      }
    ]
  }
]

By stripping the guild metadata, user flags, nested timestamps, embeds, and empty arrays, we can reduce the payload to the only fields the model actually needs — author, message text, and channel grouping — flattened into a single chronological conversation string:

[
  {
    "channel_id": "string",
    "conversation": "string"
  }
]

The table below shows what was dropped, what was kept, and why each call matters when the destination is an LLM rather than a developer console.

Area	Raw Discord-style JSON	Cleaned API response	Why it matters for AI input
Top-level structure	Array of message objects	Array of conversation objects	Both are valid, but the cleaned version is simpler and easier for the model to reason over
Core fields	`type`, `content`, `mentions`, `attachments`, `embeds`, `timestamp`, `author`, `flags`, `components`, `id`, `channel_id`, etc.	`channel_id`, `conversation`	Removes low-value metadata and keeps only what is needed
Message content	Split across many individual `content` fields	Combined into one chronological `conversation` string	Preserves meaning while reducing structural noise
Author data	Large nested `author` object with IDs, avatar, flags, bot status, profile metadata	Author name is embedded inline in the conversation text	Keeps speaker context without carrying unnecessary account metadata
Embeds	Full nested `embeds` objects with previews, thumbnails, proxy URLs, dimensions, placeholders, scan versions	Links remain inside the conversation text only	Avoids sending duplicate link preview data that rarely helps the model
Reactions	Nested `reactions` array with emoji metadata and counts	Removed	Usually irrelevant unless analysing engagement
Empty fields	Many empty arrays: `mentions`, `mention_roles`, `attachments`, `components`	Removed	Empty fields consume tokens without adding information
Null fields	Many null fields: `banner`, `accent_color`, `collectibles`, etc.	Removed	Null profile fields add no useful context
Timestamps	Separate `timestamp` and `edited_timestamp` fields per message	Timestamps preserved inline before each message	Keeps chronological context in a compact, readable format
IDs	Message IDs, channel IDs, author IDs	Only `channel_id` kept	Retains useful grouping key while removing identifiers the model does not need
Token efficiency	High token usage due to repeated nested structures and metadata	Much lower token usage	More context can fit in the prompt, reducing cost and improving available working memory
Model comprehension	Model must filter noise before understanding the actual conversation	Model can focus directly on the conversation	Less distraction means better summarisation, extraction, classification, and reasoning
Risk of irrelevant output	Higher, because the model may latch onto embeds, thumbnails, avatars, flags, or internal metadata	Lower, because only meaningful conversational content remains	Cleaner input guides the model toward cleaner output
Best use case	Auditing raw platform/API payloads, debugging integrations, preserving every field	Feeding conversation history into an AI model	Use raw data for systems, cleaned data for model context
Overall	Complete but noisy	Compact and purpose-built	Cleaning API responses before AI injection saves tokens and improves output quality

SOURCEExternal APIRaw JSON ~28k chars
→
THE FIXCleaning LayerField Extraction / Mapping
→
DESTINATIONAI AgentClean Context ~2k tokens
Architectural Pattern: Intercept → Strip → Forward

The Solution: The Cleaning Layer Pattern

The most effective way to handle this is to implement a cleaning layer between your API and your agent. Instead of the agent calling the API directly, it calls a middleware function that handles the request, cleans the response, and returns only the essential data.

I use n8n as the middleware in front of my agents (exposed over MCP) because it lets me visually map and strip data without writing a new deployment for every single tool. When my agent needs data from Gmail, n8n makes the call, runs the response through a transformation node, and returns a lean JSON object containing only the sender, subject, and body text.

The result is a massive reduction in overhead. For our 10-message Discord pull:

Raw response: ~28,000 characters / ~8,800 tokens of headers, embeds, author objects, and reactions.
Cleaned response: ~6,700 characters / ~2,000 tokens of high-signal conversation.

While n8n is my preferred tool for this, the pattern is stack-agnostic. You can achieve the same result using Make.com, a custom Node.js/Python middleware, or even simple serverless functions (AWS Lambda / Google Cloud Functions). The goal is the same: never let raw API output touch your agent’s context window.

How to Implement a Cleaner

You don’t need to spend hours manually mapping every field in a complex API. The most efficient way to build a cleaning layer is to use an LLM to write the transformation logic for you.

If you have a raw API response and you aren’t sure what to strip, use this prompt with a frontier model like Claude Sonnet 4.6:

“I have this raw JSON response from the [Name of API]. I am feeding this into an AI agent, and I want to eliminate context bloat. Please analyse the JSON and identify the 5-10 most critical fields for a task involving [describe your task]. Then, write a [JavaScript/Python] function that extracts only those fields and returns a clean, minimal JSON object.”

Once the LLM gives you the logic, the implementation is short. Here is the cleaner we used for the Discord example above — it collapses an array of raw message objects into a single chronological conversation string, keyed by channel:

// Discord message array -> minimal conversation object
function cleanDiscordMessages(rawMessages) {
  if (!rawMessages.length) return [];

  const channelId = rawMessages[0].channel_id;

  const conversation = [...rawMessages]
    .sort((a, b) => new Date(a.timestamp) - new Date(b.timestamp))
    .map(m => {
      const author = m.author?.global_name || m.author?.username || 'unknown';
      return `[${m.timestamp}] ${author}: ${m.content}`;
    })
    .join('\n');

  return [{ channel_id: channelId, conversation }];
}

Drop a script like this into your workflow between the API call and the agent. This ensures the agent receives a high-density signal, which reduces the likelihood of hallucinations and significantly lowers your API spend.

Key Takeaways for Agent Design

When building out your agent’s toolset, keep these three principles in mind:

Define the minimal data set: Before building a tool, write down exactly which fields the agent needs to complete the task. If the agent is checking a calendar, it needs the event title and time, not the entire timezone configuration object.
Enforce the filter: Make the cleaning layer mandatory. If you add a new API, the workflow is incomplete until the filter is implemented.
Monitor token usage: Regularly check your agent’s turn-by-turn token growth. If you see a spike after a specific tool call, that tool is a candidate for a tighter cleaning layer.

Clean Your AI Food: Fixing Context Bloat in Agent Workflows

Context bloat is costing you money and confusing your agents. Here's the fix.

The Signal vs. Noise Problem

The Solution: The Cleaning Layer Pattern

How to Implement a Cleaner

Key Takeaways for Agent Design

Build Leaner Agent Workflows

Shah