---
title: "Clean Your AI Food: Fixing Context Bloat in Agent Workflows"
url: "https://bravr.ai/blog/clean-your-ai-food-fixing-context-bloat-in-agent-workflows"
description: "Raw API responses bloat your agent's context and burn tokens. Learn the Cleaning Layer pattern to strip noise, cut costs, and sharpen agent accuracy."
---

# Clean Your AI Food: Fixing Context Bloat in Agent Workflows

## Context bloat is costing you money and confusing your agents. Here's the fix.

Raw API responses are bloating your agent’s context window and burning through tokens. Here’s how to strip the noise and feed your agents only what they need.

By Shah 9 May 2026 AI

**Key Takeaways**

*   Raw API responses are <strong>burning tokens</strong>.
*   The fix: A <strong>Cleaning Layer</strong> to strip the noise.
*   High-density signal = lower costs.
*   Fewer hallucinations, sharper output.
*   Works across n8n, Make, or custom scripts.
*   Stop feeding your agents 'junk food'.

If you are building [agentic AI](/labs/agentic/) that interacts with third-party APIs, you have likely noticed a recurring problem: the more tools you give your agent, the more expensive and erratic it becomes.

This is usually caused by context bloat. It happens when an agent calls a tool and receives a massive, raw JSON response containing metadata, headers, and nesting that the model doesn’t actually need. When that raw data is injected into the context window, it doesn’t just burn tokens; it creates noise that distracts the model from the actual task.

## The Signal vs. Noise Problem

Most API responses are designed for developers, not for LLMs. They include exhaustive detail to ensure every possible edge case is covered. However, an agent typically only needs a fraction of that data.

For example, a raw Gmail API response for a single email might include MIME encoding details, complex threading metadata, and full header strings. To an AI agent trying to summarise a message, the vast majority of that is noise. When the agent carries that noise into the next turn of the conversation, the bloat compounds. By the time you reach the fifth or sixth interaction, you are paying for thousands of tokens of irrelevant metadata in every single request.

Let’s look at this in practice. Pulling **10 messages from Discord**, the raw payload came in at roughly **28,000 characters / 8,800 tokens**. After cleaning, the same conversation was **6,700 characters / 2,000 tokens** — a 4x reduction with zero loss of conversational meaning.

Here is the shape of the raw Discord message schema, where every message in the array carries this much structural weight:

```
[
  {
    "type": "integer",
    "content": "string",
    "mentions": [],
    "mention_roles": [],
    "attachments": [],
    "embeds": [
      {
        "type": "string",
        "url": "string",
        "title": "string",
        "description": "string",
        "color": "integer",
        "provider": { "name": "string", "url": "string" },
        "thumbnail": {
          "url": "string", "proxy_url": "string",
          "width": "integer", "height": "integer",
          "content_type": "string", "placeholder": "string",
          "placeholder_version": "integer", "flags": "integer"
        },
        "content_scan_version": "integer",
        "author": { "name": "string", "url": "string" },
        "image": { "url": "string", "proxy_url": "string", "width": "integer", "height": "integer", "content_type": "string", "placeholder": "string", "placeholder_version": "integer", "flags": "integer" },
        "footer": { "text": "string", "icon_url": "string", "proxy_icon_url": "string" },
        "video": { "url": "string", "width": "integer", "height": "integer", "placeholder": "string", "placeholder_version": "integer", "flags": "integer" }
      }
    ],
    "timestamp": "string",
    "edited_timestamp": "null | string",
    "flags": "integer",
    "components": [],
    "id": "string",
    "channel_id": "string",
    "author": {
      "id": "string", "username": "string", "avatar": "string",
      "discriminator": "string", "public_flags": "integer", "flags": "integer",
      "bot": "boolean", "banner": "null", "accent_color": "null",
      "global_name": "null | string", "avatar_decoration_data": "null",
      "collectibles": "null", "display_name_styles": "null",
      "banner_color": "null", "clan": "null", "primary_guild": "null"
    },
    "pinned": "boolean",
    "mention_everyone": "boolean",
    "tts": "boolean",
    "reactions": [
      {
        "emoji": { "id": "null", "name": "string" },
        "count": "integer",
        "count_details": { "burst": "integer", "normal": "integer" },
        "burst_colors": [], "me_burst": "boolean", "burst_me": "boolean",
        "me": "boolean", "burst_count": "integer"
      }
    ]
  }
]
```

By stripping the guild metadata, user flags, nested timestamps, embeds, and empty arrays, we can reduce the payload to the only fields the model actually needs — author, message text, and channel grouping — flattened into a single chronological conversation string:

```
[
  {
    "channel_id": "string",
    "conversation": "string"
  }
]
```

The table below shows what was dropped, what was kept, and why each call matters when the destination is an LLM rather than a developer console.

Area

Raw Discord-style JSON

Cleaned API response

Why it matters for AI input

Top-level structure

Array of message objects

Array of conversation objects

Both are valid, but the cleaned version is simpler and easier for the model to reason over

Core fields

`type`, `content`, `mentions`, `attachments`, `embeds`, `timestamp`, `author`, `flags`, `components`, `id`, `channel_id`, etc.

`channel_id`, `conversation`

Removes low-value metadata and keeps only what is needed

Message content

Split across many individual `content` fields

Combined into one chronological `conversation` string

Preserves meaning while reducing structural noise

Author data

Large nested `author` object with IDs, avatar, flags, bot status, profile metadata

Author name is embedded inline in the conversation text

Keeps speaker context without carrying unnecessary account metadata

Embeds

Full nested `embeds` objects with previews, thumbnails, proxy URLs, dimensions, placeholders, scan versions

Links remain inside the conversation text only

Avoids sending duplicate link preview data that rarely helps the model

Reactions

Nested `reactions` array with emoji metadata and counts

Removed

Usually irrelevant unless analysing engagement

Empty fields

Many empty arrays: `mentions`, `mention_roles`, `attachments`, `components`

Removed

Empty fields consume tokens without adding information

Null fields

Many null fields: `banner`, `accent_color`, `collectibles`, etc.

Removed

Null profile fields add no useful context

Timestamps

Separate `timestamp` and `edited_timestamp` fields per message

Timestamps preserved inline before each message

Keeps chronological context in a compact, readable format

IDs

Message IDs, channel IDs, author IDs

Only `channel_id` kept

Retains useful grouping key while removing identifiers the model does not need

Token efficiency

High token usage due to repeated nested structures and metadata

Much lower token usage

More context can fit in the prompt, reducing cost and improving available working memory

Model comprehension

Model must filter noise before understanding the actual conversation

Model can focus directly on the conversation

Less distraction means better summarisation, extraction, classification, and reasoning

Risk of irrelevant output

Higher, because the model may latch onto embeds, thumbnails, avatars, flags, or internal metadata

Lower, because only meaningful conversational content remains

Cleaner input guides the model toward cleaner output

Best use case

Auditing raw platform/API payloads, debugging integrations, preserving every field

Feeding conversation history into an AI model

Use raw data for systems, cleaned data for model context

**Overall**

Complete but noisy

Compact and purpose-built

Cleaning API responses before AI injection saves tokens and improves output quality

SOURCE**External API**Raw JSON ~28k chars

→

THE FIX**Cleaning Layer**Field Extraction / Mapping

→

DESTINATION**AI Agent**Clean Context ~2k tokens

Architectural Pattern: Intercept → Strip → Forward

## The Solution: The Cleaning Layer Pattern

The most effective way to handle this is to implement a cleaning layer between your API and your agent. Instead of the agent calling the API directly, it calls a middleware function that handles the request, cleans the response, and returns only the essential data.

I use n8n as the middleware in front of my agents (exposed over MCP) because it lets me visually map and strip data without writing a new deployment for every single tool. When my agent needs data from Gmail, n8n makes the call, runs the response through a transformation node, and returns a lean JSON object containing only the sender, subject, and body text.

**The result is a massive reduction in overhead. For our 10-message Discord pull:**

*   **Raw response:** ~28,000 characters / ~8,800 tokens of headers, embeds, author objects, and reactions.
*   **Cleaned response:** ~6,700 characters / ~2,000 tokens of high-signal conversation.

While n8n is my preferred tool for this, the pattern is stack-agnostic. You can achieve the same result using Make.com, a custom Node.js/Python middleware, or even simple serverless functions (AWS Lambda / Google Cloud Functions). The goal is the same: never let raw API output touch your agent’s context window.

## How to Implement a Cleaner

You don’t need to spend hours manually mapping every field in a complex API. The most efficient way to build a cleaning layer is to use an LLM to write the transformation logic for you.

If you have a raw API response and you aren’t sure what to strip, use this prompt with a frontier model like Claude Sonnet 4.6:

> “I have this raw JSON response from the \[Name of API\]. I am feeding this into an AI agent, and I want to eliminate context bloat. Please analyse the JSON and identify the 5-10 most critical fields for a task involving \[describe your task\]. Then, write a \[JavaScript/Python\] function that extracts only those fields and returns a clean, minimal JSON object.”

Once the LLM gives you the logic, the implementation is short. Here is the cleaner we used for the Discord example above — it collapses an array of raw message objects into a single chronological conversation string, keyed by channel:

```
// Discord message array -> minimal conversation object
function cleanDiscordMessages(rawMessages) {
  if (!rawMessages.length) return [];

  const channelId = rawMessages[0].channel_id;

  const conversation = [...rawMessages]
    .sort((a, b) => new Date(a.timestamp) - new Date(b.timestamp))
    .map(m => {
      const author = m.author?.global_name || m.author?.username || 'unknown';
      return `[${m.timestamp}] ${author}: ${m.content}`;
    })
    .join('\n');

  return [{ channel_id: channelId, conversation }];
}
```

Drop a script like this into your workflow between the API call and the agent. This ensures the agent receives a high-density signal, which reduces the likelihood of hallucinations and significantly lowers your API spend.

## Key Takeaways for Agent Design

When building out your agent’s toolset, keep these three principles in mind:

1.  **Define the minimal data set:** Before building a tool, write down exactly which fields the agent needs to complete the task. If the agent is checking a calendar, it needs the event title and time, not the entire timezone configuration object.
2.  **Enforce the filter:** Make the cleaning layer mandatory. If you add a new API, the workflow is incomplete until the filter is implemented.
3.  **Monitor token usage:** Regularly check your agent’s turn-by-turn token growth. If you see a spike after a specific tool call, that tool is a candidate for a tighter cleaning layer.

## Build Leaner Agent Workflows

If your agents are burning tokens and losing focus, we can help you implement professional cleaning layers and optimised orchestration.

[Get in touch](/contact/)

[Back to Blog](/blog/)