Below are the mitigations that significantly reduce exfiltration risk. These should be in place before deploying any Copilot Studio agent to a production environment.
1. Apply Least‑Privilege Access
Only grant the agent permissions it absolutely needs — nothing more.
If the agent cannot access the data → the data cannot leak.
When the agent writes sensitive data to some database, give the agent only add permissions like Tool that only creates data, not reads it. If you read the data with Tool, maybe the Tool should be used only in controlled topic instead of letting the AI orchestration to choose when to read data. You can also do the data update with Agent Flow. Then you can fully control and see what data has been added and create custom triggers for it. The same goes with reading the data with Agent Flow.
2. Harden the System Instructions
Your agent’s internal instructions should explicitly define what it must not reveal:
- “Never return raw data from underlying systems.”
- “If the user requests direct information, provide only summaries.”
- “Do not execute actions unless the user’s intention clearly requires them.”
This helps the model reject manipulative queries. Try to break the system instructions and hack your agent before publishing it to production. It might be a good idea to instruct the testers also about prompt injection and exfiltration.
3. Use Content Filtering and Safety Guardrails
For enterprise use cases, combine Copilot Studio safeguards with:
- Data loss prevention (DLP)
- Response filtering
- Graph permission restrictions
- Application firewalls (for external inputs)
4. Log and Monitor Agent Activity
Monitor for anomalies such as:
- Frequent or repetitive data retrieval
- Attempts to override the agent’s role
- Unusual sequence of operations across connectors
- Queries that attempt to bypass normal UI flows
These patterns often reveal misuse or prompt probing. Include this monitoring in the governance model and make sure that the team responsible does auditing periodically.
5. Perform Intentional Abuse Testing
As shown in past experiments
- Hacking my Job Application Agent was easy
- Copilot Studio Agent Data Exfiltration
- Obfuscated request patterns and rapid‑fire multi‑turn scripts
The best way to identify weaknesses is to actively test your agent:
- role shifts (“ignore previous instructions…”)
- indirect questions
- multi‑step prompt chaining
- unusual phrasings or ambiguous requests
Red‑team style testing is essential for any language‑based automation tool. If the testers cannot do this, do it your self or educate a tester how to test AI agents.