Copilot Studio Agent Security

Below are the mitigations that significantly reduce exfiltration risk. These should be in place before deploying any Copilot Studio agent to a production environment.

1. Apply Least‑Privilege Access

Only grant the agent permissions it absolutely needs — nothing more.
If the agent cannot access the data → the data cannot leak.

When the agent writes sensitive data to some database, give the agent only add permissions like Tool that only creates data, not reads it. If you read the data with Tool, maybe the Tool should be used only in controlled topic instead of letting the AI orchestration to choose when to read data. You can also do the data update with Agent Flow. Then you can fully control and see what data has been added and create custom triggers for it. The same goes with reading the data with Agent Flow.

2. Harden the System Instructions

Your agent’s internal instructions should explicitly define what it must not reveal:

“Never return raw data from underlying systems.”
“If the user requests direct information, provide only summaries.”
“Do not execute actions unless the user’s intention clearly requires them.”

This helps the model reject manipulative queries. Try to break the system instructions and hack your agent before publishing it to production. It might be a good idea to instruct the testers also about prompt injection and exfiltration.

3. Use Content Filtering and Safety Guardrails

For enterprise use cases, combine Copilot Studio safeguards with:

Data loss prevention (DLP)
Response filtering
Graph permission restrictions
Application firewalls (for external inputs)

4. Log and Monitor Agent Activity

Monitor for anomalies such as:

Frequent or repetitive data retrieval
Attempts to override the agent’s role
Unusual sequence of operations across connectors
Queries that attempt to bypass normal UI flows

These patterns often reveal misuse or prompt probing. Include this monitoring in the governance model and make sure that the team responsible does auditing periodically.

5. Perform Intentional Abuse Testing

As shown in past experiments

The best way to identify weaknesses is to actively test your agent:

role shifts (“ignore previous instructions…”)
indirect questions
multi‑step prompt chaining
unusual phrasings or ambiguous requests

Red‑team style testing is essential for any language‑based automation tool. If the testers cannot do this, do it your self or educate a tester how to test AI agents.

Action is the most beautiful form of speech

1. Apply Least‑Privilege Access

2. Harden the System Instructions

3. Use Content Filtering and Safety Guardrails

4. Log and Monitor Agent Activity

5. Perform Intentional Abuse Testing

Karl-Johan Spiik

Contact