
If you’ve used ChatGPT, Microsoft Copilot, Google Gemini (formerly Bard), or other generative AI tools you know how powerful and helpful they can be.
While effective for productivity, they also introduce some critical security and data protection issues. These risks are even recognized at the congressional level. Last year the U.S. House of Representatives banned congressional staff from using Copilot due to concerns about data security and the potential risk of leaking their sensitive data to unauthorized cloud services.
Copilot is more powerful, and dangerous, compared to other AI tools like ChatGPT because it dives deep into your organization’s Microsoft 365 content. Copilot can quickly and efficiently remember every bit of your work and deliver a summary, spreadsheet, or complete document quickly and efficiently. It’s early days, but its potential seems endless.
For example, Copilot can draft in seconds a client proposal in Word using elements from your notes and past presentations. Additionally, it can summarize Teams meetings, report on the key points and to-dos, help Outlook users sort through their inbox, and serve as an Excel data analyst.
Copilot Data Security Risks Explained
While productivity benefits abound, there are serious data security risks that must be addressed. Copilot is built to access all the data available and can also create new sensitive data quickly in large quantities.
The biggest data security concern with Copilot is overly permissive data access. This is localized because Copilot works only with your M365 tenant data and can’t access other companies’ data. And while your data doesn’t train the AI for other companies to leverage, there are several serious issues to consider:
- Copilot leverages all data that users have access to. More often than not their permissions to access sensitive data are far greater than they should be entitled to.
- Copilot results do not inherit the security labels from the source files, nor are the classifications communicated in query results.
- As a result, it is up to the user’s discretion whether outputs are appropriate to share with colleagues or external contacts.
Microsoft even provides this warning about access rights in its Copilot data security documentation: “It’s important that you’re using the permission models available in Microsoft 365 services, such as SharePoint, to help ensure the right users or groups have the right access to the right content within your organization.”
When it comes to data permissions, Zero Trust is the recommended approach where access to information is granted on a need-to-know basis. Microsoft suggests using M365’s permission models to keep things locked down, but most setups are far from ideal.
Asking users to apply labels and classifications to keep data protected can get messy, and AI-generated data will only make things worse. With so much data to manage, organizations should not expect their users to be perfect stewards of data risk, especially given it’s difficult for security teams.
Real-World Examples of Copilot Security Risks
Organizations need to have a clear understanding of the data security risks before fully deploying Copilot. Here are four potential real-world scenarios that highlight these risks across departments:
Finance: A financial analyst uses Copilot to generate a quarterly financial report. The input data includes a mix of public financial figures and unreleased earnings numbers. If the confidential unreleased earnings are not properly classified at the input stage, Copilot may generate a report that includes this private data for unauthorized users who could then share it externally.
HR: An HR manager uses Copilot to compile an internal report on employee performance, which includes sensitive personal information. If the underlying data has overly permissive access controls, Copilot may generate content containing confidential employee details to anyone in the entire department or possibly anyone in the company. This not only violates privacy policies but could also create internal disruption and expose the organization to legal risk.
R&D: A product development team uses Copilot to brainstorm new product ideas. The team’s input includes existing intellectual property (IP) and R&D data, but also confidential information about upcoming developments. Copilot lacks context on the sensitivity towards this IP, which could result in Copilot including these details in its results. The output could be shared with a broader audience, including external partners and ultimately competitors, inadvertently exposing future product plans and the possibility of IP theft.
Marketing: A marketing team uses Copilot to analyze feedback from a focus group and produce a report on its findings. This report includes sensitive information about its participants and their praise or criticism of unreleased products. Since Copilot outputs are unclassified by default, the report would not be labeled as confidential, meaning it could easily be shared with unauthorized internal or external users, subjecting the company to consumer scrutiny or competitive disadvantage.
How to Best Secure Copilot Deployments
To effectively manage data risk, sensitive information — whether it’s financial data, PII/PHI/PCI, intellectual property, or other confidential business information — must be identified, classified, protected, and remediated. Sensitive data can live anywhere — in the cloud, on-premises, and in both structured and unstructured formats.
While having some form of classification is better than none, many traditional approaches — such as end-user tagging, centralized models, or metadata-driven methods — are often slow, inefficient, and burdened with unnecessary complexity.
There are technologies available that can support more secure Copilot deployments. Advanced data security governance solutions, in particular, leverage sophisticated natural language processing (NLP) to accurately and autonomously categorize and classify sensitive content such as personal data, intellectual property, financial records, legal agreements, HR files, sales strategies, partnership plans, and other business-critical information.
Data security governance platforms can analyze the output from Copilot to discover sensitive information and label the data accordingly to ensure that only authorized users have access to it. They can also autonomously identify risks due to inappropriate permissioning, risky sharing, and misclassification. Remediation actions — such as changing entitlements, adjusting access controls, or preventing the data from being shared — can also be taken centrally to fix issues and prevent data loss.
Data security governance can help organizations enjoy Copilot’s many benefits without having to worry about it sharing sensitive data with the wrong people. These solutions are a proven approach to effectively managing GenAI output data and protection. They should be evaluated if you are considering a Copilot rollout in your organization, or if Copilot is already in use.
Join our LinkedIn group Information Security Community!
















