· Jon Barclay · 8 min read
Open Sourcing ShareSentinel: A File Sharing Monitor for Higher Education
Microsoft SharePoint, OneDrive, and Teams have become the backbone of how universities operate. Research gets shared, policies get drafted, committees collaborate, and day-to-day administrative work moves through these platforms constantly.
That is exactly how it should be.
The problem is usually not malicious insiders. More often, it is a share button that is just a little too easy to click.
A faculty member shares a file “to anyone with the link” so an external collaborator can review it, without realizing the document also contains student records. A staff member shares a folder organization-wide so a committee can access one file, not noticing that the folder also includes HR documents from another project. These are not dramatic breaches or sophisticated attacks. They are ordinary mistakes, and at large institutions, they happen all the time.
That problem has become much more urgent in the age of AI.
Microsoft has been unusually direct about this: Microsoft 365 Copilot follows the permissions that already exist in your tenant, which means oversharing in SharePoint and OneDrive can now be amplified rather than merely sitting unnoticed in the background. Microsoft’s own deployment guidance says oversharing is one of the most common risks organizations encounter when deploying Copilot, and its readiness guidance specifically tells administrators to reduce accidental oversharing in SharePoint and govern OneDrive before enabling Copilot.[1][2]
This is not a niche concern. Around the world, organizations have been slowing or limiting Copilot deployments because they are not confident in their underlying permissions. In a 2024 Gartner survey reported by Computerworld, 64% of respondents said information governance and security risks required significant time and resources during deployment, 40% delayed rollouts by three months or more because of oversharing concerns, and 57% limited deployment to lower-risk or trusted users.[3]
Other industry research points in the same direction. Zscaler’s 2025 Data@Risk Report said AI apps such as ChatGPT and Microsoft Copilot contributed to 4.2 million data loss violations in 2024, while file-sharing platforms including Microsoft OneDrive were involved in 212 million transactions with data loss incidents.[4] The lesson is straightforward: Copilot is not creating a new permissions model. It is exposing the consequences of the permissions model organizations already have.
At Utah Valley University, we have more than 48,000 students, over 5,000 employees, and nearly 100,000 Microsoft 365 groups. At that scale, many commercial solutions stop being practical. Several products we evaluated had hard limits on the number of groups they could scan, and none could handle our environment without significant architectural compromises or costs that simply were not viable.
So we built our own.
Why We Open Sourced It
Higher education has a very specific collaboration problem.
We want sharing to be easy. Faculty collaborate with outside researchers. Staff work across departments. Students need access to files from multiple systems and groups. But universities also carry FERPA-regulated student data, HR records, financial information, health-related documentation, and a long tail of other sensitive content. In practice, that means one overly broad link in SharePoint or OneDrive can have consequences far beyond the person who clicked “Share.”
Many institutions are trying to solve this problem in parallel, often with limited budgets and small security teams. We open sourced ShareSentinel because this feels like the kind of problem higher education should not have to solve from scratch over and over again.
What ShareSentinel Does
ShareSentinel monitors the Microsoft 365 audit log for two high-risk sharing events: AnonymousLinkCreated and CompanyLinkCreated. These matter because they grant access either to anyone with the link or to everyone in the organization, rather than to specifically named people.
Every 15 minutes, ShareSentinel queries the Microsoft Graph Audit Log Query API for new events. When it finds one, it places the event on a queue for processing. A worker service pulls jobs from that queue, downloads the shared file into an in-memory (tmpfs) mount, extracts its content, and submits it to an AI model for sensitivity analysis. The file never touches persistent disk.
The sensitivity model is intentionally simple and tiered:
Tier 1 (urgent): government ID numbers, financial account data, FERPA-protected student records, HIPAA-covered health information, and security credentials.
Tier 2 (normal): HR and personnel data, legal or confidential documents, and contact information.
Tier 3 (no escalation): coursework, casual personal content, or no sensitive content detected.
If the analysis identifies any Tier 1 or Tier 2 content, analysts are notified with the file metadata, the AI findings, and a direct link to the shared item in the Microsoft 365 admin portal. Escalation is deterministic: the system either escalates the event or it does not. There is no sensitivity score to tune and no threshold to guess at.
ShareSentinel supports a broad range of content types. Text-heavy files such as PDFs, Word documents, spreadsheets, and presentations go through extraction-first workflows. Images and scanned documents can go through OCR or multimodal analysis. Audio and video files are transcribed before evaluation.
Beyond Detection: Enforcing a Sharing Lifecycle
Detection is only part of the problem. Once a file is broadly shared, it often stays that way until someone deliberately removes access.
ShareSentinel addresses that with a 180-day sharing link lifecycle tracker.
Every anonymous or organization-wide sharing link that ShareSentinel encounters is enrolled in a countdown. As the expiration date approaches, the file owner receives reminder emails explaining that the link will expire and giving them the option to extend it or let it expire naturally. At day 180, ShareSentinel removes the link through the Graph API and sends a confirmation.
This is not only a security control. It is also a behavior-shaping mechanism. Over time, those reminders teach users that broad sharing links should be temporary, not permanent. The goal is not just to catch risky sharing, but to gradually improve how people think about sharing in the first place.
Architecture
ShareSentinel runs as a Docker Compose application with five containers:
- lifecycle-cron runs two concurrent loops: an audit log poller that finds new sharing events, and a lifecycle processor that manages link expiration countdowns.
- worker processes queued events, including downloads, extraction, AI analysis, notifications, and cleanup.
- dashboard provides a React + FastAPI web interface where analysts can review events, inspect AI verdicts, track sharing lifecycle status, and manually review delegated content types such as Loop files.
- redis manages the job queue, deduplication, and rate-limiting state.
- postgres stores event records, AI verdicts, analyst dispositions, lifecycle tracking data, and audit poll state.
The AI provider is swappable through a single configuration value. ShareSentinel currently supports Anthropic Claude, OpenAI, and Google Gemini through the same interface and structured output format, so changing providers is a one-line configuration change.
Where possible, the design favors simplicity. Text extraction is preferred over multimodal analysis because it is usually cheaper and often more accurate for document-heavy workflows. Files are processed entirely in RAM and removed immediately after analysis. Escalation logic is intentionally opinionated rather than endlessly configurable, because missed alerts are often more costly than extra review.
Getting Started
To run ShareSentinel, you will need:
- Docker
- A Microsoft 365 tenant with an Entra ID app registration configured with the required Graph API permissions
- At least one AI provider API key for Anthropic, OpenAI, or Gemini
The GitHub README walks through the Entra ID setup, including required permissions, certificate authentication, and optional Teams transcript access. The docs/ directory contains more detailed technical documentation for each component.
git clone https://github.com/jonbarclay/ShareSentinel.git
cd ShareSentinel
cp .env.example .env
# Configure your Entra ID credentials, AI provider keys, and SMTP settings
docker compose up --build -dThe project is available at github.com/jonbarclay/ShareSentinel. If this sounds useful for your institution, take a look. Stars, issues, and pull requests are welcome.
Sources
- Microsoft Learn, Prevent oversharing in Microsoft 365 Copilot: deployment blueprint
- Microsoft Learn, Microsoft 365 Copilot data and compliance readiness
- Computerworld, Microsoft 365 Copilot rollouts slowed by data security, ROI concerns
- Zscaler ThreatLabz, 2025 Data@Risk Report Blog