· Jon Barclay · 8 min read
Open sourcing ShareSentinel: a file sharing monitor for higher education
Microsoft SharePoint, OneDrive, and Teams are how universities run. Research gets shared, policies get drafted, committees collaborate, admin work flows through these platforms all day. That’s how it should be.
The problem usually isn’t malicious insiders. It’s a share button that’s just a little too easy to click.
A faculty member shares a file “to anyone with the link” so an external collaborator can review it, not realizing the document also contains student records. A staff member shares a folder org-wide so a committee can access one file, missing that the folder also includes HR documents from another project. These aren’t dramatic breaches. They’re ordinary mistakes, and at large institutions they happen constantly.
AI has made this worse.
Microsoft has been unusually direct about it: Microsoft 365 Copilot follows whatever permissions already exist in your tenant. Oversharing that used to sit unnoticed in some forgotten SharePoint site can now get surfaced and amplified by Copilot queries. Microsoft’s own deployment guidance calls oversharing one of the most common risks when deploying Copilot, and tells administrators to reduce accidental oversharing and govern OneDrive before turning it on.[1][2]
Organizations everywhere have been slowing or limiting Copilot rollouts because they don’t trust their own permissions. In a 2024 Gartner survey reported by Computerworld, 64% of respondents said governance and security risks required significant time and resources during deployment, 40% delayed rollouts by three months or more over oversharing concerns, and 57% limited deployment to lower-risk users.[3]
Zscaler’s 2025 Data@Risk Report found AI apps like ChatGPT and Copilot contributed to 4.2 million data loss violations in 2024, while file sharing platforms including OneDrive were involved in 212 million transactions with data loss incidents.[4] Copilot isn’t creating a new permissions model. It’s exposing the one organizations already have.
At Utah Valley University we have more than 48,000 students, over 5,000 employees, and nearly 100,000 Microsoft 365 groups. At that scale, most commercial solutions fall apart. Several products we evaluated had hard limits on the number of groups they could scan. None could handle our environment without compromises or costs that weren’t viable.
So we built our own.
Why we open sourced it
Higher education has a specific collaboration problem.
We want sharing to be easy. Faculty collaborate with outside researchers, staff work across departments, students need access to files from multiple systems and groups. But universities also carry FERPA-regulated student data, HR records, financial information, health-related documentation, and a long tail of other sensitive content. One overly broad link in SharePoint can have consequences far beyond the person who clicked “Share.”
A lot of institutions are trying to solve this in parallel, often with limited budgets and small security teams. We open sourced ShareSentinel because it felt like the kind of problem higher education shouldn’t have to solve from scratch over and over.
What ShareSentinel does
ShareSentinel monitors the Microsoft 365 audit log for two sharing events: AnonymousLinkCreated and CompanyLinkCreated. These matter because they grant access either to anyone with the link or to everyone in the organization, rather than to named individuals.
Every 15 minutes, ShareSentinel queries the Microsoft Graph Audit Log Query API for new events. When it finds one, it queues the event for processing. A worker pulls jobs from the queue, downloads the shared file into an in-memory (tmpfs) mount, extracts the content, and sends it to an AI model for sensitivity analysis. The file never touches persistent disk.
The sensitivity model is simple and tiered:
Tier 1 (urgent) covers government ID numbers, financial account data, FERPA-protected student records, HIPAA-covered health information, and security credentials.
Tier 2 (normal) covers HR and personnel data, legal or confidential documents, and contact information.
Tier 3 (no escalation) is coursework, casual personal content, or nothing sensitive detected.
If the analysis flags Tier 1 or Tier 2 content, analysts get notified with the file metadata, AI findings, and a direct link to the shared item in the Microsoft 365 admin portal. Escalation is binary: the system either escalates or it doesn’t. No sensitivity score to tune, no threshold to guess at.
ShareSentinel handles a broad range of file types. PDFs, Word documents, spreadsheets, and presentations go through text extraction. Images and scanned documents can go through OCR or multimodal analysis. Audio and video files get transcribed before evaluation.
Beyond detection: enforcing a sharing lifecycle
Detection is only part of it. Once a file is broadly shared, it tends to stay that way until someone deliberately removes access. Nobody goes back and cleans up old links.
ShareSentinel handles that with a 180-day sharing link lifecycle tracker.
Every anonymous or org-wide sharing link starts a countdown. As the expiration date approaches, the file owner gets reminder emails explaining that the link will expire and giving them the option to extend or let it lapse. At day 180, ShareSentinel removes the link through the Graph API and sends a confirmation.
It’s a security control, but the reminders also gradually change behavior. People start treating broad sharing links as temporary rather than permanent. That shift in habit is worth as much as the automated cleanup.
Architecture
ShareSentinel runs as a Docker Compose application with five containers.
The lifecycle-cron container runs two loops: an audit log poller that finds new sharing events, and a lifecycle processor that manages link expiration countdowns. The worker processes queued events — downloads, extraction, AI analysis, notifications, cleanup. The dashboard is a React + FastAPI web interface where analysts review events, inspect AI verdicts, track lifecycle status, and manually review content types the system can’t process automatically (Loop files, for example). Redis manages the job queue, deduplication, and rate limiting. Postgres stores event records, AI verdicts, analyst dispositions, lifecycle tracking, and audit poll state.
The AI provider is swappable through a single config value. ShareSentinel supports Anthropic Claude, OpenAI, and Google Gemini through the same interface and structured output format. Changing providers is a one-line change.
The design favors simplicity where it can. Text extraction over multimodal analysis, because it’s cheaper and usually more accurate for documents. Files processed entirely in RAM, removed immediately after analysis. Opinionated escalation logic rather than endlessly configurable rules, because missed alerts tend to cost more than extra review.
Getting started
To run ShareSentinel you need:
- Docker
- A Microsoft 365 tenant with an Entra ID app registration configured with the required Graph API permissions
- At least one AI provider API key (Anthropic, OpenAI, or Gemini)
The GitHub README walks through Entra ID setup, including required permissions, certificate authentication, and optional Teams transcript access. The docs/ directory has more detailed technical documentation for each component.
git clone https://github.com/jonbarclay/ShareSentinel.git
cd ShareSentinel
cp .env.example .env
# Configure your Entra ID credentials, AI provider keys, and SMTP settings
docker compose up --build -dThe project is at github.com/jonbarclay/ShareSentinel. If it sounds useful for your institution, take a look. Stars, issues, and pull requests are welcome.
Sources
- Microsoft Learn, Prevent oversharing in Microsoft 365 Copilot: deployment blueprint
- Microsoft Learn, Microsoft 365 Copilot data and compliance readiness
- Computerworld, Microsoft 365 Copilot rollouts slowed by data security, ROI concerns
- Zscaler ThreatLabz, 2025 Data@Risk Report Blog