The Story You’re Being Told

Ask any edtech sales team what their platform does with student data, and you’ll get a version of the same reassuring answer: “We use it to personalize your students’ learning experience.” Clean. Purposeful. Focused entirely on the child sitting at that desk. It’s a compelling story, and in many cases, it’s partially true. The problem is what the word “use” is quietly doing in that sentence.

Over the past several years, the dominant media narrative around AI in education has orbited a simple premise: algorithms are tools, and tools serve whoever wields them. Teachers use AI to save time. Students use AI to get faster feedback. Administrators use AI to spot struggling learners before they fall through the cracks. The power relationship, in this framing, always runs from human to machine — and the machine is a neutral instrument in service of learning.

But educators in growing numbers are starting to ask a different kind of question. Not “what does the AI do?” but who taught it, who owns it, and where does all that student data actually go? Homeschool communities, long accustomed to protecting family privacy, have been asking these questions longer than most — and are finding answers that complicate the clean sales narrative considerably.

Public perception still lags behind the technical reality. When parents imagine their child’s data, they picture a grade in a gradebook or a photo on a class roster. What most don’t picture is a continuous stream of behavioral signals — every hesitation, every wrong turn, every idle minute — flowing through a layered system of vendors, sub-processors, and AI model pipelines they’ve never heard of, governed by terms of service written by legal teams specifically to keep those details opaque.

The Three Misconceptions Driving the Narrative Gap

Misconception 1 — “The school controls the data.” In reality, once data flows to a third-party vendor through an API integration, the school’s control is largely contractual — meaning it depends entirely on what the vendor’s terms of service actually say, not what the salesperson told you.

Misconception 2 — “FERPA protects everything.” FERPA is a powerful law — but it was written in 1974 for paper records, and its “school official” exception has been stretched to cover a sprawling network of commercial vendors in ways Congress never anticipated.

Misconception 3 — “If it’s free, the data stays private.” Free edtech tools are among the highest-risk data environments in K–12 education. When a product has no licensing fee, the data your students generate is frequently the product itself.

What’s Actually Happening Inside the Pipeline

Every time a student logs into an adaptive learning platform, an LMS, a digital assessment tool, or an AI tutoring system, they are not simply answering questions or watching videos. They are generating a continuous stream of behavioral data — click paths, response times, error rates, session durations, correction patterns, vocabulary choices in open-response fields — and that data is being processed in real time by systems most educators have never been shown.

This is what technologists call the data pipeline: the infrastructure that collects raw inputs at one end, processes and transforms them through a series of steps, and produces outputs at the other — personalized content recommendations, intervention alerts, predictive risk scores, and increasingly, inputs into the AI models the platform uses to improve itself over time.

Here’s what makes it genuinely complex: modern edtech tools almost never operate in isolation. They integrate with other tools. A reading platform syncs with your LMS. Your LMS connects to a student information system. That student information system feeds a learning analytics dashboard. The dashboard pulls from a rostering service. Each connection point is a potential data-sharing relationship — and each one comes with its own terms, its own data retention policies, and its own set of sub-processors that the original vendor is permitted to share data with.

1,449
Average EdTech tools used per school district (Secure Privacy, 2025)
43%
Districts lacking formal AI use policies (CoSN, 2025)
1,600+
Data breaches recorded in U.S. school districts (MIT RAISE, 2024)

Dr. Neil Selwyn, a prominent researcher in the sociology of education and technology at Monash University, has written extensively about the “datafication” of schooling — the process by which educational experience is progressively converted into data points that can be measured, monetized, and acted upon by automated systems. His concern is not that data is collected, but that the framing of collection as neutral and technical conceals the deeply human decisions embedded in every design choice: what to measure, how to weight it, and who benefits from the resulting predictions (Selwyn, 2019).

The business world has noticed. Venture capital poured more than $20 billion into edtech globally between 2020 and 2024, with a substantial share flowing toward platforms that monetize learning data through analytics, model licensing, and premium insights sold to districts, publishers, and policymakers (HolonIQ, 2024). The companies collecting your students’ behavioral data are not primarily in the education business. They’re in the data business. Education is their customer acquisition channel.

Visual 1 How Student Data Flows Through a Typical Vendor Ecosystem
STUDENT Clicks · Pauses Answers · Time EDTECH TOOL LMS / Adaptive Platform / Tutor Behavioral Logs + API Keys + PII Fields VENDOR DATA PIPELINE Primary Vendor + Sub-Processors + Integration Partners Terms of Service governs ALL of this AI MODEL TRAINING Improving the Platform (with your students’ data) DISTRICT ANALYTICS Risk Scores · Dashboards Predictions · Reports DATA LICENSING Research Partners Publishers · Advertisers Standard data flow Often undisclosed or in ToS fine print
Illustrative diagram of student data flow from interaction to downstream use. Actual pipelines vary by vendor; sub-processor disclosure is contractually variable and not always surfaced in public-facing privacy policies.

Inside the Vendor Ecosystem: Who’s Really in the Room

There is a Texas district whose data governance audit revealed something administrators hadn’t expected: they had active data-sharing agreements with 47 EdTech companies — nearly double what anyone thought (Secure Privacy, 2025). That’s not an anomaly. It’s a representative portrait of what the modern K–12 data landscape actually looks like. With an average of 1,449 different EdTech tools in use per district (Secure Privacy, 2025), the idea that any single administrator has meaningful visibility into all active data relationships is, to be blunt, a polite fiction.

This is what researchers and practitioners call the vendor ecosystem: the network of companies, platforms, integrations, and sub-processors that collectively handle student data from collection to output. At the core of the ecosystem sit a handful of dominant platforms — major LMS providers, student information systems, and adaptive learning suites. Around them orbit dozens of smaller tools that integrate via API keys, each one plugging into the data stream in ways that may or may not be surfaced in the district’s vendor contracts.

Sub-Processors: The Hidden Third Party

When a district signs a contract with an edtech vendor, that contract typically includes a provision allowing the vendor to share data with “sub-processors” — third-party companies that perform specific technical functions on behalf of the vendor. Cloud hosting, analytics infrastructure, AI model providers, email delivery — these are all common sub-processor categories. The challenge is that most vendor contracts don’t require proactive disclosure of which sub-processors are used, or how their data handling practices compare to the primary vendor’s stated policies.

This matters enormously when it comes to generative AI. Many edtech platforms are quietly integrating large language model APIs — from providers like OpenAI, Google, or Anthropic — into their products using commercial API connections. Under FERPA’s “school official” exception, these integrations may be technically permissible. But the critical question — whether student personally identifiable information (PII) is being used to train the underlying model — remains disturbingly murky in many vendor agreements (Future of Privacy Forum, 2024).

“This is not just an IT problem; it’s a shared responsibility across the institution to ask hard questions about AI privacy, governance, and risk before we turn features on.”

EdTech Magazine, Higher Education IT Leadership Interview, January 2026

The Homeschool Dimension: More Exposure, Less Protection

For families choosing to homeschool, the vendor ecosystem question carries a different weight. Most of the federal privacy protections families assume apply to their children’s learning data — FERPA chief among them — were designed for institutional settings and apply specifically to “educational agencies and institutions” receiving federal funds. Private homeschool families accessing commercial learning platforms directly, without institutional licensing, may find themselves in a data-rights gray zone where FERPA doesn’t apply and the only binding document is a consumer terms of service they scrolled past at sign-up.

Free platforms popular in homeschool communities — curriculum supplement apps, math drill tools, language learning platforms — are often the ones with the most aggressive data collection practices. When there’s no subscription revenue, behavioral data is frequently how the company sustains its business model. Homeschool families exercising the greatest autonomy over their children’s education may, paradoxically, be extending the least oversight over what happens to their children’s learning data.

The Risks That Don’t Make the Brochure

EdTech marketing is extraordinarily good at leading with benefits and burying risks. Personalized learning. Reduced teacher workload. Early identification of struggling students. These are real potential benefits — and none of them require a dishonest pitch. But a complete picture of the AI classroom stack includes a set of risks that deserve plain language, not footnotes.

Algorithmic Bias: When the Model Learns the Wrong Lesson

AI systems in education learn from data. That’s the whole point. But when the data they learn from reflects historical inequities — and most educational data does, because most educational history does — the models that emerge from that data tend to perpetuate and sometimes amplify those inequities in their predictions and recommendations.

A particularly striking body of research involves predictive “at-risk” models — algorithms used to flag students likely to struggle academically so that interventions can be triggered early. These tools sound like exactly what equity-focused education should want. The problem is the evidence on how they actually perform: studies have found that predictive success models produced false negatives for 19% of Black students and 21% of Latino/a students, meaning the algorithm predicted failure for students who went on to earn bachelor’s degrees (Diverse Education, as cited in Schiller University, 2025). These weren’t near-misses. They were students who succeeded despite being coded as likely to fail — and who may have received fewer resources, less encouragement, or different academic tracks as a result.

80%
Of AI education systems showing measurable bias when not independently audited (Springer, 2021, as cited in Schiller University, 2025)
92%
Increase in ransomware attacks on K–12 schools, 2022–2023 (ThreatDown, 2024)
4,388
Cyberattacks per education organization per week, Q2 2025 — up 31% YoY (Secure Privacy, 2025)

Shadow AI: The Tools No One Approved

In 2024 and 2025, a category emerged that IT administrators are calling “shadow AI” — unapproved AI tools being accessed by teachers, students, or staff that have never been vetted by the district, whose data handling practices are entirely unknown, and whose inputs may be stored indefinitely or used to train commercial models (Secure Privacy, 2025). The same dynamic that created shadow IT a decade ago — well-meaning users bypassing slow procurement processes because a free tool gets the job done — is now playing out with AI, but with considerably higher data-risk stakes.

Free AI browser extensions deserve particular scrutiny. Research flagged by multiple cybersecurity analysts has identified extensions that collect keystroke data, enable unauthorized access to browser sessions, or quietly export content entered in web forms — which, in a classroom context, could mean student-written essays, test responses, and teacher-entered gradebook data (eSchool News, 2025). When 43% of districts still lack formal AI use policies (CoSN, 2025), there is no shared definition of what “approved” means — and therefore no shared understanding of what using an unapproved tool actually violates.

Vendor Lock-In: When the Data Can’t Leave

There’s a structural risk that rarely gets discussed in budget meetings: what happens to all that personalized learning data when a district decides to switch platforms? Vendor lock-in in edtech operates differently from enterprise software lock-in. The concern isn’t just that switching is expensive and operationally disruptive — though it is. The concern is that a student’s entire adaptive learning profile, behavioral history, and individualized content pathways may be entirely non-portable. When a student moves between schools, or a district transitions between vendors, that learning history often doesn’t travel with them. The algorithm that “knew” the student resets to zero.

This isn’t a privacy problem — it’s a power problem. The data is being used to benefit the vendor’s platform more than it’s being used to benefit the student. And the contractual frameworks governing data portability in education remain inconsistent and often unfavorable to the institutional buyer, let alone the individual learner.

The PowerSchool Breach — A Case Study in Scale

In December 2024, PowerSchool — a student information system used by tens of thousands of schools across North America — suffered a breach that potentially exposed demographic data, attendance records, and grades for an as-yet-undisclosed number of students (Secure Privacy, 2025). PowerSchool is not an obscure niche product. It is the infrastructure that holds the most sensitive administrative data in thousands of districts. The breach was not the result of a sophisticated nation-state attack. It was the result of compromised credentials. A username and a password. The lesson is not that schools should use less technology — it is that the concentration of sensitive student data in a small number of dominant vendor platforms creates systemic risk at a scale that has no equivalent in pre-digital education.

What Teachers Can Do Right Now

It would be easy to walk away from the data and think: this is above my pay grade. But the reality is that teachers are the first and most influential line of decision-making when it comes to which tools enter the classroom. Every time a teacher bookmarks a new app, shares a tool in a department meeting, or recommends a platform to a struggling student, they are making a data governance decision — whether they recognize it as one or not. Here’s how to make those decisions more intentionally.

  • 01Ask the data question before the demo. Before agreeing to pilot any new tool, ask one simple question: “Does this platform share student data with sub-processors, and can you give me a list of who they are?” The vendor’s answer — or inability to answer — tells you something important before you’ve seen a single feature.
  • 02Read the privacy policy for the “model training” clause. Specifically look for language about whether your students’ inputs are used to train or improve the underlying AI model. This is the highest-stakes data use question in generative AI edtech, and it’s frequently buried in section 8 of a 12-section policy.
  • 03Check your district’s approved tool list. If one doesn’t exist, that’s important information — and a conversation worth having with your technology coordinator. Using tools outside an approved list doesn’t just create personal liability; it creates unmonitored data exposure for your students.
  • 04Teach students about their own data footprint. AI literacy for students isn’t just about using tools responsibly. It includes understanding that their interactions with digital learning tools generate data, and that data has a life beyond the assignment they were working on. This is increasingly considered a core component of digital citizenship education.
  • 05Name the algorithm when you see it. When an adaptive platform serves a student a particular content path, say it out loud: “The platform suggested this because of how you’ve been answering these types of questions.” Naming algorithmic recommendations helps students and parents understand that a system is making choices — and that those choices can be questioned.

What Leaders Should Be Considering

For district administrators, curriculum directors, and school leaders, the algorithmic question is not a technology issue wearing an education costume. It is a governance issue, a civil rights issue, and in an era of rapidly evolving federal enforcement, an increasingly serious legal issue. The Department of Education’s intensified FERPA enforcement posture in 2025 — including an unprecedented mandate requiring state agencies to certify compliance by April 30, 2025 — signals that the era of permissive hand-waving on student data is ending (Secure Privacy, 2025).

Build a Vendor Ecosystem Map

The Texas district that discovered 47 active data-sharing agreements only found out because they went looking. Most districts haven’t looked. The foundational act of data governance is understanding what you actually have — which vendors are active, what data they access, how long they retain it, and who they share it with. Districts that have invested in data governance platforms report being able to surface this information in hours rather than weeks. Those that haven’t are managing their data relationships in scattered spreadsheets that almost certainly contain gaps (Secure Privacy, 2025).

Develop a Shadow AI Detection Protocol

Given that well-meaning educators are introducing unapproved AI tools into classrooms at a rate that outpaces most procurement processes, leaders need a protocol that’s more agile than the traditional annual tech audit. This doesn’t mean building a punitive compliance culture. It means creating fast, accessible pathways for teachers to flag tools they want to try — and building the evaluation infrastructure to respond quickly enough that teachers don’t feel they have to go around the system to help their students.

Demand Algorithmic Transparency From Vendors

Contract language matters. Districts should be requiring vendors to disclose: what data inputs drive their recommendation algorithms, how they test for and mitigate bias, what their data portability policy is for students transferring between schools or districts switching platforms, and what happens to stored data when a contract ends. Many vendors will push back. That pushback is itself diagnostic information about how seriously they take data stewardship.

Include Homeschool Families in the Conversation

Many districts run dual-enrollment programs, cooperative arrangements, or resource-sharing agreements with homeschool families in their area. These families are making the same tool decisions with even less institutional support. Including homeschool parent organizations in community data literacy conversations — even informally — builds the kind of shared understanding that protects children regardless of their educational setting.

The Philosophical Question at the Core of All of This

Here is the question that keeps circling back, no matter how deep you go into data pipelines and vendor contracts: Whose interests is the algorithm actually serving?

It is a question worth sitting with, because the honest answer is complicated. The algorithm serves the student — sometimes, genuinely, and with measurable effectiveness. It also serves the vendor’s product improvement pipeline. It serves the investors whose return depends on platform growth metrics. It may serve the policymakers or researchers who license aggregate insights. In most cases, it serves all of these interests simultaneously, and the order of priority is not disclosed — because disclosing it would be commercially uncomfortable.

This is not a counsel of despair. Algorithms can be designed with student interests as the primary constraint. Vendors can be held to disclosure standards that make their optimization targets visible. Districts can negotiate contracts that give students meaningful data rights. These things are happening — slowly, unevenly, and often only in districts with the resources to insist on them. But they are possible.

What’s required, more than any particular technical fix, is a willingness to stop treating the question as purely technical. The decision about what an algorithm is allowed to optimize for in a classroom is a values decision. It is an educational philosophy decision. It is, in the deepest sense, a decision about what school is for — and who gets to decide that. Those are not questions the algorithm can answer. They have to be answered by the humans in the room, before the algorithm is ever installed.

In the next — and final — post in this series, we turn from diagnosis to design. What would an AI classroom stack built from principled foundations actually look like? What guardrails work, what frameworks hold up, and what questions should every educator and administrator be carrying into any conversation about AI adoption? That’s what we’ll be mapping in Case File 04: Designing an AI-Ready Classroom.

References

  1. CoSN. (2025). State of EdTech district leadership report. Consortium for School Networking. https://www.cosn.org
  2. eSchool News. (2025, July 30). Data, privacy, and cybersecurity in schools: A 2025 wake-up call. eSchool News. https://www.eschoolnews.com
  3. Future of Privacy Forum. (2024, October). Vetting generative AI tools for use in schools. https://fpf.org
  4. HolonIQ. (2024). Global EdTech investment report. HolonIQ Analytics.
  5. MIT RAISE. (2024). Securing student data in the age of generative AI. Proceedings, AIED 2024. https://raise.mit.edu
  6. Schiller University. (2025). Risks of AI algorithmic bias in higher education. Schiller University Blog. https://www.schiller.edu
  7. Secure Privacy. (2025). School data governance software: Compliance, security & privacy for K–12. https://secureprivacy.ai
  8. Selwyn, N. (2019). Should robots replace teachers? AI and the future of education. Polity Press.
  9. ThreatDown. (2024). State of ransomware in education report. Malwarebytes.
  10. World Journal of Advanced Research and Reviews. (2025). Algorithmic bias in educational systems: Examining the impact of AI-driven decision making in modern education. WJARR, 25(1), 2012–2017. https://doi.org/10.30574/wjarr.2025.25.1.0253

Additional Reading

  1. Selwyn, N., & Facer, K. (Eds.). (2021). The politics of education and technology. Palgrave Macmillan.
  2. Future of Privacy Forum. (2024). Student privacy compass: AI in K–12 guide. https://studentprivacycompass.org
  3. U.S. Department of Education, Office of Educational Technology. (2023). Artificial intelligence and the future of teaching and learning. https://www2.ed.gov
  4. SchoolDay. (2025, December). Data governance in K–12: Building trust through transparency. https://www.schoolday.com
  5. Common Sense Media. (2024). Teens, trust and technology in the age of AI. Common Sense Media Research.
author avatar
JR
JR is the founder of AI Innovations Unleashed—an educational podcast and consulting platform helping educators, leaders, and curious minds harness AI to build smarter learning environments. He has 22 year of project management experience (PMP certified) and an AI strategist who translates complex tech into practical, future-focused insights. Connect with him on LinkedIn, Medium, Substack, and X—or visit him @ aiinnovationsunleashed.com.