How to choose AI tools for your school: A practical procurement framework

TeachingFor Teachers11 min readBy Tom Mercer

Most schools in England have already had a sales call from at least one AI vendor by now. Some have signed up to several. The market is moving faster than any procurement cycle was designed to handle, and the temptation to either say yes to everything or no to everything is real.

Neither extreme tends to work well. A blanket yes leaves you with overlapping tools, data going to places nobody mapped, and teachers using whatever they downloaded last week with no shared approach. A blanket no usually fails too, because staff will use AI tools on personal accounts whether or not the school has approved any. The middle path is a clear procurement framework: A small number of questions you run every product through, a short list of things you will not compromise on, and a decision-maker who actually owns the choice.

This guide is aimed at heads, deputies, business managers, and anyone else with budget sign-off. It is grounded in current DfE guidance and what schools we have spoken to are finding works. It is not a list of recommended products, because that list would be out of date within months.


Of teachers

~44%

reported using generative AI for work tasks in the 2024-25 academic year (DfE Technology in Schools survey 2024-25). Earlier DfE GenAI research found around 42% of teachers had used GenAI for work tasks in late 2023. Most schools are now making decisions about tools that are already being used unofficially in classrooms.


Start with the problem, not the product

One of the most common mistakes we see is starting from a product demo and working backwards to find a use case. A vendor shows you what their tool does, the demo is impressive, and the implicit question becomes "how could we use this?" rather than "what problem are we trying to solve?"

Flip the order. Before any product conversation, sit down with your SLT and identify the two or three workload or outcomes problems where AI might genuinely help. Common candidates include lesson planning and resource generation, marking and feedback, SEND scaffolding, parent communication, and admin tasks like report writing. Be specific. "Reduce time spent on weekly resource creation in the science department" is something you can evaluate against. "Use AI more" is not.

Once you have the problem, the procurement question becomes much sharper. You are no longer asking whether a tool is good. You are asking whether it solves your specific problem better than the alternatives, including the alternative of doing nothing different.

The non-negotiables: Data protection and safety

Before you evaluate features, run every product through a short set of compliance checks. If a tool fails any of these, the conversation ends regardless of how good the demo was.

First, UK GDPR compliance. The tool needs a UK or EU data processing agreement, clarity on where data is stored, and a position on whether your data will be used to train their models. The last point matters a lot. Some tools train on your inputs by default, which means student names, work, or behavioural information could end up influencing the model. If they cannot give you a clear no, or a clear opt-out, treat that as a red flag.

Second, age-appropriate design. If pupils will use the tool directly, it needs to comply with the ICO's Age Appropriate Design Code. Tools built for general consumers (including the consumer versions of ChatGPT, Claude, or Gemini) are not designed for under-18 use and should not be deployed to pupils without significant guard rails.

Third, DfE alignment. The DfE's Generative AI in Education position paper (current version June 2025) and the AI product safety expectations first published in January 2025 are now the closest thing to a baseline. Ask vendors directly whether they have read these documents and how their product addresses them. The answers will tell you a lot.

Good to know

If a vendor cannot give you a straight answer on whether your data is used for model training, where it is stored, and how long it is kept, that is not a paperwork issue. It is a sign the product was not built with schools in mind. Walk away.

Evaluating the actual product

Once a tool clears the compliance bar, the next stage is evaluating whether it does what it claims. Demos are designed to impress. Real classrooms are messier, so the evaluation needs to look more like a small trial than a sales meeting.

The most reliable signal is letting a few teachers use the tool on their own real work for two to three weeks. Pick a small group who span subject and experience levels, give them a clear brief about what they are testing for, and ask them to log time saved (or lost), quality of outputs, and any issues they noticed. A trial like this will surface problems that a 30-minute demo never could.

Look particularly carefully at the quality of subject-specific outputs. A general-purpose AI tool that produces brilliant English literature lesson plans may produce shaky maths content with the same prompt. Ask your trial teachers to test the tool on the corners of their specification, not just the easy bits. Hallucinations and accuracy issues tend to cluster in the less common topics.

Procurement questions to ask every vendor

The table below is a starter set of questions to run through with any AI vendor before signing. You will not get clean answers to all of them, and that itself is useful information. Vendors with mature school products tend to have these answers ready. Vendors building primarily for other markets will struggle.

AreaQuestionWhat you are listening for
DataWhere is data stored and is it used for model training?UK or EU storage. Clear no on training, or a robust opt-out by default.
Pupil safetyHow does the tool handle inappropriate content or prompts from pupils?Filtering, logging, and a process for review. Not just "the underlying model is safe."
Curriculum alignmentHow is content aligned to UK exam specifications?Specific spec names (AQA, OCR, Edexcel, CIE) and how alignment is verified.
AccuracyWhat is your hallucination rate and how do you measure it?Honest acknowledgement that hallucinations occur, plus mitigations like citation, human review, or restricted domains.
EvidenceWhat evidence do you have that this tool reduces workload or improves outcomes?School case studies with named partners, not just anecdotes. Independent evaluation is a strong signal.
PricingHow does pricing scale with users, and what happens to data if we cancel?Clear per-seat or per-school pricing. Data exportable and deleted on cancellation.
RoadmapWhat changes are planned in the next 12 months that might affect us?An honest answer rather than vague promises. Watch for AI model swaps that could change behaviour.
Core questions to put to any AI vendor during procurement.

Looking for evidence, not enthusiasm

The evidence base for AI tools in schools is still thin. That is partly because the field is genuinely new and partly because vendors have strong incentives to publish positive case studies and weaker incentives to publish negative ones. Treat enthusiasm in marketing materials as a baseline rather than evidence.

The strongest signal at the moment is workload reduction. Teachers using tools like Aila, the Oak National Academy AI assistant, have reported time savings on lesson planning and resource creation in Oak's own early research, with NFER's independent trial expected to report in 2026. Workload is measurable, the comparison group is the teacher's own previous practice, and the impact tends to be visible within weeks. If a vendor cannot point to time-saving evidence, ask why.

Evidence on pupil outcomes is much weaker. The studies that exist tend to be small, short-term, and confounded by novelty effects. Be cautious of vendors who claim AI directly improves grades. That research is not yet at the point where any single product can credibly own that claim. Workload reduction that frees teachers to do higher-impact work in lessons is a more defensible argument.

Good to know

Workload reduction is currently the strongest evidence-based reason to adopt AI tools. Outcome claims should be treated more sceptically, especially when they come without published methodology or independent evaluation.

Decide who owns the decision

AI procurement falls awkwardly between IT, teaching and learning, data protection, and finance. Most schools we have spoken to find it works best when one named person owns the decision and pulls in others for input.

The owner is usually a deputy head or assistant head with a workload or teaching and learning brief. They liaise with the DPO on data protection, with IT on integration, with subject leads on curriculum fit, and with the business manager on cost. Without a single owner, decisions tend to drift, vendors get inconsistent answers from different staff, and procurement takes months longer than it should.

A short, written AI tools policy helps. It does not need to be long. Two or three pages covering which tools are approved, what data can and cannot be input, how teachers should disclose AI use to pupils and parents, and where to go for advice is enough. The policy should be reviewed every term, because the landscape genuinely is moving that fast.

How to pilot before rolling out

Even after a tool passes compliance and evaluation, jumping straight to a whole-school rollout tends to go badly. A small pilot with clear criteria, run over a half-term, gives you the data you need to make a sensible call.

Choose a department or year group rather than a random sample of teachers. Coherent groups make the pilot easier to support and easier to learn from. Set two or three success criteria upfront: Time saved per week per teacher, quality of resources produced, and pupil engagement with any pupil-facing features are common ones. Decide before the pilot starts what success would look like.

At the end of the pilot, write a one-page evaluation. What worked, what did not, what would need to change to roll out wider, and what the ongoing cost would be. This document protects you in two ways. It forces honest reflection on whether the tool actually delivered, and it gives the next decision-maker (governors, MAT trust, parents) a clear answer to "why are we using this?"

Where Cognito tends to come up

For science and maths departments specifically, schools sometimes look at platforms like Cognito's quiz library and exam-aligned content alongside more general AI tools. The trade-off is that subject-specific platforms tend to have tighter spec alignment and better-vetted content, while general AI tools are more flexible. Most schools we have seen settle on a small mix rather than committing to a single tool, which keeps options open as the market matures.

Common procurement mistakes to avoid

Three patterns come up repeatedly in conversations with heads who feel their AI rollout did not go as planned.

The first is signing a long contract early. AI products are improving and changing quickly, and a three-year contract signed today may lock you into a tool that has been overtaken by month nine. Prefer annual contracts with clear exit terms until the market settles.

The second is underestimating training. Teachers cannot get value from a tool they do not know how to prompt well. Budget for proper training, ideally led by someone who has used the tool in a real classroom rather than by the vendor's slide deck. Two hours of inset is rarely enough.

The third is treating AI as a substitute for teacher judgement rather than an aid to it. Tools that work tend to fit into existing teacher workflows. Tools that fail tend to be sold as replacements for tasks teachers actually need to do themselves, like assessment of pupil work or curriculum decisions.

A procurement checklist

AI tool procurement checklist

Run through this before signing any contract. If you cannot tick most items, you are probably not ready to commit.

  • Specific workload or outcomes problem the tool is solving, written down
  • DPIA completed or scheduled, with DPO sign-off in progress
  • Vendor has confirmed UK or EU data storage and a clear position on training data
  • Tool aligns with current DfE guidance on generative AI in education
  • Subject-specific output quality tested by at least three teachers on real work
  • Pricing modelled for the actual number of users, including any growth scenarios
  • Pilot scope agreed with success criteria and a defined evaluation point
  • Named decision-owner identified, with input routes from DPO, IT, and curriculum leads
  • Annual contract preferred over multi-year unless commercial terms are exceptional
  • Training plan in place, ideally led by a teacher who has used the tool in class

Frequently asked questions


Related articles

See all
Teaching5 min

How to use AI in lesson planning effectively

Teaching5 min

How to evaluate AI-generated teaching resources before using them

Teaching5 min

Teaching online safety: Tackling the risks pupils actually face