Hardy-Weinberg equation for A-Level Biology
The Hardy-Weinberg principle states that in a large, randomly mating population with no evolutionary forces acting on it, the allele and genotype frequencies will stay constant from generation to generation. The two equations are p + q = 1 (for allele frequencies) and p^2 + 2pq + q^2 = 1 (for genotype frequencies). Real populations rarely meet all the assumptions, which is why Hardy-Weinberg is mainly used as a baseline to detect when evolution is happening.
This guide covers what each letter means, how to apply the equations to AQA-style questions, the five assumptions, and the typical exam pitfalls that examiner reports flag every June. It is a standard A-Level Biology Topic 7 calculation that you must be able to do from scratch.
Two equations, one principle
p + q = 1 for alleles. p^2 + 2pq + q^2 = 1 for genotypes. You need both.
Used to detect evolution
If allele frequencies change between generations, the population is evolving. Hardy-Weinberg is the baseline test.
Five assumptions
Large population, random mating, no mutation, no migration, no selection. All five must hold for the equations to work exactly.
What the principle says
The Hardy-Weinberg principle, published independently by G. H. Hardy and Wilhelm Weinberg in 1908, says that allele frequencies in a population stay constant across generations unless something is acting on the population to change them. If a population meets the five assumptions, the genotype frequencies in the next generation can be predicted using p^2 + 2pq + q^2 = 1.
The value of the principle is not that real populations follow it (they almost never do exactly), but that it gives you a baseline. If you sample a population and the genotypes do not fit the predicted ratios, something is causing evolution: Selection, drift, migration, mutation, or non-random mating.
Hardy-Weinberg equilibrium is a null hypothesis The equations describe what should happen if nothing is causing change. When real data deviates from Hardy-Weinberg, that is evidence of evolution. So examiners often phrase questions as: Do these data show that the population is in Hardy-Weinberg equilibrium, and what does the answer tell you?
The two equations and what each letter means
The first equation, p + q = 1, gives allele frequencies. For a gene with two alleles, p is the frequency of the dominant allele and q is the frequency of the recessive allele. Because every allele in the population must be one or the other, the two frequencies add to 1.
The second equation, p^2 + 2pq + q^2 = 1, gives genotype frequencies. p^2 is the frequency of homozygous dominant individuals. 2pq is the frequency of heterozygous individuals. q^2 is the frequency of homozygous recessive individuals. The three frequencies add to 1 because every individual must be one of the three genotypes.
| Term | Meaning | Example |
|---|---|---|
| p | Frequency of the dominant allele (A) | 0.8 means 80% of alleles are A |
| q | Frequency of the recessive allele (a) | 0.2 means 20% of alleles are a |
| p^2 | Frequency of homozygous dominant individuals (AA) | 0.64 means 64% are AA |
| 2pq | Frequency of heterozygous individuals (Aa) | 0.32 means 32% are Aa |
| q^2 | Frequency of homozygous recessive individuals (aa) | 0.04 means 4% are aa |
How to use the equations
AQA Paper 3 questions almost always start by telling you the frequency of the recessive phenotype (the homozygous recessive, aa). This is because aa is the only genotype you can identify by looking, since AA and Aa look the same. From q^2 you work backwards: Square-root to get q, then p = 1 – q, then 2pq for the carrier (heterozygous) frequency.
The standard four-step recipe is: Start from q^2, take the square root to get q, calculate p = 1 – q, then compute p^2 and 2pq. This gives you the full breakdown of the population. Practise it until it is automatic.
Always start from q^2 if given the recessive phenotype The recessive phenotype is the only one whose frequency can be directly counted. So if the question says 1 in 2,500 people has cystic fibrosis (the homozygous recessive condition), that frequency is q^2 = 1/2500 = 0.0004. From there, q = 0.02 and p = 0.98.
Worked example: Cystic fibrosis carriers
Cystic fibrosis affects about 1 in 2,500 people in the UK and is caused by two copies of a recessive allele. Estimate the frequency of carriers (heterozygous individuals) in the UK population, assuming Hardy-Weinberg equilibrium.
Step 1: q^2 = 1 / 2,500 = 0.0004. So q = root(0.0004) = 0.02.
Step 2: p = 1 – q = 1 – 0.02 = 0.98.
Step 3: Frequency of carriers = 2pq = 2 x 0.98 x 0.02 = 0.0392, or about 3.92%. That means roughly 1 in 25 people carries one copy of the cystic fibrosis allele. This is a striking result and a classic AQA Paper 3 example: Far more people are carriers than have the disease, because the recessive allele is hidden in heterozygotes.
The five assumptions
For the Hardy-Weinberg equations to predict frequencies exactly, the population must meet five assumptions. If any one of them is violated, allele frequencies can change between generations, which is the definition of evolution.
| Assumption | Why it matters |
|---|---|
| Large population | Small populations are affected by genetic drift (random change in allele frequency) |
| Random mating | Mate choice based on phenotype skews genotype frequencies away from p^2, 2pq, q^2 |
| No mutation | Mutation introduces new alleles and changes p and q |
| No migration | Migrants bring or remove alleles, changing the frequency |
| No natural selection | Selection favours some genotypes over others, changing frequencies |
Memorise the five assumptions A quick mnemonic is Large, Random, no MMS (mutation, migration, selection). Examiners regularly ask students to state the assumptions, and full marks need all five. Forgetting random mating is a particularly common slip.
When the equations break down
Real populations rarely meet all five assumptions. Humans certainly do not: Mate choice is not random, migration happens constantly, mutations occur in every generation, and natural selection still acts on traits like resistance to disease.
When the equations break down, it tells you something useful. If the homozygous recessive count is lower than q^2 predicts, the recessive allele is probably being selected against. If it is higher, the recessive may be under positive selection or the population may not be mating randomly. AQA Paper 3 will often give you data and ask you to interpret a deviation from Hardy-Weinberg.
Worked example: Sickle-cell allele in West Africa
In parts of West Africa about 2% of children are born with sickle-cell anaemia (homozygous recessive, q^2 = 0.02). Estimate the carrier frequency, then explain why the data are unusual.
Step 1: q^2 = 0.02, so q = root(0.02) = 0.141.
Step 2: p = 1 – q = 0.859.
Step 3: Carrier frequency = 2pq = 2 x 0.859 x 0.141 = 0.242, or about 24%. So roughly one in four people is a heterozygous carrier.
The sickle-cell allele would normally be eliminated by natural selection because homozygous recessives often die young. The carrier frequency is so high because heterozygotes are protected against severe malaria. This is heterozygote advantage, and it is a classic example of Hardy-Weinberg being violated: Selection is acting, so the no-selection assumption fails. What you see in these populations is a balanced polymorphism, where two opposing selection pressures hold the allele frequency at a stable point that looks constant across generations even though selection is operating. The Hardy-Weinberg maths still lets you estimate carriers from q^2, but the population is not in true HW equilibrium.
Where students lose marks
AQA examiner reports flag the same issues every year. Most are calculation slips or sloppy use of p and q. Hardy-Weinberg is high-yield because the maths is straightforward but the bookkeeping is unforgiving.
Mistakes that lose marks on Paper 3 Squaring q to get q^2 instead of square-rooting q^2 to get q. Calculating 2pq using one allele frequency and the genotype frequency by mistake. Forgetting to convert percentages to decimals before substituting. Writing only three of the five assumptions in a state-the-assumptions question. Confusing the allele frequency p with the genotype frequency p^2. Forgetting that p + q must equal 1 as a sanity check.
Worked example: Plant height
In a population of 1,000 plants, 360 are short (homozygous recessive, tt). Calculate the frequency of each genotype, assuming Hardy-Weinberg equilibrium.
Step 1: q^2 = 360 / 1,000 = 0.36. So q = root(0.36) = 0.6.
Step 2: p = 1 – q = 0.4.
Step 3: p^2 = 0.16, 2pq = 2 x 0.4 x 0.6 = 0.48, q^2 = 0.36. Check: 0.16 + 0.48 + 0.36 = 1.00. Good.
Step 4: Multiply by 1,000 to get numbers: 160 TT, 480 Tt, 360 tt. The carriers outnumber both the homozygotes, which is the characteristic Hardy-Weinberg outcome at intermediate allele frequencies.
Hardy-Weinberg revision checklist
- p + q = 1 (allele frequencies sum to 1)
- p^2 + 2pq + q^2 = 1 (genotype frequencies sum to 1)
- p is dominant allele frequency, q is recessive allele frequency
- p^2 is homozygous dominant, 2pq is heterozygous, q^2 is homozygous recessive
- Start from q^2 (the recessive phenotype frequency) and work back
- Five assumptions: Large population, random mating, no mutation, no migration, no selection
- Deviation from Hardy-Weinberg indicates evolution is happening
- Always sanity-check that your three genotype frequencies add to 1