The Logic of Two Variables#
In Grade 10–11, you worked with one data set at a time (univariate data: mean, median, mode, box plots). In Grade 12, we look at the relationship between two data sets (bivariate data).
The question we ask: Does changing one variable cause (or correlate with) a change in the other?
- Hours studied vs. marks obtained
- Temperature vs. ice cream sales
- Age of car vs. resale value
1. Drawing a Scatter Plot#
A scatter plot places each data pair $(x; y)$ as a dot on a Cartesian plane.
The Steps#
- Identify the variables: The independent variable (the “cause”) goes on the x-axis. The dependent variable (the “effect”) goes on the y-axis.
- Choose appropriate scales: Look at the minimum and maximum values for each variable.
- Plot each point: Each row of the data table becomes one dot.
- Do NOT connect the dots: Scatter plots show individual data points, not a continuous function.
Example Data#
| Hours Studied ($x$) | 2 | 3 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|
| Test Mark ($y$) | 35 | 40 | 55 | 60 | 68 | 72 | 80 | 85 |
Plot each pair as a point: $(2; 35)$, $(3; 40)$, $(5; 55)$, etc.
2. Describing the Correlation#
After plotting, describe the pattern:
Direction#
| Pattern | Name |
|---|---|
| Dots trend upward (↗) | Positive correlation |
| Dots trend downward (↘) | Negative correlation |
| Dots are scattered randomly | No correlation |
Strength#
| Pattern | Strength |
|---|---|
| Dots are close to a straight line | Strong correlation |
| Dots are loosely grouped around a trend | Moderate correlation |
| Dots are widely scattered | Weak correlation |
Form#
| Pattern | Form |
|---|---|
| Trend follows a straight line | Linear |
| Trend follows a curve | Non-linear (exponential, quadratic, etc.) |
3. The Line of Best Fit (by Eye)#
Before learning the formal regression formula, you should be able to draw a line of best fit by eye:
- The line should pass through the middle of the data cloud.
- Roughly equal numbers of points should be above and below the line.
- The line should pass through the point $(\bar{x}; \bar{y})$ — the mean of both variables.
4. Outliers#
An outlier is a data point that lies far away from the general trend.
How to identify: A point that is clearly separated from the rest of the scatter plot.
Impact: Outliers can significantly affect the:
- Mean (pulled toward the outlier)
- Regression line (tilted toward the outlier)
- Correlation coefficient (weakened or artificially strengthened)
What to do: Note the outlier. If the question asks you to recalculate after removing it, exclude that data pair from your calculations.
5. Revision: Univariate Data Concepts#
These Grade 10–11 concepts may still appear in Paper 2:
Measures of Central Tendency#
- Mean: $\bar{x} = \frac{\sum x}{n}$
- Median: Middle value when data is ordered
- Mode: Most frequent value
Measures of Spread#
- Range: Maximum − Minimum
- Interquartile Range (IQR): $Q_3 - Q_1$
- Standard Deviation: How far data points typically are from the mean
- Variance: (Standard Deviation)$^2$
Five Number Summary#
Minimum, $Q_1$, Median, $Q_3$, Maximum → used to draw Box-and-Whisker plots.
Ogive (Cumulative Frequency Curve)#
- Plot cumulative frequencies against upper class boundaries.
- Use the ogive to estimate the median, quartiles, and percentiles.
🚨 Common Mistakes#
- Swapping x and y: The independent variable (what you control or the “cause”) goes on the x-axis. Getting this wrong changes the entire regression equation.
- Confusing correlation with causation: Just because two variables correlate doesn’t mean one causes the other. Ice cream sales and drownings both increase in summer — but ice cream doesn’t cause drowning!
- Drawing the line of best fit through $(0; 0)$: The line of best fit does NOT have to pass through the origin unless the data shows it.
- Ignoring outliers in interpretation: If there’s a clear outlier, mention it in your answer and explain its potential effect.
💡 Pro Tip: The Mean Point#
The regression line (whether drawn by eye or calculated) always passes through the point $(\bar{x}; \bar{y})$. If your line doesn’t pass through this point, adjust it.
