Statistics in the CDS exam is a scoring chapter — the questions are short, formula-driven and rarely tricky if your basics are firm. This Cavalier guide covers the three averages (mean, median, mode), measures of spread, and how to handle frequency and grouped-data tables. Master these and you bank two to four near-certain marks in Paper II.
Why Statistics is a Must-Score Topic
Every CDS Elementary Mathematics paper carries a few questions from Statistics, and they sit among the easiest marks in the entire paper. Unlike trigonometry or coordinate geometry, the methods here are mechanical: identify the data type, pick the right formula, substitute and compute. There is no hidden construction or unexpected identity to spot — just clean, repeatable arithmetic. For a candidate racing against the clock, that reliability is gold.
The data you face comes in three forms. Raw or ungrouped data is a plain list of values, such as the marks of a few students. Discrete frequency data attaches a frequency to each distinct value — how many students scored 40, how many scored 45, and so on. Grouped or continuous data bundles values into class intervals such as 0–10 and 10–20. Recognising the type in the first five seconds tells you exactly which formula to reach for, so train your eye to classify before you calculate.
The two big ideas in this chapter are central tendency — a single number that represents the whole data set — and dispersion, which measures how spread out the values are. Mean, median and mode answer the first; range answers the second. Almost every CDS question is a variation on these themes.
Statistics is the science of collecting, organising, presenting and interpreting data. In CDS you only need descriptive statistics — averages and spread — not probability theory, which is examined as a separate topic.
Arithmetic Mean of Raw Data
The arithmetic mean (or average) is the most common measure of central tendency. For n observations x1, x2, …, xn:
Mean x̄ = (Sum of all observations) ÷ (Number of observations) = Σxi ÷ n
So if a cadet scores 60, 72, 55, 80 and 63 in five tests, the mean is (60 + 72 + 55 + 80 + 63) ÷ 5 = 330 ÷ 5 = 66. The mean uses every single observation, which makes it the most representative average — but also the one most disturbed by an extreme value, or outlier.
Two facts the examiner loves to test follow directly from the definition. First, the sum of all observations equals n × mean; this rearrangement of the formula unlocks almost every word problem where values are added, removed or corrected. Second, the sum of deviations of each value from the mean is always zero, that is Σ(xi − x̄) = 0, because the positive and negative deviations cancel exactly. A third handy property: if every observation is increased by a constant k, the mean also increases by k; if every observation is multiplied by k, the mean is multiplied by k too.
Mean of a Frequency Distribution
When each value xi repeats fi times, multiply value by frequency before adding.
Mean x̄ = Σfixi ÷ Σfi, where Σfi = N is the total frequency.
For grouped (continuous) data, replace each class by its class mark — the midpoint — given by (lower limit + upper limit) ÷ 2, and treat that as xi. The logic is that we assume every value inside a class is concentrated at its midpoint; this introduces a tiny approximation but keeps the calculation tractable. Build a neat table with columns for class, class mark, frequency and the product f x; the disciplined layout prevents arithmetic slips under exam pressure.
Class mark is the secret to grouped data. The moment you see class intervals like 0–10, 10–20, write down the midpoints 5, 15, 25… first. Everything else follows the simple frequency formula.
Assumed-Mean and Step-Deviation Shortcuts
When class marks are large, the assumed-mean method cuts your arithmetic. Choose a convenient class mark a as the assumed mean and let di = xi − a.
Assumed mean: x̄ = a + (Σfidi ÷ Σfi)
Step-deviation: x̄ = a + h × (Σfiui ÷ Σfi), where ui = (xi − a) ÷ h and h is the common class width.
All three methods — direct, assumed-mean and step-deviation — give exactly the same mean, so there is no question of one being more accurate than another. The choice is purely about speed. In the exam, pick whichever keeps the numbers smallest: the direct method for small, friendly class marks, the assumed-mean method when class marks are large but unequal in width, and the step-deviation method when the class width is constant. The step-deviation u-values are usually tiny integers such as −2, −1, 0, 1, 2, which is why it is the fastest route for most CDS grouped-data tables.
Median: The Middle Value
The median divides ordered data into two equal halves. Always arrange the data in ascending order first — the single most common slip.
For n observations in order:
• if n is odd, median = value of the ((n+1)÷2)th term;
• if n is even, median = average of the (n÷2)th and ((n÷2)+1)th terms.
A great strength of the median is that it ignores extreme values. If one cadet in a group of ten earns a freak salary far above the rest, the mean is dragged upwards but the median stays put — which is why income and other skewed data are usually summarised by the median.
For grouped data, build a cumulative frequency column of running totals, then find the class where the cumulative frequency first reaches or crosses N÷2. That is the median class, and you apply the interpolation formula below:
Median = l + [(N÷2 − cf) ÷ f] × h
where l = lower limit of median class, cf = cumulative frequency before it, f = its frequency, h = class width.
Mode: The Most Frequent Value
The mode is the observation that occurs most often. It is the only average that can be used for non-numerical data — the most common shoe size, the favourite subject, the bestselling rifle calibre. Raw data can be unimodal (one mode), bimodal or multimodal (several modes), or have no mode at all when every value appears equally often. For grouped data, the class with the highest frequency is the modal class, and the formula below estimates the mode within it.
Mode = l + [(f1 − f0) ÷ (2f1 − f0 − f2)] × h
where l = lower limit of modal class, f1 = its frequency, f0 = frequency of the preceding class, f2 = frequency of the following class, h = class width.
The three averages are linked by the empirical relation:
Mode = 3 × Median − 2 × Mean. If a question gives any two of them, you can find the third instantly.
Range and Measures of Spread
Averages tell you the centre of the data, but two very different data sets can share the same mean. Spread, or dispersion, tells you how scattered the values are around that centre, and it is what distinguishes a consistent performer from an erratic one. The simplest and most exam-friendly measure of spread is the range.
Range = Highest value − Lowest value.
So for the marks 60, 72, 55, 80, 63 the range is 80 − 55 = 25. A small range signals tightly clustered, consistent data; a large range warns that the values are widely scattered. Be careful with grouped data: there the range is taken as the upper limit of the highest class minus the lower limit of the lowest class. CDS also tests the combined mean, used when two separate groups are merged into one — for example the average marks of a whole class formed from a boys’ section and a girls’ section:
Combined mean = (n1x̄1 + n2x̄2) ÷ (n1 + n2), where n1, n2 are the group sizes and x̄1, x̄2 their means.
Worked Example: Mean of Grouped Data
Let us apply the step-deviation method to a typical grouped table.
Find the mean of: classes 0–10, 10–20, 20–30, 30–40, 40–50 with frequencies 5, 8, 15, 9, 3.
The same data by the direct method gives Σfx = 970 and 970 ÷ 40 = 24.25 — a quick check that the shortcut is reliable.
Worked Example: Median and the Empirical Rule
Combining the median with the empirical relation is a frequent CDS pattern, because it lets the examiner test two ideas with a single short calculation.
For a distribution the mean is 24.5 and the median is 25.5. Find the mode.
If a question gives data that is nearly symmetric, mean ≈ median ≈ mode. A wildly different answer usually means an arithmetic error — recheck before marking.
Common Mistakes to Avoid
Computing the median without first sorting the data. The median formula assumes the values are in ascending order — an unsorted list gives a wrong answer every time.
Confusing frequency with cumulative frequency in the median-class formula. The term ‘cf’ is the cumulative frequency of the class before the median class, not the median class itself.
Forgetting that the empirical relation is Mode = 3 Median − 2 Mean, not 2 Median − 3 Mean. Memorise the coefficients 3 and 2 in that order.
Using upper class limits instead of class marks when finding the mean of grouped data. Always take the midpoint of each interval.
Previous-Year Style Question
Q. The mean of 50 observations is 36. If two observations 30 and 42 are removed, what is the mean of the remaining observations?
Answer: Total of 50 observations = 50 × 36 = 1800. Remove 30 and 42: new total = 1800 − 72 = 1728, over 48 observations. New mean = 1728 ÷ 48 = 36. (Since the two removed values averaged 36, the mean is unchanged.)
Whenever values are added or removed, work with totals (n × mean), adjust the total, then divide by the new count. This single idea solves most CDS mean-based word problems.
Quick Revision
- Mean = Σx ÷ n; for frequency data, Σfx ÷ Σf.
- Use class marks (midpoints) and the step-deviation shortcut for grouped data.
- Median: sort first; odd n → middle term, even n → average of two middle terms.
- Mode = most frequent value; modal-class formula for grouped data.
- Empirical relation: Mode = 3 Median − 2 Mean.
- Range = Highest − Lowest; combined mean weights each group by its size.
Drill five to ten previous-year statistics questions and these formulas become automatic. In the CDS hall, recognise the data type, pick the formula, and convert near-certain marks into your score.
Frequently asked questions
How many questions come from Statistics in CDS Maths?
Typically two to four questions appear in the Elementary Mathematics paper, covering mean, median, mode and range. They are among the most scoring questions because the methods are direct and formula-based.
What is the difference between mean, median and mode?
The mean is the arithmetic average of all values, the median is the middle value of ordered data, and the mode is the most frequently occurring value. All three are measures of central tendency but respond differently to extreme values.
When should I use the step-deviation method?
Use it for grouped data with equal class widths and large class marks. It replaces big numbers with small deviations u = (x − a)/h, cutting your arithmetic while giving exactly the same mean as the direct method.
Do I need the empirical relation Mode = 3 Median − 2 Mean for CDS?
Yes. It is a quick way to find any one average when the other two are given, and it appears regularly in CDS objective questions. Memorise the coefficients 3 and 2 in that exact order.
Is the median affected if I forget to sort the data?
Yes, completely. The median depends on the position of values in ascending order, so an unsorted list almost always gives a wrong middle value. Always arrange the data first.
Related CDS / OTA Maths topics
Want a teacher to walk you through CDS / OTA Maths?
Cavalier's CDS / OTA batches break every topic into classroom sessions with daily practice, tests and doubt-clearing.