Probability & Statistics
Counting, axioms, conditional probability, distributions, CLT, hypothesis testing.
Why this unit matters
Probability and statistics show up directly in about 15, 18% of GATE DA questions, and indirectly in machine learning, AI inference, and data preprocessing topics. If you can reason about randomness precisely, distinguishing independence from mutual exclusivity, reading a Bayes' theorem problem without panic, computing a confidence interval from scratch, you have a huge advantage over students who treat this unit as secondary. Every ML algorithm assumes something about data distributions. Understanding those assumptions starts here.
Syllabus map
| Sub-topic | Key concepts |
|---|---|
| Counting | Permutations, combinations, multinomial |
| Probability foundations | Axioms, sample space, events |
| Event relationships | Independent, mutually exclusive, complementary |
| Multi-variable probability | Marginal, conditional, joint; Bayes' theorem |
| Descriptive statistics | Mean, median, mode, std dev, correlation, covariance |
| Random variables | Discrete and continuous; CDF, PDF |
| Named distributions | Uniform, Bernoulli, Binomial, Poisson, Exponential, Normal, t, chi-squared |
| Limit theorems | CLT, LLN |
| Inference | Confidence intervals, z-test, t-test, chi-squared test |
Counting: permutations and combinations
Permutations count ordered arrangements; combinations count unordered selections.
P(n, r) = n! / (n, r)! (ordered, without replacement) C(n, r) = n! / (r! * (n, r)!) (unordered, without replacement)
Common trap. Students confuse "arrangements" with "selections". If the problem says "how many ways to choose a committee", order does not matter, use C. If it says "how many ways to assign roles", order matters, use P.
Worked example. A password uses 3 distinct digits from {0,1,...,9}. How many passwords are possible?
Answer: P(10, 3) = 10 * 9 * 8 = 720. Order matters because "123" and "321" are different passwords.
Probability axioms and event relationships
The three axioms (Kolmogorov):
- P(A) >= 0 for all events A.
- P(sample space) = 1.
- For mutually exclusive A and B: P(A union B) = P(A) + P(B).
For non-exclusive events: P(A union B) = P(A) + P(B), P(A intersection B).
Independent vs. mutually exclusive. These are different ideas that students regularly confuse., Independent: P(A intersection B) = P(A) * P(B). Knowing A happened tells you nothing about B., Mutually exclusive: P(A intersection B) = 0. They cannot both happen.
Two non-trivial events cannot be both independent and mutually exclusive (if they are mutually exclusive, P(A intersection B) = 0, but P(A) * P(B) > 0 for non-trivial events, a contradiction).
Conditional probability and Bayes' theorem
Conditional probability: P(A | B) = P(A intersection B) / P(B).
Bayes' theorem: P(A | B) = P(B | A) * P(A) / P(B).
The denominator P(B) is expanded using total probability: P(B) = sum over all partitions Ai of P(B | Ai) * P(Ai).
Classic GATE-style trap. A disease affects 1% of a population. A test is 99% sensitive (detects true positives) and 99% specific (correctly rejects negatives). You test positive. What is the probability you have the disease?
P(disease | positive) = P(positive | disease) * P(disease) / P(positive) = 0.99 * 0.01 / (0.99 * 0.01 + 0.01 * 0.99) = 0.0099 / 0.0198 = 0.5
Only 50%, not 99%. This is the base-rate fallacy. When prevalence is low, even a highly accurate test produces many false positives.
Descriptive statistics
For a dataset x1, x2, ..., xn:
- Mean (mu) = sum(xi) / n, Variance = sum((xi, mu)^2) / n (population); divide by n-1 for sample variance, Standard deviation = sqrt(variance), Covariance(X, Y) = E[(X, mu_X)(Y, mu_Y)], Correlation = Covariance(X, Y) / (std_X * std_Y)
Correlation is bounded: -1 <= r <= 1. Zero correlation does not mean independence (only for jointly normal variables does it imply independence).
Distributions you must know cold
Binomial B(n, p). n independent Bernoulli trials, each with success probability p. P(X = k) = C(n, k) * p^k * (1-p)^(n-k) Mean = np, Variance = np(1-p).
Poisson(lambda). Counts of rare events in a fixed interval when n is large and p is small. P(X = k) = e^(-lambda) * lambda^k / k! Mean = Variance = lambda. This equality is a quick check in exams.
Exponential(lambda). Time between Poisson events. Memoryless property: P(X > s + t | X > s) = P(X > t). Mean = 1/lambda, Variance = 1/lambda^2.
Normal N(mu, sigma^2). Symmetric, bell-shaped. Standard normal Z = (X, mu) / sigma. About 68% of values fall within 1 sigma of the mean, 95% within 2 sigma, 99.7% within 3 sigma.
t-distribution. Used when population std dev is unknown and sample is small. Heavier tails than normal; approaches normal as degrees of freedom increase.
Chi-squared. Sum of squares of k independent standard normals. Used in goodness-of-fit and contingency table tests.
Central Limit Theorem
If X1, X2, ..., Xn are i.i.d. with mean mu and variance sigma^2, then the sample mean X_bar has:
X_bar ~ N(mu, sigma^2 / n) approximately, for large n.
The CLT does not require the underlying distribution to be normal. This is why z-tests are valid for large samples regardless of the original distribution.
Hypothesis testing in brief
- State null hypothesis H0 and alternative H1.
- Choose significance level alpha (commonly 0.05).
- Compute test statistic.
- Reject H0 if test statistic falls in the rejection region (|z| > z_critical for two-tailed).
- z-test: population variance known, or n >= 30., t-test: population variance unknown, small n., chi-squared test: categorical data (goodness of fit, independence of attributes).
Common trap. A p-value of 0.03 does not mean the probability that H0 is true is 3%. It means: if H0 were true, there is a 3% chance of observing data at least as extreme as what you got.
Worked examples
Example 1. X ~ Poisson(3). Find P(X = 2).
P(X = 2) = e^(-3) * 3^2 / 2! = e^(-3) * 9 / 2 = 9 * 0.0498 / 2 ≈ 0.224.
Example 2. X and Y are independent. E[X] = 2, E[Y] = 3, Var(X) = 4, Var(Y) = 5. Find Var(2X, Y + 1).
Var(aX + bY + c) = a^2 * Var(X) + b^2 * Var(Y) when independent. Var(2X, Y + 1) = 4 * 4 + 1 * 5 = 16 + 5 = 21.
Note: constants (like +1) do not affect variance.
Example 3. A 95% confidence interval for the mean of a normal population (sigma = 10) is computed from n = 100 samples. Find the margin of error.
z_0.025 = 1.96. Margin = 1.96 * (10 / sqrt(100)) = 1.96 * 1 = 1.96.
Example 4. Correlation between X and Y is 0.6. Var(X) = 9, Var(Y) = 16. Find Cov(X, Y).
Cov(X, Y) = r * std_X * std_Y = 0.6 * 3 * 4 = 7.2.
Example 5. A fair coin is tossed 4 times. What is the probability of getting exactly 3 heads?
X ~ B(4, 0.5). P(X = 3) = C(4,3) * 0.5^3 * 0.5^1 = 4 * 0.125 * 0.5 = 0.25.
Quick-revision summary
- P(A union B) = P(A) + P(B), P(A intersect B). Add back the overlap., Independent: P(AB) = P(A)P(B). Mutually exclusive: P(AB) = 0. Never both (for non-trivial events)., Bayes' theorem expresses posterior = likelihood * prior / marginal., Poisson mean = variance = lambda. Exponential is memoryless., CLT: X_bar ~ N(mu, sigma^2/n) for large n, regardless of original distribution., Correlation = 0 does not imply independence (except jointly normal)., t-test for unknown population variance / small n; z-test otherwise., p-value is not P(H0 is true). It is P(data | H0).
How to study this unit
- Start with counting (2 days): work through 10 permutation/combination problems until you never confuse ordered vs. unordered selection again.
- Build the Bayes' theorem intuition with the disease-test example: vary the prevalence and watch the posterior change dramatically.
- Memorise the mean and variance formulas for all named distributions. Flash-card them. GATE regularly asks you to compute these directly.
- Do at least 5 CLT problems where you are asked to find the probability that a sample mean falls in some range.
- Practice writing out a full hypothesis test (state hypotheses, compute test statistic, compare to critical value) for z, t, and chi-squared scenarios.
- In the last week before the exam, solve 2, 3 previous GATE DA papers covering this unit under time pressure.
Prefer watching over reading?
Subscribe for free.