Calculus & Optimization

Why this unit matters

Calculus is a smaller unit in GATE DA compared to probability or linear algebra, but it punches above its weight in two ways. First, direct questions on limits, continuity, and derivatives do appear and are usually quick marks for prepared students. Second, calculus connects everything else: gradient descent (ML), the exponential distribution (probability), the Taylor series (approximation methods), and convexity (optimization). If you understand why a function has a minimum and how to find it analytically, you also understand logistic regression, SVM, and ridge regression at a deeper level.

Syllabus map

Sub-topic	Key concepts
Limits	Left/right limits, standard limits, L'Hopital
Continuity	Epsilon-delta, types of discontinuity
Differentiability	Definition, chain rule, implicit differentiation
Taylor series	Maclaurin series, remainder term, approximations
Maxima and minima	First derivative test, second derivative test
Optimization	Constrained vs unconstrained, convexity

Limits and continuity

The limit of f(x) as x approaches a is L if f(x) can be made arbitrarily close to L by taking x close enough to a (but not equal to a).

A function is continuous at a if:

f(a) is defined.
lim_{x -> a} f(x) exists.
lim_{x -> a} f(x) = f(a).

All three conditions must hold. Polynomials, exponentials, logarithms, and trigonometric functions are continuous everywhere in their domains.

L'Hopital's rule. If lim_{x -> a} f(x)/g(x) gives 0/0 or infinity/infinity, then:

lim_{x -> a} f(x)/g(x) = lim_{x -> a} f'(x)/g'(x)

provided the right-hand limit exists.

Trap. L'Hopital applies only to 0/0 or inf/inf forms. For other indeterminate forms (0 * inf, inf, inf), you must first algebraically transform to one of those two forms.

Standard limits to memorise.

lim_{x -> 0} sin(x) / x = 1, lim_{x -> 0} (1, cos x) / x^2 = 1/2, lim_{x -> 0} (e^x, 1) / x = 1, lim_{x -> 0} ln(1 + x) / x = 1, lim_{x -> inf} (1 + 1/x)^x = e

Differentiability

f is differentiable at a if f'(a) = lim_{h -> 0} [f(a + h), f(a)] / h exists.

Differentiability implies continuity; the converse is false. f(x) = |x| is continuous at 0 but not differentiable there (left derivative = -1, right derivative = +1).

Key differentiation rules.

Power rule: d/dx [x^n] = n * x^(n-1), Chain rule: d/dx [f(g(x))] = f'(g(x)) * g'(x), Product rule: d/dx [f * g] = f'g + fg'
Quotient rule: d/dx [f/g] = (f'g, fg') / g^2, d/dx [e^x] = e^x; d/dx [ln x] = 1/x, d/dx [sin x] = cos x; d/dx [cos x] = -sin x

Trap. The chain rule is misapplied constantly. When differentiating e^(x^2), the answer is e^(x^2) * 2x, not just e^(x^2).

Taylor series

The Taylor series of f around a is:

f(x) = f(a) + f'(a)(x-a) + f''(a)(x-a)^2/2! + f'''(a)(x-a)^3/3! + ...

When a = 0, this is the Maclaurin series.

Maclaurin series to memorise.

e^x = 1 + x + x^2/2! + x^3/3! + ..., sin x = x, x^3/3! + x^5/5! - ..., cos x = 1, x^2/2! + x^4/4! - ..., ln(1+x) = x, x^2/2 + x^3/3 - ... (valid for |x| <= 1, x ≠ -1), 1/(1-x) = 1 + x + x^2 + x^3 + ... (valid for |x| < 1)

Taylor approximation. For small epsilon: f(x + epsilon) ≈ f(x) + f'(x) * epsilon. This first-order approximation is the foundation of gradient descent. You move x in the direction that decreases f the fastest.

Maxima and minima

First derivative test. Find critical points where f'(x) = 0 or f'(x) is undefined., If f' changes from + to, at c: local maximum., If f' changes from, to + at c: local minimum., If f' does not change sign: neither (inflection point).

Second derivative test. At a critical point c:

f''(c) > 0: local minimum (function is concave up)., f''(c) < 0: local maximum (function is concave down)., f''(c) = 0: test is inconclusive, use the first derivative test.

Global extrema. On a closed interval [a, b], the global maximum/minimum is the largest/smallest among all local extrema and the endpoint values f(a) and f(b).

Trap. GATE sometimes asks about open intervals. On an open interval, a function need not attain its supremum or infimum (for example, f(x) = x on (0,1) has no maximum).

Convexity and optimization

A function f is convex if for all x, y and t in [0,1]:

f(t*x + (1-t)y) <= tf(x) + (1-t)*f(y)

Equivalently (for twice-differentiable f): f''(x) >= 0 everywhere.

Why convexity matters. A convex function has no local minima that are not global minima. This means gradient descent is guaranteed to find the global minimum for convex objectives, important for understanding why linear regression (quadratic loss, convex) is well-behaved but neural networks (non-convex) are not.

Jensen's inequality. For a convex function f:

f(E[X]) <= E[f(X)]

For concave f the inequality flips. Jensen's inequality appears directly in GATE questions and indirectly in the derivation of the EM algorithm.

Worked examples

Example 1. Evaluate lim_{x -> 0} (e^x, 1 - x) / x^2.

Both numerator and denominator go to 0. Apply L'Hopital: Numerator derivative: e^x, 1. Denominator derivative: 2x. Still 0/0. Apply again: e^x / 2. At x = 0: 1/2.

Alternatively, use Taylor: e^x = 1 + x + x^2/2 + ..., so e^x, 1 - x = x^2/2 + ..., and (x^2/2)/x^2 = 1/2.

Example 2. Find the maximum of f(x) = x * e^(-x) for x > 0.

f'(x) = e^(-x), x * e^(-x) = e^(-x)(1, x). Critical point at x = 1. f''(x) = -e^(-x)(1, x) + e^(-x)(-1) = e^(-x)(x, 2). f''(1) = e^(-1)(1, 2) = -e^(-1) < 0, so x = 1 is a local maximum. Maximum value = 1 * e^(-1) = 1/e.

Example 3. The function f(x) = 3x^4, 4x^3 has a local minimum at x = 1. Verify using the second derivative test.

f'(x) = 12x^3, 12x^2 = 12x^2(x, 1). Critical points: x = 0 and x = 1. f''(x) = 36x^2, 24x. f''(1) = 36, 24 = 12 > 0. Confirmed: local minimum at x = 1. f''(0) = 0. Test inconclusive for x = 0 (it is actually an inflection point).

Example 4. Write the first three terms of the Maclaurin expansion of ln(1 + x) / x and evaluate the limit as x -> 0.

ln(1+x) = x, x^2/2 + x^3/3 - ... ln(1+x)/x = 1, x/2 + x^2/3 - ... As x -> 0: limit = 1.

Example 5. Is f(x) = e^x convex? Is g(x) = ln(x) convex or concave?

f''(x) = e^x > 0 for all x: strictly convex. g''(x) = -1/x^2 < 0 for x > 0: strictly concave.

Quick-revision summary

Differentiability implies continuity; |x| is the classic example of the converse failing., L'Hopital: only for 0/0 or inf/inf. Apply repeatedly if needed., e^x Maclaurin: 1 + x + x^2/2! + x^3/3! + ..., At a critical point: f''> 0 means minimum, f'' < 0 means maximum, f'' = 0 means test fails., Convex function: f'' >= 0. Every local min is a global min., Jensen: f(E[X]) <= E[f(X)] for convex f., Taylor first-order approximation is the basis of gradient descent.

How to study this unit

Spend one day on limits and continuity: do 10 limit problems including L'Hopital cases and the standard limit table.
Spend one day on differentiation: practice the chain rule intensively (this is where marks are lost). Do 10 chain-rule problems.
Memorise all five Maclaurin series. Test yourself by writing them from scratch without looking.
Do 10 maxima/minima problems mixing first and second derivative tests, including cases where the second derivative test fails.
Read about convexity and Jensen's inequality with an eye on their ML implications, you will not need proofs, but you will need to identify convex functions and apply the inequality.
This unit can be completed in 4, 5 days of focused study. Do not over-invest at the cost of probability or ML.