Properties of Subgradients for Non-Differentiable Convex Functions

Exploring the cinematic intuition of Properties of Subgradients for Non-Differentiable Convex Functions.

Visualizing...

Our institutional research engineers are currently mapping the formal proof for Properties of Subgradients for Non-Differentiable Convex Functions.

Apply for Institutional Early Access →

The Formal Theorem

Given two proper, convex functions \ f_1: \\mathbb{R}^n \\to \\mathbb{R} \\cup \\{\\infty\\} \ and \ f_2: \\mathbb{R}^n \\to \\mathbb{R} \\cup \\{\\infty\\} \, if \ \\text{dom}(f_1) \\cap \\text{dom}(f_2) \\neq \\emptyset \ and \ x \ is a point in \ \\mathbb{R}^n \ where both \ f_1 \ and \ f_2 \ are finite, then for the sum function \ f(x) = f_1(x) + f_2(x) \, the subgradient at \ x \ is given by the sum of their individual subgradients: \
partialf(x)=partialf1(x)+partialf2(x)\begin{aligned} \\partial f(x) = \\partial f_1(x) + \\partial f_2(x) \\\end{aligned}
This property holds under the basic assumption that \ \\text{dom}(f_1) \\cap \\text{dom}(f_2) \\neq \\emptyset \. For more general cases, a constraint qualification such as \ \\text{ri}(\\text{dom}(f_1)) \\cap \\text{ri}(\\text{dom}(f_2)) \\neq \\emptyset \ (where \ \\text{ri} \ denotes the relative interior) ensures this sum rule.

Analytical Intuition.

Imagine you're a cartographer charting a rugged, multi-dimensional energy landscape, \ f(x) \. Unlike a smooth, rolling hill, this terrain is convex but boasts sharp ridges, abrupt valleys, and even jagged peaks—points where the traditional gradient, your compass's needle, simply spins wildly. At these non-differentiable points, the 'subgradient,' \ \\partial f(x) \, steps in. It's not a single vector pointing downhill, but a *set* of vectors, forming a 'fan' of possible downhill directions. Each vector in this fan represents a valid 'supporting hyperplane'—a flat surface that touches the landscape at \ x \ and lies entirely beneath it, like a perfectly balanced plank. Now, envision two such landscapes, \ f_1(x) \ and \ f_2(x) \, merging to form a new, combined terrain, \ f(x) = f_1(x) + f_2(x) \. The profound beauty of the subgradient sum rule is its additivity: the 'fan' of supporting planks for the combined landscape at any point \ x \ is simply the vector sum of the 'fans' of supporting planks from the individual landscapes. It's like combining all possible 'downhill impulses' from \ f_1 \ with all possible 'downhill impulses' from \ f_2 \ to accurately map the collective descent, preserving linearity even in the face of non-smoothness.
CAUTION

Institutional Warning.

Students often struggle to internalize that \ \\partial f(x) \ is a set, not a single vector, at non-differentiable points. The geometric intuition of 'supporting hyperplanes' can also be challenging to visualize in higher dimensions, leading to confusion about its existence and uniqueness.

Academic Inquiries.

01

What is the geometric intuition of a subgradient?

Geometrically, a subgradient \ g \ of a convex function \ f \ at a point \ x \ defines a supporting hyperplane \ y = f(x) + g^T(z-x) \ to the epigraph of \ f \ at \ (x, f(x)) \. This means the hyperplane lies entirely below or touches the function at \ x \, much like a tangent plane for a differentiable function, but it can 'tilt' within a certain range at non-differentiable points.

02

Why is the subgradient a set of vectors instead of a single vector?

At points where a convex function is differentiable, the subgradient set contains only a single vector, which is the gradient. However, at non-differentiable points (e.g., a 'kink' or 'corner'), there can be multiple valid supporting hyperplanes. Each of these hyperplanes corresponds to a different 'slope' that lies below the function, hence the subgradient is a set encompassing all such possible slopes.

03

When is the subgradient equivalent to the gradient?

For a convex function \ f \, the subgradient \ \\partial f(x) \ is equivalent to the gradient \ \\nabla f(x) \ if and only if \ f \ is differentiable at \ x \. In this case, the subgradient set contains exactly one element: \ \\partial f(x) = \\{\\nabla f(x)\\} \.

04

What happens if the functions are not convex?

The concept of a subgradient is primarily defined and useful for convex functions. While generalized gradients exist for non-convex functions (e.g., Clarke subgradients), they have different properties and are defined in a more complex manner. Many of the elegant properties of subgradients, such as the sum rule, do not hold generally for non-convex functions.

Standardized References.

  • Definitive Institutional SourceBoyd, S., & Vandenberghe, L. (2004). Convex Optimization. Cambridge University Press.

Institutional Citation

Reference this proof in your academic research or publications.

NICEFA Visual Mathematics. (2026). Properties of Subgradients for Non-Differentiable Convex Functions: Visual Proof & Intuition. Retrieved from https://nicefa.org/library/fundamentals-of-optimization/properties-of-subgradients-for-non-differentiable-convex-functions

Dominate the Logic.

"Abstract theory is just a movement we haven't seen yet."