Testing Convexity of a function

The Curious Case of Convex Functions

In-depth understanding of convex functions and ways to test convexity.

Pritish Jadhav

6 min readJan 14, 2021

Most of the online literature on introduction to machine learning kicks off by covering the Linear Regression algorithm.
Typically, the Linear Regression algorithm is detailed out by using Mean Squared Error (MSE) as the loss function.
MSE is a convex function. The convexity property unlocks a crucial advantage where the local minima is also the global minima.
This ensures that a model can be trained such that the loss function is minimized to its globally minimum value.
However, proving the convexity of MSE (or any other loss function) is typically out of scope.
In this blog post, we shall work through the concepts needed to prove the convexity of a function.

Copyright © 2014–2016, Sebastian Raschka. All rights reserved.

So hang in tight and by the end of this article, you would have a better understanding of -

Square and Symmetric Matrices.
Hessian Matrices.
Proving/Testing the convexity of functions.
Positive and Negative Definite/Semidefinite Matrices.

Without much further adieu, let’s jump into it.

Basics of Matrices -

We shall touch base on some of the basic concepts in Matrices that will help us in proving the convexity of a function. These basics are typically covered in Undergraduate Courses and should sound familiar to you.

Square Matrix -

Given a matrix X of dimensions m × n, X is a square matrix iff (if and only if) m = n. In other words, matrix X is called a square matrix if the number of matrix rows is equal to the number of columns.

In the above example, since the number of rows (m) = the number of columns (n), X can be termed as a square matrix.

Symmetric Matrix -

A square matrix, X, is termed as symmetric if xᵢⱼ = xⱼᵢ ∀ i and j where i, j denotes the ith row and jth column respectively. In other words, matrix X is symmetric if the transpose of X is equal to matrix X.

Hessian Matrix

The Hessian of a multivariate function f(x₁, x₂, …xₙ) is a way for organizing second-order partial derivatives in the form of a matrix.

Consider a multivariate function f(x₁, x₂, …xₙ) then its Hessian can be defined as follows -

Note that a Hessian matrix by definition is a Square and Symmetric matrix.

Proving / Checking Convexity of a function -

With all the relevant basics covered in previous sections, we are now ready to define checks for determining the convexity of functions.

A function f(x) defined over n variables on an open convex set S can be tested for convexity using the following criteria -

If Xᴴ is the Hessian Matrix of f(x) then -

f(x) is strictly convex in S if Xᴴ is a Positive Definite Matrix.
f(x) is convex in S if Xᴴ is a Positive Semi-Definite Matrix.
f(x) is strictly concave in S if Xᴴ is a Negative Definite Matrix.
f(x) is concave in S if Xᴴ is a Negative Semi-Definite Matrix.

In the next section, let’s discuss ways to test the definiteness of a matrix.

Positive Definite and Semidefinite Matrices -

You may have seen references about these matrices at multiple places but the definition and ways to prove definitiveness remains elusive to many. Let us start with the textbook definition -

An n × n square matrix, X, is Positive Definite if aXa > 0 ∀ a ∈ ℝⁿ.

Similarly,

An n × n square matrix, X, is Postive Semidefinite if aᵀXa ≥ 0 ∀ a ∈ ℝⁿ.

The same logic can be easily extended to Negative Definite and Negative Semidefinite matrices.
An n × n square matrix, X, is Negative Definite if aᵀXa < 0 ∀ a ∈ ℝⁿ.

And finally,
An n × n square matrix, X, is Negative Semidefinite if aᵀXa ≤ 0 ∀ a ∈ ℝⁿ.

However, while checking for Matrix Definiteness, validating the condition of every value of `a` is definitely not an option.
So let us try and break this down even further.

Eigen Values and Eigen Vector -

We know that an eigenvector is a vector whose direction remains unchanged when a linear transformation is applied to it.

Analysis

It is easy to see that the dot product of a with aᵀ will always be positive.
Referring back to the definition of positive definite matrix and eq(3), aᵀXa can be greater than zero if and only if λ is greater than zero.

Derived Definition for Matrix Definiteness -

Based on pointers mentioned in the above Analysis we can tweak the formal definitions of Matrix Definiteness as follows -

A Symmetric Matrix, X, is Positive Definite if λᵢ > 0 ∀ i
A Symmetric Matrix, X, is Positive Semi-Definite if λᵢ ≥ 0 ∀ i
A Symmetric Matrix, X, is Negative Definite if λᵢ < 0 ∀ i
A Symmetric Matrix, X, is Negative Semi-Definite if λᵢ ≤ 0 ∀ i

Alternate Definition for Determining Matrix Definiteness -

Computing Eigenvalues for large matrices often involves solving linear equations.
Large Matrices involve solving linear equations in a large number of variables. Hence, it may not always be feasible to solve for Eigen Values.

An Alternate Theorem for Determining Definiteness of Matrices comes to the rescue -

Theorems -

A Symmetric Matrix, X, is Positive Definite if Dₖ> 0 ∀ k Where Dₖ represents the kth Leading Principal Minor.
A Symmetric Matrix, X, is Positive Semi-Definite if Dₖ ≥ 0 ∀ k Where Dₖ represents the kth Principal Minor.
A Symmetric Matrix, X, is Negative Definite if Dₖ< 0 ∀ k Where Dₖ represents the kth Leading Principal Minor.
A Symmetric Matrix, X, is Negative Semi-Definite if Dₖ≤ 0 ∀ k Where Dₖ represents the kth Principal Minor.

Principal Minors of a Matrix -

For a Square Matrix, X, a Principal Submatrix of order k is obtained by deleting any (n-k) rows and corresponding (n-k) columns.
The determinant of such a Principal Submatrix is called the Principal Minor of matrix X.

Leading Principal Minors of a Matrix -

For a Square Matrix X, a Leading Principal Submatrix of order k is obtained by deleting last (n-k) rows and corresponding (n-k) columns.
The determinant of such a Leading Principal Submatrix is called the Leading Principal Minor of matrix X.

Testing Convexity of a Quadratic Function —

In this section, let's try and apply all the learning from this blog and put it to the test.

checking for convexity of a quadratic function

Properties of Convex functions -

Constant functions of the form f(x) = c are both convex and concave.
f(x) = x is both convex and concave.
Functions of the form f(x) =xʳ with r ≥ 1 are convex on the Interval 0 < x < ∞
Functions of the form f(x) = xʳ with 0 < r < 1 are concave on the Interval 0 < x < ∞
The function f(x) = łog(x) is concave on the interval 0 < x < ∞
The function f(x) = eˣ is convex everywhere.
If f(x) is convex, then g(x) = cf(x) is also convex for any positive value of c.
If f(x) and g(x) are convex then their sum h(x) = f(x) + g(x) is also convex.

Final Comments -

We have investigated convex functions in depth while studying different ways of proving convexity.
In the next article, we shall prove the convexity of the MSE loss function used in linear regression.
Convex functions are easy to minimize thus resulting in their usage across research domains.
So, the next time you want to plug in a different loss function in place of MSE while training a linear regression model, you might as well check its convexity.

Let's have a chat :

https://www.linkedin.com/in/pritish-sunil-jadhav-537075a4/