Testing Convexity of a function

The Curious Case of Convex Functions

In-depth understanding of convex functions and ways to test convexity.

Pritish Jadhav
6 min readJan 14, 2021
  • Most of the online literature on introduction to machine learning kicks off by covering the Linear Regression algorithm.
  • Typically, the Linear Regression algorithm is detailed out by using Mean Squared Error (MSE) as the loss function.
  • MSE is a convex function. The convexity property unlocks a crucial advantage where the local minima is also the global minima.
  • This ensures that a model can be trained such that the loss function is minimized to its globally minimum value.
  • However, proving the convexity of MSE (or any other loss function) is typically out of scope.
  • In this blog post, we shall work through the concepts needed to prove the convexity of a function.
Copyright © 2014–2016, Sebastian Raschka. All rights reserved.
  • So hang in tight and by the end of this article, you would have a better understanding of -
  1. Square and Symmetric Matrices.
  2. Hessian Matrices.
  3. Proving/Testing the convexity of functions.
  4. Positive and Negative Definite/Semidefinite Matrices.

Without much further adieu, let’s jump into it.

Basics of Matrices -

We shall touch base on some of the basic concepts in Matrices that will help us in proving the convexity of a function. These basics are typically covered in Undergraduate Courses and should sound familiar to you.

Square Matrix -

Given a matrix X of dimensions m × n, X is a square matrix iff (if and only if) m = n. In other words, matrix X is called a square matrix if the number of matrix rows is equal to the number of columns.

square matrix

In the above example, since the number of rows (m) = the number of columns (n), X can be termed as a square matrix.

Symmetric Matrix -

A square matrix, X, is termed as symmetric if xᵢⱼ = xⱼᵢ ∀ i and j where i, j denotes the ith row and jth column respectively. In other words, matrix X is symmetric if the transpose of X is equal to matrix X.

Hessian Matrix

The Hessian of a multivariate function f(x₁, x₂, …xₙ) is a way for organizing second-order partial derivatives in the form of a matrix.

Consider a multivariate function f(x₁, x₂, …xₙ) then its Hessian can be defined as follows -

Hessian Matrix

Note that a Hessian matrix by definition is a Square and Symmetric matrix.

Proving / Checking Convexity of a function -

With all the relevant basics covered in previous sections, we are now ready to define checks for determining the convexity of functions.

A function f(x) defined over n variables on an open convex set S can be tested for convexity using the following criteria -

If Xᴴ is the Hessian Matrix of f(x) then -

  • f(x) is strictly convex in S if Xᴴ is a Positive Definite Matrix.
  • f(x) is convex in S if Xᴴ is a Positive Semi-Definite Matrix.
  • f(x) is strictly concave in S if Xᴴ is a Negative Definite Matrix.
  • f(x) is concave in S if Xᴴ is a Negative Semi-Definite Matrix.

In the next section, let’s discuss ways to test the definiteness of a matrix.

Positive Definite and Semidefinite Matrices -

You may have seen references about these matrices at multiple places but the definition and ways to prove definitiveness remains elusive to many. Let us start with the textbook definition -

An n × n square matrix, X, is Positive Definite if aXa > 0 ∀ a ∈ ℝⁿ.

Similarly,

An n × n square matrix, X, is Postive Semidefinite if aᵀXa ≥ 0 ∀ a ∈ ℝⁿ.

The same logic can be easily extended to Negative Definite and Negative Semidefinite matrices.
An n × n square matrix, X, is Negative Definite if aᵀXa < 0 ∀ a ∈ ℝⁿ.

And finally,
An n × n square matrix, X, is Negative Semidefinite if aᵀXa ≤ 0 ∀ a ∈ ℝⁿ.

  • However, while checking for Matrix Definiteness, validating the condition of every value of `a` is definitely not an option.
  • So let us try and break this down even further.

Eigen Values and Eigen Vector -

We know that an eigenvector is a vector whose direction remains unchanged when a linear transformation is applied to it.

Analysis

  • It is easy to see that the dot product of a with aᵀ will always be positive.
  • Referring back to the definition of positive definite matrix and eq(3), aᵀXa can be greater than zero if and only if λ is greater than zero.

Derived Definition for Matrix Definiteness -

Based on pointers mentioned in the above Analysis we can tweak the formal definitions of Matrix Definiteness as follows -

  1. A Symmetric Matrix, X, is Positive Definite if λᵢ > 0 ∀ i
  2. A Symmetric Matrix, X, is Positive Semi-Definite if λᵢ ≥ 0 ∀ i
  3. A Symmetric Matrix, X, is Negative Definite if λᵢ < 0 ∀ i
  4. A Symmetric Matrix, X, is Negative Semi-Definite if λᵢ ≤ 0 ∀ i

Alternate Definition for Determining Matrix Definiteness -

  1. Computing Eigenvalues for large matrices often involves solving linear equations.
  2. Large Matrices involve solving linear equations in a large number of variables. Hence, it may not always be feasible to solve for Eigen Values.

An Alternate Theorem for Determining Definiteness of Matrices comes to the rescue -

Theorems -

  1. A Symmetric Matrix, X, is Positive Definite if Dₖ> 0 ∀ k Where Dₖ represents the kth Leading Principal Minor.
  2. A Symmetric Matrix, X, is Positive Semi-Definite if Dₖ ≥ 0 ∀ k Where Dₖ represents the kth Principal Minor.
  3. A Symmetric Matrix, X, is Negative Definite if Dₖ< 0 ∀ k Where Dₖ represents the kth Leading Principal Minor.
  4. A Symmetric Matrix, X, is Negative Semi-Definite if Dₖ≤ 0 ∀ k Where Dₖ represents the kth Principal Minor.

Principal Minors of a Matrix -

  • For a Square Matrix, X, a Principal Submatrix of order k is obtained by deleting any (n-k) rows and corresponding (n-k) columns.
  • The determinant of such a Principal Submatrix is called the Principal Minor of matrix X.

Leading Principal Minors of a Matrix -

  • For a Square Matrix X, a Leading Principal Submatrix of order k is obtained by deleting last (n-k) rows and corresponding (n-k) columns.
  • The determinant of such a Leading Principal Submatrix is called the Leading Principal Minor of matrix X.

Testing Convexity of a Quadratic Function —

In this section, let's try and apply all the learning from this blog and put it to the test.

checking for convexity of a quadratic function

Properties of Convex functions -

  • Constant functions of the form f(x) = c are both convex and concave.
  • f(x) = x is both convex and concave.
  • Functions of the form f(x) =xʳ with r ≥ 1 are convex on the Interval 0 < x < ∞
  • Functions of the form f(x) = xʳ with 0 < r < 1 are concave on the Interval 0 < x < ∞
  • The function f(x) = łog(x) is concave on the interval 0 < x < ∞
  • The function f(x) = eˣ is convex everywhere.
  • If f(x) is convex, then g(x) = cf(x) is also convex for any positive value of c.
  • If f(x) and g(x) are convex then their sum h(x) = f(x) + g(x) is also convex.

Final Comments -

  • We have investigated convex functions in depth while studying different ways of proving convexity.
  • In the next article, we shall prove the convexity of the MSE loss function used in linear regression.
  • Convex functions are easy to minimize thus resulting in their usage across research domains.
  • So, the next time you want to plug in a different loss function in place of MSE while training a linear regression model, you might as well check its convexity.

Let's have a chat :

https://www.linkedin.com/in/pritish-sunil-jadhav-537075a4/

--

--