I sometimes need a function, and want it to be both weird and sexy. To define this function, I want to be unstructured, I don’t want to limit my imagination. I want to be able to define a set of ordered points, and then have a function that when plotted, its graph passes through them. But I need more than just being able to plot the function, I need to know its exact expression, so that I can use it later. Can we do that? That is, can we find a closed form to express such a function?

And as Obama would have said many years ago, yes we can! Given n+1 points, there exists a polynomial of degree n that meets this condition. Not that I particularly wanted a polynomial, but I can settle with that. The idea sounds more sexy the moment you are told that there is a unique polynomial of degree n complying with such constraints.

Do you want to find it? I want. Let’s go!

A polynomial, what?

A polynomial of degree n is a function P(x) that can be written as

(1)P(x)=k=0nakxk=a0+a1x+a2x2++anxn

The set {a0,a1,,an} are the coefficients of the polynomial. As the values of these coefficients change, we obtain a different polynomial each time.

Polynomials of $x$
Polynomials of x.

If we plot P(x), the exact shape of the curve we obtain depends on the coefficients of the polynomial, as shown on the figure on top.

A system of equations

Our problem is finding the coefficients’ values for which the graph of P(x) indeed traverses the requested n+1 points.

And what can we do to find them? Well, the points need to be in the image of P(x). That is, for each point (xj,yj) in the set, we know that P(xj)=yj must hold. As we have n+1 points, evaluating P(x), we can create a system of n+1 equations.

(2)P(x0)=a0+a1x0+a2x02++anx0n=y0(3)P(x1)=a0+a1x1+a2x12++anx1n=y1(4)P(xn)=a0+a1xn+a2xn2++anxnn=yn

To find the answer to our problem, we need to solve the system.

Things look good a priori, as in general we can only find the values of n+1 unknowns (the coefficients) if we have the same number of equations.

Bring on the matrices

We switch to matrix notation and define

(5)X=[1x0x0n1x1x1n1xnxnn]a=[a0a1an]y=[y0y1yn]

so that then we can describe our problem more compactly as

(6)Xa=y

In particular, XR(n+1)×(n+1) is not just any other square matrix, it is actually known as the the Vandermonde matrix.

A solution, but not THE solution

Thanks to good old Algebra, after switching the notation, we see that in the end, the solution is to simply get

(7)a=X1y

End of story? Well…just hold on to your belts!

First, we should ask ourselves whether X1 even exists. As X is a square matrix, it could indeed be the case. In fact, I am sure there is a proof for this, but just looking at the expression, you can already tell that the inverse of this matrix exists as long as jk,xjxk. That is, there should not be two points requesting some image for the same domain value. Say we had (xp,ya) and (xp,yb) in our set of points. If yayb, then the problem would be messed up from scratch: a well-defined function can only have a unique image per domain element. If instead ya=yb, then we would have a point that repeats in our set, and we would only have be able to construct n linearly independent equations. In this case, rather than a unique polynomial of degree n, the solution would consist of a sub-space of polynomials.

Second, to compute X1, how much computational power do we need? Again without proof, but without doubts neither, I’m sorry to break it to you, me, and everyone else, but computing this matrix is an O(n3) operation…so as soon as n becomes of a moderate value, good luck waiting for that to run!

Fortunately, Lagrange came to the rescue and saved us all.

Lagrange Interpolation

Lagrange changed a little bit the way we were thinking about the problem, and proposed to think the polynomial P(x) as the sum of n+1 functions

(8)P(x)=i=0nαiϕi(x)=α0ϕ0(x)+α1ϕ1(x)++αnϕn(x)

and then proved that the solution was to define

(9)αi=yi (10)ϕi(x)=jixxjjixixj

Analyzing the expressions, we see that ji,ϕi(xj)=0. In addition, ϕi(xi)=1 and as αi=yi, then αiϕi(xi)=yi. This way, for each point (xk,yk), out of all the functions composing P(x), there is only αkϕk(x) that evaluated at xk returns yk, while the other n functions are nil.

I mean…what a beast, such a clever and elegant solution. Chapeau!

The interpolation in action

Let’s analyse one case to convince ourselves! Say we want a function for the following 5 points.

Points that must be on the image of P(x).
Points that must be on the image of P(x).

I guess we would all panic if we are asked to solve this, but we just learned about Lagrange’s interpolation, so we know we can find the polynomial of degree 4 that meets this condition.

In particular, let’s gather the functions that compose P(x):

xx = np.linspace(0, 4, 200)

a1phi1 = -2.8 * \
    ((xx - 0.93) * (xx - 1.73) * (xx - 2.57) * (xx - 3.49)) / \
    ((0.36 - 0.93) * (0.36 - 1.73) * (0.36 - 2.57) * (0.36 - 3.49))

a2phi2 = 1.64 * \
    ((xx - 0.36) * (xx - 1.73) * (xx - 2.57) * (xx - 3.49)) / \
    ((0.93 - 0.36) * (0.93 - 1.73) * (0.93 - 2.57) * (0.93 - 3.49))

a3phi3 = -0.5 * \
    ((xx - 0.36) * (xx - 0.93) * (xx - 2.57) * (xx - 3.49)) / \
    ((1.73 - 0.36) * (1.73 - 0.93) * (1.73 - 2.57) * (1.73 - 3.49))

a4phi4 = 2.77 * \
    ((xx - 0.36) * (xx - 0.93) * (xx - 1.73) * (xx - 3.49)) / \
    ((2.57 - 0.36) * (2.57 - 0.93) * (2.57 - 1.73) * (2.57 - 3.49))

a5phi5 = -1.8 * \
    ((xx - 0.36) * (xx - 0.93) * (xx - 1.73) * (xx - 2.57)) / \
    ((3.49 - 0.36) * (3.49 - 0.93) * (3.49 - 1.73) * (3.49 - 2.57))

And let’s just plot and see how these functions look like.

Functions that compose P(x).
Functions that compose P(x).

The first feeling when you look at this is “oh, it will be hard to prove that this truly adds up to what we need”. But to simplify, recall that each function focuses on one point at a time, while ensuring it does not bother the other. This is why, for the points highlighted, there is only one where y0 and for the others y=0. Things are actually looking good then!

Final step, let’s add these functions and see what we get.

Resulting P(x).
Resulting P(x).

An image sometimes is worth more than a million words, so I will limit myself to simpley say beautiful.

Generalizing: any and all polyomials?!

A natural question that follows is whether we could not have found a more “compact” solution that the one that Lagrange proposed. In short, the answer is no, but let’s reason about it. For example, if we were able to find some linear dependency among the ϕi(x) functions, removing it we could maybe come up with another solution for P(x), composing it with less than n+1 functions. Well, dreaming is ok, but if that is how we come up with solutions, Lagrange clearly had very sweet dreams long ago: his solution actually conforms a basis for the space of polynomials of degree n (note that this space is of dimension n+1).

The fact that the set of Lagrange’s functions ϕi(x) conforms a basis means that, relying these functions:

  • we can produce all the polynomials of degree n, and;
  • for each polynomial, there is a unique combination of the ϕi(x) functions that produce them.

For example, another basis for the same sub-space is the canonical one, conformed by the functions {1,x,x2,xn}, as shown below for n=4.

Polynomials of $x$
Polynomials of degree n1 for the canonical basis of degree 4.

I know, if we compare the functions on the plot we just obtained for the canonical basis with the ones from Lagrange’s interpolation we calculated before, it is human to hesitate …are we sure that the very clever formulation of Lagrange is also a basis? Well, only reason can defeat fear, so let’s prove it to convince ourselves.

To verify that Lagrange’s functions conform a basis of the n-degree polynomials, we need to work with a linear combination these functions

(11)L(x)=β0ϕ0(x)+β1ϕ1(x)++βnϕn(x)

If we can show that x,L(x)=0i,βi=0, then it is true.

As the aforementioned condition must hold for all x, then in particular it must hold for {x0,x1,,xn}, the x-coordinates of the n+1 points used to define the ϕi(x) functions. Without loosing generality, taking x0 as an example, we can compute

(12)L(x0)=β0ϕ0(x0)+β1ϕ1(x0)++βnϕn(x0)=0

but based on our previous analysis, we know that this ends up evaluating to L(x0)=β0=0. We can iterate over the n remaining xi values following a similar procedure to prove that in each step i, we need βi=0 to hold. And once this is done, we just need to prove the other direction of the implication, but the proof is trivial. Hence, voila, we have proved that Lagrange’s solution is indeed a basis of the n polynomial space.

We need to agree that Lagrange’s solution is simply beautiful, isn’t it?

Implementing Lagrange’s interpolation

Now that we know many things about Lagrange’s interpolation, the last point I want to address is how to implement it efficiently in code. To reason about this, let’s first write down the final expression of P(x):

(13)P(x)=i=0nyijixxjjixixj

And I will continue writing soon…