I have recently worked on a paper about RIP matrices, and decided to learn better about these objects. This post is the adaptation of some notes I wrote on the subject.

Let be natural numbers. A vector is ** -sparse** if it has at most non-zero entries.

Definition 1 (RIP matrices)Let be positive integers and let be a matrix with real entries, let and let be an integer. We say that is-RIPif for every -sparse vector we have

Thus acts almost as an isometry when we restrict our attention to -sparse vectors.

The restricted isometry property is perhaps the `simplest’ property of a matrix which makes it `good’ for compressed sensing. It appeared as part of the outcome of a series of papers by Donoho, and Candès–Romberg–Tao in 2004-2006 where other slightly different properties were studied (see these papers and also this survey). The earliest appearance of the name RIP that I could find was here, but even there the exact property is a little different from the modern one (which first appeared, I believe, here).

In the next section I present a very short introduction to the ideas behind compressed sensing and how RIP matrices naturally arise. A more complete exposition of this material is Tao’s blog post on this subject. In the final section I address the (pure) mathematic interest in RIP matrices, describe some random procedures to construct RIP matrices, and mention some recent efforts to create deterministic algorithms.

** — 1. Digital signals, compressed sensing and RIP — **

A ** signal** is a vector and can represent a digital sound or image. A

**of is the inner product with some given vector . If we want to acquire all the entries (think pixels) of , we need to make measurements, in each taking to be an element of the canonical basis and obtaining one entry (pixel). In this case, the**

*measurement***, i.e., the matrix whose lines are the vectors , is just the identity matrix.**

*measurement matrix*Taking the example when the signal is an image, note that the “interesting” images have a large degree of structure. Indeed, a typical (random) image will have each pixel independent of its neighbors, resulting in something like this picture.

In general, computer scientists realized that, after an orthonormal change of coordinates, an “interesting” signal will have very few non-zero coordinates, when compared to the dimension . In other words, interesting signals are -sparse for a relatively small .

This fact has been used for several years in computers to store signals (which are assumed to be sparse). Instead of storing all the entries of , one instead stores only the non-zero coordinates of together with the positions of those entries. However, during the process of *acquiring* the signal (for instance, taking a photograph) one still needs to capture all the coordinates of .

The idea of compressed sensing is to use much less than measurements and still recover , assuming it is sparse. If one knows which coordinates of are non-zero one only needs measurements to encode . However different signals will have the non-zero entries in different places. Thus what is needed is a measurement matrix such that can be uniquely recovered from , assuming only that is -sparse.

Lemma 2Let be a measurement matrix such that any columns of are linearly independent. Then any -sparse vector can be uniquely determined from .

*Proof:* Assume is as described and for two -sparse signals . Then . Since is -sparse, this implies that there is a linear combination of of the columns of equal to , and by the hypothesis on it follows that .

Thus if any columns of a matrix are linearly independent, one can recover a -sparse signal from only . Unfortunately, this theoretical result gives a very slow algorithm:

Proposition 3Let be a measurement matrix such that any columns of are linearly independent. Let , let and letIf is -sparse, then .

However, to find is an NP-hard problem, and thus impractical in real life problems. An alternative is to find the vector in the hyperplane which minimizes the norm.

There is, in general, no reason to believe that whenever is a sparse vector and we have . However, that holds true for special types of measurement matrices, and in particular when the measurement matrix is RIP:

Theorem 4Let be positive integers, let and let be a matrix satisfying the -RIP. If is -sparse, and is given by (1), then .

It is hard to decide who to credit for this theorem. In this form it appeared in this paper, but it is only the last evolutionary stage of a long collection of similar results, where the RIP is replaced with a slightly different property, some noise is present, or the conclusions are weakened. A good source for more information is this survey.

Theorem 4 tells us that, if one can easily construct RIP matrices, one can use significantly fewer than measurements of a signal , and still recover the signal (assuming it is sparse). Moreover, it is computationally feasible to recover from the measurement .

This is most important in applications where taking measurements is `expensive’ such as an MRI scan or an astronomical observation. Another possible application is in digital cameras, which would not be required to process the signal captured and hence save the battery.

** — 2. Constructing RIP matrices — **

In view of Theorem 4, it is desirable to be able to construct RIP matrices. Specifically, one wants to find a -RIP matrix of dimensions with the number of measurements, , being as small as possible, with respect to the dimension of the signal and the level of sparsity, . For some reason, this is usually expressed as trying to maximize the level of sparsity , depending on the number of measurements and the dimension .

** — 2.1. Random constructions — **

The constructions of RIP matrices with lowest number of measurements use randomness. Here I present just a few examples:

Theorem 5Let be positive integers, let and let . Let be the random matrix defined by one of the following methods.

- (Normal entries)Let the entries of be i.i.d. with a normal distribution .
- (Bernoulli entries)Let the entries of be i.i.d. with a Bernoulli distribution taking the values , each with probability.
- (Random rows of the Discrete Fourier transform)Let be a random subset of size . Let be the matrix obtained from the Discrete Fourier transform matrix (i.e. the matrix with entries for ) by selecting the rows indexed by .
Then is -RIP with probability .

The first two cases of Theorem 5 are from this paper, and the last case is from this one. We are not proving this theorem here, but it is worth noticing that the proof involves an old lemma of W. B. Johnson and J. Lindenstrauss (the father of the fields medalist E. Lindenstrauss) on Lipschitz mappings into Hilbert spaces.

Lemma 6 (Jonhson-Lindenstrauss)Let be positive integers and let be a finite set of cardinality . For every there exists a linear map , with , such thatfor all .

** — 2.2. Deterministic constructions — **

It is somewhat surprising that, while almost every matrix is RIP, constructing one seems to be a very challenging problem. This situation appears in many other places in mathematics, for instance in explicitly finding normal numbers, or finding Ramsey graphs (graphs in vertices without any clique or independent set of size ). The general problem of constructing deterministic objects with a certain property, for which it is know that a random object will have that property has been called finding hay in a haystack.

The problem of finding a deterministic family of matrices of increasing dimensions which satisfies the restricted isometry property was explicitly formulated by Tao in his blog (he uses the term UUP instead of RIP, standing for uniform uncertainty principle). This paper surveys several different approaches that have been used to make progress in this problem.

One of the most successful techniques to construct deterministic RIP matrices is based on the Gershgorin circle theorem:

Theorem 7 (Gershgorin)For each eigenvalue of a matrix , there is an index such that

Using this result one can show the following:

Proposition 8Let be positive integers and let be a matrix where each columns has norm . Then for any we have that is a -RIP, where is the maximum inner product between two columns of .

This proposition has been used to construct RIP matrices with sparsity . However, it has been shown that this is the highest possible level of sparsity one can get from Theorem 7; this follows from the Welch bound.

This `square root bottleneck’ is present in most other deterministic constructions. The only construction (to date) to beat this bottleneck is due to J. Bourgain, S. Dilworth, K. Ford, S. Konyagin and D. Kutzarova, but even there they obtain for an explicit of size

In a series of blog posts, Dustin Mixon and Afonso Bandeira increased this to

which seems to be the best bound currently known.

Reblogged this on Nerdville.

Very nicely summarized. Thanks.