Subspace Embeddings and ℓp-Regression Using Exponential Random Variables

Oblivious low-distortion subspace embeddings are a crucial building block for numerical linear algebra problems. We show for any real p,1≤p<∞, given a matrix M∈Rn×d with n≫d, with constant probability we can choose a matrix Π with max(1,n1−2/p)poly(d) rows and n columns so that simultaneously for all x∈Rd, ∥Mx∥p≤∥ΠMx∥∞≤poly(d)∥Mx∥p. Importantly, ΠM can be computed in the optimal O(nnz(M)) time, where nnz(M) is the number of non-zero entries of M. This generalizes all previous oblivious subspace embeddings which required p∈[1,2] due to their use of p-stable random variables. Using our new matrices Π, we also improve the best known distortion of oblivious subspace embeddings of ℓ1 into ℓ1 with O~(d) target dimension in O(nnz(M)) time from O~(d3) to O~(d2), which can further be improved to O~(d3/2)log1/2n if d=Ω(logn), answering a question of Meng and Mahoney (STOC, 2013). We apply our results to ℓp-regression, obtaining a (1+ϵ)-approximation in O(nnz(M)logn)+poly(d/ϵ) time, improving the best known poly(d/ϵ) factors for every p∈[1,∞)∖{2}. If one is just interested in a poly(d) rather than a (1+ϵ)-approximation to ℓp-regression, a corollary of our results is that for all p∈[1,∞) we can solve the ℓp-regression problem without using general convex programming, that is, since our subspace embeds into ℓ∞ it suffices to solve a linear programming problem. Finally, we give the first protocols for the distributed ℓp-regression problem for every p≥1 which are nearly optimal in communication and computation.