Online PCA with Spectral Bounds
This paper revisits the online PCA problem. Given a stream of $n$ vectors $x_t \in \R^d$ (columns of $X$) the algorithm must output $y_t \in \R^\ell$ (columns of $Y$) before receiving $x_{t+1}$. The goal of online PCA is to simultaneously minimize the target dimension $\ell$ and the error $\|X - (XY^\pinv)Y\|^2$. We describe two simple and deterministic algorithms. The first, receives a parameter $\Delta$ and guaranties that $\|X - (XY^\pinv)Y\|^2$ is not significantly larger than $\Delta$. It requires a target dimension of $\ell = O(k/\eps)$ for any $k,\eps$ such that $\Delta \ge \eps\sigma_1^2 + \sigma_{k+1}^2$. The second receives $k$ and $\eps$ and guaranties that $\|X - (XY^\pinv)Y\|^2 \le \eps\sigma_1^2 + \sigma_{k+1}^2$. It requires a target dimension of $O( k\log n/\eps^2)$. Different models and algorithms for Online PCA were considered in the past. This is the first that achieves a bound on the spectral norm of the residual matrix.