SIP - Binarized SIFT

Binarized Random Projected SIFT descriptors

In this project, we investigate how robust and distinctive SIFT descriptors are, when they are projected and binarized using so called Random Projections and a simple sign function.

The main goal of this work is to provide an analysis of the statistical properties, and thus the discriminative power, of binarized projected SIFT descriptors. We will primarily investigate the binarized projected SIFT fingerprint statistics on real image datasets, and the statistics of the fingerprints under various distortions.

The dataset used, is the Airplane subset from the Caltech 101 dataset.

Architecture
Random Projections
Testing
Example
Results
Download
References

Architecture

Random Projections

The binary fingerprints are obtained via a two staged process: a dimensionality reduction \(\textbf{W}^{L\times N}\) and binarization \(Q(.)\) The dimensionality reduction of the \(k\)-th SIFT descriptor from image \(m\), \(\textbf{x}^{k}(m)\), is done as follows: \[ \begin{align} \tilde{\textbf{x}}^{k}(m) = \textbf{W}^{L\times N} \textbf{x}^{k}(m). \end{align} \] where \(\textbf{W}^{L\times N} \in \mathcal{R}^{L\times N}\). \(L\) is the number of dimensions \(\textbf{W}^{L\times N}\) will map to, \(N\) is the length of the input column vector, which for SIFT vectors is 128. Random matrix \(\textbf{W}^{L\times N} = (\textbf{W}_1, \textbf{W}_2, \ldots, \textbf{W}_N)^T\) consists of a set of approximately orthonormal basis vectors, where all elements are generated as \(W_{i}[j] \sim \mathcal{N}(0, \frac{1}{N})\), \(1 \le i \le N, 1 \le j \le L\), and as such behaves as an approximate orthoprojector. Finally, binarization is simply done by extracting and storing the sign of all individual elements of all projected SIFT vectors: \[ \begin{align} \textbf{b}_{\textbf{x}^{k}(m)} =& \{sign(\tilde{x}^{k}(m)[1]),sign(\tilde{x}^{k}(m)[2]), \notag \\ &\dots,sign(\tilde{x}^{k}(m)[L]) \}, \end{align} \] where for \(i \in \{1 \ldots L\}\), \(b^{k}_{\textbf{x}(m)}[{i}] \in \{0, 1\}\) and \(\forall a, sign(a) = 1\) if \(a \ge 0\) and 0 otherwise. The binarized projected descriptors are then stored in the database in the bin corresponding to image index \(m\).

Implementation

Random projections are implemented in the function m_pr.m. Example usage is W = m_rp(L, N) where L is the Number of rows of W, or input dimension and N is the Number of columns of W, or the number of dimensions to project to.

Testing

There are three basic tests

Testing intra image descriptor statistics.
Testing channel distortion statistics.
Testing inter image descriptor statistics.

Intra Image descriptor statistics

Channel distortions

Inter image descriptor statistics

Results

Correlations within individual descriptors before and after projection

Below are the average normalized Autocorrelation function values of all descriptors before and after projection. This shows the correlation within each individual binarized projected descriptor, which after projection is virtually zero. These descriptors behave as if generated by \(\mathcal{B}(L, 0.5)\).

Some residual correlation is still present between all descriptor vectors. This follows from the fact that Random Projections are fast, but not optimal. Also, as SIFT descriptors are designed for robustness, equal image patches will result in similar SIFT descriptors in the dataset. Correlation between binarized projected SIFT vectors can be modelled as \(\mathcal{B}(L, 0.3)\).

Channel distortions

If we look at the statistics of binarized projected descriptors when matching identical but distorted images against eachother, we see that the behaviour of the original descriptors carries into the binarized projected domain. The matches in the original domain are taken as a ground truth. Here we have transformed all images with a class 2 isometrie transform with scale and angle parameters \( (s = 0.2, \theta = 10) \)

Intra image matching

If we look at the resulting Hamming distances when we match the binarized projected descriptors from a query image against all descriptors from an image repository, we get the left figure. The expected Hamming distance is 30. As this is lower than the average Hamming distance between binarized projected descriptors from identical but distorted images the Hamming distance can not be used solely in this dataset to identify distorted images. Deployment of a simple heuristic that samples points for geometric consistency enhances the situation considerably.

Example usage

Download

Paper

M. Diephuis, S. Voloshynovskiy, O. Koval, and F. Beekhof, "Statistical Analysis of Binarized SIFT Descriptors," in Proc. 7th International Symposium on Image and Signal Processing and Analysis, Dubrovnik, Croatia, September 4-6, 2011. [pdf|bib], [Presentation]

Matlab Code

demo.zip

Dependencies

The code uses the following libraries:

VLFEAT by Andrea Vedaldi and Brian Fulkerson
XML_IO_tools by Jaroslaw Tuszynski

References and Dependencies

Please site as:

@inproceedings { DiephuisSeptember4-62011,
    author = { M. Diephuis and S. Voloshynovskiy and O. Koval and F. Beekhof },
    title = { Statistical Analysis of Binarized SIFT Descriptors },
    booktitle = { 7th International Symposium on Image and Signal Processing and Analysis },
    year = { September 4-6, 2011 },
    address = { Dubrovnik, Croatia },
    timestamp = { 2011.07.28 }
}