Sohrab Ferdowsi

 

Biography

Sohrab Ferdowsi is a post-doctoral fellow with the Stochastic Information Processing group of the department of Computer Science at the University of Geneva. He obtained his PhD in 2018 in the same group, where he was a research and teaching assistant since 2012 and under the supervision of Prof. Slava Voloshynovskiy. 

Prior to that, he obtained his MSc in Biomedical Engineering in 2012 and with the department of Electrical Engineering of Sharif University of Technology in Tehran.

Current activities:

Hired as a postdoc within an industrial project funded by Innosuisse and in partnership with U-NICA, I am responsible for improving the performance of an object authentication system using Convolutional Neural Networks (CNNs). This is about authentication of consumer-product packages using images taken from hand-held phone cameras. Along with classical image processing concepts, I’m using CNNs to learn the specificities of the printing and acquisition technologies to help identify authentic from fake packages.

Apart from the industrial project and as a member of the SIP group of unige, I also do academic research in the intersection between signal processing and machine learning. I believe, this is a very promising direction as many related concepts have been developed quite independently within these two disciplines and now their joint treatment can bring about very powerful synergies. Below I summarize some of the highlights of my research. 

Research Interests

Generally speaking, I work with high-dimensional and usually vectorial data like digital images and I do my research somewhere in the intersection between signal processing and (deep) machine learning. This is sometimes accompanied  with a touch of information theory, as I find it very relevant to and useful for machine learning research.

More particularly, my focus is the following four related topics, but with a couple more side and hobby projects:

  • (Unsupervised) representation learning: While human-supervised data is very expensive, unlabelled data is abondant and often easily available from various sources. It is then highly desired to efficiently make use of this readily available source of information, a task which, as of today,  is still much less successful than supervised machine learning. The idea is to come up with task-agnostic representations of such data using machine learning, such that they are useful (a priori) for a bunch of other data processing tasks, perhaps in the presence of some further side information like class labels. In order to achieve this, I argue that the most natural objective to set is “multi-rate lossy compression”, i.e., to learn to compress e.g., images by targeting the best rate-distortion trade-offs. Towards this end, I dedicated a good portion of my PhD thesis to come up with solutions that learn to compress vectorial data. 
  • Large-scale similarity search: A fundamental idea in computer science is to associate semantic similarity in the real world with neighbourhood in the space of feature vectors. For example, a content-based image retrieval system maps images with similar semantics to real-valued feature vectors that have small distances to each other. A major difficulty in these systems, however, is that usually both the number of vectors and their dimensionality is large, so that, for each queried image by the users,  the retrieval system has to find similar-looking images from say a billion registered images in its database, each represented as a vector of usually around one thousand dimensions. This makes the complexity of direct and exact search prohibitive. Therefore, in practice, approximative solutions sacrificing exactness of search for memory and computational complexity gains should be adopted. My research is focused on finding the best trade-offs to make this happen. A lot of considerations should be taken into account here, due to the sheer scale of the problem, and hence, a lot of standard techniques will not be applicable. For example, classical discriminative machine learning will not be practical, as the cost of 1-vs-all discrimination will be very high. In my research, I came up with the framework of Sparse Ternary Codes (STC), an interesting alternative to existing solutions like binary hashing.
  • Solving (ill-posed) inverse problems: It is desired in many applications in science and technology to “undo” what nature or physics or a (faulty) instrument has done. For example, a low-quality camera captures an image of a scene being contaminated with ambient noise that we want to remove or a CT scanner measures output X-ray radiations that it sends to a patient’s body, and we are interested in reconstructing the image of the body itself.  An important difficulty with these problems is that they are usually ill-posed. In other words, we usually have less measurements than is mathematically required to invert the process, e.g., we only have one noisy image to denoise or we should solve an under-determined system of linear equations from the radiation projections. The solution to these problems should, therefore, consider a priori knowledge about the phenomenon under question, along with the available measurements, e.g., the fact that natural images are smooth or CT images should be sparse under some bases should come to our help. The quality of the solutions to the inverse problems is, therefore, largely dependent on the quality of the priors. In my research, I am considering different alternatives, e.g., in line with the representation learning problem discussed above, I am proposing “compressibility as a prior”, i.e., to favour solutions that are better compressible under a network that we have trained before.
  • Developing more efficient architectures for deep learning: Particularly in the last couple of years and in spite of its much longer dormant history, deep learning has very rapidly turned into a mature and leading technology in many areas.  Its success, among other factors like computational power or the availability of massive data, depends on the development of more efficient architectures and the techniques to make optimization faster. In my opinion, better architectures can be achieved by incorporating better domain knowledge into the standard pipeline. For example, the very fact that natural images have translation-invariant properties makes 2D convolution a very successful operation in image processing. The incorporation of this classical operation into the deep learning pipeline, hence, resulted in the very famous and successful  CNN, making learning on images much more efficient and requiring far less number of samples compared to the baseline fully-connected networks. I believe, potentially there is many more of these instances from this general idea that can be used to help do better deep learning. Inspired by my background in signal processing and the recent experiences I’ve gained in deep learning, I’m trying to develop better alternatives to the standard CNN. In my PhD thesis, I’ve designed a pilot network that, unlike the standard approach to initialize weights of a neural network randomly, its weights are initialized with analytical solutions. As a proof of concept, I’m showing that, at least under a particular setup, this helps a lot with better sample efficiency and faster optimization convergence. I’m now trying to make this idea applicable under broader setups and more instances.

Please take a look at my PhD thesis here. There I explain these ideas and concepts with much more details and better rigour.

 

Publications

You may find my publications on my Google Scholar page or through SIP group’s publication page.

Profiles

Usually, most of my time is spent on coding and I will be trying to open-source my codes as much as possible. I cannot open source the industrial projects, and a lot of ongoing works are private repositories. Still you may find some of the past works on my GitHub page, where new stuff will be added soon.

I also have profiles on LinkedIn, Google Scholar and ResearcheGate.

Teaching Activities

Currently, I'm not doing any teaching activities. During my PhD, I was a TA for the following courses, where I sometimes actively took part in the curricular design: 

  •  Elements of information theory (Éléments de la theorie de l'information) - a bachelor's course (in French) teaching the basic concepts of information theory
  • Analysis and processing of information - a master's course (in English) teaching concepts from signal processing and linear algebra