NRP 75 "Big Data" Project: BigAstro

People from SIP



People from FHNW


Astronomical observations produce a wealth of data in excess of several TB per day. Clearly, even a fraction of such data cannot be analyzed manually. In this project, we will investigate the use of big data analytics tools such as machine learning techniques applied to astronomical data. Specifically, we will consider observations of solar flares – magnetic eruptions that influence the whole solar system and cause space weather phenomena on Earth such as blackouts and problems in aircraft communication and GPS positioning.

So far, flares are neither understood, nor can they be reliably predicted. The problem is that the patterns found in flares and in their temporal evolution are diverse and most complex and the data volume too large to analyze manually. A huge first step towards a better understanding of the underlying physics and the development of space weather forecasting is to systematically identify, collect, and characterize the different spatio-temporal patterns in solar physics data. Here, efficient big data analytics tools such as machine learning techniques are crucial.

We propose an interdisciplinary approach to set up, customize, and optimize analytics capabilities for big data applications in astronomy. The project team consists of astronomers, experts in machine learning and statistical image processing, and specialists in data management systems for Big Data astronomy projects. This interdisciplinary approach will allow for a drastic improvement in the level of science questions that can be addressed and will, in turn, lead to a quantum leap in the understanding of the physics of solar flares and the quality of space weather predictions.

In a first step, various existing state of the art machine-learning techniques for clustering, classification, and outlier detection will be applied. In a second step, we will develop algorithms customized to the science question to be addressed and to the statistics of the input data. For the processing and the analysis of the big data, we will setup a big data analytics system suited for optimizing machine learning algorithms to use cases in astronomy but also in other science domains.

While applied for solar data, our results could be of interest not only for any kind of astronomical data, but also for other applications that have in common a large amount of unlabeled and unstructured data daily produced by distributed sources. Several domains will benefit from our results:

  1. Solar Physics by developing models for flare analysis and prediction, which nowadays cannot be fully exploited due to their Big Data nature,
  2. Machine Learning and Image Processing by contributing to both theory and practice, and
  3. Applications with Big Data by setting up a framework to analyze and classify large datasets, which are of high value not only to other domains in astronomy, but also for example to genomics and medical diagnostics.

Video of presentation

Taken at NRP 75 "Big Data" kick-off meeting - 9th may 2017