When Volunteered Geographic Information consists of sensor data, in many cases we have to deal with readings from inaccurate, noisy and differently calibrated sensors from a variety of manufacturers. In general the data quality from these sensors cannot compete with expensive, professional sensors. However it is still necessary to have these sensors in order to keep the data collection process affordable for a wider range of volunteers, which leads to a better availability of a big amount of data.

In our group work in Heidelberg, we investigated on methods for improving sensor data quality and consistency. We tackled this problem as a regression task, where we tried to find a mapping between the readings of one set of sensors to the other. The sensor readings were modelled as time series where the task was to predict one series from the other.

We had two datasets to test our approaches:

  1. Sensor measurements from an affordable, self-built sensorbox and black carbon values measured from a professional sensor. The data stem from previous research and we tried to predict the black carbon values from the readings of the self-built sensor box.
  2. Self-recorded time series of acceleration values measured from two smartphones which were kept in the same pocket. Here we tried to predict the sensor data of one smartphone from the other. We can use this idea to either improve data quality through a reference sensor or to improve the consistency amongst different sensors.

We tried various machine learning techniques to solve our task. A lot of work went into finding the best parameters for these methods as well as finding ways to speed up our calculations. So far our results have been promising. For the first task we came close to the baseline while for the second we managed to find predictions that are a lot closer to the ground truth sensor than the original two sensors were to each other.

In the future we want to beat the baseline for the first dataset by using more elaborate prediction models (e.g. sequential models). We also plan to use these techniques to improve the consistency of sensor data from various smartphone manufacturers.