During the summer of 2013, I took an internship with the Web and Internet Science (WAIS) research group at the University of Southampton. For that summer, it became my responsibility to create the group’s entry for the Placing Task that formed part of the MediaEval multimedia benchmarking initiative. The University hadn’t made any entries to the MediaEval evaluation before, so the pressure was on to create a good first impression!
The main goal of the MediaEval Placing Task was to create a system that could predict the geographical location that an photograph was taken given just the image itself, and also when combined with metadata such as tags and descriptions. To assess how well the different entries for the task performed, each group was supplied with the same set of 262,000 photographs, which they had to provide an estimated location for. The MediaEval organisers had the actual locations for each of these photographs, and so they could determine which system provided the most accurate guesses.
I was lucky enough to be able to get help and guidance from Jon Hare and Sina Samangooei, the genius researchers behind the OpenIMAJ project. Together we devised a system that would use a probability density function over the surface of the Earth to determine the location with the highest probability for each photograph, according to a number of features.
Flickr provided the participants of this task with a training dataset of approximately 8.5 million images and their metadata, along with their actual locations. After trialling a number of different features extracted from both the images themselves, and their associated metadata, we finally settled on what we believed to be the most useful:
- Tags: Every tag in the query image was associated with the coordinates of the training images that had that tag.
- Image features: We decided to use two very different features extracted from the images themselves. We used product-quantised CEDD features to achieve a low-precision/high-recall feature, and we used LSH-hashed SIFT features to achieve a high-precision/low-recall feature. For both features, the locations of the images from the training set with the most similar features were used.
- Location prior: Just by looking at the locations of the images in the training set, it is possible to build an idea of where in the world photographs are taken the most. This was used as a ground-truth, and contributed to the probabilities for each query image.
Each team taking part in the Placing Task was allowed to submit up to 5 separate ‘runs’ to compete with — that is, each team could use their system to generate locations for the set of 262,000 test images up to 5 times using different configurations.
We submitted 5 runs using different combinations of features. In our optimal run, which used the location prior, tags and LSH-SIFT features, we managed to place 26.17% of the test images within 1km of where they had actually been taken. This was the best result achieved in any team that entered the competition, and so we won the Placing Task!
At the end of the summer I went to Barcelona and presented our work at the MediaEval 2013 conference in front of all the other participants. Our success was even featured on the University website!