Speeding up the multimedia feature extraction: a comparative study on the big data approach

Warning

This publication doesn't include Institute of Computer Science. It includes Faculty of Informatics. Official publication website can be found on muni.cz.

Authors	MERA PÉREZ David BATKO Michal ZEZULA Pavel
Year of publication	2017
Type	Article in Periodical
Magazine / Source	Multimedia Tools and Applications
MU Faculty or unit	Faculty of Informatics
Citation
web	http://dx.doi.org/10.1007/s11042-016-3415-1
Doi	http://dx.doi.org/10.1007/s11042-016-3415-1
Field	Informatics
Keywords	Big data;Image feature extraction;Map Reduce;Apache Storm;Apache Spark;Grid computing
Description	The current explosion of multimedia data is significantly increasing the amount of potential knowledge. However, to get to the actual information requires to apply novel content-based techniques which in turn require time consuming extraction of indexable features from the raw data. In order to deal with large datasets, this task needs to be parallelized. However, there are multiple approaches to choose from, each with its own benefits and drawbacks. There are also several parameters that must be taken into consideration, for example the amount of available resources, the size of the data and their availability. In this paper, we empirically evaluate and compare approaches based on Apache Hadoop, Apache Storm, Apache Spark, and Grid computing, employed to distribute the extraction task over an outsourced and distributed infrastructure.
Related projects:	Centrum pro multi-modální interpretaci dat velkého rozsahu