RECfusion is a framework devoted to the automatic processing of video data from many devices, as smartphones, tablets, webcams, surveillance cameras, etc., where all devices are thought to be connected into a 4G LTE network. Exploiting this mobile ultra-broadband connection the communication paradigm between users in the social media context can be augmented: in events like concerts, feasts, expos and so on, users become either producers than fruitors of video data. RECfusion analyzes video streams from several devices and infers semantics performing scene understanding. Key scenes are identified with relation on each video stream and all the other ones; then the system generates a video rendered from a mixage of the selected video streams. In ref. [1] a system based upon visual content popularity has been already implemented in RECfusion. In this work we propose an extension for RECfusion: a novel automatic video cluster tracking algorithm able to identify the different scenes in the gathered video streams selecting for each of them the best recording device.