On-the-fly reconstruction of 3D indoor environments has recently become an important research field to provide situational awareness for first responders, like police and defence officers. The protocols do not allow deployment of active sensors (LiDAR, ToF, IR cameras) to prevent the danger of being exposed. Therefore, passive sensors, such as stereo cameras or moving mono sensors, are the only viable options for 3D reconstruction. At present, even the best portable stereo cameras provide an inaccurate estimation of depth images, caused by the small camera baseline. Reconstructing a complete scene from inaccurate depth images becomes then a challenging task. In this paper, we present a real-time ROS-based system for first responders that performs semantic 3D indoor reconstruction based purely on stereo camera imaging. The major components in the ROS system are depth estimation, semantic segmentation, SLAM and 3D point-cloud filtering. First, we improve the semantic segmentation by training the DeepLab V3+ model [9] with a filtered combination of several publicly available semantic segmentation datasets. Second, we propose and experiment with several noise filtering techniques on both depth images and generated point-clouds. Finally, we embed semantic information into the mapping procedure to achieve an accurate 3D floor plan. The obtained semantic reconstruction provides important clues on the inside structure of an unseen building which can be used for navigation.