In recent years, several deep learning-based architectures have been proposed to compress Light Field (LF) images as pseudo video sequences. However, most of these techniques employ conventional compression-focused networks. In this paper, we introduce a version of a previously designed deep learning video compression network, adapted and optimized specifically for LF image compression. We enhance this network by incorporating an in-loop filtering block, along with additional adjustments and fine-tuning. By treating LF images as pseudo video sequences and deploying our adapted network, we manage to address challenges presented by the unique features of LF images, such as high resolution and large data sizes. Our method compresses these images competently, preserving their quality and unique characteristics. With the thorough fine-tuning and inclusion of the in-loop filtering network, our approach shows improved performance in terms of Peak Signal-to-Noise Ratio (PSNR) and Mean Structural Similarity Index Measure (MSSIM) when compared to other existing techniques. Our method provides a feasible path for LF image compression and may contribute to the emergence of new applications and advancements in this field.
Acquisitions of mass-per-charge (m/z) spectrometry data from tissue samples, at high spatial resolutions, using Mass Spectrometry Imaging (MSI), require hours to days of time. The Deep Learning Approach for Dynamic Sampling (DLADS) and Supervised Learning Approach for Dynamic Sampling with Least-Squares (SLADS-LS) algorithms follow compressed sensing principles to minimize the number of physical measurements performed, generating low-error reconstructions from spatially sparse data. Measurement locations are actively determined during scanning, according to which are estimated, by a machine learning model, to provide the most relevant information to an intended reconstruction process. Preliminary results for DLADS and SLADS-LS simulations with Matrix-Assisted Laser Desorption/Ionization (MALDI) MSI match prior 70% throughput improvements, achieved in nanoscale Desorption Electro-Spray Ionization (nano-DESI) MSI. A new multimodal DLADS variant incorporates optical imaging for a 5% improvement to final reconstruction quality, with DLADS holding a 4% advantage over SLADS-LS regression performance. Further, a Forward Feature Selection (FFS) algorithm replaces expert-based determination of m/z channels targeted during scans, with negligible impact to location selection and reconstruction quality.
Video DeepFakes are fake media created with Deep Learning (DL) that manipulate a person’s expression or identity. Most current DeepFake detection methods analyze each frame independently, ignoring inconsistencies and unnatural movements between frames. Some newer methods employ optical flow models to capture this temporal aspect, but they are computationally expensive. In contrast, we propose using the related but often ignored Motion Vectors (MVs) and Information Masks (IMs) from the H.264 video codec, to detect temporal inconsistencies in DeepFakes. Our experiments show that this approach is effective and has minimal computational costs, compared with per-frame RGB-only methods. This could lead to new, real-time temporally-aware DeepFake detection methods for video calls and streaming.
Advances in AI allow for fake image creation. These techniques can be used to fake mammograms. This could impact patient care and medicolegal cases. One method to verify that an image is original is to confirm the source of the image. A deep-learning algorithm(DeepMammo)-based on CNNs and FCNNs, used to identify the machine that created any mammogram. We analyze mammograms of 1574 patients obtained on 7-different mammography machines and randomly split the dataset by patient into training/validation(80%) and test(20%) datasets. DeepMammo has an accuracy of 98.09%, AUC of 95.96% in the test dataset.
Reflectance Transformation Imaging (RTI) is a technique that provides an enhanced visualization experience. The current acquisition methods for Reflectance Transformation Imaging (RTI) are time consuming and computationally expensive. This work investigates the idea of getting best light positions for RTI acquisition using surface topography. We propose automating the RTI acquisition by estimating the surface topography using deep learning method followed by estimating light positions using unsupervised clustering method. This is one shot method which only needs one image. We also created RTI Synthetic dataset in order to carry out experiments. We found that surface topography alone is not sufficient to estimate best light positions for RTI without putting constraints.
We introduce a physics guided data-driven method for image-based multi-material decomposition for dual-energy computed tomography (CT) scans. The method is demonstrated for CT scans of virtual human phantoms containing more than two types of tissues. The method is a physics-driven supervised learning technique. We take advantage of the mass attenuation coefficient of dense materials compared to that of muscle tissues to perform a preliminary extraction of the dense material from the images using unsupervised methods. We then perform supervised deep learning on the images processed by the extracted dense material to obtain the final multi-material tissue map. The method is demonstrated on simulated breast models with calcifications as the dense material placed amongst the muscle tissues. The physics-guided machine learning method accurately decomposes the various tissues from input images, achieving a normalized root-mean-squared error of 2.75%.
Scientific user facilities present a unique set of challenges for image processing due to the large volume of data generated from experiments and simulations. Furthermore, developing and implementing algorithms for real-time processing and analysis while correcting for any artifacts or distortions in images remains a complex task, given the computational requirements of the processing algorithms. In a collaborative effort across multiple Department of Energy national laboratories, the "MLExchange" project is focused on addressing these challenges. MLExchange is a Machine Learning framework deploying interactive web interfaces to enhance and accelerate data analysis. The platform allows users to easily upload, visualize, label, and train networks. The resulting models can be deployed on real data while both results and models could be shared with the scientists. The MLExchange web-based application for image segmentation allows for training, testing, and evaluating multiple machine learning models on hand-labeled tomography data. This environment provides users with an intuitive interface for segmenting images using a variety of machine learning algorithms and deep-learning neural networks. Additionally, these tools have the potential to overcome limitations in traditional image segmentation techniques, particularly for complex and low-contrast images.
Scale invariance and high miss detection rates for small objects are some of the challenging issues for object detection and often lead to inaccurate results. This research aims to provide an accurate detection model for crowd counting by focusing on human head detection from natural scenes acquired from publicly available datasets of Casablanca, Hollywood-Heads and Scut-head. In this study, we tuned a yolov5, a deep convolutional neural network (CNN) based object detection architecture, and then evaluated the model using mean average precision (mAP) score, precision, and recall. The transfer learning approach is used for fine-tuning the architecture. Training on one dataset and testing the model on another leads to inaccurate results due to different types of heads in different datasets. Another main contribution of our research is combining the three datasets into a single dataset, including every kind of head that is medium, large and small. From the experimental results, it can be seen that this yolov5 architecture showed significant improvements in small head detections in crowded scenes as compared to the other baseline approaches, such as the Faster R-CNN and VGG-16-based SSD MultiBox Detector.
Speech emotions (SEs) are an essential component of human interactions and an efficient way of persuading human behavior. The recognition of emotions from the speech is an emergent but challenging area of digital signal processing (DSP). Healthcare professionals are always looking for the best ways to understand patient voices for better diagnosis and treatment. Speech emotion recognition (SER) from the human voice, particularly in a person with neurological disorders like Parkinson's disease (PD), can expedite the diagnostic process. Patients with PD are primarily passed through diagnosis via expensive tests and continuous monitoring that is time-consuming and costly. This research aims to develop a system that can accurately identify common SEs which are important for PD patients, such as anger, happiness, normal, and sadness. We proposed a novel lightweight deep model to predict common SEs. The adaptive wavelet thresholding method is employed for pre-processing the audio data. Furthermore, we generated spectrograms from the speech data instead of directly processing voice data to extract more discriminative features. The proposed method is trained on generated spectrograms of the IEMOCAP dataset. The suggested deep learning method contains convolution layers for learning discriminative features from spectrograms. The performance of the proposed framework is evaluated on standard performance metrics, which show promising real-time results for PD patients.
Videokymographic (VKG) images of the human larynx are often used for automatic vibratory feature extraction for diagnostic purposes. One of the most challenging parameters to evaluate is the mucosal wave's presence and its lateral peaks' sharpness. Although these features can be clinically helpful and give an insight into the health and pliability of vocal fold mucosa, the identification and visual estimation of the sharpness can be challenging for human examiners and even more so for an automatic process. This work aims to create and validate a method that can automatically quantify the lateral peak sharpness from the VKG images using a convolutional neural network.