Regular
AccelerationAssistance systemArtificial intelligenceAmerican footballAccess ControlAutonomous drivingAlignment
Biomechnical strainBidirectional LSTMBody motion analysis
Convolutional neural network applicationsCutting edge evaluationComputer VisionCamera calibrationComputer visionComputational ImagingConcurrent Two-factor AuthenticationChannel attention
Data fusionData augmentationDeep LearningDeep learningDisparity calculationDigital Holography
Event cameraEye tracking
Facial Action AnalysisFeatureFixmatch
Golf swing
HOG
Inline InspectionIdentity VerificationIllumination ScenariosIndustrial inspectionImage ProcessingIntelligent robots
Low cost hardwareLight field imaging
Mechanical measurementsMental stress estimationMarginal lossMachine learningMixmatchManufacturing infrastructureMachine Learning
neural networksNovel vision systems
Object detection
Protected workshopsPlayer labeling
Quality assessment
RefocusRecognition
SVMSemi-supervised learningSports strategy analysisSheet metal productionScene understandingSports analysisSensing and imaging techniquesSpatial attentionSensor fusion
Unlabeled data
Vision-based worker assistance
Wearables
3D recovery
 Filters
Month and year
 
  16  1
Image
Pages A06-1 - A06-6,  © Society for Imaging Science and Technology 2021
Digital Library: EI
Published Online: January  2021
  116  49
Image
Pages 301-1 - 301-6,  © Society for Imaging Science and Technology 2021
Volume 33
Issue 6

For quality inspection in different industries, where objects may be transported at several m=s, acquisition and computation speed for 2d and 3d imaging even at resolutions in the micrometer (mm) scale is essential. AIT's well-established Inline Computational Imaging (ICI) system has until now used standard multilinescan cameras to build a linear light field stack. Unfortunately, this image readout mode is only supported by few camera manufacturers thus effectively limiting the application of ICI software. However, industrial grade area scan cameras now offer frame rates of several hundred FPS, so a novel method has been developed that can match previous speed requirements while upholding and eventually surpassing previous 3D reconstruction results even for challenging objects. AIT's new area scan ICI can be used with most standard industrial cameras and many different light sources. Nevertheless, AIT has also developed its own light source to illuminate a scene by high-frequency strobing tailored to this application. The new algorithms employ several consistency checks for a range of base lines and feature channels and give robust confidence values that ultimately improve subsequent 3D reconstruction results. Its lean output is well-suited for realtime applications while holding information from four different illumination direction. Qualitative comparisons with our previous method in terms of 3d reconstruction, speed and confidence are shown at a typical sampling of 22mm=pixel. In the future, this fast and robust inline inspection scheme will be extended to microscopic resolutions and to several orthogonal axes of transport.

Digital Library: EI
Published Online: January  2021
  244  31
Image
Pages 302-1 - 302-6,  © Society for Imaging Science and Technology 2021
Volume 33
Issue 6

Understanding human action from the visual data is an important computer vision application for video surveillance, sports player performance analysis, and many IoT applications. The traditional approaches for action recognition used hand-crafted visual and temporal features for classifying specific actions. In this paper, we followed the standard deep learning framework for action recognition but introduced channel and spatial attention module sequentially in the network. In a nutshell, our network consists of four main components. First, the input frames are given to a pre-trained CNN for extracting the visual features and the visual features are passed through the attention module. The transformed features maps are given to the bi-directional LSTM network that exploits the temporal dependency among the frames for the underlying action in the scene. The output of bi-direction LSTM is given to a fully connected layer with a softmax classifier that assigns the probabilities to the actions of the subject in the scene. In addition to cross-entropy loss, the marginal loss function is used that penalizes the network for the inter action classes and complimenting the network for the intra action variations. The network is trained and validated on a tennis dataset and in total six tennis players' actions are focused. The network is evaluated on standard performance metrics (precision, recall) promising results are achieved.

Digital Library: EI
Published Online: January  2021
  81  19
Image
Pages 303-1 - 303-7,  © Society for Imaging Science and Technology 2021
Volume 33
Issue 6

Annotation and analysis of sports videos is a challenging task that, once accomplished, could provide various benefits to coaches, players, and spectators. In particular, American Football could benefit from such a system to provide assistance in statistics and game strategy analysis. Manual analysis of recorded American football game videos is a tedious and inefficient process. In this paper, as a first step to further our research for this unique application, we focus on locating and labeling individual football players from a single overhead image of a football play immediately before the play begins. A pre-trained deep learning network is used to detect and locate the players in the image. A ResNet is used to label the individual players based on their corresponding player position or formation. Our player detection and labeling algorithms obtain greater than 90% accuracy, especially for those skill positions on offense (Quarterback, Running Back, and Wide Receiver) and defense (Cornerback and Safety). Results from our preliminary studies on player detection, localization, and labeling prove the feasibility of building a complete American football strategy analysis system using artificial intelligence.

Digital Library: EI
Published Online: January  2021
  78  29
Image
Pages 310-1 - 310-7,  © Society for Imaging Science and Technology 2021
Volume 33
Issue 6

In future manufacturing human-machine interaction will evolve towards flexible and smart collaboration. It will meet requirements from the optimization of assembly processes as well as from motivated and skilled human behavior. Recently, human factors engineering has substantially progressed by means of detailed task analysis. However, there is still a lack in precise measuring cognitive and sensorimotor patterns for the analysis of long-term mental and physical strain. This work presents a novel methodology that enables real-time measurement of cognitive load based on executive function analyses as well as biomechanical strain from non-obtrusive wearable sensors. The methodology works on 3D information recovery of the working cell using a precise stereo measurement device. The worker is equipped with eye tracking glasses and a set of wearable accelerometers. Wireless connectivity transmits the sensor-based data to a nearby PC for monitoring. Data analytics then recovers the 3D geometry of gaze and viewing frustum within the working cell and furthermore extracts the worker's task switching rate as well as a skeleton-based approximation of worker's posture associated with an estimation of biomechanical strain of muscles and joints. First results enhanced by AI-based estimators demonstrate a good match with the results of an activity analysis performed by occupational therapists.

Digital Library: EI
Published Online: January  2021
  66  15
Image
Pages 311-1 - 311-7,  © Society for Imaging Science and Technology 2021
Volume 33
Issue 6

Working in protected workshops places supervisor workers in a work field with concurrent targets. On the one side, the workers with disabilities require a safe space to meet special requirements and on the other side, customers expect comparable time and quality standards than in the normal industry while maintaining cost pressure. We propose a technical solution to support the supervisors with the quality control. We developed a flexible assistance system for people with disabilities working in protected workshops that is based on a Raspberry Pi4 and uses cameras for perception. It is appliable for packaging and picking processes and is supported by additional step by step guidance to reach as many protected workshops as possible. The system tries to support supervisors in quality control and provide information if any action is required to free time for interpersonal matters. An automatic pick-by-light system is included which uses hand recognition. To ensure good speed we used image processing and verified the detections with a machine learning approach for robustness against lighting conditions. In this paper we present the system, which is available open source, itself with its features and the development of the machine learning algorithm.

Digital Library: EI
Published Online: January  2021
  42  2
Image
Pages 313-1 - 313-7,  © Society for Imaging Science and Technology 2021
Volume 33
Issue 6

Deep semi-supervised learning (SSL) have been significantly investigated in the past few years due to its broad spectrum of theory, algorithms, and applications. The extensive use of the SSL methods is dominant in the field of computer vision, for example, image classification, human activity recognition, object detection, scene segmentation, and image generation. In spite of the significant success achieved in these domains, critically analyzing SSL methods on benchmark datasets still presents important challenges. In the literature, very limited reviews and surveys are available. In this paper, we present short but focused review about the most significant SSL methods. We analyze the basic theory of SSL and the differences among various SSL methods. Then, we present experimental analysis to compare these SSL methods using standard datasets. We also provide an insight into the challenges of the SSL methods.

Digital Library: EI
Published Online: January  2021
  77  14
Image
Pages 314-1 - 314-7,  © Society for Imaging Science and Technology 2021
Volume 33
Issue 6

In sheet metal production, the quality of a cut edge determines the quality of the cut itself. Quality criteria such as the roughness, the edge slope, and the burr height are of decisive importance for further application and quality determination. In order to be able to determine these criteria analytically, the depth information of the edge must be determined at great expense. The current methods for obtaining the depth information are very time-consuming, require laboratory environments and are therefore not suitable for a fast evaluation of the quality criteria. Preliminary work has shown that it is possible to make robust and accurate statements about the roughness of a cut edge based on images when using an industrial camera with a standard lens and diffuse incident light, if the model used for this purpose has been trained on appropriate images. In this work, the focus is on the illumination scenarios and their influence on the prediction quality of the models. Images of cut edges are taken under different defined illumination scenarios and it is investigated whether a comprehensive evaluation of the cut edges on the evaluation criteria defined in standards is possible under the given illumination conditions. The results of the obtained model predictions are compared with each other in order to make a statement about the importance of the illumination scenario. In order to investigate the possibility of a mobile low-cost evaluation of cut edges, cheap hardware components for illumination and a smartphone for image acquisition are used.

Digital Library: EI
Published Online: January  2021
  28  5
Image
Pages 317-1 - 317-3,  © Society for Imaging Science and Technology 2021
Volume 33
Issue 6

A novel acceleration strategy is presented for computer vision and machine learning field from both algorithmic and hardware implementation perspective. With our approach, complex mathematical functions such as multiplication can be greatly simplified. As a result, an accelerated machine learning method requires no more than ADD operations, which tremendously reduces processing time, hardware complexity and power consumption. The applicability is illustrated by going through a machine learning example of HOG+SVM, where the accelerated version achieves comparable accuracy based on real datasets of human figure and digits.

Digital Library: EI
Published Online: January  2021
  184  24
Image
Pages 318-1 - 318-7,  © Society for Imaging Science and Technology 2021
Volume 33
Issue 6

Identity verification is ubiquitous in daily life. Its applications range from unlocking mobile device to accessing online account, boarding airplane or other types of transportation, recording times of arrival and leaving work, controlling access to a restricted area, facility, or vault, and many more. The traditional and the most popular identity verification is password authentication but with many challenges. Human biometric identifiers like fingerprint, retina scan, and 2D or 3D facial features have become popular alternatives. Some applications use two-factor or multi-factor authentication to increase system security, e.g., password and login code sent to a mobile device. All these identity verification methods have their challenges ranging from forgotten or stolen password to unaware or unintentional authentication and complexity and high costs. This paper presents a promising alternative that could be an improvement to the existing identity verification methods. This improved identity verification is a two-factor approach that concurrently analyzes facial features and unique facial actions. The user's facial features and facial actions must both match what have been stored in the system in order to pass identity verification. This two-factor verification requires only the frontal view of the face and authenticates facial features and facial actions concurrently. It generates an embedding of facial features and facial action in a short video for matching. We name this method Current Two-Factor Identity Verification (C2FIV). Two frameworks that use recurrent neural networks to learn the representation of facial features and actions. One uses an auto-encoder, and the other one uses metric learning. Experimental result shows that the metric learning model performs reliably with an average precision of 98.8%.

Digital Library: EI
Published Online: January  2021

Keywords

[object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object] [object Object]