Presenters

Martin Gustavo BERNARDI (Argentina)

COHORT OF 2020-2022

2D and 3D focusing and tomography of UAV-borne Synthetic Aperture Radar images

Abstract:

This talk introduces the topic of Synthetic Aperture Radar (SAR) from the point of view of the Backprojection algorithm. This imaging method is based on the transmission of radio pulses and the subsequent recording of the received echoes. After software processing, 2D and 3D ground images in different frequency bands and modalities can be obtained.

Emphasis is done in the challenges and particularities that pose the studied case, which consists on Time-Domain Backprojection for an UAV-mounted SAR platform flying in arbitrary trajectories. Results are provided for 2D and 3D focusing in P, L and C bands. GPU parallelization is discussed, and insights are provided in regard to possible performance improvements that can be obtained in similar algorithms.

Awet Haileslassie GEBREHIWOT (Ethiopia)

COHORT OF 2018-2020

Back to the Future: Boosting 3D Perception with Privileged Temporal Information

Abstract:

Pseudo-labeling is a powerful tool to tap into large amounts of unlabelled data. Such automatic annotation promises are especially appealing in safety-critical applications where performance requirements are extreme and data annotation is challenging. Sensing 3D dynamic environments for autonomous robotics is such a domain. We propose in this work to boost pseudo-labeling for 3D dynamic perception by leveraging the whole temporal information contained in unlabeled sequences. Building on the teacher-student pseudo-labeling paradigm, in which the teacher provides annotations for unlabeled data to improve the training of the student, we leverage both past and future frames in different ways: (1) We boost the performance of the offline teacher through access to richer information, in particular to the privileged information (PI) from the future, which online student will not see at run-time; (2) We learn and combine multiple teachers with different temporal horizons; (3) We improve the selection of the final pseudo-labels by including the agreement of the multiple teachers in the confidence-based selection criterion. We demonstrate the merit of our approach on the fundamental perception tasks of 3D semantic segmentation, 3D object detection, and motion forecasting in lidar point clouds.

Abel KAHSAY GEBRESLASSIE (Ethiopia)

COHORT OF 2020-2022

Incremental learning on large visual datasets

Abstract:

This work presents a streaming learning method for large visual recognition datasets where models should learn from new data as soon as it becomes available. ImageNet200 image recognition and video set BIRDS dataset for risk situation of frail people are considered in this work. We trained models based on ResNet50 for ImageNet200 and a pooling vision transformer for BIRDS dataset. Subsequently, we trained our models on the streaming set by passing data points one at a time. We base our approach on the existing Move-to-Data(MTD) continual learning method that uses vector projection for weight updates. We introduced MTD with gradient by fusing MTD with gradient-based weight updates and a buffer. We used ExStream streaming learning as a baseline for comparison. On ImageNet200, our newly proposed MTD with gradient achieves an accuracy of 67.24% surpassing the baseline model by 0.5%. On BIRDS, only ExStream and MTD were evaluated and MTD did not perform well in this regard.

Dániel GÉMES (Hungary)

COHORT OF 2015-2017

Image Processing in Nuclear Medicine

Abstract:

Nuclear medicine uses radioactive material inside the body to see how organs or tissue are functioning (for diagnosis) or to target and destroy damaged or diseased organs or tissue (for treatment). Image processing has a crucial role during the entire imaging chain, starting from the image acquisition, through the reconstruction and post-processing of tomographic images. The presentation aims to provide an insight to the various image processing methods and problems.

Mohamed HASSAN (Egypt)

COHORT OF 2020-2022

Road Friction Coefficient Estimation Method Based on the Fusion of Machine Vision, Tire and Vehicle Dynamics

Abstract:

Road type detection and its friction estimation become one of the key features of a safe and sustainable transportation system. By augmenting the machine vision with tire and vehicle sensor information, it is possible to optimize the robustness and performance of a wide range of Advanced Driver Assistance Systems. A camera is used to obtain the front image and based on machine vision algorithms; the road type can be identified. In addition, road friction based on the road type can be estimated. Based on the sensor fusion method, parameters of estimators can be optimized by utilizing the sensor information available fromvision,vehicles,and tires to improve the road type detection and its friction estimation. The obtained simulation and experimental results could demonstrate a robust road type identification as well as its friction estimation.

This work presents two approaches for road surface classification to be used by an autonomous vehicule (dry / wet / snow / ice). 1st approach : the data come from a monocular camera. After an image preprocessing pipeline, 3 blocks compute the features from a sequence of enhanced ROI : CNN, texture analysis (GLCM), reflection attention block. They are classified with a LSTM network. 2nd approach: the features come from CAN bus data, tire dynamics and vehicle dynamics. They are classified with an ensemble classifier (Random Forest, XGboost, SVM). Finally the 2 approaches are fusied by using both type of data.

Oscar JUAREZ DELGADO (Mexico)

Cohort of 2020-2022

Object detection and segmentation for 360 urban stocktaking and Equirectangular projection for data augmentation.

Abstract:

Computer Vision (CV) and Deep Learning (DL) algorithms can play an essential role in the urban mobile mapping, allowing to process of large data volumes through intelligent automatic systems able to leverage decision-making and management. This talk presents the development of two DL-based modules capable of detecting objects at different granularity levels for Semantic Segmentation (SS) and Object Detection (OD). So that they can form part of a more extensive intelligent stocktaking system for urban public infrastructures, providing essential information for high-level decisions. The work shows the performance of SoTA standard models in a 360 urban-context dataset. The experimentation in this work showed how SoTA SS models trained in generic datasets can have a remarkable performance on 360 images. Moreover, SoTA OD models can be trained over 360 datasets, with some considerations related to the equirectangular high-deformation poles. Furthermore, trying to understand the impact of the equirectangular geometrical deformations in the SoTA Deep Learning models, a novel research line was studied to assess geometrical projection as data augmentation transformations to improve the prediction capabilities of deep models. It shows how the equirectangular projection positively impacts the model’s performance.

Kaleab Alemayehu KINFU (Ethiopia)

Cohort of 2018-2020

Analysis and Extensions of Adversarial Training for Video Classification

Abstract:

Adversarial training (AT) is a simple yet effective defense against adversarial attacks to image classification systems, which is based on augmenting the training set with attacks that maximize the loss. However, the effectiveness of AT as a defense for video classification has not been thoroughly studied. Our first contribution is to show that generating optimal attacks for video requires carefully tuning the attack parameters, especially the step size. Notably, we show that the optimal step size varies linearly with the attack budget. Our second contribution is to show that using a smaller (sub-optimal) attack budget at training time leads to a more robust performance at test time. Based on these findings, we propose three defenses against attacks with variable attack budgets. The first one, Adaptive AT, is a technique where the attack budget is drawn from a distribution that is adapted as training iterations proceed. The second, Curriculum AT, is a technique where the attack budget is increased as training iterations proceed. The third, Generative AT, further couples AT with a denoising generative adversarial network to boost robust performance. Experiments on the UCF101 dataset demonstrate that the proposed methods improve adversarial robustness against multiple attack types.

Vilmos MADARAS (Hungary)

COHORT OF 2020-2022

Abdominal Aortic Aneurysm segmentation and analysis from CTA images

Abstract:

Abdominal aortic aneurysm (AAA) is a focal dilation of the vessel wall in the abdominal region of the aorta. The continuous expansion of the aneurysm may lead to its rupture, which has an 80-90% risk of mortality. To avoid this life-threatening scenario, AAA needs to be continuously monitored and surgically treated if indicated. The most commonly applied AAA treatment is endovascular repair (EVAR), which also requires lifelong post-surgical monitoring. Therefore, accurate measurements of the aneurysm are needed both pre- and post-operatively, for which CTA imaging is the preferred modality. Currently there is an unmet need for the automatization of AAA analysis, since these measurements are performed manually, resulting in the lack of standardization and intra-user variability due to the challenging nature of the task. Therefore, in this work the use of convolutional neural networks (CNN) was investigated for AAA segmentation from CTA images. Three state of the art CNNs were implemented and fine-tuned to solve the segmentation task without relying on human involvement. The results obtained by the best performing model are comparable with the current state of the art, achieving over 0.8 dice score and overcoming a large variety of the challenges present in the analysis of this pathology.

Laura Valeria PÉREZ HERRERA (Mexico)

Cohort of 2020-2022

Classification of Breast Cancer Subtypes and Evaluation of Biomarker Expression in Whole Slide Images

Abstract:

Breast cancer is one of the most commonly diagnosed cancers in women worldwide and its overall survival is based on several factors including accurate and prompt diagnosis and treatment. Some important prognostic factors are its morphological subtype and biomarker status. The shift to the digital era has made it possible to use gigapixel images of breast biopsies to determine the best possible treatment for each patient. However, challenges such as 1) the high morphological heterogeneity, 2) the size of the images, 3) the presence of various artifacts, and 4) the size of the datasets still need to be addressed in order to implement deep learning techniques efficiently. To this end, a multiple instance learning approach was implemented to cope with the problem of weak and noisy labels in breast cancer subtype classification. On the other hand, for biomarker status prediction, the advantages of transfer learning were exploited. Further evaluation of each methodology should be performed to allow for decreasing costs and timelines in cancer treatment and, at the same time, increasing patient survival.

Julián Norberto SALAZAR VIDAL (Columbia)

COHORT OF 2020-2022

Triplet networks for cross-modal document image retrieval

Abstract:

A particularly successful approach to detect fraud in documents consists in comparing a new document with a legitimate one in order to spot anomalies. However, to make this comparison viable, it is necessary to retrieve similar documents within a database. Such similarity can be established by considering textual, structural or visual features. Hybrid models, i.e., those that combine all these kinds of features have shown the best performances. In view of this, a content-based document image retrieval system is proposed with a novel and robust architecture. It is designed as a triplet neural network whose branches consist of a visual encoder, a text encoder with spatial-aware self-attention mechanism, and a pooling module. Due to the different nature of the inputs expected by each block, it can be considered as a cross-modal neural network, which offers the advantage of leveraging the benefits of the individual sources of information while compensating each other’s flaws. The resulting model is able to achieve a precision comparable to the state of the art, even though the train and test datasets consist of samples captured by end users in non-controlled environments.

Anam ZAHRA (Pakistan)

Cohort of 2018-2020

Automatic Quantification of Children’s Lived Experience

Abstract:

One core assumption in developmental psychology is that children’s cognition develops in interaction with their social and physical environment. One way to study this relation is to code interactions from video recordings of children’s daily activities. However, this coding is usually done by hand and therefore very labor-intensive. Modern Computer Vision (CV) techniques– such as automatic people and object detection– can significantly reduce this effort and thereby facilitate the study of cognitive development. We want to use these techniques to evaluate a dataset of children’s daily activities. That is, we want to automatically quantify children’s interactions with people and objects. For this, we are collecting video recordings at home and in kindergartens using small lightweight bodycams. So far, we have recorded ten hours of videos from six children. To evaluate the accuracy of various models, we hand-coded a subset of videos. Antecedently, we have compared 11 state-of-the-art CV detectors for people and objects against the hand-coded subset of videos. The detection accuracy is between 30-35%, leaving room for improvement. We identified key limitations of the state-of-the-art models by specifying systematic detection errors (i.e. conditions under which the model fails to detect a person). This is the basis for improving our processing pipeline. Our next step is to improve the state-of-the-art models by fine-tuning or changing the model architecture. Once the detection is sufficiently accurate, we want to use these models to study the effect of children’s daily activities on their cognitive development at a scale.

Martin Gustavo BERNARDI (Argentina)

Awet Haileslassie GEBREHIWOT (Ethiopia)

Abel KAHSAY GEBRESLASSIE (Ethiopia)

Dániel GÉMES (Hungary)

Mohamed HASSAN (Egypt)

Oscar JUAREZ DELGADO (Mexico)

Kaleab Alemayehu KINFU (Ethiopia)

Vilmos MADARAS (Hungary)

Laura Valeria PÉREZ HERRERA (Mexico)

Julián Norberto SALAZAR VIDAL (Columbia)

Anam ZAHRA (Pakistan)

GDPR