Multimodality non-contact human heart rate sensing
Banza Mwape, Josline (2023)
Diplomityö
Banza Mwape, Josline
2023
School of Engineering Science, Laskennallinen tekniikka
Kaikki oikeudet pidätetään.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe2023060151537
https://urn.fi/URN:NBN:fi-fe2023060151537
Tiivistelmä
Multimodality non-contact human heart rate sensing is an emerging area of research that aims to leverage the strengths of different modalities to improve the accuracy and reliability of a heart rate monitoring system. However, it has been shown that the use of RGB facial videos to estimate plethysmography signals is biased against dark skin tone. In this study, two modalities, RGB facial videos and radio frequency signal measurements, were explored to capture relevant information that can be used to estimate heart rate in a non-contact manner. This work involves a series of experiments and analysis to evaluate the performance of combining two modalities for heart rate sensing based on a deep learning
approach. The models are trained into two ways, including a self-supervised learning utilizing a contrastive loss alone and a semi-supervised learning approach where an additional loss is considered, the negative Pearson loss. The results show that the contrastive loss usefully guides the models to learn the joint representations between the RGB and radar data that leads to a robust performance, but it poorly performs sometimes in single situations. On the other hand, the negative Pearson loss encourages the model to make predictions that are consistent with the ground truth, therefore it guaranties high overall
performance. In general, the finding of this thesis is that combining the two losses allows the model to be robust and fair across all skin tones.
approach. The models are trained into two ways, including a self-supervised learning utilizing a contrastive loss alone and a semi-supervised learning approach where an additional loss is considered, the negative Pearson loss. The results show that the contrastive loss usefully guides the models to learn the joint representations between the RGB and radar data that leads to a robust performance, but it poorly performs sometimes in single situations. On the other hand, the negative Pearson loss encourages the model to make predictions that are consistent with the ground truth, therefore it guaranties high overall
performance. In general, the finding of this thesis is that combining the two losses allows the model to be robust and fair across all skin tones.
