Learning metric volume estimation of fruits and vegetables from short monocular video sequences

Jan Steinbrener; Vesna Dimitrievska; Federico Pittino; Frans Starmans; Roland Waldner; Jürgen Holzbauer; Thomas Arnold

doi:https://doi.org/10.1016/j.heliyon.2023.e14722

Learning metric volume estimation of fruits and vegetables from short monocular video sequences

Jan Steinbrener, Vesna Dimitrievska, Federico Pittino, Frans Starmans, Roland Waldner, Jürgen Holzbauer, Thomas Arnold

Research output: Contribution to journal › Article › peer-review

Abstract

We present a novel approach for extracting metric volume information of fruits and vegetables from short monocular video sequences and associated inertial data recorded with a hand-held smartphone. Estimated segmentation masks from a pre-trained object detector are fused with the predicted change in relative pose obtained from the inertial data to predict the class and volume of the objects of interest. Our approach works with simple RGB video frames and inertial data which are readily available from modern smartphones. It does not require reference objects of known size in the video frames. Using a balanced validation dataset, we achieve a classification accuracy of 95% and a mean absolute percentage error for the volume prediction of 16% on untrained objects, which is comparable to state-of-the-art results requiring more elaborated data recording setups. A very accurate estimation of the model uncertainty is achieved through ensembling and the use of Gaussian negative log-likelihood loss. The dataset used in our experiments including ground-truth volume information is available at https://sst.aau.at/cns/datasets.

Original language	English
Journal	Heliyon
DOIs	https://doi.org/10.1016/j.heliyon.2023.e14722
Publication status	Published - 2023

Access to Document

https://doi.org/10.1016/j.heliyon.2023.e14722

Cite this

@article{78ccfd861bfb41b78a49d4cbd3920db2,

title = "Learning metric volume estimation of fruits and vegetables from short monocular video sequences",

abstract = "We present a novel approach for extracting metric volume information of fruits and vegetables from short monocular video sequences and associated inertial data recorded with a hand-held smartphone. Estimated segmentation masks from a pre-trained object detector are fused with the predicted change in relative pose obtained from the inertial data to predict the class and volume of the objects of interest. Our approach works with simple RGB video frames and inertial data which are readily available from modern smartphones. It does not require reference objects of known size in the video frames. Using a balanced validation dataset, we achieve a classification accuracy of 95% and a mean absolute percentage error for the volume prediction of 16% on untrained objects, which is comparable to state-of-the-art results requiring more elaborated data recording setups. A very accurate estimation of the model uncertainty is achieved through ensembling and the use of Gaussian negative log-likelihood loss. The dataset used in our experiments including ground-truth volume information is available at https://sst.aau.at/cns/datasets.",

author = "Jan Steinbrener and Vesna Dimitrievska and Federico Pittino and Frans Starmans and Roland Waldner and J{\"u}rgen Holzbauer and Thomas Arnold",

year = "2023",

doi = "https://doi.org/10.1016/j.heliyon.2023.e14722",

language = "English",

journal = "Heliyon",

issn = "2405-8440",

publisher = "Elsevier BV",

}

TY - JOUR

T1 - Learning metric volume estimation of fruits and vegetables from short monocular video sequences

AU - Steinbrener, Jan

AU - Dimitrievska, Vesna

AU - Pittino, Federico

AU - Starmans, Frans

AU - Waldner, Roland

AU - Holzbauer, Jürgen

AU - Arnold, Thomas

PY - 2023

Y1 - 2023

N2 - We present a novel approach for extracting metric volume information of fruits and vegetables from short monocular video sequences and associated inertial data recorded with a hand-held smartphone. Estimated segmentation masks from a pre-trained object detector are fused with the predicted change in relative pose obtained from the inertial data to predict the class and volume of the objects of interest. Our approach works with simple RGB video frames and inertial data which are readily available from modern smartphones. It does not require reference objects of known size in the video frames. Using a balanced validation dataset, we achieve a classification accuracy of 95% and a mean absolute percentage error for the volume prediction of 16% on untrained objects, which is comparable to state-of-the-art results requiring more elaborated data recording setups. A very accurate estimation of the model uncertainty is achieved through ensembling and the use of Gaussian negative log-likelihood loss. The dataset used in our experiments including ground-truth volume information is available at https://sst.aau.at/cns/datasets.

AB - We present a novel approach for extracting metric volume information of fruits and vegetables from short monocular video sequences and associated inertial data recorded with a hand-held smartphone. Estimated segmentation masks from a pre-trained object detector are fused with the predicted change in relative pose obtained from the inertial data to predict the class and volume of the objects of interest. Our approach works with simple RGB video frames and inertial data which are readily available from modern smartphones. It does not require reference objects of known size in the video frames. Using a balanced validation dataset, we achieve a classification accuracy of 95% and a mean absolute percentage error for the volume prediction of 16% on untrained objects, which is comparable to state-of-the-art results requiring more elaborated data recording setups. A very accurate estimation of the model uncertainty is achieved through ensembling and the use of Gaussian negative log-likelihood loss. The dataset used in our experiments including ground-truth volume information is available at https://sst.aau.at/cns/datasets.

U2 - https://doi.org/10.1016/j.heliyon.2023.e14722

DO - https://doi.org/10.1016/j.heliyon.2023.e14722

M3 - Article

SN - 2405-8440

JO - Heliyon

JF - Heliyon

ER -

Learning metric volume estimation of fruits and vegetables from short monocular video sequences

Abstract

Access to Document

Fingerprint

Cite this