Mel Frequency Cepstral Coefficients Properties Optimization Due to Ultrasonic Bands and Data Structure: Application to Acoustic Signals Identification


Authors : Bi Tra Jean Claude YOUAN; N’tcho Assoukpou Jean GNAMELE; Digrais Moïse MAMBE

Volume/Issue : Volume 9 - 2024, Issue 11 - November


Google Scholar : https://tinyurl.com/n4dt9yn6

Scribd : https://tinyurl.com/5eep5cc2

DOI : https://doi.org/10.38124/ijisrt/IJISRT24NOV1194

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.


Abstract : In the work presented in this article, we highlight the interest of choosing low-frequency ultrasound for the calculation of Mel cepstral coefficients combined with a singular restructuring of the study data. These coefficients were used as descriptors in the classification of sound samples produced by chainsaws in forest environments in order to combat the destruction of Ivorian fauna and flora. Three restructuring methods were compared, namely: the Time Domain Channel Fusion method, the cepstral Domain Channel Fusion method and the one channel method. To do this, we first calculated the MFCC on different frequency bands in the acoustic band [170 Hz-22000 Hz]. The different frequency bands selected range from 1 kHz to 21 kHz, increasing by 2 kHz at each new calculation phase. Low-frequency ultrasound produced better classification rates than the other acoustic bands. The best rate of 98.40% was obtained for the 3 kHz bandwidth on the acoustic band [21170 Hz- 24170 Hz] combined with the 'Time Domain Channel Fusion' method. A study of the ultrasounds deduced from the central frequencies of the octave bands was then carried out. A comparative approach of the sample classification rates led to selecting the band [11313 Hz - 22627 Hz] deduced from the central frequency of the 16 KHz octave band as the best ultrasonic band for the calculation of the MFCCs.

Keywords : Ultrasound, Octave Band, KNN,, Data Structure.

References :

  1. N. A. J. Gnamele, B. T. J. C. Youan, et A. M. L. Famien, « Improvement of chainsaw sounds identification in the forest environment using maximum ratio combining and classification algorithme », Eureka: PE, no 3, p. 3‑16, mai 2024, doi: 10.21303/2461-4262.2024.003107.
  2. N. A. J. Gnamele, Y. Berenger, T. Arsene, G. Baudoin, et J.-M. Laheurte, « KNN and SVM Classification for Chainsaw Sound Identification in the Forest Areas », IJACSA, vol. 10, no 12, 2019, doi: 10.14569/IJACSA.2019.0101270.
  3. K. C. Shahira, S. Tripathy, et A. Lijiya, « Obstacle Detection, Depth Estimation And Warning System For Visually Impaired People », in TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON), Kochi, India: IEEE, oct. 2019, p. 863‑868. doi: 10.1109/TENCON.2019.8929334.
  4. A. M. Joseph, A. Kian, et R. Begg, « State-of-the-Art Review on Wearable Obstacle Detection Systems Developed for Assistive Technologies and Footwear », Sensors, vol. 23, no 5, p. 2802, mars 2023, doi: 10.3390/s23052802.
  5. J. Ye et N. Toyama, « Automatic defect detection for ultrasonic wave propagation imaging method using spatio-temporal convolution neural networks », Structural Health Monitoring, vol. 21, no 6, p. 2750‑2767, nov. 2022, doi: 10.1177/14759217211073503.
  6. M. Marchevsky, S. Prestemon, O. Lobkis, R. Roth, D. C. Van Der Laan, et J. D. Weiss, « Ultrasonic Waveguides for Quench Detection in HTS Magnets », IEEE Trans. Appl. Supercond., vol. 32, no 6, p. 1‑5, sept. 2022, doi: 10.1109/TASC.2022.3164035.
  7. J.-X. Shen et al., « Ultrasonic frogs show hyperacute phonotaxis to female courtship calls », Nature, vol. 453, no 7197, p. 914‑916, juin 2008, doi: 10.1038/nature06719.
  8. H. C. Gerhardt, M. A. Bee, et J. Christensen-Dalsgaard, « Neuroethology of sound localization in anurans », J Comp Physiol A, vol. 209, no 1, p. 115‑129, janv. 2023, doi: 10.1007/s00359-022-01576-9.
  9. N. Lee, A. Vélez, et M. Bee, « Behind the mask(ing): how frogs cope with noise », J Comp Physiol A, vol. 209, no 1, p. 47‑66, janv. 2023, doi: 10.1007/s00359-022-01586-7.
  10. N. A. Zaidan et M. S. Salam, « MFCC Global Features Selection in Improving Speech Emotion Recognition Rate », in Advances in Machine Learning and Signal Processing, vol. 387, P. J. Soh, W. L. Woo, H. A. Sulaiman, M. A. Othman, et M. S. Saat, Éd., in Lecture Notes in Electrical Engineering, vol. 387. , Cham: Springer International Publishing, 2016, p. 141‑153. doi: 10.1007/978-3-319-32213-1_13.
  11. M. A. Kassem, K. M. Hosny, R. Damaševičius, et M. M. Eltoukhy, « Machine Learning and Deep Learning Methods for Skin Lesion Classification and Diagnosis: A Systematic Review », Diagnostics, vol. 11, no 8, p. 1390, juill. 2021, doi: 10.3390/diagnostics11081390.
  12. X. Huang, A. Acero, F. Alleva, M. Hwang, L. Jiang, et M. Mahajan, « From Sphinx-II to Whisper — Making Speech Recognition Usable », in Automatic Speech and Speaker Recognition, vol. 355, C.-H. Lee, F. K. Soong, et K. K. Paliwal, Éd., in The Kluwer International Series in Engineering and Computer Science, vol. 355. , Boston, MA: Springer US, 1996, p. 481‑508. doi: 10.1007/978-1-4613-1367-0_20.
  13. R. M. Foratto, D. Llusia, L. F. Toledo, et L. R. Forti, « Treefrogs adjust their acoustic signals in response to harmonics structure of intruder calls », Behavioral Ecology, vol. 32, no 3, p. 416‑427, juin 2021, doi: 10.1093/beheco/araa135.
  14. L. R. Forti, M. R. De Melo Sampaio, C. R. Pires, J. K. Szabo, et L. F. Toledo, « Torrent frogs emit acoustic signals of a narrower spectral range in habitats with longer-lasting biotic background noise », Behavioural Processes, vol. 200, p. 104700, août 2022, doi: 10.1016/j.beproc.2022.104700.
  15. I. Alla, H. B. Olou, V. Loscri, et M. Levorato, « From Sound to Sight: Audio-Visual Fusion and Deep Learning for Drone Detection », in Proceedings of the 17th ACM Conference on Security and Privacy in Wireless and Mobile Networks, Seoul Republic of Korea: ACM, mai 2024, p. 123‑133. doi: 10.1145/3643833.3656133.
  16. S. Gupta et A. Cosgun, « Audio-Visual Traffic Light State Detection for Urban Robots », 2024, arXiv. doi: 10.48550/ARXIV.2404.19281.
  17. M. H. Pham, F. M. Noori, et J. Torresen, « Emotion Recognition using Speech Data with Convolutional Neural Network », in 2021 IEEE 2nd International Conference on Signal, Control and Communication (SCC), Tunis, Tunisia: IEEE, déc. 2021, p. 182‑187. doi: 10.1109/SCC53769.2021.9768372.
  18. Z. Kh. Abdul et A. K. Al-Talabani, « Mel Frequency Cepstral Coefficient and its Applications: A Review », IEEE Access, vol. 10, p. 122136‑122158, 2022, doi: 10.1109/ACCESS.2022.3223444.
  19. P. Chen, D. Yin, B. Yang, et W. Tang, « A Fusion Feature for the Oestrous Sow Sound Identification Based on Convolutional Neural Networks », J. Phys.: Conf. Ser., vol. 2203, no 1, p. 012049, févr. 2022, doi: 10.1088/1742-6596/2203/1/012049.

In the work presented in this article, we highlight the interest of choosing low-frequency ultrasound for the calculation of Mel cepstral coefficients combined with a singular restructuring of the study data. These coefficients were used as descriptors in the classification of sound samples produced by chainsaws in forest environments in order to combat the destruction of Ivorian fauna and flora. Three restructuring methods were compared, namely: the Time Domain Channel Fusion method, the cepstral Domain Channel Fusion method and the one channel method. To do this, we first calculated the MFCC on different frequency bands in the acoustic band [170 Hz-22000 Hz]. The different frequency bands selected range from 1 kHz to 21 kHz, increasing by 2 kHz at each new calculation phase. Low-frequency ultrasound produced better classification rates than the other acoustic bands. The best rate of 98.40% was obtained for the 3 kHz bandwidth on the acoustic band [21170 Hz- 24170 Hz] combined with the 'Time Domain Channel Fusion' method. A study of the ultrasounds deduced from the central frequencies of the octave bands was then carried out. A comparative approach of the sample classification rates led to selecting the band [11313 Hz - 22627 Hz] deduced from the central frequency of the 16 KHz octave band as the best ultrasonic band for the calculation of the MFCCs.

Keywords : Ultrasound, Octave Band, KNN,, Data Structure.

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe