Dang Manh Chinh1*, Antoni Martínez-Ballesté2, Pham Ngoc Minh1 and Dang Thanh Trung3
1
Institute of Information Technology, Hanoi, Vietnam
2
Universitat Rovira i Virgili, Catalonia, Spain
3
Electric Power University, Hanoi, Vietnam
*Corresponding Author: Dang Manh Chinh, Institute of Information Technology, Hanoi, Vietnam.
Received: August 07, 2018; Published: September 05, 2018
Citation: Dang Manh Chinh., et al. “On the Robustness of Cry Detection Methods in Real Neonatal Intensive Care Units”. Acta Scientific Paediatrics 1.3 (2018):02-09.
The detection of cry is crucial in intelligent computerized systems that aim at assessing the wellbeing of neonates during their hospitalization periods. Moreover, a precise characterization of cry allows its classification (e.g. hunger, pain, tiredness…). Although several cry detection and characterization techniques can be found in the literature, there is no testing of such techniques in real-life environments such as hospital Intensive Care Units. In this article we first summarize the problem of background noise in Intensive Care Units, that may prevent the operation of cry detection algorithms from succeeding. Second, we implement a specific cry detection technique that is based on some of the relevant cry detection proposals that have been found in the literature. Finally, we test this method using audio samples recorded in a real neonatal intensive care unit.
Keywords: Robustness; Cry Detection Methods; Neonatal Intensive Care Units
Some newborns have to spend a period of time in Neonatal Intensive Care Units (NICUs). Most of them are preterm newborns who suffer from some immaturity or disease. During hospitalization, newborns might suffer from pain (for instance, due to invasive procedures) and if this pain is not treated correctly, the child could suffer from some neurological disorders in the future. In order to detect whether neonatal patients are suffering from pain or discomfort, health professionals proceed by directly observing physiological parameters (i.e. heartbeat rate, temperature, etc.) and behavioral parameters (frowning, crying…) and using some assessment scales to figure out if some pain must be relieved. These procedures are done in an hourly basis, by a reduced number of people that also take care of the other typical duties in the NICUs.
Moreover, current trends in nursing for the neonates focus on personalized observations of the newborn (for instance, the progressively adopted NIDCAP, Newborn Individualized Developmental Care and Assessment Program [1]).
In order to improve the quality in this aspect of healthcare, some attempts to automatize the detection of neonate’s cry have been published. Accurately characterizing the cry in real time could be used in both automatic pain assessment [2] and NIDCAP procedures. In fact, if some neonate in a large NICUs is crying, it could be useful to trigger alarms that request the attention of the nurses.
Regarding methods and techniques to analyze baby’s cries, the literature deals with “neat environment tests”: only cry sounds are analyzed and, to the sake of our knowledge, proposals do not consider the background noise. This topic attracts the attention of some researches, merely on measuring typical sounds and noises in NICUs and how they affect the well-being of the neonates. In our work, we took a look at papers to get the brief knowledge of background noise in NICUs environments.
For cry analysis in NICUs, background noise is an important aspect which is taken into account. Background noise can come from several sources, namely equipment or machines in hospital rooms, ventilator systems, sounds generated when interacting with incubators, staff talking, phone ring, alarm noise, etc. There are some standards about noise level, for example the standard of American Academy of Pediatrics. If the noise level inside NICUs exceeds these standards for a long period, it will cause some problems on newborns, it can affect neurodevelopment or other important aspects.
In [3] authors describe these environments in terms of background noise, in a recent contribution. Measures were conducted in both an old and a new hospital. Their sound environment was surveyed for 24-hour periods in Melbourne and compared to Australian recommendation, which states maximum and minimum sound levels. For instance, the maximum sound level should be less than 65 dB and no more than 6 minutes in an hour. They also consider sound peaks (defined as jumps of more than 12 decibels). The granularity for noise measuring is 5 seconds, for which averages of decibels are computed. The results show that in general, that recommendation is not achieved in a half of observation periods and the results are not different from the old building to the new one.
Moreover, in [4] authors assess background noise according to American Academy of Pediatrics, which recommends that sound levels should not exceed the maximum acceptable level of 45 dB. Authors concluded that the sound environment in the NICUs is louder than most home or office environments and contains disturbing noises of short duration and at irregular intervals. Elevated levels of speech are needed to overcome the noisy environment in the NICU, thereby increasing the negative impacts on staff, newborns, and their families. High noise levels are associated with an increased rate of errors and accidents, leading to decreased performance among staff. The aim of interventions included in this review is to reduce sound levels to 45 dB or less. This can be achieved by lowering the sound levels in an entire unit, treating the infant in a section of a NICU (i.e. in a “private” room), or in incubators in which the sound levels are controlled, or reducing the sound levels that reaches the individual infant by using earmuffs or earplugs.
Finally, in [5] authors describe background noise both quantitatively (measuring sound) and qualitatively (by means of interviews). They compare data collected during day and night periods and conclude that there is no statistically significant difference.
In this paper we analyze the robustness of a cry detection method against real-life sounds that occur in a Neonatal Intensive Care Unit. To that end, we implement a method, described in Section 2, that is representative of the most outstanding ones found in the literature. In Section 3, we test this implementation using audio samples from a cry database, mixed with real audio samples of NICU’s background noises. Finally, Section 4 concludes de paper.
In this section the method implemented for cry detection is described. The goal of the method is to detect cry sounds in the waves obtained in the real neonatal intensive care unit (NICU). For the sake of simplicity, we have implemented a software that analyses the recorded waves. However, in a real deployment of the system, it would be able to analyze the sound over real-time acquired audio.
Cry analysis methods make use of several concepts or techniques that play a key role in the process. Sampling is the first step in analysis of audio from real environments. Sounds to be analyzed are in the form of samples of b bit resolution (usually, b = 16 for generic audio and b = 8 for voice-targeted applications). Samples are taken at a specific sampling rate (being 44.100 Hz a typical value for generic audio and 8.000 Hz the value for voice-targeted applications). Hence, sampling consists of the reduction of a continuous-time signal to a discrete-time signal.
Most audio analysis applications rely on frequency estimation. In this step, a window is applied to select a certain interval of data to analyze. Normally, a cry duration has the minimum length of 200 ms, so that the window size normally is 50 ms. After determining the window size, a set of data is collected consecutively from the sampling data set which is obtained from the sampling step. Then, a time to frequency transformation technique, such as Fast Fourier Transform, is applied.
Besides sampling and frequency estimation, the following concepts play a key role in the reviewed cry analysis techniques [6-8]:
The method works with a WAVE audio file. We assume that it has been obtained using a high sensitivity microphone placed next to the incubator. The audio file is recorded at a sampling rate of r samples per second, b = 16 bit. The file is divided into a succession of N windows {w1, w2, …, wN} of length S, being the latter a parameter of the system. Thus, wi is a succession {si1, si2, ..., siS} of samples.
The first step consists of detecting a succession of non-silent windows. Note that several successions might be found in a file, but each succession will be analyzed individually. In order to classify a wi into silent or non-silent, the STE is utilized. Then, if STE (wi)>T, i.e. a specific threshold, wi is classified as non-silent, otherwise it is considered a silent window and therefore is discarder for further analysis.
The problem in the first step is related to detecting as non-silent window sound samples that belong to a near source of background noise. For instance, if a breathing machine is working next to the incubator, the microphone will record this noise but with a high STE. In order to avoid this problem and, assuming that cry frequencies are between 150 Hz and 900 Hz ([6]), we might proceed with a preprocessing. Preprocessing consists of eliminating from wi all the frequencies that according to [6] do not belong to cry. Naturally, higher frequencies pass the filtering and, as a result, this sound could be confused with cries. To that end, a second step analysis, that considers the entire succession of non-silent windows, is mandatory.
In the second step we analyze the entire succession of non-silent windows. This succession’s length is L, and is variable in length. The goal of the second step is to compare the wave of the non-silent windows succession with the waves in a so-called Cry Dictionary.
The Cry Dictionary contains the wave features of several newborn cries, each with different durations, and in terms of fundamental frequencies.
Hence, in this second step, the fundamental frequencies of each window are obtained. As a result, a succession F of L fundamental frequencies {f1, f2, …, fL} is obtained. In this step, we apply a filter algorithm to eliminate the fundamental frequency less than 150 Hz. Because the main purpose of our method is detecting cry unit in background noise environment, we only care about the fundamental frequency related to cry fundamental frequency. The fundamental frequency of background noise after filtering step can have values in the range of cry fundamental frequency, but it never has the shape of changing in frequency like cry.
Finally, the wave in the Cry Dictionary at a nearest distance to F is found. If this distance is below a specific threshold, the succession of L non-silent windows is considered a cry. For the sake of simplicity, our detection technique is based on distances.
The aim of experimental results is to assess if the cry detection algorithm implemented detects sounds not related to cry (e.g. machinery, conversations…) as false cries. Hence we have tested our implementation with neonatal cries in the scenario of a real NICU.
Sound samples used for testing have been generated using separate samples for background noise and newborn cries. On the one hand, it is not straightforward to obtain a representative sample of newborn cries. On the other, there is a database of neonate: The Baby Chillanto Data Base is a property of the Instituto Nacional de Astrofisica Optica y Electronica - CONACYT, Mexico and has been used in the literature to assess proposals on cry detection and characterization [9]. We have selected four samples from this database, namely “Normal_Cry_1”, “Normal_Cry_2”, “Pain_Cry” and “New_Cry”. The three first samples have been used to build the Cry Dictionary.
Background noises have been recorded in a real NICU scenario: the neonatal unit in Hospital Universitari Joan XXIII. Samples, with a duration of 10 seconds each, were recorded during November 2016 using a high-sensitivity cardioid condenser microphone and an audio interface attached to a Mac Book Pro. Samples have been recorded by Audacity software.
We have selected the following background noise samples:
In order to generate the final samples that are used in the tests in this work, we have mixed cry sounds with the background noise at different proportions. Specifically, 80% cry plus 20% background and 50% cry plus 50% background. As a result, we test our algorithm with 5 x 3 x 2 = 30 samples with a duration of 10 seconds.
For each sample, we know the exact time in which cry starts and, hence, a false positive (i.e. false cry) will consist of detecting a cry outside the boundaries of the parts of the sample where we know the real cry is.
Tests have been conducted using a laptop equipped with a core i5-5200U processor and 4GB DDR3 RAM. Each sample took around 3-5 seconds to be evaluated. In this result, the accuracy (in time) is calculated by Total time of cry in detection/Total time of cry in one benchmark. Of course, we also consider these cry units are at the correct position. Tables 1, 2 and 3 show some of the results, for the sake of brevity. The list of the complete tests can be obtained from the document downloadable from the link http:// smarthealthresearch.com/docs/cryanalysisreport.pdf.
For the “Normal_Cry_1” (Table 1) tests no false positives were found and, thus, our method is shown to be valid since it does not consider noises as cries. For the “Normal_Cry_2” (Table 2) the result of detecting in first situation of mixing with proportion 20 - 80% is quite good, there is no False positive in this situation. However, for the second case of mixing with proportion of 50 - 50%, there exists false positive. The minimum distance of frequency when false positive occur is 150 and this value could contribute to fine tune our detection algorithm using the Cry Dictionary and our current simple detection method.
Table 1: Some representative results in the case of “Normal_Cry_1”
Finally, in the case of “Pain_Cry” (Table 3) the results are optimal since no false positives have been detected.
Table 2: Some representative results in the case of “Normal_Cry_2”.
Until now, we have described the success of our method in terms of not detecting false positives. In addition, we consider assessing the validity and robustness of our method about detecting new cries that are not in the dictionary. To that end, we have tested our method using the fourth sample from the database, and creating mixes with background noise from the real NICU. Table 4 shows that out method gives does not detect false positives and detects the new cry as well.
In most of the cases, our method can correctly detect cry regardless the background noise, even in case of mixing 50% of noise. False positive only exists in some tests of second cases with 50 - 50% mixing noise.
In case of “Normal_Cry_1” and “Pain_Cry”, which we already know are the good cry recording, the quality of the method is correct. It gives very high accuracy in case of 20 - 80% of mixing noise. In “Normal_Cry_1” case, it always gives the accuracy more than 80%. In case of “Pain_Cry”, the result is even better, around 100% of accuracy.
In case of “Normal_Cry_2”, we knew that this file -i.e. the sample from the cry database-, it also contains a noticeable amount of background noise. Thus the detecting result is not good as the others. In situation of 20 - 80% mixing background noise, it only gives the result with accuracy around 60%, but in situation of 50 - 50% mixing background noise, the result is even worse and the false positive appears. It is not good enough to become the standard for detecting cry. However, the value of minimum distance of frequency obtained is very valuable for estimating the standard fundamental frequency threshold in the future.
Table 3: Some representative results in the case of “Pain_Cry”.
Table 4: Results in the case of “New_Cry”
The result in the situation of mixing 20 - 80% mixing background noise is always better than the result in the situation of mixing 50 - 50%. In case of “Normal_Cry_1” and “Pain_Cry”, situation of 20 - 80% gives result of over 80% and nearly 100% respectively. But the situation of 50 - 50% is only gives result of average 60% and 80% respectively.
In case of “New Cry”, our method gives a good result of detecting cry. There is no false positive in this case even the amplitude of noise is quite high. Although it still cannot detect all the cry duration in the audio file, it can detect the most important cry signals correctly.
In this paper we have addresses the robustness of cry detection techniques against real-life neonatal Intensive Care Units. Our method has correctly detected cry regardless background noise. In our work, we determined that fundamental frequency is the most important feature of newborn infant cry. Thus, we focused on characterizing the feature of the changing and the shape in fundamental frequency. Comparing to the other method, it could reduce the computational cost but still gives us the good result.
Although we get good result in detecting cry unit in most of the case, but there still exists some false positives. The reason is insufficient quality of Cry Dictionary. The cry sample with the name “Normal Cry 2” containing noise inside, when creating standard cry unit from it, it reduces the quality of the Cry Dictionary. It lets to some false positives in detecting later. In future, we will investigate more about building the good and standard Cry Dictionary and make it to be a good benchmark for other researches.
In our method, the value of some importance parameters which are power threshold, minimum distance in fundamental frequency are still not fixed. Because we do not have the standard scenario for recording cry, the cry samples are mixed by other audio software. In near future, we will design a standard scenario, the device for recording cry of the newborn infant will be put near the incubator to warranty that the background should have the lower energy than the cry.
In this work, we just implemented the basic method to classifying cry unit based on calculating the similarity in the characteristic of fundamental frequency. It can be still improved because fundamental frequency is not the unique characteristic of cry unit. In the future, we will implement artificial intelligent method like Neural Network, Supported Vector Machine, etc. to classify cry unit based on the set of input attributes including fundamental frequency, some resonance frequencies and average power frequency ratio. The new method will take into account all the features of cry signal, thus, the quality of cry detection should be improved.
This paper was completed by financial supporting from VAST project (Vietnam Academy of Science and Technology): “Design and development of a remote visual monitoring system applied for security applications” which project’s code is: VAST01.10/17- 18. Authors also thank Hospital Universitari Joan XXIII and the personnel in its neonatal Intensive Care Unit for allowing them the recording of audio samples in their facilities. Finally, we like to thank Dr. Carlos A. Reyes-Garcia, Dr. Emilio Arch-Tirado and his INR-Mexico group, and Dr. Edgar M. Garcia-Tamayo for their dedication of the collection of the Infant Cry data base.
Copyright: © 2018 Dang Manh Chinh., et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
ff
© 2024 Acta Scientific, All rights reserved.