For spatial audio reproduction in the context of virtual and augmented reality, a position-dynamic binaural synthesis can be used to reproduce the ear signals for a moving listener. A set of binaural room impulse responses (BRIRs) is required for each possible position of the listener in the room. The required spatial resolution of the BRIR positions can be estimated by spatial auditory perception. If the resolution is too low, jumps in perception of direction and distance and coloration effects occur. This contribution presents an evaluation of spatial audio quality using different spatial resolutions of the position of the used BRIRs. The evaluation is performed with a moving listener. The test persons evaluate any abnormalities in the spatial audio quality. The result is a comparison of the quality and the spatial resolution of the various methods used.
As binaural systems become more and more important, understanding user behavior can be critical to the ease of use and efficiency of designing these applications. For this purpose, the test subjects are tested with a motion tracking system. Six degrees of freedom for the position and head orientation of a listener are recorded. The movement of 23 people is recorded in five different test scenarios and for an exploration time of approximately three minutes. Each of the first four scenarios contains a different specific task where the listener is asked to find a specific audio object. The last scenario consists of a piece of music in which the test persons can move freely. The explorable area is 4 m x 4 m, with a spatial grid of the used binaural filters of 0.25 m steps and 5° in head rotation. Several features are extracted from the data: Yaw angle, inclination angle, yaw angle speed, xyz position, walking speed, path and area walked, time for localization and reaction time. It turns out that most people behave similarly, especially in simpler scenarios. The walkable area and the exploration behaviour, however, seemed to be very individual.
The aim of auditory augmented reality is to create a highly immersive and plausible auditory illusion combining virtual audio objects and scenarios with the real acoustic surrounding. For this use case it is necessary to estimate the acoustics of the current room. A mismatch between real and simulated acoustics will easily be detected by the listener and will probably lead to In-head localization or an unrealistic acoustic envelopment of the virtual sound sources. This publication investigates State-of-the-Art algorithms for blind reverberation time estimation which are commonly used for speech enhancement algorithms or speech dereverberation and applies them to binaural ear signals. The outcome of these algorithms can be used to select the most appropriate room out of a room database. A room database could for example include pre-measured or simulated binaural room impulse responses which could directly be used to realize a binaural reproduction. First results show promising results combined with low computational effort. Further strategies for enhancing the used method are proposed in order to create a more precise reverberation time estimation.
We compare two algorithms that create diffuse sound fields in terms of the sweet area size they produce in a 10 m by 10 m playback room. One approach employs random frequency-dependent group delays to generate a set of minimally correlated impulse responses used as filters for multi-channel diffusion of a mono sound. Canfield-Dafilou presented a frequency-dependent maximum group delay value as constraint on the randomness of this method, ensuring minimal audible artifacts in studio environments. We relax those constraints to enable a stable, enveloping, and diffuse listening experience filling the targeted larger audience area size, which however, unavoidably yields the impression of spaciousness and reverberation. Consequently, the new FIR approach competes with the IIR response feedback-delay-network diffusion as alternative. We conduct listening experiments to reveal properties and effectiveness of both methods, in particular regarding sweet area size and sound quality.
The expansion of wind energy as part of renewable energy supply has gained increasing scepticism and negative emotions within the public especially in rural areas where wind farms exist or are planned. Among various social, economic and ecological arguments, objectors worry about noise exposure of the affected population. Beside the overall sound pressure level that might be increased by wind turbines, acoustic properties like tonality and amplitude modulation are perceived as increasingly annoying.
Most research on these properties has been carried out in an isolated, singular and mostly physically motivated manner where overall auditory perception is rarely considered. To get a better understanding on how the interaction of the mentioned properties influences the perception of wind turbine noise, systematic psychoacoustic testing offers the potential to contribute to basic scientific findings.
In order to carry out excessive, comparable and valid perceptional evaluations, a reproduction environment for wind turbine noise was designed, realized and validated. It consists of a recording and processing stage for capturing existing wind turbine noisescapes, a method to model and synthesize plausible sound generation and propagation, as well as a plausible audiovisual reproduction environment that respects ecological validity. This paper presents the interdisciplinary research goal, the general methodology of development and validation, as well as an outlook for current and future psychoacoustic application. It is accompanied by a workshop taking place at this ICSA 2019.
Remote music collaboration is ever-relevant in the field of immersive audio. The proliferation of commercial immersive devices for virtual and mixed reality enables today's musicians to experience enhanced forms of virtual presence when remotely connected to their peers. In the presented work, a dancer and percussionists have been recorded with both microphones and an OptiTrack motion capture system. Their audiovisual presence is converted into a game character avatar able to be reproduced through VR headsets. During the exhibition, a live percussionist performer wearing a motion-capture suit enters the performance in the virtual scene. Audience members are also brought in the experience by means of a VR headset and are able to observe and hear the live collaboration between the real musician and the pre-recorded virtual ensemble members.
The goal is to create a compelling and cohesive immersive experience with the real and virtual audio layers blending seamlessly. To match the auditory expectations set by the presence of a real source, the virtual audio material is treated to acoustically match the characteristics that the same instruments would have in the intended exhibition space via measured impulse responses and dynamic binaural rendering.
This paper illustrates an overview of the proposed method used to create this novel musical experience and a discussion about the impressions of participants, audience and musicians. Future technical enhancements for the involved elements are discussed along with proposed evaluation procedures and variations to the pipeline.
Binaural localization of speech signals has been widely applied in human-computer interaction systems, communication devices, etc. Traditionally, the binaural cues, i.e., frequency-dependent interaural level difference (ILD) and interaural time difference (ITD), are often used to localize binaural signals in the horizontal plane. The spectral information, especially the positive gradient, is an important cue of the binaural sound localization in the sagittal plane. It is still unknown, whether this cue can be used for sound localization in the horizontal plane or not. The mel frequency cepstral coefficient (MFCC) is commonly applied in automatic speech and speaker recognition, which can also be used as an acoustic feature to localize speech signals. Furthermore, it is interesting to investigate whether the difference of MFCC between two ears (DMFCC) can be used for binaural sound localization or not.
In the present study, the above-mentioned acoustic cues, i.e., ILD, ITD, spectral information, MFCC and DMFCC, are used as neural network features for binaural localization of speech signals in the horizontal plane, and the performance of these features in terms of localization accuracy is evaluated.
Binaural room impulse responses were measured with a KEMAR 45BA dummy head. It was placed at different positions located on a line with a length of 2m in a 25cm positional resolution. and an azimuth resolution of 4°. Two source positions were considered in the setup, one in front of the line, one at the side. The same arrangement of source and receiver positions was realized in two different rooms, a quite dry listening laboratory and a quite reverberant seminar room. The data set is valuable for realizing, testing and studying dynamic binaural walk-through scenarios in the two different rooms. It is provided online.