旧站栏目

当前位置: 首页 > 旧站栏目 > 学科方向 > 正文

学科方向

知觉计算及场景分析

时间：2019-06-19 点击数：

知觉计算及场景分析

Perception computing and scene analysis

计算听觉场景分析

Computational Auditory Scene Analysis

在环境中的声音会产生各种听觉场景，即发生在空间不同位置上的声音流所形成的“事件”分布。通过听觉系统的复杂处理，人在脑中会对这一场景中叠加的各种声音事件加以分类，即所谓场景分析。对人的这种能力的研究及其功能建模，是计算听觉场景分析的任务；它对解决盲源分离及语音增强等工程问题有指导意义。

Sounds in the environment create auditory scenes – images of things happening at various locations in space. Through the extremely complex process of the human auditory system, these superimposed sounds can be sorted into separate sounds, and the process is called auditory scene analysis. Clarifying and modeling this human ability are the task of computational auditory scene analysis and the progress will guide the applications such as blind source separation and speech enhancement.

单耳模型 Monaural Model

通过谐波结构的检测来分辨不同的听觉物象。语音信号的谐波结构特征主要包括其调幅调频特性等。

The monaural model identifies auditory objects by analyzing harmonic features, such as frequency modulation, amplitude modulation, etc

双耳模型Binaural Model

从全频带内选择来自声源的双耳线索对声源进行定位，并以其对分离出的目标语音进行时间整合的修正，从而形成指向性增强。双耳线索主要包括双耳信号间的强度差与相位差等信息。

The binaural model obtains the location of sound source for integration, by analyzing the interaural level and phase differences from the band selected by monaural processing.

听觉场景分析图

声源定位

Sound Source Localization

声源定位在智能机器人和视频会议等方面有广泛的应用，目前主要采用基于麦克风阵列的信号处理方法。本研究从模拟人的听觉定位机制着手，试图改善现有算法的性能。

Sound source localization is widely used in many systems such as robots and video conferences. Currently, the methods are mainly based on classical signal processing with microphone array. The group is trying to use human auditory localization mechanism in order to improve the performances.

传统的声源定位方法主要有：可控波束形成技术；高分辨率谱估计的定向技术；时延估计技术等。本研究利用时间差线索，实现了一个基于六通道时延估计算法的实时三维声源定位系统。

Traditional methods are steered beamformer, high resolution spectral estimation, time delay estimation (TDE). Using time difference cues, a real time 3-D localization system is implemented, which is based on 6-channel TDOA estimation.

传统声源定位方法

Traditional methods

在听觉定位中，双耳强度差也是一个重要的线索。如果能够利用一个类似人头的散射体，则可有效地获得传声器之间的强度差信息。

It is known that interaural intensity difference (IID) is also an important cue in human auditory localization mechanism. If a scatterer like human head is used, intensity differences among microphones can be obtained efficiently.

本研究实现了一个基于球散射体的双麦克风声源定位系统，它利用球散射体产生的ITD和IID信息，进行子带波束形成，并进而通过多带决策实现声源定位。该系统对水平方向的声源定位能力与人的听觉定位能力相近，且系统在0dB信噪比环境下仍有较强鲁棒性。

Based on spherical scatterer, a binaural sound source localization system is also implemented. Sub-band steered beamformer, with ITD and IID cues generated by the sphere and multi-band judgment, is employed in the system to achieve localization goal. The azimuth localization ability is close to that of human beings, and it is robust in noisy conditions.

基于球散射体的双麦克风声源定位系统

a binaural sound source localization system based on spherical scatterer

虚拟空间声音回放

Virtual Spatial Sound Reproduction

声源的空间位置是人们在真实环境中对声音感知的重要属性。如何产生具有正确空间方位感的虚拟声音，以构建高质量的虚拟声学环境是一个重要的研究课题。该技术在影视、音乐、传媒、远程医疗及教育等领域具有广泛的应用。

The spatial position of sound source is one of attributes in human's auditory perception. How to reproduce virtual spatial sound with acute spatial perception to construct high quality virtual acoustic environment is an important research direction. Virtual spatial sound reproduction techniques have been widely used in film and music industry, media, remote medical treatment and education.

虚拟空间声音回放技术分类

Classification of virtual spatial sound reproduction techniques

虚拟乐队系统 Virtual Orchestra

基于MIDI合成技术和矢量基幅度平移技术实现了一个交互式多通道虚拟空间音乐回放系统。该系统允许用户对乐器位置进行实时移动控制，提高了用户与音乐演奏之间的交互性。

The virtual orchestra system, which is based on MIDI synthesis and VBAP (Vector Base Amplitude Panning), implements interactive multi-channel virtual spatial sound reproduction. The audience can modify instruments' placement while listening, and experience pleasure.

偶极子环绕声系统 Stereo Dipole

偶极子环绕声是利用两个相隔很近的小扬声器来合成虚拟声源，通过改进串扰消除网络算法、优化近场头传输函数，实现了实时环绕声的回放。它在手持设备上具有广泛应用。

Stereo Dipole takes advantage of two closely spaced loudspeakers to synthesize virtual sound sources. C rosstalk cancellation algorithm and near field HRTFs are optimized to obtain real time reproduction. The system could be widely used in handheld devices.

虚拟乐队系统

Virtual Orchestra System

DSP与嵌入式系统

DSP/Embedded System

实验室拥有先进的DSP/嵌入式开发软、硬件设备和工具。先后研发了助听器开发平台、多通道声音采集回放系统、动物惊跳反射仪、基于音频广播的隐藏通信系统等，有的申请了国家发明专利。

With self-contained DSP/embedded development kits and related instruments, the work focuses on embedded A/V systems and hearing aid ASICs, including the digital hearing aid development kit, the multi-channel audio capture and play system, the startle reflex experiment platform, and audio broadcasting based hiding communication system, etc.