2016 Seminars

February 2016

HRC Seminar with Karl Kandler February 5th

Karl Kandler, University of Pittsburgh

Title: Plasticity of auditory midbrain circuits in development and pathology

HRC Seminar – ARO practice talks February 12th

ARO practice talks

 

March 2016

HRC Seminar with Yale Cohen March 3rd

Yale Cohen, University of Pennsylvania

Title: Neural correlates underlying auditory perception and decision-making

HRC Seminar with Robert Baumgartner March 11th

Robert Baumgartner, Austrian Academy of Sciences, visiting Boston University

Title: Role of spectral cues in sound externalization: objective measures and modeling

 

April 2016

HRC Seminar with Michael Long April 1st

Michael Long, NYU School of Medicine

Title: How does the brain generate behavioral sequences?

Abstract: Zebra finches learn to sing in much the same way that infants learn to speak, but circuit-based mechanisms that underlie the existing parallels between human speech and song production remain unknown. To address this directly, we use focal cooling to manipulate the circuits underlying birdsong and human speech production, and we examine the effects on the fine structure of these vocalizations to establish a functional map for these behaviors. In the songbird, we then examine dynamics within a key forebrain premotor structure using 2-photon imaging in the singing bird, and we looked at the underlying circuitry using a combined electron-light microscopy approach. Using these observations, we have begun to test circuit-level models of sequence generation that are likely to be involved in a range of neural computations.

HRC Seminar with Kenneth Vaden April 8th

Kenneth Vaden, Medical University of South Carolina

Title: Age-Related Differences in Adaptive Control for Speech Recognition in Noise

Abstract: Speech recognition in noise appears to benefit from the engagement of cingulo-opercular activity, but to a smaller extent for older adults compared to younger adults. Differences in benefit from cingulo-opercular activity appear independent from mild presbyacusis, at least when audibility differences are limited for older adults. Preliminary evidence is presented that other forms of attention are increasingly critical to word recognition for older adults.

HRC Seminar with Stephen David April 15th

Stephen David, Oregon Hearing Research Center

Title: How does behavior shape the neural representation of sound?

HRC Seminar with Andrew Oxenham April 22nd

Andrew Oxenham, University of Minnesota

Title: Speech perception and masking release in normal, impaired, and electric hearing: Understanding speech in noise, and vice versa

Abstract: Difficulty understanding speech in noise is the most common complaint of people with hearing impairment and cochlear implants. Differences in intelligibility experienced by people with normal and impaired hearing become even greater when the background noise fluctuates over time: normal-hearing listeners are able to make use of the temporal “dips” in the noise to better understand the speech, whereas hearing-impaired listeners derive less benefit, and cochlear-implant users sometimes experience a decrease in intelligibility in the presence of noise fluctuations. Many reasons for the loss of masking release have been proposed over the past two decades. Our recent work suggests that some of these differences may be accounted for by differences in how inherent fluctuations in a stationary noise masker are processed and perceived. This talk will discuss how these differences reveal important interactions between spectral resolution and temporal processing in the perception of speech in noise.

HRC Seminar with Hari Baradwaj April 29th

Hari Bharadwaj, Massachusetts General Hospital

Title: What can we learn about auditory scene analysis from autism spectrum disorders?

Abstract: Difficulty communicating in social settings is a core deficit reported in Autism Spectrum Disorders (ASD). Further, many individuals with ASD report sensory “overload”, including being overly sensitive to all sounds in a scene, and not just the sources of interest. Yet, while the social aspects (e.g., emotion, empathy) and higher-order cognitive aspects (e.g., attentional control) of the communication deficits have received considerable attention, deficits in sensory processing are less explored. Here, I’ll describe our attempts to study the *automatic* processes that underlie auditory scene analysis in ASD using magneto- and electroencephalography (MEG and EEG). We find that while peripheral and brainstem measures of sensory coding are indistinguishable between ASD and neurotypical controls, differences emerge at the thalamocortical stages of processing. In particular, in a series of experiments using carefully synthesized sounds, we find that MEG signatures of auditory object formation (i.e., the neural processes that underlie the binding of component sound elements into a perceived unit) are anomalous in ASD. The implications of these results for our understanding of the role of inhibition-mediated oscillations (gamma band) in object formation and early-stages of processing of complex sounds such as speech will be discussed.

 

May 2016

HRC Seminar with Laura D’Aquila May 6th

Laura D’Aquila, MIT

Title: Improving Speech Intelligibility in Fluctuating Background Interference

Abstract: The goal of this research was to develop a signal-processing technique to improve speech intelligibility for hearing-impaired (HI) listeners in fluctuating background interference. Typically, the masking release (MR; i.e., better performance in a fluctuating compared to a continuous noise background) experienced by normal-hearing (NH) listeners is reduced or absent in hearing-impaired (HI) listeners for aided unprocessed speech materials. Recently, greater MR has been observed for various types of speech signals whose processing reduces or removes signal amplitude variations. The work reported here was concerned with implementing a method of signal processing to reduce variations in signal amplitude (thus leading to improved speech intelligibility in fluctuating interference) without suffering a loss in intelligibility in continuous background noise. The signal-processing technique described here was designed to increase the audibility of the lower-energy intervals of a speech-plus-noise signal through normalization of the short-term energy to match that of the long-term average energy. The technique simply compares short-term and long-term estimates of signal energy, increases the level of short-term segments whose energy is below the average signal energy, and normalizes the overall energy of the processed signal to be equivalent to that of the original long-term estimate. The method operates blindly on a speech-plus-noise signal without the need for segmentation, operates on relative levels of the speech and interference, and does not require a reference signal.

Consonant-identification tests were conducted on NH and HI listeners to compare performance on unprocessed speech with that of speech processed with the energy-normalization technique. Performance was measured in backgrounds of continuous and fluctuating noises (including square-wave and sinusoidally modulated noises as well as those derived from speech envelopes). For both NH and HI listeners, similar scores were obtained for processed and unprocessed speech materials in quiet and continuous-noise backgrounds. For HI listeners, superior performance was obtained for the processed speech in some fluctuating background noises while NH listeners performed similarly in both types of speech. These results support the conclusion that greater MR for HI listeners may be related to decreased amplitude variation and increased audibility of the speech during gaps in the noise for energy-normalized signals.

Canceled – HRC Seminar with Karen Halfer May 13th

Karen Halfer, UMass Amhearst

HRC Seminar with Kelly Harris May 18th

Kelly Harris, Medical University of South Carolina

Title: Dissociable neural mechanisms to explain age-related auditory declines

Abstract: Older adults often experience difficulty understanding speech in degraded conditions, particularly when speech is presented in background noise or with altered temporal characteristics. These difficulties can occur even when hearing is normal. Temporal acuity, or the ability to efficiently and accurately encode the complex temporal information inherent in speech, is thought to deteriorate with age and contribute to speech recognition problems of older adults. We will present a series of studies that examined the extent to which age-related structural and functional changes in auditory and attention-related networks may contribute to auditory temporal processing and speech recognition declines in older adults. Our results suggest that structural and metabolic changes in auditory cortex may limit attention modulation, increasing dependence on an auditory system that is declining with age. Additionally, we highlight specific neural mechanism that may contribute to these auditory deficits by offering a model in which age-related declines in neural synchrony and slowed processing uniquely affect temporal processing and speech recognition.

 

September 2016

HRC Seminar with Mark Wallace September 9th

Mark Wallace, Vanderbilt University

Title: Development and Plasticity in Multisensory Representations: From Animal Models to Autism

Abstract: We live in a world in which we are continually bombarded with information from the different senses. Given that our perceptual view of the world is a unified one (rather than sense-by-sense), one of the major challenges for the brain is deciding which pieces of information belong together and which should be segregated. To accomplish this, the brain has a number of specialized areas and processing architectures. The first part of the talk will provide an overview of how individual neurons and ensembles of neurons within these areas respond to multisensory stimulus combinations, as well as providing a view into how these processes develop and their plasticity during both development and in adulthood. This work provides the foundation for the second part of the talk that will focus on studies examining how human performance and perception is altered under multisensory circumstances, how multisensory function changes across lifespan, and how multisensory capacity and circuits are altered in clinical conditions such as autism. Finally, the talk will highlight recent work focused on taking advantage of the plasticity that can be engendered within multisensory systems and that may be used as a rehabilitative tool in clinical contexts.

HRC Seminar with Jorg Buchholz September 16th

Jorg Buchholz, National Acoustic Laboratories

Title: Improving the ecological validity of laboratory-based speech tests

Abstract: Laboratory-based performance measures of speech communication ability and hearing device benefit often do not correlate well with the performance reported and experienced by hearing-impaired subjects in the real world. The main reasons are the unrealistic stimuli as well as the speech tasks that are commonly applied. This talk provides an overview of some of the research that is being conducted at the National Acoustic Laboratories to overcome these problems. Loudspeaker-based sound reproduction is used to create realistic acoustic environments and the limitations are evaluated using acoustic measures as well as measures of speech intelligibility. The principal effect of applying more realistic environments on speech outcomes is evaluated in normal-hearing and hearing impaired listeners by comparing sentence recall ability in a “standard” laboratory environment and an example realistic cafeteria-like environment with different levels of informational masking. A novel speech conversation task is introduced to derive both realistic signal-to-noise ratios for a recently recorded library of real-world environments and to evaluate the impact of the acoustic environments on speech-communication outcomes at an acoustic, linguistic, prosodic, and interactive level. Finally, the impact of reduced audibility on spatial hearing is discussed by comparing sentence recall ability in normal hearing and hearing-impaired subjects as a function of both sensation level and frequency range using different spatial noise configurations.

Bonus HRC Seminar with Richard Stern Thursday, September 22nd

Richard Stern, Carnegie Mellon University

Title: Modeling Binaural Lateralization, Discrimination, and Detection: The Position-Variable Model at 40

Abstract: Psychoacousticians and physiologists have developed many models of the processes that mediate human binaural perception over the years. These models have differed greatly in the extent to which they are consistent with known attributes of mammalian physiology, in the specific phenomena that they attempt to address, and in the comprehensiveness of their predictions. One such model was the position-variable model developed by the present author more than 40 years ago. In its original formulation, the position-variable model attempted to describe and predict binaural lateralization, discrimination, and detection phenomena at 500 Hz based on the values of a decision variable related to subjective lateral position. This variable was obtained by computing the centroid along the internal-delay axis of the responses of a matrix of interaural coincidence-counting units formulated according to the hypothesis proposed by Jeffress in the 1940s and quantified by Colburn in the 1960s. The model was successful in describing the lateralization phenomena to which it was applied, mostly successful in describing the related discrimination phenomena, and unsuccessful at describing detection phenomena. (It was proposed at the time that binaural detection was mediated by changes in the absolute values of the outputs of the coincidence-counting units rather than changes in the measure of lateral position derived from them.) While work continued on the lateralization predictions of the position-variable model for some time after, there was very little continued attention paid to its discrimination and detection predictions. More importantly, the model had not been applied systematically to most of the binaural phenomena that have emerged since the late 1970s, including at higher frequencies, although some findings by Bernstein, Trahiotis, and others provides some basis for optimism that it could be successful for these data as well. More recently, interest in binaural modeling has been stimulated by a new theory proposed by McAlpine and colleagues concerning the putative distribution of the interaural delays associated with the coincidence-counting units (along with other aspects of binaural processing). The Binaural Model Challenge proposed by researchers at the University of Oldenburg and elsewhere has also provoked renewed interest in modeling binaural phenomena. This talk describes the current state of efforts to update and broaden the predictions of the position-variable model. We will review the principles of the model, and comment on the extent to which it is able to describe and predict a broader range of more recent binaural phenomena, based on the results of preliminary analyses.

HRC Seminar with Richard Stern September 23rd

Richard Stern, Carnegie Mellon University

Title: Applying Physiologically-Motivated Models of Auditory Processing to Automatic Speech Recognition: Recent Progress

Abstract: The human auditory system has been an inspiration for developers of automatic speech recognition systems for many years because of its ability to interpret speech accurately in a wide variety of difficult acoustical environments. As a consequence, the use of auditory models for features for speech recognition has become a popular topic of research by groups at numerous locations including the University of Oldenburg. This talk will discuss ongoing work at Carnegie Mellon University directed toward the application of physiologically-motivated and psychophysically-motivated approaches to signal processing that facilitate robust automatic speech recognition in environments with additive noise and reverberation. We will review the structure of selected “classic” auditory models that were first proposed in the 1980s. In recent decades there has been a renaissance of interest in the use of physiologically-based signal processing for speech recognition and related technologies, largely enabled by advances in computational resources and acoustic modeling techniques. We will discuss the application of contemporary auditory-based signal processing approaches from our group to practical automatic speech recognition systems. We will consider pragmatic tradeoffs between faithfulness to actual auditory processing and considerations related to computational cost and system performance, focusing on aspects of nonlinear transduction, synchronization to temporal fine structure, temporal and spectral integration, temporal suppression, and signal separation based on binaural processing that we have found to be useful for improving recognition accuracy. We will evaluate and compare the impact on speech recognition accuracy of selected components of these features. We will discuss some of the general observations and results that have been obtained during the renaissance of activity in auditory-based features over the past 15 years. Finally, we will identify certain attributes of auditory processing that we believe to be generally helpful, and share insights that we have gleaned from recent work at Carnegie Mellon.

HRC Seminar with Sarah Woolley September 30th

Sarah Woolley, Columbia University

Title: Tuning auditory neurons for vocal communication​​

Abstract: Auditory-vocal communication requires coordinated function of sensory and motor systems that subserve perception and production of sounds that carry social information. Behavioral studies show that vocal learning during development shapes auditory perceptual skills for life. This suggests that social experience can direct the maturation of auditory processing in the central nervous system. I will present studies using songbirds to test the role of early vocal learning in auditory development. Using manipulations in the song acoustics that are learned by young birds, we are beginning to understand how early auditory-vocal experience relates to the tuning properties of adult auditory cortex neurons.
 

October 2016

HRC Seminar with DeLiang Wang October 7th

DeLiang Wang, Ohio State University

Title: ​Towards solving the cocktail party problem

Abstract: Speech separation, or the cocktail party problem, has evaded a solution for decades in speech and audio processing. Motivated by auditory perception, I have been advocating a new formulation to this old challenge that estimates an ideal time-frequency mask (binary or ratio). This new formulation has an important implication that the speech separation problem is open to modern machine learning techniques, and deep neural networks (DNNs) are well-suited for this task due to their representational capacity. I will describe recent algorithms that employ DNNs for supervised speech separation. DNN-based mask estimation elevates speech separation performance to a new level, and produces the first demonstration of substantial speech intelligibility improvements for both hearing-impaired and normal-hearing listeners in background noise. These advances represent major progress towards solving the cocktail party problem.

HRC Seminar with Fan-Gang Zeng October 14th

Fan-Gang Zeng, University of California, Irvine

Title: Challenging Neuroscientists and Neuroengineers to Break the Sound of Silence

Abstract: As an engineering marvel, the cochlear implant electrically stimulates the auditory nerve and has introduced or restored hearing to ~500,000 deaf children and adults worldwide. I will summarize the science and engineering behind this achievement and discuss specific and general bottlenecks (e.g., electronic-neural interface, closed-loop control, power consumption) challenging future development of cochlear implants and other neuromodulation devices. In particular, I will describe a personal journey developing a low-cost, high-performance cochlear implant and its socio-economical impact on people and society.

HRC Seminar with Oded Ghitza October 21st

Oded Ghitza, Boston University

Title: ​Acoustic-driven neuronal oscillations as syllabic and prosodic markers

Abstract: Oscillation-based models of speech perception postulate a cortical computation principle by which decoding is performed within a time-varying window structure, synchronized with the input on multiple time scales. The windows are generated by a segmentation process, implemented by a cascade of oscillators. Staying in sync with the quasi-regular rhythmicity of speech requires that the oscillators be “flexible” (in contrast to autonomous, “rigid” oscillators). Syllabic segmentation is into speech fragments that are multi-phone in duration, and it is realized by a flexible theta oscillator capable of tracking the input syllabic rhythm, with the theta cycles aligned with intervocalic speech fragments termed theta-syllables. Prosodic segmentation is into chunks that are multi-word in duration, and it is realized by a flexible delta oscillator capable of tracking phrase-level prosodic information, with the delta cycles aligned with the chunks. Intelligibility remains high as long as the oscillators are in sync with the input, and it sharply deteriorates once they are out of sync. This talk reviews a model that utilizes this cortical computation principle, capable of explaining counterintuitive data on the intelligibility of time-compressed speech hard to explain with conventional models of speech perception.

HRC Seminar with Matt Goupell October 28th

Matt Goupell, University of Maryland

Title: Spatial hearing with bilateral cochlear implants and their interaural-level-difference dominance in sound localization

Abstract: A primary reason for having access to sound in both ears is to help localize and organize complex auditory scenes with multiple sound sources, which improves speech understanding in these situations. As the number of people who receive bilateral cochlear implants (CIs) increases, it is a goal to provide the best possible understanding of speech in noise. However, clinical sound processors are not designed to faithfully preserve binaural information. Since the neural processing of auditory signals by the binaural system is acutely sensitive the smallest differences between the ears, this limits how well bilateral CI users function in complex listening environments with multiple talkers, background noise, and reverberation. Furthermore, while listeners with normal hearing (NH) seem to rely primarily on interaural time differences (ITDs) for these tasks, bilateral CI users seem to rely on interaural level differences (ILDs). Currently, we know little about why bilateral CI users show ILD dominance in sound localization, how the ILDs are being computed and utilized (particularly across frequency), and how might we improve spatial hearing with bilateral CIs through altered/improved ILD and/or ITD encoding.

In this talk, we will discuss experiments that investigate what binaural advantages can be achieved with bilateral CIs for speech stimuli, as well as the barriers that limit those advantages. We will also discuss psychophysical results from simple electrical stimulation patterns that highlight that bilateral CI listeners are still highly sensitive to binaural cues, but complex stimulation patterns eventually limit sensitivity. In addition, we will discuss work on developing a model for processing across-channel ILD cues. Finally, we will discuss how to translate this knowledge and model so that we might subvert spatial-hearing barriers for CI users, either through new devices and speech processing strategies, or through alternative approaches to clinical device setting.

 

November 2016

HRC Seminar with Pavel Zahorik November 11th

Pavel Zahorik, University of Louisville

Title: Binaural listening situations in which perceived reverberation and speech intelligibility are unrelated

Abstract: It is well known that reverberation can affect the intelligibility of speech. Psychophysical and computational results have demonstrated that the relationship is inverse: an increase in the amount of reverberation results in a decrease in intelligibility. From the architectural acoustics literature, it is also well known that there is a direct relationship between the physical amount of reverberation and perceived reverberation. It therefore might be assumed that perceived reverberation and intelligibility are inversely related, although here situations are described in which the two are effectively unrelated. Using virtual auditory space techniques to simulate reverberant sound field listening, it is shown that when reverberant sound level is artificially decreased in one ear and naturally preserved in the other ear, perceived reverberation is unaffected, but speech intelligibility is markedly improved. This dissociation likely results from the differential monaural and binaural aspects of reverberation, and is consistent with the idea that perceived reverberation is multidimensional. These results also suggest a potential binaural approach to the application of improving speech intelligibility in reverberation that does not limit the positive sound quality benefits of reverberation.

HRC Seminar with Mathias Dietz November 17th

Mathias Dietz, Western University

Title: Factors limiting spatial hearing performance with cochlear implants and first steps to reduce the shortcomings

Abstract: In an increasing number of countries the standard treatment for deaf individuals is moving toward the implantation of two cochlear implants, in an effort to increase their speech intelligibility and spatial awareness. Experimental studies over the past 20 years have demonstrated that after careful matching and balancing of left and right stimulation in controlled laboratory settings, most users can detect differences between the left and the right input, which form the basis for spatial hearing. With their own processors in typical listening conditions, however, many factors work against an optimal exploitation of the interaural differences. Today’s device technology and fitting procedure appears as if the two implants would serve two independent ears and brains. Experimental and computational work on sound localization with bilateral cochlear implants will be presented to better understand how acoustics, CI signal processing, the electrode nerve interface, and neural mechanisms limit the performance. In the last part a signal processing concept will be presented alongside data from an ongoing study, to overcome some of these limitations.

 

December 2016

HRC Seminar with Monty Escabi December 9th

Monty Escabi, University of Connecticut

Title: From neurons to machines: neural codes and models for sound recognition

Abstract: The brain’s ability to identify and recognize sounds in the presence of background noise and acoustic uncertain is critical for everyday audition and vital for survival. How the brain deals with and exploits acoustic noise and uncertainty in sound recognition tasks, however, is poorly understood. In the first part of this talk I will explore the hypothesis that the auditory pathway is organized hierarchically into sequential transformations that are critical for sound recognition in the presence of background noise. Using neural recordings from the auditory nerve, midbrain, thalamus and cortex of cats and an auditory network model of spiking neurons trained for optimal speech recognition in noise and multiple talkers I will explore the computational strategies that enable noise robustness. I will demonstrate that the optimal computational strategy for speech recognition in noise predicts basic transformations performed by the ascending auditory pathway, including a sequential loss of temporal and spectral resolution, increasing sparseness and selectivity. Next, I will explore the hypothesis that the brain utilizes high-level statistical regularities in natural sounds to perform sound category identification tasks. Using a catalogue of natural and man-made sounds (e.g., water sounds, machine sounds, bird song, crowd noise etc.), texture synthesis procedures to manipulate sounds statistics from various sound categories, and neural recordings from the auditory midbrain of awake rabbits I will show that neural population response statistics (i.e., neuron-to-neuron correlations and modulation response statistics) can be used to identify discrete sound categories. A computational model and large database of sounds Is then used to show that related high-order sound statistics enable identification of sound categories at time-scales comparable to perceptual integration.

HRC Seminar with Andrea Hasenstaub December 16th

Andrea Hasenstaub, University of California, San Francisco

Title: Circuit mechanisms of response flexibility in mouse auditory cortex

Abstract: Numerous experimental paradigms, such as forward suppression, critical-band measurements, and stimulus-specific adaptation, have revealed context-dependent computations at multiple stages of the auditory pathway. Such context-dependent phenomena are more pronounced in the auditory cortex than at subcortical stages, suggesting cortical involvement in their generation. What circuit mechanisms underly these flexible representations? In order to disambiguate the contributions of synaptic depression, synaptic inhibition, and intrinsic adaptation, we manipulate the two main families of cortical interneurons – somatostatin-positive (Sst+) or parvalbumin-positive (Pvalb+) interneurons – while recording neural responses to tones or tone sequences in the auditory cortex of awake mice. We show that cortical responses to a two-tone forward suppression paradigm in awake mice are more diverse than previously appreciated and vary with cell type. We further show that inactivation of Sst+ interneurons increases response gain and reduces the bandwidth of forward suppression, while inactivation of Pvalb+ interneurons weakens tuning and decreases information transfer. As an important caution, and contrary to the common assumption that tonic activation and inactivation are both manipulations that will produce valid, internally consistent insights into interneurons’ computational roles, we show that activating Sst+ and Pvalb+ interneurons reveals no such functional differences. We use a simple feedforward model to understand this asymmetry, and find that relatively small changes in key parameters, such as baseline activity, neural thresholds, or the strength of the light manipulation, determine whether activation and inactivation will produce equivalent conclusions regarding interneurons’ computational functions. This implies that seemingly minor experimental details can qualitatively change the readout of a neural population’s role in computation, and that the conclusions optogenetics enables us to draw regarding neuronal function can be influenced, even distorted, by the precise way in which the neuronal populations are manipulated. Finally, we combine analysis of variability with a simple computational model to link the changes in forward suppression to changes in synaptic inhibition.