2010 Seminars

January 2010

Seminar with Dr. Mitch Day January 15th

Title: Internal Delay and Neural Coding in the Gerbil Medial Superior Olive

Abstract: Humans and other animals precisely localize sound sources along the horizontal plane by detecting the split-second difference of arrival time of sound to the two ears – the interaural time difference (ITD). Along the auditory neural pathway, sensitivity to ITDs first occurs in the medial superior olive (MSO) whose neurons are tuned to different peak ITDs. The placement of a peak ITD is determined by the difference of time it takes auditory information to reach a MSO neuron from each side. This internal delay is thought to be created neurally by axonal conductance delay lines and likely influenced by inhibition. An alternative delay mechanism proposes that MSO neurons receive bilateral inputs that originate from slightly different frequency-tuned locations along the cochlear basilar membranes. Using single-unit data recorded from the gerbil MSO, we show an absence of a topographic map of ITD – as predicted from delay lines – and the first evidence for small, random cochlear internal delays at the MSO. The resulting population distribution of peak ITDs optimized neither a place code nor a population slope code of ITD with respect to sound localization. Our results suggest that the microsecond precision of ITD discrimination may be explained by relatively imprecise mechanisms of internal delay.

Seminar with Prof. George Pollak and Dr. Josh Gittelman January 22nd

Title: Beyond Tonotopy: FM Sweep Discrimination in the Auditory Midbrain

Abstract: Auditory neurons in the inferior colliculus (IC) show remarkable selectively in that they can distinguish between complex sounds that have identical spectral energy but different temporal structure, such as frequency modulations (FMs) that sweep either upward or downward. Extra-cellular recordings show that blocking inhibition locally reduces or eliminates response selectivity, suggesting that selectivity is created de novo in the IC, with inhibition playing a prominent role. However, these studies can only infer underlying mechanisms based on spike counts. Using in-vivo whole-cell recordings, we examine the mechanisms underlying FM directional selectivity in the IC. We first report that spike threshold can strongly amplify directional selectivity in that the spike directionality was on average more than twice as large as the directionality of the post-synaptic potentials (PSPs). We then show that in our sample of IC cells, PSP directional selectivity is not created de novo. Rather, we found that the preferred and null FMs evoked synaptic conductances of different magnitudes, indicating that the pre-synaptic neurons were directionally selective. Combining conductance data with modeling, we show that directionally dependent magnitude differences, not temporal differences, underlie PSP directionality. Modeling also shows that our results are consistent with extracellular studies where blocking inhibition reduces or eliminates directionality. Our findings suggest some IC cells utilize a rate code in their inputs rather than a time code, and that highly selective discharge properties can be created by only minor adjustments in the synaptic strengths evoked by different signals.

 

February 2010

Seminar with Oded Ghitza February 12th

Title: TEMPO: A model of speech perception inspired by brain rhythmicity

Abstract: A computational model is presented in which the process of decoding speech is temporally guided by an array of hierarchical oscillators operating in the delta (<3 Hz), theta (5–10 Hz), beta (20–40 Hz) and gamma (60–120 Hz) ranges. The model is capable of emulating the psychophysical data described by Ghitza and Greenberg (2009) — data that are challenging and difficult to explain by current models of speech perception. Since the data are in terms of word errors, a comprehensive, quantitative validation would require a TEMPO-based word recognition system, not available at present. In this talk a qualitative corroboration will be presented, showing a projected performance of TEMPO in line with the human data.(Ghitza, O. and Greenberg, S. (2009). “On the possible role of brain rhythms in speech perception: Intelligibility of time compressed speech with periodic and aperiodic insertions of silence.” Phonetica 66:113–126. doi:10.1159/000208934)

Seminar with Prof. Yake E. Cohen February 19th

Title: Vocalization processing: categories and decisions

 

April 2010

Seminar with YoonSoeb Lim April 2nd

Title: Contour representation of sound signals

Abstract : Continuous edges, or contours, are powerful features for object recognition, both in neural and machine vision. Similarly, auditory signals are characterized by sharp edges in some classes of time-frequency analysis. Linking these edges to form contours could be relevant for auditory signal processing. However, the mathematical foundations of a general contour representation of sound have not been established. Sinusoidal representations of voiced speech and music have been explored, but these approaches do not represent broadband signals efficiently. Here we construct a two-dimensional contour representation that is generally applicable to any time series, including sound. Starting with the Short Time Fourier Transform (STFT), the method defines edges by coherent phase structure at local points in the time-frequency plane (zero crossings of a complex reassignment matrix). Continuous edges are grouped to form contours that follow the ridges and valleys of the traditional STFT. Local amplitudes are assigned by calculation of fixed points in an iterated reassignment mapping. The representation is additive; the complex amplitudes of the contours can be directly summed to reproduce the original signal. This re-synthesis matches the original signal with a signal-to-noise ratio of 20 dB or much higher, even in the challenging case of white noise. In practice, this level of precision provides perceptually equivalent representations of speech and music. For many sounds of interest, a subset of the full contour collection can provide an accurate representation. To find this compact subset, an over-complete set of contours are calculated using multiple filter bandwidths. Contours are then ranked by power, length and curvature, and subject to lateral inhibition from neighboring contours. The top-ranking contours in this distribution provide a sparse representation that emerges without any prior suppositions about the nature of the original signal. By combining contours from multiple bandwidths, the representation achieves high precision in both time and frequency. As such, the method is relevant to a wide range of time-frequency tasks such as constructing receptive fields of auditory neurons, characterizing animal vocalizations, pattern recognition and signal de-noising. We speculate that neural auditory processing involves a similar contour representation. Each stage in the analysis is a plausible operation for neurons: parallel and redundant primary processing in multiple bandwidths, grouping by phase coherence, linking by continuity and lateral inhibition.

Seminar with Dr. Tobias Overath April 7th

Title: Representation of statistical sound properties in human auditory cortex.

Seminar with Dr. Nace Golding April 9th

Title: Cellular mechanisms underlying binaural coincidence detection in the medial superior olive.

Seminar with Dr. David Griesinger April 16th

Title: The Relationship between Audience Engagement and our ability to Perceive the Pitch, Timbre, Azimuth and Envelopment of Multiple Sources

Seminar with Prof. Mark Chertoff April 23rd

 

May 2010

Seminar with Dr. Chris Halpin May 7th

Title: Massachusetts Eye & Ear Infirmary, Harvard Medical School

Seminar with Dr. Greg Wakefield May 14th

Title: Singing Stradivarius: Aural Shaping of Oral Shapes and Other Enabling Technologies for Singers

Abstract: As he or she learns to perform their instrument, a singer must also learn to build and care for their voice by engaging otherwise unfamiliar and often awkward coordinated patterns of muscular activity, many of which involve fine motor skill and almost all of which are hidden from
external view. Accordingly, the training of singers is a challenging process in which both the teacher and student encounter pedagogical roadblocks not typically found when learning other instruments. Teachers have difficulty providing nuanced instruction in the non-obvious physical mechanisms of vocal production, while students have trouble retaining those successful physical adjustments made during their lesson. These roadblocks serve to confuse both the engineer and voice scientist as well, which has typically led to poor adoption of otherwise highly appropriate enabling technologies by either the student or teacher. The present talk reviews the author’s musical, scientific, and engineering discoveries of his own voice over the past 10 years and demonstrates how the lessons he has learned have informed his development of enabling technologies for students and teachers of voice.

Bio: Gregory H. Wakefield is a member of the faculty at the University of Michigan with a primary appointment in Electrical Engineering and Computer Science, and courtesy appointments in Performing Arts Technology and Otolaryngology. His current research focuses on music and audio processing, including the development of interactive systems that allow human users the freedom to better design within the world of engineering without having to speak the technical language of the engineer. He holds doctorates in Electrical Engineering and Psychology from the University of Minnesota and was awarded the IEEE Millennium medal in 2000. He has had principal roles in world-premier performances of three experimental operas and is an active singer in the Comic Opera Guild’s revival of the early works of Jerome Kern and late works of Victor Herbert. He’s proud to say that he gave his Senior undergraduate recital at the University of Michigan in 2006 and continues his voice studies with his teacher and colleague, Prof. George Shirley.

Seminar with Dr. Keren Haroush May 21st

Dr. Keren Haroush

Seminar with Dr. Beverly Wright May 28th

Dr. Beverly Wright, Department of Communication Sciences and Disorders Northwestern University

 

July 2010

Seminar with Dr. Oded Ghitza July 23rd

Title: Speech Perception and Brain Rhythms

Abstract: The basic premise of this study is that current models of speech perception, which are driven by acoustic features alone, are incomplete, and that the role of decoding time during memory access must be incorporated. It is postulated that decoding time is governed by a cascade of neuronal oscillators, which guide template-matching operations at a hierarchy of temporal scales. It is argued that nested neuronal oscillations in the theta, beta and gamma frequency bands are crucial for speech intelligibility and that intelligibility is high so long as they remain phase-locked to the auditory input rhythm. A model (Tempo) is presented capable of emulating recent psychophysical data on the intelligibility of speech sentences as a function of syllabic rate (Ghitza and Greenberg, 2009). The data show that intelligibility of speech time-compressed by a factor of 3 (i.e., at a high syllabic rate) is poor (ca. 50% word error rate), but is substantially restored when silent gaps are inserted in-between successive 40-ms long compressed-signal intervals – a challenging finding, difficult to explain using current models of speech perception, but emerging naturally from the Tempo architecture.

August 2010

Seminar with Dr. Cyrus Billimoria August 27th

Title: Neural and behavioral discrimination of natural auditory objects

Abstract: The auditory system of complex organisms is faced with a general problem that all sensory systems much cope with, the problem of perceptual invariance. Perceptual invariance is the ability to discriminate, recognize and categorize objects despite natural variations in stimulus parameters. The auditory system must deal with many such variations, such as intensity, timing, speed, and the presence of background noise. This is a particularly challenging problem for central neural processing as the sensory periphery is sensitive to such variations. Humans and other animals quickly, robustly, and apparently effortlessly perform tasks requiring perceptual invariance, which remain difficult for current artificial systems (such as speech recognition). In this talk I will discuss several projects related to this problem in the model system of songbirds, which shows striking similarities to humans in the context of speech. In our laboratory we have studied the auditory system of songbirds, with a broad range of experimental techniques from single unit neural recordings spanning all the way to behavioral experiments, with a focus on processing in a brain area known as field L that is analogous to primary auditory cortex. In this talk I will describe several recent findings: 1) field L neurons span a continuum of invariance to intensity from high sensitivity to near perfect invariance; 2) neural responses in field L are able to follow stimuli over a broad range of time-warps (stimulus speed); and 3) preliminary data on surprising interactions between the spatial location of targets and background maskers in field L. Finally I’ll show neural recordings obtained using a new wireless recording technique from awake-unrestrained songbirds.

 

September 2010

Seminar with Dr. Alexey Lukin September 17th

Dr. Alexey Lukin, iZotope Inc., Image Processing Lab, “Moscow Lomonosov” State University

Title: “Multiresolution Short-Time Fourier Transform for Analysis and Processing of Audio”

Abstract: Filter banks with fixed time-frequency resolution, such as the Short-Time Fourier Transform (STFT), are a common tool for many audio analysis and processing applications allowing effective implementation via the Fast Fourier Transform (FFT). The fixed time-frequency resolution of the STFT can lead to the undesirable smearing of events in both time and frequency. In this talk, we suggest adaptively varying STFT window size (and resolution) in order to reduce filter bank-specific artifacts while retaining adequate frequency resolution. Several strategies for systematic adaptation of time-frequency resolution are proposed. The introduced approach is demonstrated as applied to spectrogram displays, noise reduction, and spectral effects processing.

Dr. Lukin’s presentation slides: http://audio.rightmark.org/lukin/temp/exchange/LukinTalk.ppt

Seminar with Dr. Stephen David September 24th

Stephen David, Ph.D., Neural Systems Laboratory, University of Maryland

Title: Behavior, reward and the routing of auditory information through cortex

Abstract: A better understanding of auditory brain systems can be used to engineer improved algorithms for speech and other sound processing. We hypothesize two factors that limit current models of auditory processing by the brain: (1) Feed-forward mechanisms are specialized for complex natural sounds rather than the synthetic stimuli used to develop most models and (2) Learned behavioral contingencies have a strong influence over sound representations. We completed neurophysiological studies to investigate both issues. In the first study, we compared spectro-temporal tuning of neurons in ferret primary auditory cortex (A1) measured with a synthetic noise stimulus and with continuous speech. For many neurons, we found significant differences in tuning that could be modeled by a rapid nonlinear depression of synaptic inputs. Dynamic changes in depression are larger during speech stimulation, enabling a richer representation of sound features. In the second study, we compared spectro-temporal tuning of A1 neurons during two different operant tasks that required discriminating between a sequence of broadband noise distracters and a pure tone target. The first task used conditioned avoidance, and the second task used positive reinforcement. During conditioned avoidance, A1 neurons showed enhanced responses to the target frequency. During positive reinforcement, in contrast, A1 neurons showed an opposite pattern of modulation, with a decreased response to the target frequency. Recordings of single neurons and local field potentials in prefrontal cortex (PFC) suggest a gating mechanism by which task-relevant information is propagated from A1 to PFC. These effects are consistent with a matched filter model in which top-down feedback from PFC to A1 enhances responses to stimuli that signal the suppression of basal appetitive behaviors.

Bio: Stephen David is a postdoctoral fellow with Shihab Shamma in the Institute for Systems Research at the University of Maryland, College Park. He completed an A.B. in Applied Mathematics at Harvard University and a Ph.D. in Bioengineering at the University of California, Berkeley, where he studied attention and representation in the primate visual cortex. More information is available at http://www.ece.umd.edu/~svd/.

October 2010

Seminar with Dr. Jennifer Bizley October 1st

Jennifer Bizley, Ph.D., Royal Society Senior Research Fellow, Department of Physiology, Anatomy and Genetics, University of Oxford

Title: Listening to auditory cortex; searching for neural correlates of complex sound perception
Abstract: We are able to recognize and understand speech across many different speakers, voice pitches and listening conditions. However, the acoustic waveform of a sound (e.g. for example the vowel “ae”) will vary considerably depending on the individual speaker. Moreover, the ear itself will filter the sound in a location-dependent fashion, and the “ae” may be embedded in a cacophony of other, background sounds in our often cluttered acoustic environments. Because we can perceive the pitch, timbre and spatial location of a sound source independently, it seems natural to suppose that cortical processing of sounds might separate out these attributes. However, recordings made in primary and secondary cortical areas of the ferret suggest that neural encoding of pitch, timbre and location is highly interdependent. Moreover, sensitivity to these sound percepts was distributed throughout the cortical fields examined. In order to investigate whether these distributed responses might underlie pitch perception, we compared the performance of ferrets trained in a pitch discrimination task to the pitch discrimination abilities of auditory cortical neurons. To achieve a more robust decoding of the neural responses, we developed a population neurometric analysis, with which we decoded the activity of ensembles of simultaneously recorded units. We found several parameters of the ensemble response to be informative; both spike count vectors and relative response latency vectors encoded stimulus pitch just as effectively. Our results thus suggest that count or latency based population codes could equally well account for the animal’s pitch discrimination ability. Neural populations capable of discriminating pitch as well as the animal did could be found throughout all five areas investigated. While these studies show that any of the 5 areas of ferret cortex could support the animal’s pitch judgment, further work is required to ascertain which, if any, of these fields make an essential contribution to pitch perception. To address this question we are currently recording local field potentials and single neuron spiking activity from the auditory cortex of freely moving ferrets while they discriminate the pitch of artificial vowel sounds.

Seminar with Dr. Laurel Carney October 22nd

Professor Laurel Carney, Department of Biomedical Engineering, University of Rochester

Title: Detection of Amplitude-Modulations in Sounds: Behavioral and Physiological Studies

Abstract: Fluctuations in the amplitudes of sounds are a rich form of temporal information. We are interested in understanding how amplitude modulations are encoded, processed, and perceived. Our approach includes psychophysical studies in human listeners and behavioral and physiological studies in the rabbit. Differences in detection thresholds between these species raise some interesting issues about strategies and neural mechanisms that are involved in encoding and processing this aspect of complex sounds.

Seminar with Dr. Charlotte Reed and Dr. Joseph G. Desloge October 29th

Dr. Charlotte Reed, Senior Research Scientist, Research Laboratory of Electronics, MIT
Dr. Joseph G. Desloge, Research Scientist, Sensimetrics Corporation

Title: Hearing-Loss Simulation as a Tool for Understanding the Role of Audibility in Hearing Impairment
Abstract: Comparisons of performance between normal-hearing (NH) and hearing-impaired (HI) listeners are intrinsically complicated by the difference in absolute thresholds between the two groups. One approach towards making more valid comparisons between NH and HI listeners is through the use of functional simulations of hearing loss in which stimuli are equated for both sound-pressure level and sensation level in both groups of listeners. This talk will describe research being conducted using hearing-loss simulation to examine the role of threshold elevation and audibility in the speech-reception and psychoacoustic abilities of HI listeners. The hearing-loss simulation paradigm employs a combination of spectrally-shaped masking noise and multi-band expansion which is applied to stimuli presented to NH listeners that are matched in age to their hearing- impaired counterparts (AM-SIM). This talk will include a description of the methods used for hearing loss simulation and of results obtained from speech-reception and psychophysical studies comparing the performance of HI and AM-SIM subjects.

 

November 2010

Seminar with Dr. Christopher Clark November 5th

Dr. Christopher Clark, Ph.D., Lab of Ornithology Bioacoustics Research Program, Cornell University

Title: Voices of the Great Whales, Drowning in a Sea of Noise

Abstract: Whales produce and listen to sounds for mating, feeding, navigating, and detecting predators. The US Navy’s anti-submarine listening system has revealed the immense scales over which singing and calling whales can be detected and tracked. Over the last 60 years, the level of low-frequency ocean noise has been steadily rising as a result of commercial shipping, and energy exploration and operational activities. Such activities can have both acute, short-term impacts and chronic, long-term influences on individuals and on populations. One of the most pernicious, yet nearly invisible influences of this man-made noise is acoustic masking that interferes with communication. Ocean acoustic monitoring systems that sample the ocean for periods of months to years are used to map, quantify, and describe the variability of marine acoustic habitats. Results reveal that in some habitats with high levels of vessel traffic and vessel noise, the predicted area over which whales can communicate is routinely reduced to < 10-20% of what it would be under normally quiet conditions. When considered from a large-scale and behavioral ecology perspective, reduction in acoustic habitat likely represents a significant cost for species in which acoustic communication is biologically critical. Government agencies, ocean users, industries, NGOs and scientists now recognize that ocean noise is one in a suite of ocean stressors, and they are working together to find ways to make a difference. It will take a village, and we’re all part of that village.

(Voices of the Great Whales, Drowning in a Sea of Noise. Clark, C.W. Bioacoustics Research Program, Cornell University, 159 Sapsucker Woods Road, Ithaca, New York 148504, USA. Presenter email: cwc2@cornell.edu; 607-254-2408)

Seminar with Dr. Richard Rabbit November 10th

Dr. Richard Rabbit, Department of Bioengineering, University of Utah; Marine Biological Laboratory, Woods Hole

Title: Pulsed Infrared Excitability of Inner-Ear Hair Cells and Cardiac Myocytes

Abstract: Pulsed infrared radiation (IR; 1862 nm, 0.2-100 pulses per sec., 10-400 µJ/pulse) evokes transient increases in intracellular [Ca2+] and excitation in neonatal rat myocytes in vitro, and in semicircular canal hair cells in vivo. Myocytes loaded with Fluo-4 AM showed robust intracellular [Ca2+] spikes evoked by each IR pulse. CGP-37157, an inhibitor of mitochondrial Na+/Ca2+ exchanger (NCX), inhibited myocyte responses to IR. Ruthenium Red, an inhibitor of the mitochondrial Ca2+ uniporter, and 2-APB, an IP3 channel antagonist, also blocked [Ca2+] responses in myocytes. Ryanodine did not block IR evoked responses. These pharmacological results implicate mitochondria as the most likely source of IR-evoked calcium cycling. Preliminary [Ca2+] imaging and post-synaptic afferent responses in the semicircular canals of the oyster toadfish, O. tau, strongly suggest a similar mechanism of pulsed IR excitability is at play in hair cells. In a subset of afferent neurons tested, IR stimulation of presynaptic hair cells evoked a phase-locked action potential for each IR pulse. The latency from the IR pulse to the action potential exceeded synaptic delay by ~7ms, consistent with an intracellular calcium signaling mechanism within hair cells. Reponses in vivo were repeatable over numerous stimulus presentations and hours of recording. IR photocontrol of intracellular calcium through an endogenous mechanism(s) may have general implications for applications in basic science, neural prostheses, and therapeutic interventions. Suhrud M. Rajguru1, Greg M. Dittami1, Richard D. Rabbitt2,3, Claus-Peter Richter1,4, Stephen M. Highstein3

1Dept. of Otolaryngology, Northwestern University, Chicago, IL,
2Dept. of Bioengineering, University of Utah, Salt lake City, UT
3Marine Biological Laboratory, Woods Hole, MA
4Dept. of Communication Sciences and Disorders, Northwestern University, Evanston, IL

Seminar with Dr. David McAlpine November 11th

Dr. David McAlpine; UCL Ear Institute

Title: Biophysical limits to binaural temporal coding

Seminar with Dr. Mark A. Parker November 19th

Dr. Mark A. Parker, Department of Communication Sciences & Disorders, Emerson College; Research Associate, Eaton-Peabody laboratories, MEEI

Title: Genenetic Engineering in the Treatment of Hearing Loss: Novel Approaches to Hair Cell Regeneration

Abstract: The overarching aim of this research is to investigate mechanisms by which cochlear supporting cells differentiate into hair cells in mammals. A growing body of evidence suggests that the supporting cells of the organ of Corti maintain the potential to develop hair cell characteristics, including cilia formation, myosin 7a labeling, and proper hair cell function. These studies are based on observations that cochlear supporting cells that over-express the pro-hair cell gene Atoh1 will develop into hair cells. My current work investigates the creation of novel genetic constructs engineered to control the expression and of Atoh1 in specific supporting cell populations. This talk will review this work and provide a background for the utility of genetic engineering as it apples to the treatment of hearing loss.

 

December 2010

Seminar with Dr. Jayaganesh Swaminathan December 17th

Dr. Jayaganesh Swaminathan; Research Laboratory of Electronics, MIT

Title: Interactions between neural coding of temporal fine-structure and envelop are critical for the perception of noise degraded speech