23935931	Auditory sensory modulation difficulties and problems with automatic re-orienting to sound are well documented in autism spectrum disorders (ASD). Abnormal preattentive arousal processes may contribute to these deficits. In this study, we investigated components of the cortical auditory evoked potential (CAEP) reflecting preattentive arousal in children with ASD and typically developing (TD) children aged 3-8 years. Pairs of clicks ('S1' and 'S2') separated by a 1 sec S1-S2 interstimulus interval (ISI) and much longer (8-10 sec) S1-S1 ISIs were presented monaurally to either the left or right ear. In TD children, the P50, P100 and N1c CAEP components were strongly influenced by temporal novelty of clicks and were much greater in response to the S1 than the S2 click. Irrespective of the stimulation side, the 'tangential' P100 component was rightward lateralized in TD children, whereas the 'radial' N1c component had higher amplitude contralaterally to the stimulated ear. Compared to the TD children, children with ASD demonstrated 1) reduced amplitude of the P100 component under the condition of temporal novelty (S1) and 2) an attenuated P100 repetition suppression effect. The abnormalities were lateralized and depended on the presentation side. They were evident in the case of the left but not the right ear stimulation. The P100 abnormalities in ASD correlated with the degree of developmental delay and with the severity of auditory sensory modulation difficulties observed in early life. The results suggest that some rightward-lateralized brain networks that are crucially important for arousal and attention re-orienting are compromised in children with ASD and that this deficit contributes to sensory modulation difficulties and possibly even other behavioral deficits in ASD.	\N	\N
21823798	In older adults, difficulties processing complex auditory scenes, such as speech comprehension in noisy environments, might be due to a specific impairment of temporal processing at early, automatic processing stages involving auditory sensory memory (ASM). Even though age effects on auditory temporal processing have been well-documented, there is a paucity of research on how ASM processing of more complex tone-patterns is altered by age. In the current study, age effects on ASM processing of temporal and frequency aspects of two-tone patterns were investigated using a passive listening protocol. The P1 component, the mismatch negativity (MMN) and the P3a component of event-related brain potentials (ERPs) to tone frequency and temporal pattern deviants were recorded in younger and older adults as a measure of auditory event detection, ASM processing, and attention switching, respectively. MMN was elicited with smaller amplitude to both frequency and temporal deviants in older adults. Furthermore, P3a was elicited only in the younger adults. In conclusion, the smaller MMN amplitude indicates that automatic processing of both frequency and temporal aspects of two-tone patterns is impaired in older adults. The failure to initiate an attention switch, suggested by the absence of P3a, indicates that impaired ASM processing of patterns may lead to less distractibility in older adults. Our results suggest age-related changes in ASM processing of patterns that cannot be explained by an inhibitory deficit.	\N	\N
19929331	Native language experience plays a critical role in shaping speech categorization, but the exact mechanisms by which it does so are not well understood. Investigating category learning of nonspeech sounds with which listeners have no prior experience allows their experience to be systematically controlled in a way that is impossible to achieve by studying natural speech acquisition, and it provides a means of probing the boundaries and constraints that general auditory perception and cognition bring to the task of speech category learning. In this study, we used a multimodal, video-game-based implicit learning paradigm to train participants to categorize acoustically complex, nonlinguistic sounds. MMN responses to the nonspeech stimuli were collected before and after training, and changes in MMN resulting from the nonspeech category learning closely resemble patterns of change typically observed during speech category learning. Results indicate that changes in mismatch negativity resulting from the nonspeech category learning closely resemble patterns of change typically observed during speech category learning. This suggests that the often-observed "specialized" neural responses to speech sounds may result, at least in part, from the expertise we develop with speech categories through experience rather than from properties unique to speech (e.g., linguistic or vocal tract gestural information). Furthermore, particular characteristics of the training paradigm may inform our understanding of mechanisms that support natural speech acquisition.	\N	\N
20578033	Understanding the basic neural processes that underlie complex higher-order cognitive operations and functional domains is a fundamental goal of cognitive neuroscience. Electroencephalography (EEG) is a non-invasive and relatively inexpensive method for assessing neurophysiological function that can be used to achieve this goal. EEG measures the electrical activity of large, synchronously firing populations of neurons in the brain with electrodes placed on the scalp. This unit outlines the basics of setting up an EEG experiment with human participants, including equipment, and a step-by-step guide to applying and preparing an electrode cap. Also included are support protocols for two event-related potential (ERP) paradigms, P50 suppression, and mismatch negativity (MMN), which are measures of early sensory processing. These paradigms can be used to assess the integrity of early sensory processing in normal individuals and clinical populations, such as individuals with schizophrenia.	\N	\N
20665718	Subjects detected rarely occurring shifts between two simple tone-patterns, in a paradigm that dissociated the effects of rarity from those of pitch, habituation, and attention. Whole-head magnetoencephalography suggested that rare attended pattern-shifts evoked activity first in the superior temporal plane (sTp, peak ~100 ms), then superior temporal sulcus (sTs, peak ~130 ms), then posteroventral prefrontal (pvpF, peak ~230 ms), and anterior temporal cortices (aT, peak ~370 ms). Activity was more prominent in the right hemisphere. After subtracting the effects of nonshift tones (balanced for pitch and habituation status), weak but consistent differential effects of pattern-shifts began in aT at 90-130 ms, spread to sTs and sTp at ∼130 ms, then pvpF, and finally returned to aT. Cingulate activity resembled prefrontal. Responses to pattern shifts were greatly attenuated when the same stimuli were ignored, suggesting that the initial superior temporal activity reflected an attention-related mismatch negativity. The prefrontal activity at ~230 ms corresponded in latency and task correlates with simultaneously recorded event-related potential components N2b and P3a; the subsequent temporal activity corresponded to the P3b. These results were confirmed in sensors specific for frontal or temporal cortex, and thus are independent of the inverse method used. Overall, these results suggest that auditory working memory for temporal patterns begins with detection of the pattern change by an interaction of anterior and superior temporal structures, followed by identification of the event and its consequences led by posteroventral prefrontal and cingulate cortices, and finally, definitive encoding of the event in anterior temporal areas.	\N	\N
20929535	We investigated the processing of task-irrelevant and unexpected novel sounds and its modulation by working-memory load in children aged 9-10 and in adults. Environmental sounds (novels) were embedded amongst frequently presented standard sounds in an auditory-visual distraction paradigm. Each sound was followed by a visual target. In two conditions, participants evaluated the position of a visual stimulus (0-back, low load) or compared the position of the current stimulus with the one two trials before (2-back, high load). Processing of novel sounds were measured with reaction times, hit rates and the auditory event-related brain potentials (ERPs) Mismatch Negativity (MMN), P3a, Reorienting Negativity (RON) and visual P3b. In both memory load conditions novels impaired task performance in adults whereas they improved performance in children. Auditory ERPs reflect age-related differences in the time-window of the MMN as children showed a positive ERP deflection to novels whereas adults lack an MMN. The attention switch towards the task irrelevant novel (reflected by P3a) was comparable between the age groups. Adults showed more efficient reallocation of attention (reflected by RON) under load condition than children. Finally, the P3b elicited by the visual target stimuli was reduced in both age groups when the preceding sound was a novel. Our results give new insights in the development of novelty processing as they (1) reveal that task-irrelevant novel sounds can result in contrary effects on the performance in a visual primary task in children and adults, (2) show a positive ERP deflection to novels rather than an MMN in children, and (3) reveal effects of auditory novels on visual target processing.	\N	\N
21368051	Certain features of objects or events can be represented by more than a single sensory system, such as roughness of a surface (sight, sound, and touch), the location of a speaker (audition and sight), and the rhythm or duration of an event (by all three major sensory systems). Thus, these properties can be said to be sensory-independent or amodal. A key question is whether common multisensory cortical regions process these amodal features, or does each sensory system contain its own specialized region(s) for processing common features? We tackled this issue by investigating simple duration-detection mechanisms across audition and touch; these systems were chosen because fine duration discriminations are possible in both. The mismatch negativity (MMN) component of the human event-related potential provides a sensitive metric of duration processing and has been elicited independently during both auditory and somatosensory investigations. Employing high-density electroencephalographic recordings in conjunction with intracranial subdural recordings, we asked whether fine duration discriminations, represented by the MMN, were generated in the same cortical regions regardless of the sensory modality being probed. Scalp recordings pointed to statistically distinct MMN topographies across senses, implying differential underlying cortical generator configurations. Intracranial recordings confirmed these noninvasive findings, showing generators of the auditory MMN along the superior temporal gyrus with no evidence of a somatosensory MMN in this region, whereas a robust somatosensory MMN was recorded from postcentral gyrus in the absence of an auditory MMN. The current data clearly argue against a common circuitry account for amodal duration processing.	\N	\N
21483666	Acute stress is a stereotypical, but multimodal response to a present or imminent challenge overcharging an organism. Among the different branches of this multimodal response, the consequences of glucocorticoid secretion have been extensively investigated, mostly in connection with long-term memory (LTM). However, stress responses comprise other endocrine signaling and altered neuronal activity wholly independent of pituitary regulation. To date, knowledge of the impact of such "paracorticoidal" stress responses on higher cognitive functions is scarce. We investigated the impact of an ecological stressor on the ability to direct selective attention using event-related potentials in humans. Based on research in rodents, we assumed that a stress-induced imbalance of catecholaminergic transmission would impair this ability. The stressor consisted of a single cold pressor test. Auditory negative difference (Nd) and mismatch negativity (MMN) were recorded in a tonal dichotic listening task. A time series of such tasks confirmed an increased distractibility occurring 4-7 minutes after onset of the stressor as reflected by an attenuated Nd. Salivary cortisol began to rise 8-11 minutes after onset when no further modulations in the event-related potentials (ERP) occurred, thus precluding a causal relationship. This effect may be attributed to a stress-induced activation of mesofrontal dopaminergic projections. It may also be attributed to an activation of noradrenergic projections. Known characteristics of the modulation of ERP by different stress-related ligands were used for further disambiguation of causality. The conjuncture of an attenuated Nd and an increased MMN might be interpreted as indicating a dopaminergic influence. The selective effect on the late portion of the Nd provides another tentative clue for this. Prior studies have deliberately tracked the adrenocortical influence on cognition, as it has proven most influential with respect to LTM. However, current cortisol-optimized study designs would have failed to detect the present findings regarding attention.	\N	\N
21750713	In the present study we investigated the capacity of the memory store underlying the mismatch negativity (MMN) response in musicians and nonmusicians for complex tone patterns. While previous studies have focused either on the kind of information that can be encoded or on the decay of the memory trace over time, we studied capacity in terms of the length of tone sequences, i.e., the number of individual tones that can be fully encoded and maintained. By means of magnetoencephalography (MEG) we recorded MMN responses to deviant tones that could occur at any position of standard tone patterns composed of four, six or eight tones during passive, distracted listening. Whereas there was a reliable MMN response to deviant tones in the four-tone pattern in both musicians and nonmusicians, only some individuals showed MMN responses to the longer patterns. This finding of a reliable capacity of the short-term auditory store underlying the MMN response is in line with estimates of a three to five item capacity of the short-term memory trace from behavioural studies, although pitch and contour complexity covaried with sequence length, which might have led to an understatement of the reported capacity. Whereas there was a tendency for an enhancement of the pattern MMN in musicians compared to nonmusicians, a strong advantage for musicians could be shown in an accompanying behavioural task of detecting the deviants while attending to the stimuli for all pattern lengths, indicating that long-term musical training differentially affects the memory capacity of auditory short-term memory for complex tone patterns with and without attention. Also, a left-hemispheric lateralization of MMN responses in the six-tone pattern suggests that additional networks that help structuring the patterns in the temporal domain might be recruited for demanding auditory processing in the pitch domain.	\N	\N
21808660	Have you ever shouted your child's name from the kitchen while they were watching television in the living room to no avail, so you shout their name again, only louder? Yet, still no response. The current study provides evidence that young children process loudness changes differently than pitch changes when they are engaged in another task such as watching a video. Intensity level changes were physiologically detected only when they were behaviorally relevant, but frequency level changes were physiologically detected without task relevance in younger children. This suggests that changes in pitch rather than changes in volume may be more effective in evoking a response when sounds are unexpected. Further, even though behavioral ability may appear to be similar in younger and older children, attention-based physiologic responses differ from automatic physiologic processes in children. Results indicate that 1) the automatic auditory processes leading to more efficient higher-level skills continue to become refined through childhood; and 2) there are different time courses for the maturation of physiological processes encoding the distinct acoustic attributes of sound pitch and sound intensity. The relevance of these findings to sound perception in real-world environments is discussed.	\N	\N
22163029	The detection of deviant sounds is a crucial function of the auditory system and is reflected by the automatically elicited mismatch negativity (MMN), an auditory evoked potential at 100 to 250 ms from stimulus onset. It has recently been shown that rarely occurring frequency and location deviants in an oddball paradigm trigger a more negative response than standard sounds at very early latencies in the middle latency response of the human auditory evoked potential. This fast and early ability of the auditory system is corroborated by the finding of neurons in the animal auditory cortex and subcortical structures, which restore their adapted responsiveness to standard sounds, when a rare change in a sound feature occurs. In this study, we investigated whether the detection of intensity deviants is also reflected at shorter latencies than those of the MMN. Auditory evoked potentials in response to click sounds were analyzed regarding the auditory brain stem response, the middle latency response (MLR) and the MMN. Rare stimuli with a lower intensity level than standard stimuli elicited (in addition to an MMN) a more negative potential in the MLR at the transition from the Na to the Pa component at circa 24 ms from stimulus onset. This finding, together with the studies about frequency and location changes, suggests that the early automatic detection of deviant sounds in an oddball paradigm is a general property of the auditory system.	\N	\N
22213909	Behavioural and electrophysiological studies give differing impressions of when auditory discrimination is mature. Ability to discriminate frequency and speech contrasts reaches adult levels only around 12 years of age, yet an electrophysiological index of auditory discrimination, the mismatch negativity (MMN), is reported to be as large in children as in adults. Auditory ERPs were measured in 30 children (7 to 12 years), 23 teenagers (13 to 16 years) and 32 adults (35 to 56 years) in an oddball paradigm with tone or syllable stimuli. For each stimulus type, a standard stimulus (1000 Hz tone or syllable [ba]) occurred on 70% of trials, and one of two deviants (1030 or 1200 Hz tone, or syllables [da] or [bi]) equiprobably on the remaining trials. For the traditional MMN interval of 100–250 ms post-onset, size of mismatch responses increased with age, whereas the opposite trend was seen for an interval from 300 to 550 ms post-onset, corresponding to the late discriminative negativity (LDN). Time-frequency analysis of single trials revealed that the MMN resulted from phase-synchronization of oscillations in the theta (4–7 Hz) range, with greater synchronization in adults than children. Furthermore, the amount of synchronization was significantly correlated with frequency discrimination threshold. These results show that neurophysiological processes underlying auditory discrimination continue to develop through childhood and adolescence. Previous reports of adult-like MMN amplitudes in children may be artefactual results of using peak measurements when comparing groups that differ in variance.	\N	\N
22221004	Deviations from repetitive auditory stimuli evoke a mismatch negativity (MMN). Counterintuitively, omissions of repetitive stimuli do not. Violations of patterns reflecting complex rules also evoke MMN. To detect a MMN to missing stimuli, we developed an auditory gestalt task using one stimulus. Groups of six pips (50 ms duration, 330 ms stimulus onset asynchrony [SOA], 400 trials), were presented with an intertrial interval (ITI) of 750 ms while subjects (n=16) watched a silent video. Occasional deviant groups had missing 4th or 6th tones (50 trials each). Missing stimuli evoked a MMN (p<.05). The missing 4th (-0.8 µV, p<.01) and the missing 6th stimuli (-1.1 µV, p<.05) were more negative than standard 6th stimuli (0.3 µV). MMN can be elicited by a missing stimulus at long SOAs by violation of a gestalt grouping rule. Patterned stimuli appear more sensitive to omissions and ITI than homogenous streams.	\N	\N
22551948	Recent studies show that electrophysiological markers of auditory processing such as the cortical 100 ms response (M100) and the mismatch field, derived from magnetoencephalography, might be used to identify children with autism spectrum disorders--M100 peak latency--and to stratify children with autism according to the degree of language impairment--mismatch field peak latency. The present study examined the latency of right superior temporal gyrus M100 and mismatch field in a cohort of children and young adolescents with specific language impairment (n=17), in comparison with age-matched and nonverbal intelligence quotient-matched typically developing controls (n=21). Neither group showed symptoms associated with autism. Although M100 latency (reflecting early auditory processing) did not distinguish controls from children with specific language impairment, the later 'change detection' mismatch field response was significantly delayed (by >50 ms) in the specific language impairment group. Linear discriminant analysis confirmed the role of mismatch field latency (92%) but not M100 latency (8%) in distinguishing groups. The present results lend support to the claim that a delayed M100 is specific to autism spectrum disorders (with relative independence of degree of language impairment) and that a delayed mismatch field reflects an abnormality more generally associated with language impairment, suggesting that mismatch field delay in the present specific language impairment group and previously reported in autistic children with language impairment may be indicative of a common neural system dysfunction.	\N	\N
22570723	Multisensory learning and resulting neural brain plasticity have recently become a topic of renewed interest in human cognitive neuroscience. Music notation reading is an ideal stimulus to study multisensory learning, as it allows studying the integration of visual, auditory and sensorimotor information processing. The present study aimed at answering whether multisensory learning alters uni-sensory structures, interconnections of uni-sensory structures or specific multisensory areas. In a short-term piano training procedure musically naive subjects were trained to play tone sequences from visually presented patterns in a music notation-like system [Auditory-Visual-Somatosensory group (AVS)], while another group received audio-visual training only that involved viewing the patterns and attentively listening to the recordings of the AVS training sessions [Auditory-Visual group (AV)]. Training-related changes in cortical networks were assessed by pre- and post-training magnetoencephalographic (MEG) recordings of an auditory, a visual and an integrated audio-visual mismatch negativity (MMN). The two groups (AVS and AV) were differently affected by the training. The results suggest that multisensory training alters the function of multisensory structures, and not the uni-sensory ones along with their interconnections, and thus provide an answer to an important question presented by cognitive models of multisensory training.	\N	\N
22815876	The precise neural mechanisms underlying speech sound representations are still a matter of debate. Proponents of 'sparse representations' assume that on the level of speech sounds, only contrastive or otherwise not predictable information is stored in long-term memory. Here, in a passive oddball paradigm, we challenge the neural foundations of such a 'sparse' representation; we use words that differ only in their penultimate consonant ("coronal" [t] vs. "dorsal" [k] place of articulation) and for example distinguish between the German nouns Latz ([lats]; bib) and Lachs ([laks]; salmon). Changes from standard [t] to deviant [k] and vice versa elicited a discernible Mismatch Negativity (MMN) response. Crucially, however, the MMN for the deviant [lats] was stronger than the MMN for the deviant [laks]. Source localization showed this difference to be due to enhanced brain activity in right superior temporal cortex. These findings reflect a difference in phonological 'sparsity': Coronal [t] segments, but not dorsal [k] segments, are based on more sparse representations and elicit less specific neural predictions; sensory deviations from this prediction are more readily 'tolerated' and accordingly trigger weaker MMNs. The results foster the neurocomputational reality of 'representationally sparse' models of speech perception that are compatible with more general predictive mechanisms in auditory perception.	\N	\N
22916282	Auditory deviance detection in humans is indexed by the mismatch negativity (MMN), a component of the auditory evoked potential (AEP) of the electroencephalogram (EEG) occurring at a latency of 100-250 ms after stimulus onset. However, by using classic oddball paradigms, differential responses to regularity violations of simple auditory features have been found at the level of the middle latency response (MLR) of the AEP occurring within the first 50 ms after stimulus (deviation) onset. These findings suggest the existence of fast deviance detection mechanisms for simple feature changes, but it is not clear whether deviance detection among more complex acoustic regularities could be observed at such early latencies. To test this, we examined the pre-attentive processing of rare stimulus repetitions in a sequence of tones alternating in frequency in both long and middle latency ranges. Additionally, we introduced occasional changes in the interaural time difference (ITD), so that a simple-feature regularity could be examined in the same paradigm. MMN was obtained for both repetition and ITD deviants, occurring at 150 ms and 100 ms after stimulus onset respectively. At the level of the MLR, a difference was observed between standards and ITD deviants at the Na component (20-30 ms after stimulus onset), for 800 Hz tones, but not for repetition deviants. These findings suggest that detection mechanisms for deviants to simple regularities, but not to more complex regularities, are already activated in the MLR range, supporting the view that the auditory deviance detection system is organized in a hierarchical manner.	\N	\N
23028971	For the perception of timbre of a musical instrument, the attack time is known to hold crucial information. The first 50 to 150 ms of sound onset reflect the excitation mechanism, which generates the sound. Since auditory processing and music perception in particular are known to be hampered in cochlear implant (CI) users, we conducted an electroencephalography (EEG) study with an oddball paradigm to evaluate the processing of small differences in musical sound onset. The first 60 ms of a cornet sound were manipulated in order to examine whether these differences are detected by CI users and normal-hearing controls (NH controls), as revealed by auditory evoked potentials (AEPs). Our analysis focused on the N1 as an exogenous component known to reflect physical stimuli properties as well as on the P2 and the Mismatch Negativity (MMN). Our results revealed different N1 latencies as well as P2 amplitudes and latencies for the onset manipulations in both groups. An MMN could be elicited only in the NH control group. Together with additional findings that suggest an impact of musical training on CI users' AEPs, our findings support the view that impaired timbre perception in CI users is at partly due to altered sound onset feature detection.	\N	\N
23131615	This study investigated whether the mismatch negativity (MMN) event-related brain potential (ERP) could be evoked by purely top-down, attentional control. An infrequently occurring tone was designated as a target prior to presenting a randomized sequence of five equi-probably occurring tones. MMN elicitation to the tones categorized as "high", "medium", or "low" frequency, and designated as the target, would indicate that the change detection process can be driven solely by top-down control. However, MMNs were not elicited by the categorized tones. Only the N2b and P3b attention-driven target detection components were elicited. These results suggest that top-down factors alone cannot generate mismatch negativity. Standard formation by stimulus-driven factors is required.	\N	\N
23241212	Coloured-hearing (CH) synesthesia is a perceptual phenomenon in which an acoustic stimulus (the inducer) initiates a concurrent colour perception (the concurrent). Individuals with CH synesthesia "see" colours when hearing tones, words, or music; this specific phenomenon suggesting a close relationship between auditory and visual representations. To date, it is still unknown whether the perception of colours is associated with a modulation of brain functions in the inducing brain area, namely in the auditory-related cortex and associated brain areas. In addition, there is an on-going debate as to whether attention to the inducer is necessarily required for eliciting a visual concurrent, or whether the latter can emerge in a pre-attentive fashion. By using the EEG technique in the context of a pre-attentive mismatch negativity (MMN) paradigm, we show that the binding of tones and colours in CH synesthetes is associated with increased MMN amplitudes in response to deviant tones supposed to induce novel concurrent colour perceptions. Most notably, the increased MMN amplitudes we revealed in the CH synesthetes were associated with stronger intracerebral current densities originating from the auditory cortex, parietal cortex, and ventral visual areas. The automatic binding of tones and colours in CH synesthetes is accompanied by an early pre-attentive process recruiting the auditory cortex, inferior and superior parietal lobules, as well as ventral occipital areas.	\N	\N
23308266	Computational and experimental research has revealed that auditory sensory predictions are derived from regularities of the current environment by using internal generative models. However, so far, what has not been addressed is how the auditory system handles situations giving rise to redundant or even contradictory predictions derived from different sources of information. To this end, we measured error signals in the event-related brain potentials (ERPs) in response to violations of auditory predictions. Sounds could be predicted on the basis of overall probability, i.e., one sound was presented frequently and another sound rarely. Furthermore, each sound was predicted by an informative visual cue. Participants' task was to use the cue and to discriminate the two sounds as fast as possible. Violations of the probability based prediction (i.e., a rare sound) as well as violations of the visual-auditory prediction (i.e., an incongruent sound) elicited error signals in the ERPs (Mismatch Negativity [MMN] and Incongruency Response [IR]). Particular error signals were observed even in case the overall probability and the visual symbol predicted different sounds. That is, the auditory system concurrently maintains and tests contradictory predictions. Moreover, if the same sound was predicted, we observed an additive error signal (scalp potential and primary current density) equaling the sum of the specific error signals. Thus, the auditory system maintains and tolerates functionally independently represented redundant and contradictory predictions. We argue that the auditory system exploits all currently active regularities in order to optimally prepare for future events.	\N	\N
23585888	To localize the neural generators of the musically elicited mismatch negativity with high temporal resolution we conducted a beamformer analysis (Synthetic Aperture Magnetometry, SAM) on magnetoencephalography (MEG) data from a previous musical mismatch study. The stimuli consisted of a six-tone melodic sequence comprising broken chords in C- and G-major. The musical sequence was presented within an oddball paradigm in which the last tone was lowered occasionally (20%) by a minor third. The beamforming analysis revealed significant right hemispheric neural activation in the superior temporal (STC), inferior frontal (IFC), superior frontal (SFC) and orbitofrontal (OFC) cortices within a time window of 100-200 ms after the occurrence of a deviant tone. IFC and SFC activation was also observed in the left hemisphere. The pronounced early right inferior frontal activation of the auditory mismatch negativity has not been shown in MEG studies so far. The activation in STC and IFC is consistent with earlier electroencephalography (EEG), optical imaging and functional magnetic resonance imaging (fMRI) studies that reveal the auditory and inferior frontal cortices as main generators of the auditory MMN. The observed right hemispheric IFC is also in line with some previous music studies showing similar activation patterns after harmonic syntactic violations. The results demonstrate that a deviant tone within a musical sequence recruits immediately a distributed neural network in frontal and prefrontal areas suggesting that top-down processes are involved when expectation violation occurs within well-known stimuli.	\N	\N
23617597	The human auditory cortex automatically encodes acoustic input from the environment and differentiates regular sound patterns from deviant ones in order to identify important, irregular events. The Mismatch Negativity (MMN) response is a neuronal marker for the detection of sounds that are unexpected, based on the encoded regularities. It is also elicited by violations of more complex regularities and musical expertise has been shown to have an effect on the processing of complex regularities. Using magnetoencephalography (MEG), we investigated the MMN response to salient or less salient deviants by varying the standard probability (70%, 50% and 35%) of a pattern oddball paradigm. To study the effects of musical expertise in the encoding of the patterns, we compared the responses of a group of non-musicians to those of musicians. We observed significant MMN in all conditions, including the least salient condition (35% standards), in response to violations of the predominant tone pattern for both groups. The amplitude of MMN from the right hemisphere was influenced by the standard probability. This effect was modulated by long-term musical training: standard probability changes influenced MMN amplitude in the group of non-musicians only. This study indicates that pattern violations are detected automatically, even if they are of very low salience, both in non-musicians and musicians, with salience having a stronger impact on processing in the right hemisphere of non-musicians. Long-term musical training influences this encoding, in that non-musicians benefit to a greater extent from a good signal-to-noise ratio (i.e. high probability of the standard pattern), while musicians are less dependent on the salience of an acoustic environment.	\N	\N
23708059	The auditory system is organized such that progressively more complex features are represented across successive cortical hierarchical stages. Just when and where the processing of phonemes, fundamental elements of the speech signal, is achieved in this hierarchy remains a matter of vigorous debate. Non-invasive measures of phonemic representation have been somewhat equivocal. While some studies point to a primary role for middle/anterior regions of the superior temporal gyrus (STG), others implicate the posterior STG. Differences in stimulation, task and inter-individual anatomical/functional variability may account for these discrepant findings. Here, we sought to clarify this issue by mapping phonemic representation across left perisylvian cortex, taking advantage of the excellent sampling density afforded by intracranial recordings in humans. We asked whether one or both major divisions of the STG were sensitive to phonemic transitions. The high signal-to-noise characteristics of direct intracranial recordings allowed for analysis at the individual participant level, circumventing issues of inter-individual anatomic and functional variability that may have obscured previous findings at the group level of analysis. The mismatch negativity (MMN), an electrophysiological response elicited by changes in repetitive streams of stimulation, served as our primary dependent measure. Oddball configurations of pairs of phonemes, spectro-temporally matched non-phonemes, and simple tones were presented. The loci of the MMN clearly differed as a function of stimulus type. Phoneme representation was most robust over middle/anterior STG/STS, but was also observed over posterior STG/SMG. These data point to multiple phonemic processing zones along perisylvian cortex, both anterior and posterior to primary auditory cortex. This finding is considered within the context of a dual stream model of auditory processing in which functionally distinct ventral and dorsal auditory processing pathways may be engaged by speech stimuli.	\N	\N
23715097	In this study, we used magnetoencephalography and a mismatch paradigm to investigate speech processing in stroke patients with auditory comprehension deficits and age-matched control subjects. We probed connectivity within and between the two temporal lobes in response to phonemic (different word) and acoustic (same word) oddballs using dynamic causal modelling. We found stronger modulation of self-connections as a function of phonemic differences for control subjects versus aphasics in left primary auditory cortex and bilateral superior temporal gyrus. The patients showed stronger modulation of connections from right primary auditory cortex to right superior temporal gyrus (feed-forward) and from left primary auditory cortex to right primary auditory cortex (interhemispheric). This differential connectivity can be explained on the basis of a predictive coding theory which suggests increased prediction error and decreased sensitivity to phonemic boundaries in the aphasics' speech network in both hemispheres. Within the aphasics, we also found behavioural correlates with connection strengths: a negative correlation between phonemic perception and an inter-hemispheric connection (left superior temporal gyrus to right superior temporal gyrus), and positive correlation between semantic performance and a feedback connection (right superior temporal gyrus to right primary auditory cortex). Our results suggest that aphasics with impaired speech comprehension have less veridical speech representations in both temporal lobes, and rely more on the right hemisphere auditory regions, particularly right superior temporal gyrus, for processing speech. Despite this presumed compensatory shift in network connectivity, the patients remain significantly impaired.	\N	\N
23825422	Hierarchical predictive coding suggests that attention in humans emerges from increased precision in probabilistic inference, whereas expectation biases attention in favor of contextually anticipated stimuli. We test these notions within auditory perception by independently manipulating top-down expectation and attentional precision alongside bottom-up stimulus predictability. Our findings support an integrative interpretation of commonly observed electrophysiological signatures of neurodynamics, namely mismatch negativity (MMN), P300, and contingent negative variation (CNV), as manifestations along successive levels of predictive complexity. Early first-level processing indexed by the MMN was sensitive to stimulus predictability: here, attentional precision enhanced early responses, but explicit top-down expectation diminished it. This pattern was in contrast to later, second-level processing indexed by the P300: although sensitive to the degree of predictability, responses at this level were contingent on attentional engagement and in fact sharpened by top-down expectation. At the highest level, the drift of the CNV was a fine-grained marker of top-down expectation itself. Source reconstruction of high-density EEG, supported by intracranial recordings, implicated temporal and frontal regions differentially active at early and late levels. The cortical generators of the CNV suggested that it might be involved in facilitating the consolidation of context-salient stimuli into conscious perception. These results provide convergent empirical support to promising recent accounts of attention and expectation in predictive coding.	\N	\N
23850664	Over the last four decades, a range of different neuroimaging tools have been used to study human auditory attention, spanning from classic event-related potential studies using electroencephalography to modern multimodal imaging approaches (e.g., combining anatomical information based on magnetic resonance imaging with magneto- and electroencephalography). This review begins by exploring the different strengths and limitations inherent to different neuroimaging methods, and then outlines some common behavioral paradigms that have been adopted to study auditory attention. We argue that in order to design a neuroimaging experiment that produces interpretable, unambiguous results, the experimenter must not only have a deep appreciation of the imaging technique employed, but also a sophisticated understanding of perception and behavior. Only with the proper caveats in mind can one begin to infer how the cortex supports a human in solving the "cocktail party" problem. This article is part of a Special Issue entitled Human Auditory Neuroimaging.	\N	\N
23886958	The purpose of the study was to test the hypothesis that sound context modulates the magnitude of auditory distraction, indexed by behavioral and electrophysiological measures. Participants were asked to identify tone duration, while irrelevant changes occurred in tone frequency, tone intensity, and harmonic structure. Frequency deviants were randomly intermixed with standards (Uni-Condition), with intensity deviants (Bi-Condition), and with both intensity and complex deviants (Tri-Condition). Only in the Tri-Condition did the auditory distraction effect reflect the magnitude difference among the frequency and intensity deviants. The mixture of the different types of deviants in the Tri-Condition modulated the perceived level of distraction, demonstrating that the sound context can modulate the effect of deviance level on processing irrelevant acoustic changes in the environment. These findings thus indicate that perceptual contrast plays a role in change detection processes that leads to auditory distraction.	\N	\N
23920129	A better understanding of melodic pitch perception in cochlear implants (CIs) may guide signal processing and/or rehabilitation techniques to improve music perception and appreciation in CI patients. In this study, the mismatch negativity (MMN) in response to infrequent changes in 5-tone pitch contours was obtained in CI users and normal-hearing (NH) listeners. Melodic contour identification (MCI) was also measured. Results showed that MCI performance was poorer in CI than in NH subjects; the MMNs were missing in all CI subjects for the 1-semitone contours. The MMNs with the 5-semitone contours were observed in a smaller proportion of CI than NH subjects. Results suggest that encoding of pitch contour changes in CI users appears to be degraded, most likely due to the limited pitch cues provided by the CI and deafness-related compromise of brain substrates.	\N	\N
24143195	Unexpected physical increases in the intensity of a frequently occurring "standard" auditory stimulus are experienced as obtrusive. This could either be because of a physical change, the increase in intensity of the "deviant" stimulus, or a psychological change, the violation of the expectancy for the occurrence of the lower intensity standard stimulus. Two experiments were run in which event-related potentials (ERPs) were recorded to determine whether "psychological" increments (violation of an expectancy for a lower intensity) would be processed differently than psychological decrements (violation of an expectancy for a higher intensity). Event-related potentials (ERPs) were recorded while subjects were presented with auditory tones that alternated between low and high intensity. The subjects ignored the auditory stimuli while watching a video. Deviants were created by repeating the same stimulus. In the first experiment, pairs of stimuli alternating in intensity, were presented in separate increment (H-L...H-L...H-H...H-L, in which H = 80 dB SPL and L = 60 dB SPL) and decrement conditions (L-H...L-H...L-L... L-H, in which H = 90 dB SPL and L = 80 dB SPL). The paradigm employed in the second experiment consisted of an alternating intensity pattern (H-L-H-L-H-H-H-L) or (H-L-H-L-L-L-H-L). Importantly, the stimulus prior to the deviant (the standard) and the actual deviants in both increment and decrement conditions in both experiments were physically identical (80 dB SPL tones). The repetition of the lower intensity tone therefore acted as a psychological rather than a physical decrement (a higher intensity tone was expected) while the repetition of the higher intensity tone acted as a psychological increment (a lower intensity tone was expected). The psychological increments in both experiments elicited a larger amplitude mismatch negativity (MMN) than the decrements. Thus, regardless of whether an acoustic change signals a physical increase in intensity or violates an expected decrease in intensity, a large MMN will be elicited.	\N	\N
24158725	The goal of this review article is to redefine what the mismatch negativity (MMN) component of event-related potentials reflects in auditory scene analysis, and to provide an overview of how the MMN serves as a valuable tool in Cognitive Neuroscience research. In doing so, some of the old beliefs (five common 'myths') about MMN will be dispelled, such as the notion that MMN is a simple feature discriminator and that attention itself modulates MMN elicitation. A revised description of what MMN truly reflects will be provided, which includes a principal focus onto the highly context-dependent nature of MMN elicitation and new terminology to discuss MMN and attention. This revised framework will help clarify what has been a long line of seemingly contradictory results from studies in which behavioral ability to hear differences between sounds and passive elicitation of MMN have been inconsistent. Understanding what MMN is will also benefit clinical research efforts by providing a new picture of how to design appropriate paradigms suited to various clinical populations.	\N	\N
24366694	One of the major challenges in human brain science is the functional hemispheric asymmetry of auditory processing. Behavioral and neurophysiological studies have demonstrated that speech processing is dominantly handled in the left hemisphere, whereas music processing dominantly occurs in the right. Using magnetoencephalography, we measured the auditory mismatch negativity elicited by band-pass filtered click-trains, which deviated from frequently presented standard sound signals in a spectral or temporal domain. The results showed that spectral and temporal deviants were dominantly processed in the right and left hemispheres, respectively. Hemispheric asymmetry was not limited to high-level cognitive processes, but also originated from the pre-attentive neural processing stage represented by mismatch negativity.	\N	\N
24475052	The brain response to auditory novelty comprises two main eeg components: an early mismatch negativity and a late P300. Whereas the former has been proposed to reflect a prediction error, the latter is often associated with working memory updating. Interestingly, these two proposals predict fundamentally different dynamics: prediction errors are thought to propagate serially through several distinct brain areas, while working memory supposes that activity is sustained over time within a stable set of brain areas. Here we test this temporal dissociation by showing how the generalization of brain activity patterns across time can characterize the dynamics of the underlying neural processes. This method is applied to magnetoencephalography (MEG) recordings acquired from healthy participants who were presented with two types of auditory novelty. Following our predictions, the results show that the mismatch evoked by a local novelty leads to the sequential recruitment of distinct and short-lived patterns of brain activity. In sharp contrast, the global novelty evoked by an unexpected sequence of five sounds elicits a sustained state of brain activity that lasts for several hundreds of milliseconds. The present results highlight how MEG combined with multivariate pattern analyses can characterize the dynamics of human cortical processes.	\N	\N
24548430	Natural sound environments are dynamic, with overlapping acoustic input originating from simultaneously active sources. A key function of the auditory system is to integrate sensory inputs that belong together and segregate those that come from different sources. We hypothesized that this skill is impaired in individuals with phonological processing difficulties. There is considerable disagreement about whether phonological impairments observed in children with developmental language disorders can be attributed to specific linguistic deficits or to more general acoustic processing deficits. However, most tests of general auditory abilities have been conducted with a single set of sounds. We assessed the ability of school-aged children (7-15 years) to parse complex auditory non-speech input, and determined whether the presence of phonological processing impairments was associated with stream perception performance. A key finding was that children with language impairments did not show the same developmental trajectory for stream perception as typically developing children. In addition, children with language impairments required larger frequency separations between sounds to hear distinct streams compared to age-matched peers. Furthermore, phonological processing ability was a significant predictor of stream perception measures, but only in the older age groups. No such association was found in the youngest children. These results indicate that children with language impairments have difficulty parsing speech streams, or identifying individual sound events when there are competing sound sources. We conclude that language group differences may in part reflect fundamental maturational disparities in the analysis of complex auditory scenes.	\N	\N
24771006	Detecting regularity and change in the environment is crucial for survival, as it enables making predictions about the world and informing goal-directed behavior. In the auditory modality, the detection of regularity involves segregating incoming sounds into distinct perceptual objects (stream segregation). The detection of change from this within-stream regularity is associated with the mismatch negativity, a component of auditory event-related brain potentials (ERPs). A central unanswered question is how the detection of regularity and the detection of change are interrelated, and whether attention affects the former, the latter, or both. Here we show that the detection of regularity and the detection of change can be empirically dissociated, and that attention modulates the detection of change without precluding the detection of regularity, and the perceptual organization of the auditory background into distinct streams. By applying frequency spectra analysis on the EEG of subjects engaged in a selective listening task, we found distinct peaks of ERP synchronization, corresponding to the rhythm of the frequency streams, independently of whether the stream was attended or ignored. Our results provide direct neurophysiological evidence of regularity detection in the auditory background, and show that it can occur independently of change detection and in the absence of attention.	\N	\N
25178752	The ability to read passages of information fluently and with comprehension is a basic component of socioeconomic success. Reading ability depends on the integrity of underlying visual and auditory (phonological) systems. This study investigated the integrity of reading ability in schizophrenia relative to the integrity of underlying visual and auditory function. The participants were 45 schizophrenia patients, 19 clinical high-risk patients, and 65 comparison subjects. Reading was assessed using tests sensitive to visual or phonological reading dysfunction. Sensory, neuropsychological, and functional outcome measures were also obtained. Schizophrenia patients displayed reading deficits that were far more severe (effect size >2.0) than would be predicted based on general neurocognitive impairments (effect size 1.0-1.4). The deficits correlated highly with both visual and auditory sensory measures, including impaired mismatch negativity generation (r=0.62, N=51, p=0.0002). Patients with established schizophrenia displayed both visual and phonological impairments, whereas high-risk patients showed isolated visual impairments. More than 70% of schizophrenia patients met criteria for acquired dyslexia, with 50% reading below eighth grade level despite intact premorbid reading ability. Reading deficits also correlated significantly (rp=0.4, N=30, p=0.03) with failure to match parental socioeconomic achievement, over and above contributions of more general cognitive impairment. Patients with schizophrenia display severe deficits in reading ability that represent a potentially remediable cause of impaired socioeconomic function. Such deficits are not presently captured during routine clinical assessment. Deficits most likely develop during the years immediately surrounding illness onset and may contribute to the reduced educational and occupational achievement associated with schizophrenia.	\N	\N
25231619	In animal models, single-neuron response properties such as stimulus-specific adaptation have been described as possible precursors to mismatch negativity, a human brain response to stimulus change. In the present study, we attempted to bridge the gap between human and animal studies by characterising responses to changes in the frequency of repeated tone series in the anesthetised guinea pig using small-animal magnetoencephalography (MEG). We showed that 1) auditory evoked fields (AEFs) qualitatively similar to those observed in human MEG studies can be detected noninvasively in rodents using small-animal MEG; 2) guinea pig AEF amplitudes reduce rapidly with tone repetition, and this AEF reduction is largely complete by the second tone in a repeated series; and 3) differences between responses to the first (deviant) and later (standard) tones after a frequency transition resemble those previously observed in awake humans using a similar stimulus paradigm.	\N	\N
25342520	Neurofeedback is a strong direct training method for brain function, wherein brain activity patterns are measured and displayed as feedback, and trainees try to stabilize the feedback signal onto certain desirable states to regulate their own mental states. Here, we introduce a novel neurofeedback method, using the mismatch negativity (MMN) responses elicited by similar sounds that cannot be consciously discriminated. Through neurofeedback training, without participants' attention to the auditory stimuli or awareness of what was to be learned, we found that the participants could unconsciously achieve a significant improvement in the auditory discrimination of the applied stimuli. Our method has great potential to provide effortless auditory perceptual training. Based on this method, participants do not need to make an effort to discriminate auditory stimuli, and can choose tasks of interest without boredom due to training. In particular, it could be used to train people to recognize speech sounds that do not exist in their native language and thereby facilitate foreign language learning.	\N	\N
25379456	Although sensory processing abnormalities contribute to widespread cognitive and psychosocial impairments in schizophrenia (SZ) patients, scalp-channel measures of averaged event-related potentials (ERPs) mix contributions from distinct cortical source-area generators, diluting the functional relevance of channel-based ERP measures. SZ patients (n = 42) and non-psychiatric comparison subjects (n = 47) participated in a passive auditory duration oddball paradigm, eliciting a triphasic (Deviant-Standard) tone ERP difference complex, here termed the auditory deviance response (ADR), comprised of a mid-frontal mismatch negativity (MMN), P3a positivity, and re-orienting negativity (RON) peak sequence. To identify its cortical sources and to assess possible relationships between their response contributions and clinical SZ measures, we applied independent component analysis to the continuous 68-channel EEG data and clustered the resulting independent components (ICs) across subjects on spectral, ERP, and topographic similarities. Six IC clusters centered in right superior temporal, right inferior frontal, ventral mid-cingulate, anterior cingulate, medial orbitofrontal, and dorsal mid-cingulate cortex each made triphasic response contributions. Although correlations between measures of SZ clinical, cognitive, and psychosocial functioning and standard (Fz) scalp-channel ADR peak measures were weak or absent, for at least four IC clusters one or more significant correlations emerged. In particular, differences in MMN peak amplitude in the right superior temporal IC cluster accounted for 48% of the variance in SZ-subject performance on tasks necessary for real-world functioning and medial orbitofrontal cluster P3a amplitude accounted for 40%/54% of SZ-subject variance in positive/negative symptoms. Thus, source-resolved auditory deviance response measures including MMN may be highly sensitive to SZ clinical, cognitive, and functional characteristics.	\N	\N
21159322	Phonology is a lower-level structural aspect of language involving the sounds of a language and their organization in that language. Numerous behavioral studies utilizing priming, which refers to an increased sensitivity to a stimulus following prior experience with that or a related stimulus, have provided evidence for the role of phonology in visual word recognition. However, most language studies utilizing priming in conjunction with functional magnetic resonance imaging (fMRI) have focused on lexical-semantic aspects of language processing. The aim of the present study was to investigate the neurobiological substrates of the automatic, implicit stages of phonological processing. While undergoing fMRI, eighteen individuals performed a lexical decision task (LDT) on prime-target pairs including word-word homophone and pseudoword-word pseudohomophone pairs with a prime presentation below perceptual threshold. Whole-brain analyses revealed several cortical regions exhibiting hemodynamic response suppression due to phonological priming including bilateral superior temporal gyri (STG), middle temporal gyri (MTG), and angular gyri (AG) with additional region of interest (ROI) analyses revealing response suppression in the left lateralized supramarginal gyrus (SMG). Homophone and pseudohomophone priming also resulted in different patterns of hemodynamic responses relative to one another. These results suggest that phonological processing plays a key role in visual word recognition. Furthermore, enhanced hemodynamic responses for unrelated stimuli relative to primed stimuli were observed in midline cortical regions corresponding to the default-mode network (DMN) suggesting that DMN activity can be modulated by task requirements within the context of an implicit task.	\N	\N
21500313	The effects of neural activity on cerebral hemodynamics underlie human brain imaging with functional magnetic resonance imaging and positron emission tomography. However, the threshold and characteristics of the converse effects, wherein the cerebral hemodynamic and metabolic milieu influence neural activity, remain unclear. We tested whether mild hypercapnia (5% CO2 ) decreases the magnetoencephalogram response to auditory pattern recognition and visual semantic tasks. Hypercapnia induced statistically significant decreases in event-related fields without affecting behavioral performance. Decreases were observed in early sensory components in both auditory and visual modalities as well as later cognitive components related to memory and language. Effects were distributed across cortical regions. Decreases were comparable in evoked versus spontaneous spectral power. Hypercapnia is commonly used with hemodynamic models to calibrate the blood oxygenation level-dependent response. Modifying model assumptions to incorporate the current findings produce a modest but measurable decrease in the estimated cerebral metabolic rate for oxygen change with activation. Because under normal conditions, low cerebral pH would arise when bloodflow is unable to keep pace with neuronal activity, the cortical depression observed here may reflect a homeostatic mechanism by which neuronal activity is adjusted to a level that can be sustained by available bloodflow. Animal studies suggest that these effects may be mediated by pH-modulating presynaptic adenosine receptors. Although the data is not clear, comparable changes in cortical pH to those induced here may occur during sleep apnea, sleep, and exercise. If so, these results suggest that such activities may in turn have generalized depressive effects on cortical activity.	\N	\N
22496909	Understanding how the brain processes stimuli in a rich natural environment is a fundamental goal of neuroscience. Here, we showed a feature film to 10 healthy volunteers during functional magnetic resonance imaging (fMRI) of hemodynamic brain activity. We then annotated auditory and visual features of the motion picture to inform analysis of the hemodynamic data. The annotations were fitted to both voxel-wise data and brain network time courses extracted by independent component analysis (ICA). Auditory annotations correlated with two independent components (IC) disclosing two functional networks, one responding to variety of auditory stimulation and another responding preferentially to speech but parts of the network also responding to non-verbal communication. Visual feature annotations correlated with four ICs delineating visual areas according to their sensitivity to different visual stimulus features. In comparison, a separate voxel-wise general linear model based analysis disclosed brain areas preferentially responding to sound energy, speech, music, visual contrast edges, body motion and hand motion which largely overlapped the results revealed by ICA. Differences between the results of IC- and voxel-based analyses demonstrate that thorough analysis of voxel time courses is important for understanding the activity of specific sub-areas of the functional networks, while ICA is a valuable tool for revealing novel information about functional connectivity which need not be explained by the predefined model. Our results encourage the use of naturalistic stimuli and tasks in cognitive neuroimaging to study how the brain processes stimuli in rich natural environments.	\N	\N
23777481	Numerous studies have provided clues about the ontogeny of lateralization of auditory processing in humans, but most have employed specific subtypes of stimuli and/or have assessed responses in discrete temporal windows. The present study used near-infrared spectroscopy (NIRS) to establish changes in hemodynamic activity in the neocortex of preverbal infants (aged 4-11 months) while they were exposed to two distinct types of complex auditory stimuli (full sentences and musical phrases). Measurements were taken from bilateral temporal regions, including both anterior and posterior superior temporal gyri. When the infant sample was treated as a homogenous group, no significant effects emerged for stimulus type. However, when infants' hemodynamic responses were categorized according to their overall changes in volume, two very clear neurophysiological patterns emerged. A high-responder group showed a pattern of early and increasing activation, primarily in the left hemisphere, similar to that observed in comparable studies with adults. In contrast, a low-responder group showed a pattern of gradual decreases in activation over time. Although age did track with responder type, no significant differences between these groups emerged for stimulus type, suggesting that the high- versus low-responder characterization generalizes across classes of auditory stimuli. These results highlight a new way to conceptualize the variable cortical blood flow patterns that are frequently observed across infants and stimuli, with hemodynamic response volumes potentially serving as an early indicator of developmental changes in auditory-processing sensitivity.	\N	\N
22786953	Auditory spatial perception plays a critical role in day-to-day communication. For instance, listeners utilize acoustic spatial information to segregate individual talkers into distinct auditory "streams" to improve speech intelligibility. However, spatial localization is an exceedingly difficult task in everyday listening environments with numerous distracting echoes from nearby surfaces, such as walls. Listeners' brains overcome this unique challenge by relying on acoustic timing and, quite surprisingly, visual spatial information to suppress short-latency (1-10 ms) echoes through a process known as "the precedence effect" or "echo suppression." In the present study, we employed electroencephalography (EEG) to investigate the neural time course of echo suppression both with and without the aid of coincident visual stimulation in human listeners. We find that echo suppression is a multistage process initialized during the auditory N1 (70-100 ms) and followed by space-specific suppression mechanisms from 150 to 250 ms. Additionally, we find a robust correlate of listeners' spatial perception (i.e., suppressing or not suppressing the echo) over central electrode sites from 300 to 500 ms. Contrary to our hypothesis, vision's powerful contribution to echo suppression occurs late in processing (250-400 ms), suggesting that vision contributes primarily during late sensory or decision making processes. Together, our findings support growing evidence that echo suppression is a slow, progressive mechanism modifiable by visual influences during late sensory and decision making stages. Furthermore, our findings suggest that audiovisual interactions are not limited to early, sensory-level modulations but extend well into late stages of cortical processing.	\N	\N
25710328	To form a veridical percept of the environment, the brain needs to integrate sensory signals from a common source but segregate those from independent sources. Thus, perception inherently relies on solving the "causal inference problem." Behaviorally, humans solve this problem optimally as predicted by Bayesian Causal Inference; yet, the underlying neural mechanisms are unexplored. Combining psychophysics, Bayesian modeling, functional magnetic resonance imaging (fMRI), and multivariate decoding in an audiovisual spatial localization task, we demonstrate that Bayesian Causal Inference is performed by a hierarchy of multisensory processes in the human brain. At the bottom of the hierarchy, in auditory and visual areas, location is represented on the basis that the two signals are generated by independent sources (= segregation). At the next stage, in posterior intraparietal sulcus, location is estimated under the assumption that the two signals are from a common source (= forced fusion). Only at the top of the hierarchy, in anterior intraparietal sulcus, the uncertainty about the causal structure of the world is taken into account and sensory signals are combined as predicted by Bayesian Causal Inference. Characterizing the computational operations of signal interactions reveals the hierarchical nature of multisensory perception in human neocortex. It unravels how the brain accomplishes Bayesian Causal Inference, a statistical computation fundamental for perception and cognition. Our results demonstrate how the brain combines information in the face of uncertainty about the underlying causal structure of the world.	\N	\N
20146608	The neural responses to sensory consequences of a self-produced motor act are suppressed compared with those in response to a similar but externally generated stimulus. Previous studies in the somatosensory and auditory systems have shown that the motor-induced suppression of the sensory mechanisms is sensitive to delays between the motor act and the onset of the stimulus. The present study investigated time-dependent neural processing of auditory feedback in response to self-produced vocalizations. ERPs were recorded in response to normal and pitch-shifted voice auditory feedback during active vocalization and passive listening to the playback of the same vocalizations. The pitch-shifted stimulus was delivered to the subjects' auditory feedback after a randomly chosen time delay between the vocal onset and the stimulus presentation. Results showed that the neural responses to delayed feedback perturbations were significantly larger than those in response to the pitch-shifted stimulus occurring at vocal onset. Active vocalization was shown to enhance neural responsiveness to feedback alterations only for nonzero delays compared with passive listening to the playback. These findings indicated that the neural mechanisms of auditory feedback processing are sensitive to timing between the vocal motor commands and the incoming auditory feedback. Time-dependent neural processing of auditory feedback may be an important feature of the audio-vocal integration system that helps to improve the feedback-based monitoring and control of voice structure through vocal error detection and correction.	\N	\N
20493828	Magnetoencephalography (MEG) is an increasingly popular non-invasive tool used to record, on a millisecond timescale, the magnetic field changes generated by cortical neural activity. MEG has the advantage, over fMRI for example, that it is a direct measure of neural activity. In the current investigation we used MEG to measure cortical responses to tactile and auditory stimuli in the macaque monkey. We had two aims. First, we sought to determine whether MEG, a technique that may have low spatial accuracy, could be used to distinguish the location and organization of sensory cortical fields in macaque monkeys, a species with a relatively small brain compared to that of the human. Second, we wanted to examine the temporal dynamics of cortical responses in the macaque monkey relative to the human. We recorded MEG data from anesthetized monkeys and, for comparison, from awake humans that were presented with simple tactile and auditory stimuli. Neural source reconstruction of MEG data showed that primary somatosensory and auditory cortex could be differentiated and, further, that separate representations of the digit and lip within somatosensory cortex could be identified in macaque monkeys as well as humans. We compared the latencies of activity from monkey and human data for the three stimulation types and proposed a correspondence between the neural responses of the two species. We thus demonstrate the feasibility of using MEG in the macaque monkey and provide a non-human primate model for examining the relationship between external evoked magnetic fields and their underlying neural sources.	\N	\N
20598152	The detection of any abrupt change in the environment is important to survival. Since memory of preceding sensory conditions is necessary for detecting changes, such a change-detection system relates closely to the memory system. Here we used an auditory change-related N1 subcomponent (change-N1) of event-related brain potentials to investigate cortical mechanisms underlying change detection and echoic memory. Change-N1 was elicited by a simple paradigm with two tones, a standard followed by a deviant, while subjects watched a silent movie. The amplitude of change-N1 elicited by a fixed sound pressure deviance (70 dB vs. 75 dB) was negatively correlated with the logarithm of the interval between the standard sound and deviant sound (1, 10, 100, or 1000 ms), while positively correlated with the logarithm of the duration of the standard sound (25, 100, 500, or 1000 ms). The amplitude of change-N1 elicited by a deviance in sound pressure, sound frequency, and sound location was correlated with the logarithm of the magnitude of physical differences between the standard and deviant sounds. The present findings suggest that temporal representation of echoic memory is non-linear and Weber-Fechner law holds for the automatic cortical response to sound changes within a suprathreshold range. Since the present results show that the behavior of echoic memory can be understood through change-N1, change-N1 would be a useful tool to investigate memory systems.	\N	\N
20633569	Studies in all sensory modalities have demonstrated amplification of early brain responses to attended signals, but less is known about the processes by which listeners selectively ignore stimuli. Here we use MEG and a new paradigm to dissociate the effects of selectively attending, and ignoring in time. Two different tasks were performed successively on the same acoustic stimuli: triplets of tones (A, B, C) with noise-bursts interspersed between the triplets. In the COMPARE task subjects were instructed to respond when tones A and C were of same frequency. In the PASSIVE task they were instructed to respond as fast as possible to noise-bursts. COMPARE requires attending to A and C and actively ignoring tone B, but PASSIVE involves neither attending to nor ignoring the tones. The data were analyzed separately for frontal and auditory-cortical channels to independently address attentional effects on low-level sensory versus putative control processing. We observe the earliest attend/ignore effects as early as 100 ms post-stimulus onset in auditory cortex. These appear to be generated by modulation of exogenous (stimulus-driven) sensory evoked activity. Specifically related to ignoring, we demonstrate that active-ignoring-induced input inhibition involves early selection. We identified a sequence of early (<200 ms post-onset) auditory cortical effects, comprised of onset response attenuation and the emergence of an inhibitory response, and provide new, direct evidence that listeners actively ignoring a sound can reduce their stimulus related activity in auditory cortex by 100 ms after onset when this is required to execute specific behavioral objectives.	\N	\N
21233780	The neural origins of the cortical response to rare sensory events remain poorly understood. Using simultaneous event-related potentials and magnetic resonance imaging, we investigated the anatomical profile of regional activity at various processing stages during performance of auditory and visual variants of an oddball paradigm. The earliest rarity-detection response was found in sensory-specific cortices, rapidly spreading to tertiary association areas, mesial temporal and frontal cortices by 150-200 ms. P3m-related activity was not found in sensory-specific cortices. On the basis of the anatomic distribution of P3m-related activity, this component is likely to reflect more generalized cognitive abilities hosted by association cortical regions.	\N	\N
21261633	Humans must often focus attention onto relevant sensory signals in the presence of simultaneous irrelevant signals. This type of attention has been explored in vision with the N2pc component, and the present study sought to find an analogous auditory effect. In Experiment 1, two 750-ms sounds were presented simultaneously, one from each of two lateral speakers. On each trial, participants indicated whether one of the two sounds was a pre-defined target. We found that targets elicited an N2ac component: a negativity in the N2 latency range at anterior contralateral electrodes. We also observed a later and more posterior contralateral positivity. Experiment 2 replicated these effects and demonstrated that they arose from competition between attended and unattended tones rather than reflecting lateralized effects of attention for individual tones. The N2ac component may provide a useful tool for studying selective attention within auditory scenes.	\N	\N
21305666	Both sighted and blind individuals can readily interpret meaning behind everyday real-world sounds. In sighted listeners, we previously reported that regions along the bilateral posterior superior temporal sulci (pSTS) and middle temporal gyri (pMTG) are preferentially activated when presented with recognizable action sounds. These regions have generally been hypothesized to represent primary loci for complex motion processing, including visual biological motion processing and audio-visual integration. However, it remained unclear whether, or to what degree, life-long visual experience might impact functions related to hearing perception or memory of sound-source actions. Using functional magnetic resonance imaging (fMRI), we compared brain regions activated in congenitally blind versus sighted listeners in response to hearing a wide range of recognizable human-produced action sounds (excluding vocalizations) versus unrecognized, backward-played versions of those sounds. Here, we show that recognized human action sounds commonly evoked activity in both groups along most of the left pSTS/pMTG complex, though with relatively greater activity in the right pSTS/pMTG by the blind group. These results indicate that portions of the postero-lateral temporal cortices contain domain-specific hubs for biological and/or complex motion processing independent of sensory-modality experience. Contrasting the two groups, the sighted listeners preferentially activated bilateral parietal plus medial and lateral frontal networks, whereas the blind listeners preferentially activated left anterior insula plus bilateral anterior calcarine and medial occipital regions, including what would otherwise have been visual-related cortex. These global-level network differences suggest that blind and sighted listeners may preferentially use different memory retrieval strategies when hearing and attempting to recognize action sounds.	\N	\N
21380858	Most ecologically natural sensory inputs are not limited to a single modality. While it is possible to use real ecological materials as experimental stimuli to investigate the neural basis of multi-sensory experience, parametric control of such tokens is limited. By using artificial bimodal stimuli composed of approximations to ecological signals, we aim to observe the interactions between putatively relevant stimulus attributes. Here we use MEG as an electrophysiological tool and employ as a measure the steady-state response (SSR), an experimental paradigm typically applied to unimodal signals. In this experiment we quantify the responses to a bimodal audio-visual signal with different degrees of temporal (phase) congruity, focusing on stimulus properties critical to audiovisual speech. An amplitude modulated auditory signal ('pseudo-speech') is paired with a radius-modulated ellipse ('pseudo-mouth'), with the envelope of low-frequency modulations occurring in phase or at offset phase values across modalities. We observe (i) that it is possible to elicit an SSR to bimodal signals; (ii) that bimodal signals exhibit greater response power than unimodal signals; and (iii) that the SSR power at specific harmonics and sensors differentially reflects the congruity between signal components. Importantly, we argue that effects found at the modulation frequency and second harmonic reflect differential aspects of neural coding of multisensory signals. The experimental paradigm facilitates a quantitative characterization of properties of multi-sensory speech and other bimodal computations.	\N	\N
21807011	In real-world settings, information from multiple sensory modalities is combined to form a complete, behaviorally salient percept - a process known as multisensory integration. While deficits in auditory and visual processing are often observed in schizophrenia, little is known about how multisensory integration is affected by the disorder. The present study examined auditory, visual, and combined audio-visual processing in schizophrenia patients using high-density electrical mapping. An ecologically relevant task was used to compare unisensory and multisensory evoked potentials from schizophrenia patients to potentials from healthy normal volunteers. Analysis of unisensory responses revealed a large decrease in the N100 component of the auditory-evoked potential, as well as early differences in the visual-evoked components in the schizophrenia group. Differences in early evoked responses to multisensory stimuli were also detected. Multisensory facilitation was assessed by comparing the sum of auditory and visual evoked responses to the audio-visual evoked response. Schizophrenia patients showed a significantly greater absolute magnitude response to audio-visual stimuli than to summed unisensory stimuli when compared to healthy volunteers, indicating significantly greater multisensory facilitation in the patient group. Behavioral responses also indicated increased facilitation from multisensory stimuli. The results represent the first report of increased multisensory facilitation in schizophrenia and suggest that, although unisensory deficits are present, compensatory mechanisms may exist under certain conditions that permit improved multisensory integration in individuals afflicted with the disorder.	\N	\N
21958655	Perceptual sensitivities are malleable via learning, even in adults. We trained adults to discriminate complex sounds (periodic, frequency-modulated sweep trains) using two different training procedures, and used psychoacoustic tests and evoked potential measures (the N1-P2 complex) to assess changes in both perceptual and neural sensitivities. Training took place either on a single day, or daily across eight days, and involved discrimination of pairs of stimuli using a single-interval, forced-choice task. In some participants, training started with dissimilar pairs that became progressively more similar across sessions, whereas in others training was constant, involving only one, highly similar, stimulus pair. Participants were better able to discriminate the complex sounds after training, particularly after progressive training, and the evoked potentials elicited by some of the sounds increased in amplitude following training. Significant amplitude changes were restricted to the P2 peak. Our findings indicate that changes in perceptual sensitivities parallel enhanced neural processing. These results are consistent with the proposal that changes in perceptual abilities arise from the brain's capacity to adaptively modify cortical representations of sensory stimuli, and that different training regimens can lead to differences in cortical sensitivities, even after relatively short periods of training.	\N	\N
22367585	In recent years, it has become evident that neural responses previously considered to be unisensory can be modulated by sensory input from other modalities. In this regard, visual neural activity elicited to viewing a face is strongly influenced by concurrent incoming auditory information, particularly speech. Here, we applied an additive-factors paradigm aimed at quantifying the impact that auditory speech has on visual event-related potentials (ERPs) elicited to visual speech. These multisensory interactions were measured across parametrically varied stimulus salience, quantified in terms of signal to noise, to provide novel insights into the neural mechanisms of audiovisual speech perception. First, we measured a monotonic increase of the amplitude of the visual P1-N1-P2 ERP complex during a spoken-word recognition task with increases in stimulus salience. ERP component amplitudes varied directly with stimulus salience for visual, audiovisual, and summed unisensory recordings. Second, we measured changes in multisensory gain across salience levels. During audiovisual speech, the P1 and P1-N1 components exhibited less multisensory gain relative to the summed unisensory components with reduced salience, while N1-P2 amplitude exhibited greater multisensory gain as salience was reduced, consistent with the principle of inverse effectiveness. The amplitude interactions were correlated with behavioral measures of multisensory gain across salience levels as measured by response times, suggesting that change in multisensory gain associated with unisensory salience modulations reflects an increased efficiency of visual speech processing.	\N	\N
22464943	Limited information is available on the relationship between antisocial personality disorder (ASPD) and early filtering, or gating, of information, even though this could contribute to the repeatedly reported impairment in ASPD of higher-order information processing. In order to investigate early filtering in ASPD, we compared electrophysiological measures of auditory sensory gating assessed by the paired-click paradigm in males with ASPD (n = 37) to healthy controls (n = 28). Stimulus encoding was measured by P50, N100, and P200 auditory evoked potentials; auditory sensory gating (ASG) was measured by a reduction in amplitude of evoked potentials following click repetition. Effects were studied of co-existing past alcohol or drug use disorders, ASPD symptom counts, and trait impulsivity. Controls and ASPD did not differ in P50, N100, or P200 amplitude or ASG. Past alcohol or drug use disorders had no effect. In controls, impulsivity related to improved P50 and P200 gating. In ASPD, P50 or N100 gating was impaired with more symptoms or increased impulsivity, respectively, suggesting impaired early filtering of irrelevant information. In controls the relationship between P50 and P200 gating and impulsivity was reversed, suggesting better gating with higher impulsivity scores. This could reflect different roles of ASG in behavioral regulation in controls versus ASPD.	\N	\N
22547804	Bilingualism profoundly affects the brain, yielding functional and structural changes in cortical regions dedicated to language processing and executive function [Crinion J, et al. (2006) Science 312:1537-1540; Kim KHS, et al. (1997) Nature 388:171-174]. Comparatively, musical training, another type of sensory enrichment, translates to expertise in cognitive processing and refined biological processing of sound in both cortical and subcortical structures. Therefore, we asked whether bilingualism can also promote experience-dependent plasticity in subcortical auditory processing. We found that adolescent bilinguals, listening to the speech syllable [da], encoded the stimulus more robustly than age-matched monolinguals. Specifically, bilinguals showed enhanced encoding of the fundamental frequency, a feature known to underlie pitch perception and grouping of auditory objects. This enhancement was associated with executive function advantages. Thus, through experience-related tuning of attention, the bilingual auditory system becomes highly efficient in automatically processing sound. This study provides biological evidence for system-wide neural plasticity in auditory experts that facilitates a tight coupling of sensory and cognitive functions.	\N	\N
22592306	Poor sensitivity to the interaural time difference (ITD) constrains the ability of human bilateral cochlear implant users to listen in everyday noisy acoustic environments. ITD sensitivity to periodic pulse trains degrades sharply with increasing pulse rate but can be restored at high pulse rates by jittering the interpulse intervals in a binaurally coherent manner (Laback and Majdak. Binaural jitter improves interaural time-difference sensitivity of cochlear implantees at high pulse rates. Proc Natl Acad Sci USA 105: 814-817, 2008). We investigated the neural basis of the jitter effect by recording from single inferior colliculus (IC) neurons in bilaterally implanted, anesthetized cats. Neural responses to trains of biphasic pulses were measured as a function of pulse rate, jitter, and ITD. An effect of jitter on neural responses was most prominent for pulse rates above 300 pulses/s. High-rate periodic trains evoked only an onset response in most IC neurons, but introducing jitter increased ongoing firing rates in about half of these neurons. Neurons that had sustained responses to jittered high-rate pulse trains showed ITD tuning comparable with that produced by low-rate periodic pulse trains. Thus, jitter appears to improve neural ITD sensitivity by restoring sustained firing in many IC neurons. The effect of jitter on IC responses is qualitatively consistent with human psychophysics. Action potentials tended to occur reproducibly at sparse, preferred times across repeated presentations of high-rate jittered pulse trains. Spike triggered averaging of responses to jittered pulse trains revealed that firing was triggered by very short interpulse intervals. This suggests it may be possible to restore ITD sensitivity to periodic carriers by simply inserting short interpulse intervals at select times.	\N	\N
22628458	Successful integration of auditory and visual inputs is crucial for both basic perceptual functions and for higher-order processes related to social cognition. Autism spectrum disorders (ASD) are characterized by impairments in social cognition and are associated with abnormalities in sensory and perceptual processes. Several groups have reported that individuals with ASD are impaired in their ability to integrate socially relevant audiovisual (AV) information, and it has been suggested that this contributes to the higher-order social and cognitive deficits observed in ASD. However, successful integration of auditory and visual inputs also influences detection and perception of nonsocial stimuli, and integration deficits may impair earlier stages of information processing, with cascading downstream effects. To assess the integrity of basic AV integration, we recorded high-density electrophysiology from a cohort of high-functioning children with ASD (7-16 years) while they performed a simple AV reaction time task. Children with ASD showed considerably less behavioral facilitation to multisensory inputs, deficits that were paralleled by less effective neural integration. Evidence for processing differences relative to typically developing children was seen as early as 100 ms poststimulation, and topographic analysis suggested that children with ASD relied on different cortical networks during this early multisensory processing stage.	\N	\N
22735387	Decreased blood hemoglobin (HbB) levels and anemia have been associated with abnormal brainstem auditory evoked responses (BAER). Lead (Pb) exposure has also been associated with anemia and aberrant BAER. This study investigated the relationship between HbB level and BAER wave latency and amplitude in Pb-exposed Andean children. Sixty-six children aged 2 to 15 years (mean age: 9.1; SD: 3.3) living in Pb-contaminated villages were screened for HbB levels, blood Pb (PbB) levels and BAER latencies and amplitudes. The mean HbB level observed in the study group was 11.9 g/dL (SD: 1.4; range: 8.6-14.8 g/dL). The mean HbB level corrected for altitude was 10.3g/dL (SD: 1.4; range: 6.9-13.1 g/dL), and suggestive of anemia. The mean PbB level was 49.3 μg/dL (SD: 30.1; range: 4.4-119.1 μg/dL) and indicative of Pb poisoning. Spearman rho correlation analyses revealed significant associations between the BAER absolute latencies and HbB level, indicating that as the HbB level decreased, the BAER wave latency increased. Children with low HbB levels (≤11 g/dL) showed significantly prolonged absolute latencies of waves I, II, III, IV and V compared to the children with normal HbB levels. Although a significant relationship between HbB and BAER waves was observed, no significant associations between PbB level and BAER parameters were found. Low hemoglobin levels may diminish auditory sensory-neural function, and is therefore an important variable to consider when assessing BAER in children with anemia and/or Pb exposure.	\N	\N
22773777	Many neurons adapt their spike output to accommodate the prevailing sensory environment. Although such adaptation is thought to improve coding of relevant stimulus features, the relationship between adaptation at the neural and behavioral levels remains to be established. Here we describe improved discrimination performance for an auditory spatial cue (interaural time differences, ITDs) following adaptation to stimulus statistics. Physiological recordings in the midbrain of anesthetized guinea pigs and measurement of discrimination performance in humans both demonstrate improved coding of the most prevalent ITDs in a distribution, but with highest accuracy maintained for ITDs corresponding to frontal locations, suggesting the existence of a fovea for auditory space. A biologically plausible model accounting for the physiological data suggests that neural tuning is stabilized by inhibition to maintain high discriminability for frontal locations. The data support the notion that adaptive coding in the midbrain is a key element of behaviorally efficient sound localization in dynamic acoustic environments.	\N	\N
22885999	Event-related potentials (ERPs) to tones that are self-initiated are reduced in their magnitude in comparison with ERPs to tones that are externally generated. This phenomenon has been taken as evidence for an efference copy of the motor command acting to suppress the sensory response. However, self-initiation provides a strong temporal cue for the stimulus which might also contribute to the ERP suppression for self-initiated tones. The current experiment sought to investigate the suppression of monaural tones by temporal cueing and also whether the addition of self-initiation enhanced this suppression. Lastly, the experiment sought to investigate the lateralisation of the ERP suppression via presenting these monaural tones to each ear respectively. We examined source waveforms extracted from the lateralised auditory cortices and measured the modulation of the N1 and P2 components by cueing and self-initiation. Self-initiation significantly reduced the amplitude of the N1 component. Temporal cueing without self-initiation significantly reduced the P2 component. There were no significant differences in the amplitude of either the N1 or the P2 between self-initiation and temporal cuing. There was a significant lateralisation effect on the N1-it being significantly larger contralateral to the ear of stimulation. There was no interaction between lateralisation and side of the temporal cue or side of self-initiation suggesting that the effects of self-initiation and temporal cuing are equal bilaterally. We conclude that a significant proportion of ERP suppression by self-initiation is a result of inherent temporal cueing.	\N	\N
23071654	Selectively attending to task-relevant sounds whilst ignoring background noise is one of the most amazing feats performed by the human brain. Here, we studied the underlying neural mechanisms by recording magnetoencephalographic (MEG) responses of 14 healthy human subjects while they performed a near-threshold auditory discrimination task vs. a visual control task of similar difficulty. The auditory stimuli consisted of notch-filtered continuous noise masker sounds, and of 1020-Hz target tones occasionally (p = 0.1) replacing 1000-Hz standard tones of 300-ms duration that were embedded at the center of the notches, the widths of which were parametrically varied. As a control for masker effects, tone-evoked responses were additionally recorded without masker sound. Selective attention to tones significantly increased the amplitude of the onset M100 response at ~100 ms to the standard tones during presence of the masker sounds especially with notches narrower than the critical band. Further, attention modulated sustained response most clearly at 300-400 ms time range from sound onset, with narrower notches than in case of the M100, thus selectively reducing the masker-induced suppression of the tone-evoked response. Our results show evidence of a multiple-stage filtering mechanism of sensory input in the human auditory cortex: 1) one at early (~100 ms) latencies bilaterally in posterior parts of the secondary auditory areas, and 2) adaptive filtering of attended sounds from task-irrelevant background masker at longer latency (~300 ms) in more medial auditory cortical regions, predominantly in the left hemisphere, enhancing processing of near-threshold sounds.	\N	\N
23145143	It has been previously demonstrated by our group that a visual stimulus made of dynamically changing luminance evokes an echo or reverberation at ~10 Hz, lasting up to a second. In this study we aimed to reveal whether similar echoes also exist in the auditory modality. A dynamically changing auditory stimulus equivalent to the visual stimulus was designed and employed in two separate series of experiments, and the presence of reverberations was analyzed based on reverse correlations between stimulus sequences and EEG epochs. The first experiment directly compared visual and auditory stimuli: while previous findings of ~10 Hz visual echoes were verified, no similar echo was found in the auditory modality regardless of frequency. In the second experiment, we tested if auditory sequences would influence the visual echoes when they were congruent or incongruent with the visual sequences. However, the results in that case similarly did not reveal any auditory echoes, nor any change in the characteristics of visual echoes as a function of audio-visual congruence. The negative findings from these experiments suggest that brain oscillations do not equivalently affect early sensory processes in the visual and auditory modalities, and that alpha (8-13 Hz) oscillations play a special role in vision.	\N	\N
23251704	Cognitive task demands in one sensory modality (T1) can have beneficial effects on a secondary task (T2) in a different modality, due to reduced top-down control needed to inhibit the secondary task, as well as crossmodal spread of attention. This contrasts findings of cognitive load compromising a secondary modality's processing. We manipulated cognitive load within one modality (visual) and studied the consequences of cognitive demands on secondary (auditory) processing. 15 healthy participants underwent a simultaneous EEG-fMRI experiment. Data from 8 participants were obtained outside the scanner for validation purposes. The primary task (T1) was to respond to a visual working memory (WM) task with four conditions, while the secondary task (T2) consisted of an auditory oddball stream, which participants were asked to ignore. The fMRI results revealed fronto-parietal WM network activations in response to T1 task manipulation. This was accompanied by significantly higher reaction times and lower hit rates with increasing task difficulty which confirmed successful manipulation of WM load. Amplitudes of auditory evoked potentials, representing fundamental auditory processing showed a continuous augmentation which demonstrated a systematic relation to cross-modal cognitive load. With increasing WM load, primary auditory cortices were increasingly deactivated while psychophysiological interaction results suggested the emergence of auditory cortices connectivity with visual WM regions. These results suggest differential effects of crossmodal attention on fundamental auditory processing. We suggest a continuous allocation of resources to brain regions processing primary tasks when challenging the central executive under high cognitive load.	\N	\N
23281832	If we initiate a sound by our own motor behavior, the N1 component of the auditory event-related brain potential (ERP) that the sound elicits is attenuated compared to the N1 elicited by the same sound when it is initiated externally. It has been suggested that this N1 suppression results from an internal predictive mechanism that is in the service of discriminating the sensory consequences of one's own actions from other sensory input. As the N1-suppression effect is becoming a popular approach to investigate predictive processing in cognitive and social neuroscience, it is important to exclude an alternative interpretation not related to prediction. According to the attentional account, the N1 suppression is due to a difference in the allocation of attention between self- and externally-initiated sounds. To test this hypothesis, we manipulated the allocation of attention to the sounds in different blocks: Attention was directed either to the sounds, to the own motor acts or to visual stimuli. If attention causes the N1-suppression effect, then manipulating attention should affect the effect for self-initiated sounds. We found N1 suppression in all conditions. The N1 per se was affected by attention, but there was no interaction between attention and self-initiation effects. This implies that self-initiation N1 effects are not caused by attention. The present results support the assumption that the N1-suppression effect for self-initiated sounds indicates the operation of an internal predictive mechanism. Furthermore, while attention had an influence on the N1a, N1b, and N1c components, the N1-suppression effect was confined to the N1b and N1c subcomponents suggesting that the major contribution to the auditory N1-suppression effect is circumscribed to late N1 components.	\N	\N
23301775	Using electrophysiology, we have examined two questions in relation to musical training - namely, whether it enhances sensory encoding of the human voice and whether it improves the ability to ignore irrelevant auditory change. Participants performed an auditory distraction task, in which they identified each sound as either short (350 ms) or long (550 ms) and ignored a change in timbre of the sounds. Sounds consisted of a male and a female voice saying a neutral sound [a], and of a cello and a French Horn playing an F3 note. In some blocks, musical sounds occurred on 80% of trials, while voice sounds on 20% of trials. In other blocks, the reverse was true. Participants heard naturally recorded sounds in half of experimental blocks and their spectrally-rotated versions in the other half. Regarding voice perception, we found that musicians had a larger N1 event-related potential component not only to vocal sounds but also to their never before heard spectrally-rotated versions. We therefore conclude that musical training is associated with a general improvement in the early neural encoding of complex sounds. Regarding the ability to ignore irrelevant auditory change, musicians' accuracy tended to suffer less from the change in timbre of the sounds, especially when deviants were musical notes. This behavioral finding was accompanied by a marginally larger re-orienting negativity in musicians, suggesting that their advantage may lie in a more efficient disengagement of attention from the distracting auditory dimension.	\N	\N
23316957	There is an accumulating body of evidence indicating that neuronal functional specificity to basic sensory stimulation is mutable and subject to experience. Although fMRI experiments have investigated changes in brain activity after relative to before perceptual learning, brain activity during perceptual learning has not been explored. This work investigated brain activity related to auditory frequency discrimination learning using a variational Bayesian approach for source localization, during simultaneous EEG and fMRI recording. We investigated whether the practice effects are determined solely by activity in stimulus-driven mechanisms or whether high-level attentional mechanisms, which are linked to the perceptual task, control the learning process. The results of fMRI analyses revealed significant attention and learning related activity in left and right superior temporal gyrus STG as well as the left inferior frontal gyrus IFG. Current source localization of simultaneously recorded EEG data was estimated using a variational Bayesian method. Analysis of current localized to the left inferior frontal gyrus and the right superior temporal gyrus revealed gamma band activity correlated with behavioral performance. Rapid improvement in task performance is accompanied by plastic changes in the sensory cortex as well as superior areas gated by selective attention. Together the fMRI and EEG results suggest that gamma band activity in the right STG and left IFG plays an important role during perceptual learning.	\N	\N
23326548	Auditory distraction is a failure to maintain focus on a stream of sounds. We investigated the neural correlates of distraction in a selective-listening pitch-discrimination task with high (competing speech) or low (white noise) distraction. High-distraction impaired performance and reduced the N1 peak of the auditory Event-Related Potential evoked by probe tones. In a series of simulations, we explored two theories to account for this effect: disruption of sensory gain or a disruption of inter-trial phase consistency. When compared to these simulations, our data were consistent with both effects of distraction. Distraction reduced the gain of the auditory evoked potential and disrupted the inter-trial phase consistency with which the brain responds to stimulus events. Tones at a non-target, unattended frequency were more susceptible to the effects of distraction than tones within an attended frequency band.	\N	\N
23616340	Previous studies on crossmodal spatial orienting typically used simple and stereotyped stimuli in the absence of any meaningful context. This study combined computational models, behavioural measures and functional magnetic resonance imaging to investigate audiovisual spatial interactions in naturalistic settings. We created short videos portraying everyday life situations that included a lateralised visual event and a co-occurring sound, either on the same or on the opposite side of space. Subjects viewed the videos with or without eye-movements allowed (overt or covert orienting). For each video, visual and auditory saliency maps were used to index the strength of stimulus-driven signals, and eye-movements were used as a measure of the efficacy of the audiovisual events for spatial orienting. Results showed that visual salience modulated activity in higher-order visual areas, whereas auditory salience modulated activity in the superior temporal cortex. Auditory salience modulated activity also in the posterior parietal cortex, but only when audiovisual stimuli occurred on the same side of space (multisensory spatial congruence). Orienting efficacy affected activity in the visual cortex, within the same regions modulated by visual salience. These patterns of activation were comparable in overt and covert orienting conditions. Our results demonstrate that, during viewing of complex multisensory stimuli, activity in sensory areas reflects both stimulus-driven signals and their efficacy for spatial orienting; and that the posterior parietal cortex combines spatial information about the visual and the auditory modality.	\N	\N
23624493	Findings in animal models demonstrate that activity within hierarchically early sensory cortical regions can be modulated by cross-sensory inputs through resetting of the phase of ongoing intrinsic neural oscillations. Here, subdural recordings evaluated whether phase resetting by auditory inputs would impact multisensory integration processes in human visual cortex. Results clearly showed auditory-driven phase reset in visual cortices and, in some cases, frank auditory event-related potentials (ERP) were also observed over these regions. Further, when audiovisual bisensory stimuli were presented, this led to robust multisensory integration effects which were observed in both the ERP and in measures of phase concentration. These results extend findings from animal models to human visual cortices, and highlight the impact of cross-sensory phase resetting by a non-primary stimulus on multisensory integration in ostensibly unisensory cortices.	\N	\N
23647558	In the visual modality, perceptual demand on a goal-directed task has been shown to modulate the extent to which irrelevant information can be disregarded at a sensory-perceptual stage of processing. In the auditory modality, the effect of perceptual demand on neural representations of task-irrelevant sounds is unclear. We compared simultaneous ERPs and fMRI responses associated with task-irrelevant sounds across parametrically modulated perceptual task demands in a dichotic-listening paradigm. Participants performed a signal detection task in one ear (Attend ear) while ignoring task-irrelevant syllable sounds in the other ear (Ignore ear). Results revealed modulation of syllable processing by auditory perceptual demand in an ROI in middle left superior temporal gyrus and in negative ERP activity 130-230 msec post stimulus onset. Increasing the perceptual demand in the Attend ear was associated with a reduced neural response in both fMRI and ERP to task-irrelevant sounds. These findings are in support of a selection model whereby ongoing perceptual demands modulate task-irrelevant sound processing in auditory cortex.	\N	\N
23664703	Both faces and voices are rich in socially-relevant information, which humans are remarkably adept at extracting, including a person's identity, age, gender, affective state, personality, etc. Here, we review accumulating evidence from behavioral, neuropsychological, electrophysiological, and neuroimaging studies which suggest that the cognitive and neural processing mechanisms engaged by perceiving faces or voices are highly similar, despite the very different nature of their sensory input. The similarity between the two mechanisms likely facilitates the multi-modal integration of facial and vocal information during everyday social interactions. These findings emphasize a parsimonious principle of cerebral organization, where similar computational problems in different modalities are solved using similar solutions.	\N	\N
23827717	Learning and maintaining the sounds we use in vocal communication require accurate perception of the sounds we hear performed by others and feedback-dependent imitation of those sounds to produce our own vocalizations. Understanding how the central nervous system integrates auditory and vocal-motor information to enable communication is a fundamental goal of systems neuroscience, and insights into the mechanisms of those processes will profoundly enhance clinical therapies for communication disorders. Gaining the high-resolution insight necessary to define the circuits and cellular mechanisms underlying human vocal communication is presently impractical. Songbirds are the best animal model of human speech, and this review highlights recent insights into the neural basis of auditory perception and feedback-dependent imitation in those animals. Neural correlates of song perception are present in auditory areas, and those correlates are preserved in the auditory responses of downstream neurons that are also active when the bird sings. Initial tests indicate that singing-related activity in those downstream neurons is associated with vocal-motor performance as opposed to the bird simply hearing itself sing. Therefore, action potentials related to auditory perception and action potentials related to vocal performance are co-localized in individual neurons. Conceptual models of song learning involve comparison of vocal commands and the associated auditory feedback to compute an error signal that is used to guide refinement of subsequent song performances, yet the sites of that comparison remain unknown. Convergence of sensory and motor activity onto individual neurons points to a possible mechanism through which auditory and vocal-motor signals may be linked to enable learning and maintenance of the sounds used in vocal communication. This article is part of a Special Issue entitled "Communication Sounds and the Brain: New Directions and Perspectives".	\N	\N
23988583	Experience-dependent characteristics of auditory function, especially with regard to speech-evoked auditory neurophysiology, have garnered increasing attention in recent years. This interest stems from both pragmatic and theoretical concerns as it bears implications for the prevention and remediation of language-based learning impairment in addition to providing insight into mechanisms engendering experience-dependent changes in human sensory function. Musicians provide an attractive model for studying the experience-dependency of auditory processing in humans due to their distinctive neural enhancements compared to nonmusicians. We have only recently begun to address whether these enhancements are observable early in life, during the initial years of music training when the auditory system is under rapid development, as well as later in life, after the onset of the aging process. Here we review neural enhancements in musically trained individuals across the life span in the context of cellular mechanisms that underlie learning, identified in animal models. Musicians' subcortical physiologic enhancements are interpreted according to a cognitive framework for auditory learning, providing a model in which to study mechanisms of experience-dependent changes in human auditory function.	\N	\N
24072639	Neurobiological underpinnings of unusual sensory features in individuals with autism are unknown. Event-related potentials elicited by task-irrelevant sounds were used to elucidate neural correlates of auditory processing and associations with three common sensory response patterns (hyperresponsiveness; hyporesponsiveness; sensory seeking). Twenty-eight children with autism and 39 typically developing children (4-12 year-olds) completed an auditory oddball paradigm. Results revealed marginally attenuated P1 and N2 to standard tones and attenuated P3a to novel sounds in autism versus controls. Exploratory analyses suggested that within the autism group, attenuated N2 and P3a amplitudes were associated with greater sensory seeking behaviors for specific ranges of P1 responses. Findings suggest that attenuated early sensory as well as later attention-orienting neural responses to stimuli may underlie selective sensory features via complex mechanisms.	\N	\N
24119225	Humans are able to extract regularities from complex auditory scenes in order to form perceptually meaningful elements. It has been shown previously that this process depends critically on both the temporal integration of the sensory input over time and the degree of frequency separation between concurrent sound sources. Our goal was to examine the relationship between these two aspects by means of magnetoencephalography (MEG). To achieve this aim, we combined time-frequency analysis on a sensor space level with source analysis. Our paradigm consisted of asymmetric ABA-tone triplets wherein the B-tones were presented temporally closer to the first A-tones, providing different tempi within the same sequence. Participants attended to the slowest B-rhythm whilst the frequency separation between tones was manipulated (0-, 2-, 4- and 10-semitones). The results revealed that the asymmetric ABA-triplets spontaneously elicited periodic-sustained responses corresponding to the temporal distribution of the A-B and B-A tone intervals in all conditions. Moreover, when attending to the B-tones, the neural representations of the A- and B-streams were both detectable in the scenarios which allow perceptual streaming (2-, 4- and 10-semitones). Alongside this, the steady-state responses tuned to the presentation of the B-tones enhanced significantly with increase of the frequency separation between tones. However, the strength of the B-tones related steady-state responses dominated the strength of the A-tones responses in the 10-semitones condition. Conversely, the representation of the A-tones dominated the B-tones in the cases of 2- and 4-semitones conditions, in which a greater effort was required for completing the task. Additionally, the P1 evoked fields' component following the B-tones increased in magnitude with the increase of inter-tonal frequency difference. The enhancement of the evoked fields in the source space, along with the B-tones related activity of the time-frequency results, likely reflect the selective enhancement of the attended B-stream. The results also suggested a dissimilar efficiency of the temporal integration of separate streams depending on the degree of frequency separation between the sounds. Overall, the present findings suggest that the neural effects of auditory streaming could be directly captured in the time-frequency spectrum at the sensor-space level.	\N	\N
24314010	Auditory perceptual learning persistently modifies neural networks in the central nervous system. Central auditory processing comprises a hierarchy of sound analysis and integration, which transforms an acoustical signal into a meaningful object for perception. Based on latencies and source locations of auditory evoked responses, we investigated which stage of central processing undergoes neuroplastic changes when gaining auditory experience during passive listening and active perceptual training. Young healthy volunteers participated in a five-day training program to identify two pre-voiced versions of the stop-consonant syllable 'ba', which is an unusual speech sound to English listeners. Magnetoencephalographic (MEG) brain responses were recorded during two pre-training and one post-training sessions. Underlying cortical sources were localized, and the temporal dynamics of auditory evoked responses were analyzed. After both passive listening and active training, the amplitude of the P2m wave with latency of 200 ms increased considerably. By this latency, the integration of stimulus features into an auditory object for further conscious perception is considered to be complete. Therefore the P2m changes were discussed in the light of auditory object representation. Moreover, P2m sources were localized in anterior auditory association cortex, which is part of the antero-ventral pathway for object identification. The amplitude of the earlier N1m wave, which is related to processing of sensory information, did not change over the time course of the study. The P2m amplitude increase and its persistence over time constitute a neuroplastic change. The P2m gain likely reflects enhanced object representation after stimulus experience and training, which enables listeners to improve their ability for scrutinizing fine differences in pre-voicing time. Different trajectories of brain and behaviour changes suggest that the preceding effect of a P2m increase relates to brain processes, which are necessary precursors of perceptual learning. Cautious discussion is required when interpreting the finding of a P2 amplitude increase between recordings before and after training and learning.	\N	\N
24423729	Across the animal kingdom, sensations resulting from an animal's own actions are processed differently from sensations resulting from external sources, with self-generated sensations being suppressed. A forward model has been proposed to explain this process across sensorimotor domains. During vocalization, reduced processing of one's own speech is believed to result from a comparison of speech sounds to corollary discharges of intended speech production generated from efference copies of commands to speak. Until now, anatomical and functional evidence validating this model in humans has been indirect. Using EEG with anatomical MRI to facilitate source localization, we demonstrate that inferior frontal gyrus activity during the 300ms before speaking was associated with suppressed processing of speech sounds in auditory cortex around 100ms after speech onset (N1). These findings indicate that an efference copy from speech areas in prefrontal cortex is transmitted to auditory cortex, where it is used to suppress processing of anticipated speech sounds. About 100ms after N1, a subsequent auditory cortical component (P2) was not suppressed during talking. The combined N1 and P2 effects suggest that although sensory processing is suppressed as reflected in N1, perceptual gaps may be filled as reflected in the lack of P2 suppression, explaining the discrepancy between sensory suppression and preserved sensory experiences. These findings, coupled with the coherence between relevant brain regions before and during speech, provide new mechanistic understanding of the complex interactions between action planning and sensory processing that provide for differentiated tagging and monitoring of one's own speech, processes disrupted in neuropsychiatric disorders.	\N	\N
24695717	Detecting the location of salient sounds in the environment rests on the brain's ability to use differences in sounds arriving at both ears. Functional neuroimaging studies in humans indicate that the left and right auditory hemispaces are coded asymmetrically, with a rightward attentional bias that reflects spatial attention in vision. Neuropsychological observations in patients with spatial neglect have led to the formulation of two competing models: the orientation bias and right-hemisphere dominance models. The orientation bias model posits a symmetrical mapping between one side of the sensorium and the contralateral hemisphere, with mutual inhibition of the ipsilateral hemisphere. The right-hemisphere dominance model introduces a functional asymmetry in the brain's coding of space: the left hemisphere represents the right side, whereas the right hemisphere represents both sides of the sensorium. We used Dynamic Causal Modeling of effective connectivity and Bayesian model comparison to adjudicate between these alternative network architectures, based on human electroencephalographic data acquired during an auditory location oddball paradigm. Our results support a hemispheric asymmetry in a frontoparietal network that conforms to the right-hemisphere dominance model. We show that, within this frontoparietal network, forward connectivity increases selectively in the hemisphere contralateral to the side of sensory stimulation. We interpret this finding in light of hierarchical predictive coding as a selective increase in attentional gain, which is mediated by feedforward connections that carry precision-weighted prediction errors during perceptual inference. This finding supports the disconnection hypothesis of unilateral neglect and has implications for theories of its etiology.	\N	\N
24956028	Listening situations with multiple talkers or background noise are common in everyday communication and are particularly demanding for older adults. Here we review current research on auditory perception in aging individuals in order to gain insights into the challenges of listening under noisy conditions. Informationally rich temporal structure in auditory signals--over a range of time scales from milliseconds to seconds--renders temporal processing central to perception in the auditory domain. We discuss the role of temporal structure in auditory processing, in particular from a perspective relevant for hearing in background noise, and focusing on sensory memory, auditory scene analysis, and speech perception. Interestingly, these auditory processes, usually studied in an independent manner, show considerable overlap of processing time scales, even though each has its own 'privileged' temporal regimes. By integrating perspectives on temporal structure processing in these three areas of investigation, we aim to highlight similarities typically not recognized.	\N	\N
25245785	Atypical processing and integration of sensory inputs are hypothesized to play a role in unusual sensory reactions and social-cognitive deficits in autism spectrum disorder (ASD). Reports on the relationship between objective metrics of sensory processing and clinical symptoms, however, are surprisingly sparse. Here we examined the relationship between neurophysiological assays of sensory processing and (1) autism severity and (2) sensory sensitivities, in individuals with ASD aged 6-17. Multiple linear regression indicated significant associations between neural markers of auditory processing and multisensory integration, and autism severity. No such relationships were apparent for clinical measures of visual/auditory sensitivities. These data support that aberrant early sensory processing contributes to autism symptoms, and reveal the potential of electrophysiology to objectively subtype autism.	\N	\N
25259525	Auditory selective attention plays an essential role for identifying sounds of interest in a scene, but the neural underpinnings are still incompletely understood. Recent findings demonstrate that neural activity that is time-locked to a particular amplitude-modulation (AM) is enhanced in the auditory cortex when the modulated stream of sounds is selectively attended to under sensory competition with other streams. However, the target sounds used in the previous studies differed not only in their AM, but also in other sound features, such as carrier frequency or location. Thus, it remains uncertain whether the observed enhancements reflect AM-selective attention. The present study aims at dissociating the effect of AM frequency on response enhancement in auditory cortex by using an ongoing auditory stimulus that contains two competing targets differing exclusively in their AM frequency. Electroencephalography results showed a sustained response enhancement for auditory attention compared to visual attention, but not for AM-selective attention (attended AM frequency vs. ignored AM frequency). In contrast, the response to the ignored AM frequency was enhanced, although a brief trend toward response enhancement occurred during the initial 15 s. Together with the previous findings, these observations indicate that selective enhancement of attended AMs in auditory cortex is adaptive under sustained AM-selective attention. This finding has implications for our understanding of cortical mechanisms for feature-based attentional gain control.	\N	\N
25535356	When social animals communicate, the onset of informative content in one modality varies considerably relative to the other, such as when visual orofacial movements precede a vocalization. These naturally occurring asynchronies do not disrupt intelligibility or perceptual coherence. However, they occur on time scales where they likely affect integrative neuronal activity in ways that have remained unclear, especially for hierarchically downstream regions in which neurons exhibit temporally imprecise but highly selective responses to communication signals. To address this, we exploited naturally occurring face- and voice-onset asynchronies in primate vocalizations. Using these as stimuli we recorded cortical oscillations and neuronal spiking responses from functional MRI (fMRI)-localized voice-sensitive cortex in the anterior temporal lobe of macaques. We show that the onset of the visual face stimulus resets the phase of low-frequency oscillations, and that the face-voice asynchrony affects the prominence of two key types of neuronal multisensory responses: enhancement or suppression. Our findings show a three-way association between temporal delays in audiovisual communication signals, phase-resetting of ongoing oscillations, and the sign of multisensory responses. The results reveal how natural onset asynchronies in cross-sensory inputs regulate network oscillations and neuronal excitability in the voice-sensitive cortex of macaques, a suggested animal model for human voice areas. These findings also advance predictions on the impact of multisensory input on neuronal processes in face areas and other brain regions.	\N	\N
25948269	Sensory processing involves identification of stimulus features, but also integration with the surrounding sensory and cognitive context. Previous work in animals and humans has shown fine-scale sensitivity to context in the form of learned knowledge about the statistics of the sensory environment, including relative probabilities of discrete units in a stream of sequential auditory input. These statistics are a defining characteristic of one of the most important sequential signals humans encounter: speech. For speech, extensive exposure to a language tunes listeners to the statistics of sound sequences. To address how speech sequence statistics are neurally encoded, we used high-resolution direct cortical recordings from human lateral superior temporal cortex as subjects listened to words and nonwords with varying transition probabilities between sound segments. In addition to their sensitivity to acoustic features (including contextual features, such as coarticulation), we found that neural responses dynamically encoded the language-level probability of both preceding and upcoming speech sounds. Transition probability first negatively modulated neural responses, followed by positive modulation of neural responses, consistent with coordinated predictive and retrospective recognition processes, respectively. Furthermore, transition probability encoding was different for real English words compared with nonwords, providing evidence for online interactions with high-order linguistic knowledge. These results demonstrate that sensory processing of deeply learned stimuli involves integrating physical stimulus features with their contextual sequential structure. Despite not being consciously aware of phoneme sequence statistics, listeners use this information to process spoken input and to link low-level acoustic representations with linguistic information about word identity and meaning.	\N	\N
22803512	Existing 67-channel event-related potentials, obtained during recognition and working memory paradigms with words or faces, were used to examine early visual processing in schizophrenia patients prone to auditory hallucinations (AH, n = 26) or not (NH, n = 49) and healthy controls (HC, n = 46). Current source density (CSD) transforms revealed distinct, strongly left- (words) or right-lateralized (faces; N170) inferior-temporal N1 sinks (150 ms) in each group. N1 was quantified by temporal PCA of peak-adjusted CSDs. For words and faces in both paradigms, N1 was substantially reduced in AH compared with NH and HC, who did not differ from each other. The difference in N1 between AH and NH was not due to overall symptom severity or performance accuracy, with both groups showing comparable memory deficits. Our findings extend prior reports of reduced auditory N1 in AH, suggesting a broader early perceptual integration deficit that is not limited to the auditory modality.	\N	\N
25231612	Atypical medial olivocochlear (MOC) feedback from brain stem to cochlea has been proposed to play a role in tinnitus, but even well-constructed tests of this idea have yielded inconsistent results. In the present study, it was hypothesized that low sound tolerance (mild to moderate hyperacusis), which can accompany tinnitus or occur on its own, might contribute to the inconsistency. Sound-level tolerance (SLT) was assessed in subjects (all men) with clinically normal or near-normal thresholds to form threshold-, age-, and sex-matched groups: 1) no tinnitus/high SLT, 2) no tinnitus/low SLT, 3) tinnitus/high SLT, and 4) tinnitus/low SLT. MOC function was measured from the ear canal as the change in magnitude of distortion-product otoacoustic emissions (DPOAE) elicited by broadband noise presented to the contralateral ear. The noise reduced DPOAE magnitude in all groups ("contralateral suppression"), but significantly more reduction occurred in groups with tinnitus and/or low SLT, indicating hyperresponsiveness of the MOC system compared with the group with no tinnitus/high SLT. The results suggest hyperresponsiveness of the interneurons of the MOC system residing in the cochlear nucleus and/or MOC neurons themselves. The present data, combined with previous human and animal data, indicate that neural pathways involving every major division of the cochlear nucleus manifest hyperactivity and/or hyperresponsiveness in tinnitus and/or low SLT. The overactivation may develop in each pathway separately. However, a more parsimonious hypothesis is that top-down neuromodulation is the driving force behind ubiquitous overactivation of the auditory brain stem and may correspond to attentional spotlighting on the auditory domain in tinnitus and hyperacusis.	\N	\N
21909378	Speech is the most important form of human communication but ambient sounds and competing talkers often degrade its acoustics. Fortunately the brain can use visual information, especially its highly precise spatial information, to improve speech comprehension in noisy environments. Previous studies have demonstrated that audiovisual integration depends strongly on spatiotemporal factors. However, some integrative phenomena such as McGurk interference persist even with gross spatial disparities, suggesting that spatial alignment is not necessary for robust integration of audiovisual place-of-articulation cues. It is therefore unclear how speech-cues interact with audiovisual spatial integration mechanisms. Here, we combine two well established psychophysical phenomena, the McGurk effect and the ventriloquist's illusion, to explore this dependency. Our results demonstrate that conflicting spatial cues may not interfere with audiovisual integration of speech, but conflicting speech-cues can impede integration in space. This suggests a direct but asymmetrical influence between ventral 'what' and dorsal 'where' pathways.	\N	\N
22390292	Human multisensory systems are known to bind inputs from the different sensory modalities into a unified percept, a process that leads to measurable behavioral benefits. This integrative process can be observed through multisensory illusions, including the McGurk effect and the sound-induced flash illusion, both of which demonstrate the ability of one sensory modality to modulate perception in a second modality. Such multisensory integration is highly dependent upon the temporal relationship of the different sensory inputs, with perceptual binding occurring within a limited range of asynchronies known as the temporal binding window (TBW). Previous studies have shown that this window is highly variable across individuals, but it is unclear how these variations in the TBW relate to an individual's ability to integrate multisensory cues. Here we provide evidence linking individual differences in multisensory temporal processes to differences in the individual's audiovisual integration of illusory stimuli. Our data provide strong evidence that the temporal processing of multiple sensory signals and the merging of multiple signals into a single, unified perception, are highly related. Specifically, the width of right side of an individuals' TBW, where the auditory stimulus follows the visual, is significantly correlated with the strength of illusory percepts, as indexed via both an increase in the strength of binding synchronous sensory signals and in an improvement in correctly dissociating asynchronous signals. These findings are discussed in terms of their possible neurobiological basis, relevance to the development of sensory integration, and possible importance for clinical conditions in which there is growing evidence that multisensory integration is compromised.	\N	\N
23664001	The sight and sound of a person speaking or a ball bouncing may seem simultaneous, but their corresponding neural signals are spread out over time as they arrive at different multisensory brain sites. How subjective timing relates to such neural timing remains a fundamental neuroscientific and philosophical puzzle. A dominant assumption is that temporal coherence is achieved by sensory resynchronisation or recalibration across asynchronous brain events. This assumption is easily confirmed by estimating subjective audiovisual timing for groups of subjects, which is on average similar across different measures and stimuli, and approximately veridical. But few studies have examined normal and pathological individual differences in such measures. Case PH, with lesions in pons and basal ganglia, hears people speak before seeing their lips move. Temporal order judgements (TOJs) confirmed this: voices had to lag lip-movements (by ∼200 msec) to seem synchronous to PH. Curiously, voices had to lead lips (also by ∼200 msec) to maximise the McGurk illusion (a measure of audiovisual speech integration). On average across these measures, PH's timing was therefore still veridical. Age-matched control participants showed similar discrepancies. Indeed, normal individual differences in TOJ and McGurk timing correlated negatively: subjects needing an auditory lag for subjective simultaneity needed an auditory lead for maximal McGurk, and vice versa. This generalised to the Stream-Bounce illusion. Such surprising antagonism seems opposed to good sensory resynchronisation, yet average timing across tasks was still near-veridical. Our findings reveal remarkable disunity of audiovisual timing within and between subjects. To explain this we propose that the timing of audiovisual signals within different brain mechanisms is perceived relative to the average timing across mechanisms. Such renormalisation fully explains the curious antagonistic relationship between disparate timing estimates in PH and healthy participants, and how they can still perceive the timing of external events correctly, on average.	\N	\N
24974346	We investigated whether visual speech fills in non-intact auditory speech (excised consonant onsets) in typically developing children from 4 to 14 years of age. Stimuli with the excised auditory onsets were presented in the audiovisual (AV) and auditory-only (AO) modes. A visual speech fill-in effect occurs when listeners experience hearing the same non-intact auditory stimulus (e.g., /-b/ag) as different depending on the presence/absence of visual speech such as hearing /bag/ in the AV mode but hearing /ag/ in the AO mode. We quantified the visual speech fill-in effect by the difference in the number of correct consonant onset responses between the modes. We found that easy visual speech cues /b/ provided greater filling in than difficult cues /g/. Only older children benefited from difficult visual speech cues, whereas all children benefited from easy visual speech cues, although 4- and 5-year-olds did not benefit as much as older children. To explore task demands, we compared results on our new task with those on the McGurk task. The influence of visual speech was uniquely associated with age and vocabulary abilities for the visual speech fill--in effect but was uniquely associated with speechreading skills for the McGurk effect. This dissociation implies that visual speech--as processed by children-is a complicated and multifaceted phenomenon underpinned by heterogeneous abilities. These results emphasize that children perceive a speaker's utterance rather than the auditory stimulus per se. In children, as in adults, there is more to speech perception than meets the ear.	\N	\N
24996043	The brain improves speech processing through the integration of audiovisual (AV) signals. Situations involving AV speech integration may be crudely dichotomized into those where auditory and visual inputs contain (1) equivalent, complementary signals (validating AV speech) or (2) inconsistent, different signals (conflicting AV speech). This simple framework may allow the systematic examination of broad commonalities and differences between AV neural processes engaged by various experimental paradigms frequently used to study AV speech integration. We conducted an activation likelihood estimation metaanalysis of 22 functional imaging studies comprising 33 experiments, 311 subjects, and 347 foci examining "conflicting" versus "validating" AV speech. Experimental paradigms included content congruency, timing synchrony, and perceptual measures, such as the McGurk effect or synchrony judgments, across AV speech stimulus types (sublexical to sentence). Colocalization of conflicting AV speech experiments revealed consistency across at least two contrast types (e.g., synchrony and congruency) in a network of dorsal stream regions in the frontal, parietal, and temporal lobes. There was consistency across all contrast types (synchrony, congruency, and percept) in the bilateral posterior superior/middle temporal cortex. Although fewer studies were available, validating AV speech experiments were localized to other regions, such as ventral stream visual areas in the occipital and inferior temporal cortex. These results suggest that while equivalent, complementary AV speech signals may evoke activity in regions related to the corroboration of sensory input, conflicting AV speech signals recruit widespread dorsal stream areas likely involved in the resolution of conflicting sensory signals.	\N	\N
20541597	Auditory perceptual 'restoration' occurs when the auditory system restores an occluded or masked sound of interest. Behavioral work on auditory restoration in humans began over 50 years ago using it to model a noisy environmental scene with competing sounds. It has become clear that not only humans experience auditory restoration: restoration has been broadly conserved in many species. Behavioral studies in humans and animals provide a necessary foundation to link the insights being obtained from human EEG and fMRI to those from animal neurophysiology. The aggregate of data resulting from multiple approaches across species has begun to clarify the neuronal bases of auditory restoration. Different types of neural responses supporting restoration have been found, supportive of multiple mechanisms working within a species. Yet a general principle has emerged that responses correlated with restoration mimic the response that would have been given to the uninterrupted sound of interest. Using the same technology to study different species will help us to better harness animal models of 'auditory scene analysis' to clarify the conserved neural mechanisms shaping the perceptual organization of sound and to advance strategies to improve hearing in natural environmental settings.	\N	\N
20826671	Processing of complex acoustic scenes depends critically on the temporal integration of sensory information as sounds evolve naturally over time. It has been previously speculated that this process is guided by both innate mechanisms of temporal processing in the auditory system, as well as top-down mechanisms of attention and possibly other schema-based processes. In an effort to unravel the neural underpinnings of these processes and their role in scene analysis, we combine magnetoencephalography (MEG) with behavioral measures in humans in the context of polyrhythmic tone sequences. While maintaining unchanged sensory input, we manipulate subjects' attention to one of two competing rhythmic streams in the same sequence. The results reveal that the neural representation of the attended rhythm is significantly enhanced in both its steady-state power and spatial phase coherence relative to its unattended state, closely correlating with its perceptual detectability for each listener. Interestingly, the data reveal a differential efficiency of rhythmic rates of the order of few hertz during the streaming process, closely following known neural and behavioral measures of temporal modulation sensitivity in the auditory system. These findings establish a direct link between known temporal modulation tuning in the auditory system (particularly at the level of auditory cortex) and the temporal integration of perceptual features in a complex acoustic scene, while mediated by processes of attention.	\N	\N
20844143	Segregation of concurrent sounds in complex acoustic environments is a fundamental feature of auditory scene analysis. A powerful cue used by the auditory system to segregate concurrent sounds, such as speakers' voices at a cocktail party, is inharmonicity. This can be demonstrated when a component of a harmonic complex tone is perceived as a separate tone "popping out" from the complex as a whole when it is sufficiently mistuned from its harmonic value. The neural bases of perceptual "pop out" of mistuned harmonics are unclear. We recorded multiunit activity from primary auditory cortex (A1) of behaving monkeys elicited by harmonic complex tones that were either "in tune" or that contained a mistuned third harmonic set at the best frequency of the neural populations. Responses to mistuned sounds were enhanced relative to responses to "in-tune" sounds, thus correlating with the enhanced perceptual salience of the mistuned component. Consistent with human psychophysics of "pop out," response enhancements increased with the degree of mistuning, were maximal for neural populations tuned to the frequency of the mistuned component, and were not observed under comparable stimulus conditions that do not elicit perceptual "pop out." Mistuning was also associated with changes in neuronal temporal response patterns phase locked to "beats" in the stimuli. Intracortical auditory evoked potentials paralleled noninvasive neurophysiological correlates of perceptual "pop out" in humans, further augmenting the translational relevance of the results. Findings suggest two complementary neural mechanisms for "pop out," based on the detection of local differences in activation level or coherence of temporal response patterns across A1.	\N	\N
20975559	Analysis of the auditory environment, source identification and vocal communication all require efficient brain mechanisms for disambiguating, representing and understanding complex natural sounds as 'auditory objects'. Failure of these mechanisms leads to a diverse spectrum of clinical deficits. Here we review current evidence concerning the phenomenology, mechanisms and brain substrates of auditory agnosias and related disorders of auditory object processing. Analysis of lesions causing auditory object deficits has revealed certain broad anatomical correlations: deficient parsing of the auditory scene is associated with lesions involving the parieto-temporal junction, while selective disorders of sound recognition occur with more anterior temporal lobe or extra-temporal damage. Distributed neural networks have been increasingly implicated in the pathogenesis of such disorders as developmental dyslexia, congenital amusia and tinnitus. Auditory category deficits may arise from defective interaction of spectrotemporal encoding and executive and mnestic processes. Dedicated brain mechanisms are likely to process specialized sound objects such as voices and melodies. Emerging empirical evidence suggests a clinically relevant, hierarchical and modular neuropsychological model of auditory object processing that provides a framework for understanding auditory agnosias and makes specific predictions to direct future work.	\N	\N
21196054	Humans and other animals can attend to one of multiple sounds and follow it selectively over time. The neural underpinnings of this perceptual feat remain mysterious. Some studies have concluded that sounds are heard as separate streams when they activate well-separated populations of central auditory neurons, and that this process is largely pre-attentive. Here, we argue instead that stream formation depends primarily on temporal coherence between responses that encode various features of a sound source. Furthermore, we postulate that only when attention is directed towards a particular feature (e.g. pitch) do all other temporally coherent features of that source (e.g. timbre and location) become bound together as a stream that is segregated from the incoherent features of other sources.	\N	\N
21209201	Auditory figure-ground segregation, listeners' ability to selectively hear out a sound of interest from a background of competing sounds, is a fundamental aspect of scene analysis. In contrast to the disordered acoustic environment we experience during everyday listening, most studies of auditory segregation have used relatively simple, temporally regular signals. We developed a new figure-ground stimulus that incorporates stochastic variation of the figure and background that captures the rich spectrotemporal complexity of natural acoustic scenes. Figure and background signals overlap in spectrotemporal space, but vary in the statistics of fluctuation, such that the only way to extract the figure is by integrating the patterns over time and frequency. Our behavioral results demonstrate that human listeners are remarkably sensitive to the appearance of such figures. In a functional magnetic resonance imaging experiment, aimed at investigating preattentive, stimulus-driven, auditory segregation mechanisms, naive subjects listened to these stimuli while performing an irrelevant task. Results demonstrate significant activations in the intraparietal sulcus (IPS) and the superior temporal sulcus related to bottom-up, stimulus-driven figure-ground decomposition. We did not observe any significant activation in the primary auditory cortex. Our results support a role for automatic, bottom-up mechanisms in the IPS in mediating stimulus-driven, auditory figure-ground segregation, which is consistent with accumulating evidence implicating the IPS in structuring sensory input and perceptual organization.	\N	\N
21355664	The effect of context on the identification of common environmental sounds (e.g., dogs barking or cars honking) was tested by embedding them in familiar auditory background scenes (street ambience, restaurants). Initial results with subjects trained on both the scenes and the sounds to be identified showed a significant advantage of about five percentage points better accuracy for sounds that were contextually incongruous with the background scene (e.g., a rooster crowing in a hospital). Further studies with naive (untrained) listeners showed that this incongruency advantage (IA) is level-dependent: there is no advantage for incongruent sounds lower than a Sound/Scene ratio (So/Sc) of -7.5 dB, but there is about five percentage points better accuracy for sounds with greater So/Sc. Testing a new group of trained listeners on a larger corpus of sounds and scenes showed that the effect is robust and not confined to a specific stimulus set. Modeling using spectral-temporal measures showed that neither analyses based on acoustic features, nor semantic assessments of sound-scene congruency can account for this difference, indicating the IA is a complex effect, possibly arising from the sensitivity of the auditory system to new and unexpected events, under particular listening conditions.	\N	\N
21387016	Important sounds can be easily missed or misidentified in the presence of extraneous noise. We describe an auditory illusion in which a continuous ongoing tone becomes inaudible during a brief, non-masking noise burst more than one octave away, which is unexpected given the frequency resolution of human hearing. Participants strongly susceptible to this illusory discontinuity did not perceive illusory auditory continuity (in which a sound subjectively continues during a burst of masking noise) when the noises were short, yet did so at longer noise durations. Participants who were not prone to illusory discontinuity showed robust early electroencephalographic responses at 40-66 ms after noise burst onset, whereas those prone to the illusion lacked these early responses. These data suggest that short-latency neural responses to auditory scene components reflect subsequent individual differences in the parsing of auditory scenes.	\N	\N
21428515	The precedence effect (PE) describes the ability to localize a direct, leading sound correctly when its delayed copy (lag) is present, though not separately audible. The relative contribution of binaural cues in the temporal fine structure (TFS) of lead-lag signals was compared to that of interaural level differences (ILDs) and interaural time differences (ITDs) carried in the envelope. In a localization dominance paradigm participants indicated the spatial location of lead-lag stimuli processed with a binaural noise-band vocoder whose noise carriers introduced random TFS. The PE appeared for noise bursts of 10 ms duration, indicating dominance of envelope information. However, for three test words the PE often failed even at short lead-lag delays, producing two images, one toward the lead and one toward the lag. When interaural correlation in the carrier was increased, the images appeared more centered, but often remained split. Although previous studies suggest dominance of TFS cues, no image is lateralized in accord with the ITD in the TFS. An interpretation in the context of auditory scene analysis is proposed: By replacing the TFS with that of noise the auditory system loses the ability to fuse lead and lag into one object, and thus to show the PE.	\N	\N
21945789	This study investigates how acoustic change-events are represented in a listener's brain when attention is strongly focused elsewhere. Using magneto-encephalography (MEG) we examine whether cortical responses to different kinds of changes in stimulus statistics are similarly influenced by attentional load, and whether the processing of such acoustic changes in auditory cortex depends on modality-specific or general processing resources. We investigated these issues by examining cortical responses to two basic forms of acoustic transitions: (1) Violations of a simple acoustic pattern and (2) the emergence of a regular pattern from a random one. To simulate a complex sensory environment, these patterns were presented concurrently with streams of auditory and visual decoys. Listeners were required to perform tasks of high- and low-attentional-load in these domains. Results demonstrate that while auditory attentional-load does not influence the cortical representation of simple violations of regularity, it significantly reduces the magnitude of responses to the emergence of a regular acoustic pattern, suggesting a fundamentally skewed representation of the unattended auditory scene. In contrast, visual attentional-load had no effect on either transition response, consistent with the hypothesis that processing resources necessary for change detection are modality-specific.	\N	\N
22036957	Parsing of sound sources in the auditory environment or 'auditory scene analysis' is a computationally demanding cognitive operation that is likely to be vulnerable to the neurodegenerative process in Alzheimer's disease. However, little information is available concerning auditory scene analysis in Alzheimer's disease. Here we undertook a detailed neuropsychological and neuroanatomical characterization of auditory scene analysis in a cohort of 21 patients with clinically typical Alzheimer's disease versus age-matched healthy control subjects. We designed a novel auditory dual stream paradigm based on synthetic sound sequences to assess two key generic operations in auditory scene analysis (object segregation and grouping) in relation to simpler auditory perceptual, task and general neuropsychological factors. In order to assess neuroanatomical associations of performance on auditory scene analysis tasks, structural brain magnetic resonance imaging data from the patient cohort were analysed using voxel-based morphometry. Compared with healthy controls, patients with Alzheimer's disease had impairments of auditory scene analysis, and segregation and grouping operations were comparably affected. Auditory scene analysis impairments in Alzheimer's disease were not wholly attributable to simple auditory perceptual or task factors; however, the between-group difference relative to healthy controls was attenuated after accounting for non-verbal (visuospatial) working memory capacity. These findings demonstrate that clinically typical Alzheimer's disease is associated with a generic deficit of auditory scene analysis. Neuroanatomical associations of auditory scene analysis performance were identified in posterior cortical areas including the posterior superior temporal lobes and posterior cingulate. This work suggests a basis for understanding a class of clinical symptoms in Alzheimer's disease and for delineating cognitive mechanisms that mediate auditory scene analysis both in health and in neurodegenerative disease.	\N	\N
22280585	Despite many studies investigating auditory spatial impressions in rooms, few have addressed the impact of simultaneous visual cues on localization and the perception of spaciousness. The current research presents an immersive audiovisual environment in which participants were instructed to make auditory width judgments in dynamic bi-modal settings. The results of these psychophysical tests suggest the importance of congruent audio visual presentation to the ecological interpretation of an auditory scene. Supporting data were accumulated in five rooms of ascending volumes and varying reverberation times. Participants were given an audiovisual matching test in which they were instructed to pan the auditory width of a performing ensemble to a varying set of audio and visual cues in rooms. Results show that both auditory and visual factors affect the collected responses and that the two sensory modalities coincide in distinct interactions. The greatest differences between the panned audio stimuli given a fixed visual width were found in the physical space with the largest volume and the greatest source distance. These results suggest, in this specific instance, a predominance of auditory cues in the spatial analysis of the bi-modal scene.	\N	\N
22371616	Auditory streaming and visual plaids have been used extensively to study perceptual organization in each modality. Both stimuli can produce bistable alternations between grouped (one object) and split (two objects) interpretations. They also share two peculiar features: (i) at the onset of stimulus presentation, organization starts with a systematic bias towards the grouped interpretation; (ii) this first percept has 'inertia'; it lasts longer than the subsequent ones. As a result, the probability of forming different objects builds up over time, a landmark of both behavioural and neurophysiological data on auditory streaming. Here we show that first percept bias and inertia are independent. In plaid perception, inertia is due to a depth ordering ambiguity in the transparent (split) interpretation that makes plaid perception tristable rather than bistable: experimental manipulations removing the depth ambiguity suppressed inertia. However, the first percept bias persisted. We attempted a similar manipulation for auditory streaming by introducing level differences between streams, to bias which stream would appear in the perceptual foreground. Here both inertia and first percept bias persisted. We thus argue that the critical common feature of the onset of perceptual organization is the grouping bias, which may be related to the transition from temporally/spatially local to temporally/spatially global computation.	\N	\N
22371619	Recent studies have shown that auditory scene analysis involves distributed neural sites below, in, and beyond the auditory cortex (AC). However, it remains unclear what role each site plays and how they interact in the formation and selection of auditory percepts. We addressed this issue through perceptual multistability phenomena, namely, spontaneous perceptual switching in auditory streaming (AS) for a sequence of repeated triplet tones, and perceptual changes for a repeated word, known as verbal transformations (VTs). An event-related fMRI analysis revealed brain activity timelocked to perceptual switching in the cerebellum for AS, in frontal areas for VT, and the AC and thalamus for both. The results suggest that motor-based prediction, produced by neural networks outside the auditory system, plays essential roles in the segmentation of acoustic sequences both in AS and VT. The frequency of perceptual switching was determined by a balance between the activation of two sites, which are proposed to be involved in exploring novel perceptual organization and stabilizing current perceptual organization. The effect of the gene polymorphism of catechol-O-methyltransferase (COMT) on individual variations in switching frequency suggests that the balance of exploration and stabilization is modulated by catecholamines such as dopamine and noradrenalin. These mechanisms would support the noteworthy flexibility of auditory scene analysis.	\N	\N
22371621	Auditory stream segregation involves linking temporally separate acoustic events into one or more coherent sequences. For any non-trivial sequence of sounds, many alternative descriptions can be formed, only one or very few of which emerge in awareness at any time. Evidence from studies showing bi-/multistability in auditory streaming suggest that some, perhaps many of the alternative descriptions are represented in the brain in parallel and that they continuously vie for conscious perception. Here, based on a predictive coding view, we consider the nature of these sound representations and how they compete with each other. Predictive processing helps to maintain perceptual stability by signalling the continuation of previously established patterns as well as the emergence of new sound sources. It also provides a measure of how well each of the competing representations describes the current acoustic scene. This account of auditory stream segregation has been tested on perceptual data obtained in the auditory streaming paradigm.	\N	\N
22612172	Observers often remember a scene as containing information that was not presented but that would have likely been located just beyond the observed boundaries of the scene. This effect is called boundary extension (BE; e.g., Intraub & Richardson, 1989). Previous studies have observed BE in memory for visual and haptic stimuli, and the present experiments examined whether BE occurred in memory for auditory stimuli (prose, music). Experiments 1 and 2 varied the amount of auditory content to be remembered. BE was not observed, but when auditory targets contained more content, boundary restriction (BR) occurred. Experiment 3 presented auditory stimuli with less content and BR also occurred. In Experiment 4, white noise was added to stimuli with less content to equalize the durations of auditory stimuli, and BR still occurred. Experiments 5 and 6 presented trained stories and popular music, and BR still occurred. This latter finding ruled out the hypothesis that the lack of BE in Experiments 1-4 reflected a lack of familiarity with the stimuli. Overall, memory for auditory content exhibited BR rather than BE, and this pattern was stronger if auditory stimuli contained more content. Implications for the understanding of general perceptual processing and directions for future research are discussed.	\N	\N
22753470	A visual scene is perceived in terms of visual objects. Similar ideas have been proposed for the analogous case of auditory scene analysis, although their hypothesized neural underpinnings have not yet been established. Here, we address this question by recording from subjects selectively listening to one of two competing speakers, either of different or the same sex, using magnetoencephalography. Individual neural representations are seen for the speech of the two speakers, with each being selectively phase locked to the rhythm of the corresponding speech stream and from which can be exclusively reconstructed the temporal envelope of that speech stream. The neural representation of the attended speech dominates responses (with latency near 100 ms) in posterior auditory cortex. Furthermore, when the intensity of the attended and background speakers is separately varied over an 8-dB range, the neural representation of the attended speech adapts only to the intensity of that speaker but not to the intensity of the background speaker, suggesting an object-level intensity gain control. In summary, these results indicate that concurrent auditory objects, even if spectrotemporally overlapping and not resolvable at the auditory periphery, are neurally encoded individually in auditory cortex and emerge as fundamental representational units for top-down attentional modulation and bottom-up neural adaptation.	\N	\N
22829899	In natural environments, sensory information is embedded in temporally contiguous streams of events. This is typically the case when seeing and listening to a speaker or when engaged in scene analysis. In such contexts, two mechanisms are needed to single out and build a reliable representation of an event (or object): the temporal parsing of information and the selection of relevant information in the stream. It has previously been shown that rhythmic events naturally build temporal expectations that improve sensory processing at predictable points in time. Here, we asked to which extent temporal regularities can improve the detection and identification of events across sensory modalities. To do so, we used a dynamic visual conjunction search task accompanied by auditory cues synchronized or not with the color change of the target (horizontal or vertical bar). Sounds synchronized with the visual target improved search efficiency for temporal rates below 1.4 Hz but did not affect efficiency above that stimulation rate. Desynchronized auditory cues consistently impaired visual search below 3.3 Hz. Our results are interpreted in the context of the Dynamic Attending Theory: specifically, we suggest that a cognitive operation structures events in time irrespective of the sensory modality of input. Our results further support and specify recent neurophysiological findings by showing strong temporal selectivity for audiovisual integration in the auditory-driven improvement of visual search efficiency.	\N	\N
22844509	In auditory scene analysis, population separation and temporal coherence have been proposed to explain how auditory features are grouped together and streamed over time. The present study investigated whether these two theories can be applied to tactile streaming and whether temporal coherence theory can be applied to crossmodal streaming. The results show that synchrony detection between two tones/taps at different frequencies/locations became difficult when one of the tones/taps was embedded in a perceptual stream. While the taps applied to the same location were streamed over time, the taps applied to different locations were not. This observation suggests that tactile stream formation can be explained by population-separation theory. On the other hand, temporally coherent auditory stimuli at different frequencies were streamed over time, but temporally coherent tactile stimuli applied to different locations were not. When there was within-modality streaming, temporally coherent auditory stimuli and tactile stimuli were not streamed over time, either. This observation suggests the limitation of temporal coherence theory when it is applied to perceptual grouping over time.	\N	\N
23029426	The ability to detect sudden changes in the environment is critical for survival. Hearing is hypothesized to play a major role in this process by serving as an "early warning device," rapidly directing attention to new events. Here, we investigate listeners' sensitivity to changes in complex acoustic scenes-what makes certain events "pop-out" and grab attention while others remain unnoticed? We use artificial "scenes" populated by multiple pure-tone components, each with a unique frequency and amplitude modulation rate. Importantly, these scenes lack semantic attributes, which may have confounded previous studies, thus allowing us to probe low-level processes involved in auditory change perception. Our results reveal a striking difference between "appear" and "disappear" events. Listeners are remarkably tuned to object appearance: change detection and identification performance are at ceiling; response times are short, with little effect of scene-size, suggesting a pop-out process. In contrast, listeners have difficulty detecting disappearing objects, even in small scenes: performance rapidly deteriorates with growing scene-size; response times are slow, and even when change is detected, the changed component is rarely successfully identified. We also measured change detection performance when a noise or silent gap was inserted at the time of change or when the scene was interrupted by a distractor that occurred at the time of change but did not mask any scene elements. Gaps adversely affected the processing of item appearance but not disappearance. However, distractors reduced both appearance and disappearance detection. Together, our results suggest a role for neural adaptation and sensitivity to transients in the process of auditory change detection, similar to what has been demonstrated for visual change detection. Importantly, listeners consistently performed better for item addition (relative to deletion) across all scene interruptions used, suggesting a robust perceptual representation of item appearance.	\N	\N
23145699	Listeners are good at attending to one auditory stream in a crowded environment. However, is there an upper limit of streams present in an auditory scene at which this selective attention breaks down? Here, participants were asked to attend one stream of spoken letters amidst other letter streams. In half of the trials, an initial primer was played, cueing subjects to the sound configuration. Results indicate that performance increases with token repetitions. Priming provided a performance benefit, suggesting that stream selection, not formation, is the bottleneck associated with attention in an overcrowded scene. Results' implications for brain-computer interfaces are discussed.	\N	\N
23423817	A variety of perceptual correspondences between auditory and visual features have been reported, but few studies have investigated how rhythm, an auditory feature defined purely by dynamics relevant to speech and music, interacts with visual features. Here, we demonstrate a novel crossmodal association between auditory rhythm and visual clutter. Participants were shown a variety of visual scenes from diverse categories and asked to report the auditory rhythm that perceptually matched each scene by adjusting the rate of amplitude modulation (AM) of a sound. Participants matched each scene to a specific AM rate with surprising consistency. A spatial-frequency analysis showed that scenes with greater contrast energy in midrange spatial frequencies were matched to faster AM rates. Bandpass-filtering the scenes indicated that greater contrast energy in this spatial-frequency range was associated with an abundance of object boundaries and contours, suggesting that participants matched more cluttered scenes to faster AM rates. Consistent with this hypothesis, AM-rate matches were strongly correlated with perceived clutter. Additional results indicated that both AM-rate matches and perceived clutter depend on object-based (cycles per object) rather than retinal (cycles per degree of visual angle) spatial frequency. Taken together, these results suggest a systematic crossmodal association between auditory rhythm, representing density in the temporal domain, and visual clutter, representing object-based density in the spatial domain. This association may allow for the use of auditory rhythm to influence how visual clutter is perceived and attended.	\N	\N
23516340	Many sound sources can only be recognised from the pattern of sounds they emit, and not from the individual sound events that make up their emission sequences. Auditory scene analysis addresses the difficult task of interpreting the sound world in terms of an unknown number of discrete sound sources (causes) with possibly overlapping signals, and therefore of associating each event with the appropriate source. There are potentially many different ways in which incoming events can be assigned to different causes, which means that the auditory system has to choose between them. This problem has been studied for many years using the auditory streaming paradigm, and recently it has become apparent that instead of making one fixed perceptual decision, given sufficient time, auditory perception switches back and forth between the alternatives-a phenomenon known as perceptual bi- or multi-stability. We propose a new model of auditory scene analysis at the core of which is a process that seeks to discover predictable patterns in the ongoing sound sequence. Representations of predictable fragments are created on the fly, and are maintained, strengthened or weakened on the basis of their predictive success, and conflict with other representations. Auditory perceptual organisation emerges spontaneously from the nature of the competition between these representations. We present detailed comparisons between the model simulations and data from an auditory streaming experiment, and show that the model accounts for many important findings, including: the emergence of, and switching between, alternative organisations; the influence of stimulus parameters on perceptual dominance, switching rate and perceptual phase durations; and the build-up of auditory streaming. The principal contribution of the model is to show that a two-stage process of pattern discovery and competition between incompatible patterns can account for both the contents (perceptual organisations) and the dynamics of human perception in auditory streaming.	\N	\N
23527271	The auditory system creates a neuronal representation of the acoustic world based on spectral and temporal cues present at the listener's ears, including cues that potentially signal the locations of sounds. Discrimination of concurrent sounds from multiple sources is especially challenging. The current study is part of an effort to better understand the neuronal mechanisms governing this process, which has been termed "auditory scene analysis". In particular, we are interested in spatial release from masking by which spatial cues can segregate signals from other competing sounds, thereby overcoming the tendency of overlapping spectra and/or common temporal envelopes to fuse signals with maskers. We studied detection of pulsed tones in free-field conditions in the presence of concurrent multi-tone non-speech maskers. In "energetic" masking conditions, in which the frequencies of maskers fell within the ± 1/3-octave band containing the signal, spatial release from masking at low frequencies (~600 Hz) was found to be about 10 dB. In contrast, negligible spatial release from energetic masking was seen at high frequencies (~4000 Hz). We observed robust spatial release from masking in broadband "informational" masking conditions, in which listeners could confuse signal with masker even though there was no spectral overlap. Substantial spatial release was observed in conditions in which the onsets of the signal and all masker components were synchronized, and spatial release was even greater under asynchronous conditions. Spatial cues limited to high frequencies (>1500 Hz), which could have included interaural level differences and the better-ear effect, produced only limited improvement in signal detection. Substantially greater improvement was seen for low-frequency sounds, for which interaural time differences are the dominant spatial cue.	\N	\N
23691185	Listening to and understanding people in a "cocktail-party situation" is a remarkable feature of the human auditory system. Here we investigated the neural correlates of the ability to localize a particular sound among others in an acoustically cluttered environment with healthy subjects. In a sound localization task, five different natural sounds were presented from five virtual spatial locations during functional magnetic resonance imaging (fMRI). Activity related to auditory stream segregation was revealed in posterior superior temporal gyrus bilaterally, anterior insula, supplementary motor area, and frontoparietal network. Moreover, the results indicated critical roles of left planum temporale in extracting the sound of interest among acoustical distracters and the precuneus in orienting spatial attention to the target sound. We hypothesized that the left-sided lateralization of the planum temporale activation is related to the higher specialization of the left hemisphere for analysis of spectrotemporal sound features. Furthermore, the precuneus - a brain area known to be involved in the computation of spatial coordinates across diverse frames of reference for reaching to objects - seems to be also a crucial area for accurately determining locations of auditory targets in an acoustically complex scene of multiple sound sources. The precuneus thus may not only be involved in visuo-motor processes, but may also subserve related functions in the auditory modality.	\N	\N
23825404	In a complex auditory scene, a "cocktail party" for example, listeners can disentangle multiple competing sequences of sounds. A recent psychophysical study in our laboratory demonstrated a robust spatial component of stream segregation showing ∼8° acuity. Here, we recorded single- and multiple-neuron responses from the primary auditory cortex of anesthetized cats while presenting interleaved sound sequences that human listeners would experience as segregated streams. Sequences of broadband sounds alternated between pairs of locations. Neurons synchronized preferentially to sounds from one or the other location, thereby segregating competing sound sequences. Neurons favoring one source location or the other tended to aggregate within the cortex, suggestive of modular organization. The spatial acuity of stream segregation was as narrow as ∼10°, markedly sharper than the broad spatial tuning for single sources that is well known in the literature. Spatial sensitivity was sharpest among neurons having high characteristic frequencies. Neural stream segregation was predicted well by a parameter-free model that incorporated single-source spatial sensitivity and a measured forward-suppression term. We found that the forward suppression was not due to post discharge adaptation in the cortex and, therefore, must have arisen in the subcortical pathway or at the level of thalamocortical synapses. A linear-classifier analysis of single-neuron responses to rhythmic stimuli like those used in our psychophysical study yielded thresholds overlapping those of human listeners. Overall, the results indicate that the ascending auditory system does the work of segregating auditory streams, bringing them to discrete modules in the cortex for selection by top-down processes.	\N	\N
23926291	Previously, Gygi and Shafiro (2011) found that when environmental sounds are semantically incongruent with the background scene (e.g., horse galloping in a restaurant), they can be identified more accurately by young normal-hearing listeners (YNH) than sounds congruent with the scene (e.g., horse galloping at a racetrack). This study investigated how age and high-frequency audibility affect this Incongruency Advantage (IA) effect. In Experiments 1a and 1b, elderly listeners ( N = 18 for 1a; N = 10 for 1b) with age-appropriate hearing (EAH) were tested on target sounds and auditory scenes in 5 sound-to-scene ratios (So/Sc) between -3 and -18 dB. Experiment 2 tested 11 YNH on the same sound-scene pairings lowpass-filtered at 4 kHz (YNH-4k). The EAH and YNH-4k groups exhibited an almost identical pattern of significant IA effects, but both were at approximately 3.9 dB higher So/Sc than the previously tested YNH listeners. However, the psychometric functions revealed a shallower slope for EAH listeners compared with YNH listeners for the congruent stimuli only, suggesting a greater difficulty for the EAH listeners in attending to sounds expected to occur in a scene. These findings indicate that semantic relationships between environmental sounds in soundscapes are mediated by both audibility and cognitive factors and suggest a method for dissociating these factors.	\N	\N
24003112	After hearing a tone, the human auditory system becomes more sensitive to similar tones than to other tones. Current auditory models explain this phenomenon by a simple bandpass attention filter. Here, we demonstrate that auditory attention involves multiple pass-bands around octave-related frequencies above and below the cued tone. Intriguingly, this "octave effect" not only occurs for physically presented tones, but even persists for the missing fundamental in complex tones, and for imagined tones. Our results suggest neural interactions combining octave-related frequencies, likely located in nonprimary cortical regions. We speculate that this connectivity scheme evolved from exposure to natural vibrations containing octave-related spectral peaks, e.g., as produced by vocal cords.	\N	\N
24052177	The fundamental perceptual unit in hearing is the 'auditory object'. Similar to visual objects, auditory objects are the computational result of the auditory system's capacity to detect, extract, segregate and group spectrotemporal regularities in the acoustic environment; the multitude of acoustic stimuli around us together form the auditory scene. However, unlike the visual scene, resolving the component objects within the auditory scene crucially depends on their temporal structure. Neural correlates of auditory objects are found throughout the auditory system. However, neural responses do not become correlated with a listener's perceptual reports until the level of the cortex. The roles of different neural structures and the contribution of different cognitive states to the perception of auditory objects are not yet fully understood.	\N	\N
24239869	Perceptual representations of auditory stimuli (i.e., sounds) are derived from the auditory system's ability to segregate and group the spectral, temporal, and spatial features of auditory stimuli-a process called "auditory scene analysis". Psychophysical studies have identified several of the principles and mechanisms that underlie a listener's ability to segregate and group acoustic stimuli. One important psychophysical task that has illuminated many of these principles and mechanisms is the "streaming" task. Despite the wide use of this task to study psychophysical mechanisms of human audition, no studies have explicitly tested the streaming abilities of non-human animals using the standard methodologies employed in human-audition studies. Here, we trained rhesus macaques to participate in the streaming task using methodologies and controls similar to those presented in previous human studies. Overall, we found that the monkeys' behavioral reports were qualitatively consistent with those of human listeners, thus suggesting that this task may be a valuable tool for future neurophysiological studies.	\N	\N
24475030	In our daily lives, auditory stream segregation allows us to differentiate concurrent sound sources and to make sense of the scene we are experiencing. However, a combination of segregation and the concurrent integration of auditory streams is necessary in order to analyze the relationship between streams and thus perceive a coherent auditory scene. The present functional magnetic resonance imaging study investigates the relative role and neural underpinnings of these listening strategies in multi-part musical stimuli. We compare a real human performance of a piano duet and a synthetic stimulus of the same duet in a prioritized integrative attention paradigm that required the simultaneous segregation and integration of auditory streams. In so doing, we manipulate the degree to which the attended part of the duet led either structurally (attend melody vs. attend accompaniment) or temporally (asynchronies vs. no asynchronies between parts), and thus the relative contributions of integration and segregation used to make an assessment of the leader-follower relationship. We show that perceptually the relationship between parts is biased towards the conventional structural hierarchy in western music in which the melody generally dominates (leads) the accompaniment. Moreover, the assessment varies as a function of both cognitive load, as shown through difficulty ratings and the interaction of the temporal and the structural relationship factors. Neurally, we see that the temporal relationship between parts, as one important cue for stream segregation, revealed distinct neural activity in the planum temporale. By contrast, integration used when listening to both the temporally separated performance stimulus and the temporally fused synthetic stimulus resulted in activation of the intraparietal sulcus. These results support the hypothesis that the planum temporale and IPS are key structures underlying the mechanisms of segregation and integration of auditory streams, respectively.	\N	\N
24478375	Adaptation to both common and rare sounds has been independently reported in neurophysiological studies using probabilistic stimulus paradigms in small mammals. However, the apparent sensitivity of the mammalian auditory system to the statistics of incoming sound has not yet been generalized to task-related human auditory perception. Here, we show that human listeners selectively adapt to novel sounds within scenes unfolding over minutes. Listeners' performance in an auditory discrimination task remains steady for the most common elements within the scene but, after the first minute, performance improves for distinct and rare (oddball) sound elements, at the expense of rare sounds that are relatively less distinct. Our data provide the first evidence of enhanced coding of oddball sounds in a human auditory discrimination task and suggest the existence of an adaptive mechanism that tracks the long-term statistics of sounds and deploys coding resources accordingly.	\N	\N
24681354	The auditory system is designed to transform acoustic information from low-level sensory representations into perceptual representations. These perceptual representations are the computational result of the auditory system's ability to group and segregate spectral, spatial and temporal regularities in the acoustic environment into stable perceptual units (i.e., sounds or auditory objects). Current evidence suggests that the cortex-specifically, the ventral auditory pathway-is responsible for the computations most closely related to perceptual representations. Here, we discuss how the transformations along the ventral auditory pathway relate to auditory percepts, with special attention paid to the processing of vocalizations and categorization, and explore recent models of how these areas may carry out these computations.	\N	\N
24711409	Human perception, cognition, and action are laced with seemingly arbitrary mappings. In particular, sound has a strong spatial connotation: Sounds are high and low, melodies rise and fall, and pitch systematically biases perceived sound elevation. The origins of such mappings are unknown. Are they the result of physiological constraints, do they reflect natural environmental statistics, or are they truly arbitrary? We recorded natural sounds from the environment, analyzed the elevation-dependent filtering of the outer ear, and measured frequency-dependent biases in human sound localization. We find that auditory scene statistics reveals a clear mapping between frequency and elevation. Perhaps more interestingly, this natural statistical mapping is tightly mirrored in both ear-filtering properties and in perceived sound location. This suggests that both sound localization behavior and ear anatomy are fine-tuned to the statistics of natural auditory scenes, likely providing the basis for the spatial connotation of human hearing.	\N	\N
24788808	This work analyzed the perceptual attributes of natural dynamic audiovisual scenes. We presented thirty participants with 19 natural scenes in a similarity categorization task, followed by a semi-structured interview. The scenes were reproduced with an immersive audiovisual display. Natural scene perception has been studied mainly with unimodal settings, which have identified motion as one of the most salient attributes related to visual scenes, and sound intensity along with pitch trajectories related to auditory scenes. However, controlled laboratory experiments with natural multimodal stimuli are still scarce. Our results show that humans pay attention to similar perceptual attributes in natural scenes, and a two-dimensional perceptual map of the stimulus scenes and perceptual attributes was obtained in this work. The exploratory results show the amount of movement, perceived noisiness, and eventfulness of the scene to be the most important perceptual attributes in naturalistically reproduced real-world urban environments. We found the scene gist properties openness and expansion to remain as important factors in scenes with no salient auditory or visual events. We propose that the study of scene perception should move forward to understand better the processes behind multimodal scene processing in real-world environments. We publish our stimulus scenes as spherical video recordings and sound field recordings in a publicly available database.	\N	\N
24821552	Many studies have shown that attention modulates the cortical representation of an auditory scene, emphasizing an attended source while suppressing competing sources. Yet, individual differences in the strength of this attentional modulation and their relationship with selective attention ability are poorly understood. Here, we ask whether differences in how strongly attention modulates cortical responses reflect differences in normal-hearing listeners' selective auditory attention ability. We asked listeners to attend to one of three competing melodies and identify its pitch contour while we measured cortical electroencephalographic responses. The three melodies were either from widely separated pitch ranges ("easy trials"), or from a narrow, overlapping pitch range ("hard trials"). The melodies started at slightly different times; listeners attended either the leading or lagging melody. Because of the timing of the onsets, the leading melody drew attention exogenously. In contrast, attending the lagging melody required listeners to direct top-down attention volitionally. We quantified how attention amplified auditory N1 response to the attended melody and found large individual differences in the N1 amplification, even though only correctly answered trials were used to quantify the ERP gain. Importantly, listeners with the strongest amplification of N1 response to the lagging melody in the easy trials were the best performers across other types of trials. Our results raise the possibility that individual differences in the strength of top-down gain control reflect inherent differences in the ability to control top-down attention.	\N	\N
24841996	Auditory objects, like their visual counterparts, are perceptually defined constructs, but nevertheless must arise from underlying neural circuitry. Using magnetoencephalography (MEG) recordings of the neural responses of human subjects listening to complex auditory scenes, we review studies that demonstrate that auditory objects are indeed neurally represented in auditory cortex. The studies use neural responses obtained from different experiments in which subjects selectively listen to one of two competing auditory streams embedded in a variety of auditory scenes. The auditory streams overlap spatially and often spectrally. In particular, the studies demonstrate that selective attentional gain does not act globally on the entire auditory scene, but rather acts differentially on the separate auditory streams. This stream-based attentional gain is then used as a tool to individually analyze the different neural representations of the competing auditory streams. The neural representation of the attended stream, located in posterior auditory cortex, dominates the neural responses. Critically, when the intensities of the attended and background streams are separately varied over a wide intensity range, the neural representation of the attended speech adapts only to the intensity of that speaker, irrespective of the intensity of the background speaker. This demonstrates object-level intensity gain control in addition to the above object-level selective attentional gain. Overall, these results indicate that concurrently streaming auditory objects, even if spectrally overlapping and not resolvable at the auditory periphery, are individually neurally encoded in auditory cortex, as separate objects.	\N	\N
25433224	Pitch plays a fundamental role in audition, from speech and music perception to auditory scene analysis. Congenital amusia is a neurogenetic disorder that appears to affect primarily pitch and melody perception. Pitch is normally conveyed by the spectro-temporal fine structure of low harmonics, but some pitch information is available in the temporal envelope produced by the interactions of higher harmonics. Using 10 amusic subjects and 10 matched controls, we tested the hypothesis that amusics suffer exclusively from impaired processing of spectro-temporal fine structure. We also tested whether the inability of amusics to process acoustic temporal fine structure extends beyond pitch by measuring sensitivity to interaural time differences, which also rely on temporal fine structure. Further tests were carried out on basic intensity and spectral resolution. As expected, pitch perception based on spectro-temporal fine structure was impaired in amusics; however, no significant deficits were observed in amusics' ability to perceive the pitch conveyed via temporal-envelope cues. Sensitivity to interaural time differences was also not significantly different between the amusic and control groups, ruling out deficits in the peripheral coding of temporal fine structure. Finally, no significant differences in intensity or spectral resolution were found between the amusic and control groups. The results demonstrate a pitch-specific deficit in fine spectro-temporal information processing in amusia that seems unrelated to temporal or spectral coding in the auditory periphery. These results are consistent with the view that there are distinct mechanisms dedicated to processing resolved and unresolved harmonics in the general population, the former being altered in congenital amusia while the latter is spared.	\N	\N
25654748	In noisy settings, listening is aided by correlated dynamic visual cues gleaned from a talker's face-an improvement often attributed to visually reinforced linguistic information. In this study, we aimed to test the effect of audio-visual temporal coherence alone on selective listening, free of linguistic confounds. We presented listeners with competing auditory streams whose amplitude varied independently and a visual stimulus with varying radius, while manipulating the cross-modal temporal relationships. Performance improved when the auditory target's timecourse matched that of the visual stimulus. The fact that the coherence was between task-irrelevant stimulus features suggests that the observed improvement stemmed from the integration of auditory and visual streams into cross-modal objects, enabling listeners to better attend the target. These findings suggest that in everyday conditions, where listeners can often see the source of a sound, temporal cues provided by vision can help listeners to select one sound source from a mixture.	\N	\N
25659464	To probe sensitivity to the time structure of ongoing sound sequences, we measured MEG responses, in human listeners, to the offset of long tone-pip sequences containing various forms of temporal regularity. If listeners learn sequence temporal properties and form expectancies about the arrival time of an upcoming tone, sequence offset should be detectable as soon as an expected tone fails to arrive. Therefore, latencies of offset responses are indicative of the extent to which the temporal pattern has been acquired. In Exp1, sequences were isochronous with tone inter-onset-interval (IOI) set to 75, 125 or 225ms. Exp2 comprised of non-isochronous, temporally regular sequences, comprised of the IOIs above. Exp3 used the same sequences as Exp2 but listeners were required to monitor them for occasional frequency deviants. Analysis of the latency of offset responses revealed that the temporal structure of (even rather simple) regular sequences is not learnt precisely when the sequences are ignored. Pattern coding, supported by a network of temporal, parietal and frontal sources, improved considerably when the signals were made behaviourally pertinent. Thus, contrary to what might be expected in the context of an 'early warning system' framework, learning of temporal structure is not automatic, but affected by the signal's behavioural relevance.	\N	\N
25726262	Auditory development involves changes in the peripheral and central nervous system along the auditory pathways, and these occur naturally, and in response to stimulation. Human development occurs along a trajectory that can last decades, and is studied using behavioral psychophysics, as well as physiologic measurements with neural imaging. The auditory system constructs a perceptual space that takes information from objects and groups, segregates sounds, and provides meaning and access to communication tools such as language. Auditory signals are processed in a series of analysis stages, from peripheral to central. Coding of information has been studied for features of sound, including frequency, intensity, loudness, and location, in quiet and in the presence of maskers. In the latter case, the ability of the auditory system to perform an analysis of the scene becomes highly relevant. While some basic abilities are well developed at birth, there is a clear prolonged maturation of auditory development well into the teenage years. Maturation involves auditory pathways. However, non-auditory changes (attention, memory, cognition) play an important role in auditory development. The ability of the auditory system to adapt in response to novel stimuli is a key feature of development throughout the nervous system, known as neural plasticity.	\N	\N
22240459	Humans and other animals often communicate acoustically in noisy social groups, in which the background noise generated by other individuals can mask signals of interest. When listening to speech in the presence of speech-like noise, humans experience a release from auditory masking when target and masker are spatially separated. We investigated spatial release from masking (SRM) in a free-field call recognition task in Cope's gray treefrog (Hyla chrysoscelis). In this species, reproduction requires that females successfully detect, recognize, and localize a conspecific male in the noisy social environment of a breeding chorus. Using no-choice phonotaxis assays, we measured females' signal recognition thresholds in response to a target signal (an advertisement call) in the presence and absence of chorus-shaped noise. Females experienced about 3 dB of masking release, compared with a co-localized condition, when the masker was displaced 90° in azimuth from the target. The magnitude of masking release was independent of the spectral composition of the target (carriers of 1.3 kHz, 2.6 kHz, or both). Our results indicate that frogs experience a modest degree of spatial unmasking when performing a call recognition task in the free-field, and suggest that variation in signal spectral content has small effects on both source identification and spatial unmasking. We discuss these results in the context of spatial unmasking in vertebrates and call recognition in frogs.	\N	\N
20649227	The lateralization of 250-ms trains of brief noise bursts was measured using an acoustic pointing technique. Stimuli were designed to assess the contribution of the interaural time delay (ITD) of the onset binaural burst relative to that of the ITDs in the ongoing part of the train. Lateralization was measured by listeners' adjustments of the ITD of a pointer stimulus, a 50-ms burst of noise, to match the lateral position of the target train. Results confirmed previous reports of lateralization dominance by the onset burst under conditions in which the train is composed of frozen tokens and the ongoing part contains multiple ambiguous interaural delays. In contrast, lateralization of ongoing trains in which fresh noise tokens were used for each set of two alternating (left-leading/right-leading) binaural pairs followed the ITD of the first pair in each set, regardless of the ITD of the onset burst of the entire stimulus and even when the onset burst was removed by gradual gating. This clear lateralization of a long-duration stimulus with ambiguous interaural delay cues suggests precedence mechanisms that involve not only the interaural cues at the beginning of a sound, but also the pattern of cues within an ongoing sound.	\N	\N
21131368	The thalamic reticular nucleus (TRN) is a shell-shaped gamma amino butyric acid (GABA)ergic nucleus, which is uniquely placed between the thalamus and the cortex, because it receives excitatory afferents from both cortical and thalamic neurons and sends inhibitory projections to all nuclei of the dorsal thalamus. A review of the evidence suggesting that the TRN is implicated in the neurobiology of schizophrenia. TRN-thalamus circuits are implicated in bottom-up as well as top-down processing. TRN projections to nonspecific nuclei of the dorsal thalamus mediate top-down processes, including attentional modulation, which are initiated by cortical afferents to the TRN. TRN-thalamus circuits are also involved in bottom-up activities, including sensory gating and the transfer to the cortex of sleep spindles. Intriguingly, deficits in attention and sensory gating have been consistently found in schizophrenics, including first-break and chronic patients. Furthermore, high-density electroencephalographic studies have revealed a marked reduction in sleep spindles in schizophrenics. On the basis of our current knowledge on the molecular and anatomo-functional properties of the TRN, we suggest that this thalamic GABAergic nucleus may be involved in the neurobiology of schizophrenia.	\N	\N
22087275	How quickly do listeners recognize emotions from a speaker's voice, and does the time course for recognition vary by emotion type? To address these questions, we adapted the auditory gating paradigm to estimate how much vocal information is needed for listeners to categorize five basic emotions (anger, disgust, fear, sadness, happiness) and neutral utterances produced by male and female speakers of English. Semantically-anomalous pseudo-utterances (e.g., The rivix jolled the silling) conveying each emotion were divided into seven gate intervals according to the number of syllables that listeners heard from sentence onset. Participants (n = 48) judged the emotional meaning of stimuli presented at each gate duration interval, in a successive, blocked presentation format. Analyses looked at how recognition of each emotion evolves as an utterance unfolds and estimated the "identification point" for each emotion. Results showed that anger, sadness, fear, and neutral expressions are recognized more accurately at short gate intervals than happiness, and particularly disgust; however, as speech unfolds, recognition of happiness improves significantly towards the end of the utterance (and fear is recognized more accurately than other emotions). When the gate associated with the emotion identification point of each stimulus was calculated, data indicated that fear (M = 517 ms), sadness (M = 576 ms), and neutral (M = 510 ms) expressions were identified from shorter acoustic events than the other emotions. These data reveal differences in the underlying time course for conscious recognition of basic emotions from vocal expressions, which should be accounted for in studies of emotional speech processing.	\N	\N
22384211	Recent behavioral neuroscience research revealed that elementary reactive behavior can be improved in the case of cross-modal sensory interactions thanks to underlying multisensory integration mechanisms. Can this benefit be generalized to an ongoing coordination of movements under severe physical constraints? We choose a juggling task to examine this question. A central issue well-known in juggling lies in establishing and maintaining a specific temporal coordination among balls, hands, eyes and posture. Here, we tested whether providing additional timing information about the balls and hands motions by using external sound and tactile periodic stimulations, the later presented at the wrists, improved the behavior of jugglers. One specific combination of auditory and tactile metronome led to a decrease of the spatiotemporal variability of the juggler's performance: a simple sound associated to left and right tactile cues presented antiphase to each other, which corresponded to the temporal pattern of hands movement in the juggling task. A contrario, no improvements were obtained in the case of other auditory and tactile combinations. We even found a degraded performance when tactile events were presented alone. The nervous system thus appears able to integrate in efficient way environmental information brought by different sensory modalities, but only if the information specified matches specific features of the coordination pattern. We discuss the possible implications of these results for the understanding of the neuronal integration process implied in audio-tactile interaction in the context of complex voluntary movement, and considering the well-known gating effect of movement on vibrotactile perception.	\N	\N
22896044	Thresholds of school-aged children are elevated relative to those of adults for intensity discrimination and amplitude modulation (AM) detection. It is unclear how these findings are related or what role stimulus gating and dynamic envelope cues play in these results. Two experiments assessed the development of sensitivity to intensity increments in different stimulus contexts. Thresholds for detecting an increment in level were estimated for normal-hearing children (5- to 10-year-olds) and adults. Experiment 1 compared intensity discrimination for gated and continuous presentation of a 1-kHz tone, with a 65-dB-SPL standard level. Experiment 2 compared increment detection and 16-Hz AM detection introduced into a continuous 1-kHz tone, with either 35- or 75-dB-SPL standard levels. Children had higher thresholds than adults overall. All listeners were more sensitive to increments in the continuous than the gated stimulus and performed better at the 75- than at the 35-dB-SPL standard level. Both effects were comparable for children and adults. There was some evidence that children's AM detection was more adultlike than increment detection. These results imply that memory for loudness across gated intervals is not responsible for children's poor performance but that multiple dynamic envelope cues may benefit children more than adults.	\N	\N
23716244	This study investigated monaural envelope correlation perception (Richards 1987) for noise bandwidths ranging from 25 to 1,600 Hz. The high-frequency side of the low band was fixed at 3,000 Hz and the low-frequency side of the high band was fixed at 3,500 Hz. When comodulated, the magnitude spectra of the pair of noise bands were either identical or reflected around the midpoint. Six listeners with normal hearing participated. Listeners showed similar performance for identical and reflected-spectrum conditions, with best performance usually occurring for bandwidths between 200 and 800 Hz. Results were considered in terms of envelope comparisons of waveforms at the outputs of multiple peripheral filters or envelope comparisons of waveforms at the outputs of central filters set to the bandwidths of the noise stimuli. Some aspects of the results were incompatible with the account based on multiple peripheral filters. However, the results of a supplementary condition involving the gating of band subregions indicated that this incompatibility could be accounted for by nonoptimal weighting of peripheral filter outputs.	\N	\N
24298171	While watching movies, the brain integrates the visual information and the musical soundtrack into a coherent percept. Multisensory integration can lead to emotion elicitation on which soundtrack valences may have a modulatory impact. Here, dynamic kissing scenes from romantic comedies were presented to 22 participants (13 females) during functional magnetic resonance imaging scanning. The kissing scenes were either accompanied by happy music, sad music or no music. Evidence from cross-modal studies motivated a predefined three-region network for multisensory integration of emotion, consisting of fusiform gyrus (FG), amygdala (AMY) and anterior superior temporal gyrus (aSTG). The interactions in this network were investigated using dynamic causal models of effective connectivity. This revealed bilinear modulations by happy and sad music with suppression effects on the connectivity from FG and AMY to aSTG. Non-linear dynamic causal modeling showed a suppressive gating effect of aSTG on fusiform-amygdalar connectivity. In conclusion, fusiform to amygdala coupling strength is modulated via feedback through aSTG as region for multisensory integration of emotional material. This mechanism was emotion-specific and more pronounced for sad music. Therefore, soundtrack valences may modulate emotion elicitation in movies by differentially changing preprocessed visual information to the amygdala.	\N	\N
24736181	Although a number of recent studies have examined functional connectivity at rest, few have assessed differences between connectivity both during rest and across active task paradigms. Therefore, the question of whether cortical connectivity patterns remain stable or change with task engagement continues to be unaddressed. We collected multi-scan fMRI data on healthy controls (N=53) and schizophrenia patients (N=42) during rest and across paradigms arranged hierarchically by sensory load. We measured functional network connectivity among 45 non-artifactual distinct brain networks. Then, we applied a novel analysis to assess cross paradigm connectivity patterns applied to healthy controls and patients with schizophrenia. To detect these patterns, we fit a group by task full factorial ANOVA model to the group average functional network connectivity values. Our approach identified both stable (static effects) and state-based differences (dynamic effects) in brain connectivity providing a better understanding of how individuals' reactions to simple sensory stimuli are conditioned by the context within which they are presented. Our findings suggest that not all group differences observed during rest are detectable in other cognitive states. In addition, the stable differences of heightened connectivity between multiple brain areas with thalamus across tasks underscore the importance of the thalamus as a gateway to sensory input and provide new insight into schizophrenia.	\N	\N
24801767	Despite advances in the treatment of schizophrenia spectrum disorders with atypical antipsychotics (AAPs), there is still need for compounds with improved efficacy/side-effect ratios. Evidence from challenge studies suggests that the assessment of gating functions in humans and rodents with naturally low-gating levels might be a useful model to screen for novel compounds with antipsychotic properties. To further evaluate and extend this translational approach, three AAPs were examined. Compounds without antipsychotic properties served as negative control treatments. In a placebo-controlled, within-subject design, healthy males received either single doses of aripiprazole and risperidone (n=28), amisulpride and lorazepam (n=30), or modafinil and valproate (n=30), and placebo. Prepulse inhibiton (PPI) and P50 suppression were assessed. Clinically associated symptoms were evaluated using the SCL-90-R. Aripiprazole, risperidone, and amisulpride increased P50 suppression in low P50 gaters. Lorazepam, modafinil, and valproate did not influence P50 suppression in low gaters. Furthermore, low P50 gaters scored significantly higher on the SCL-90-R than high P50 gaters. Aripiprazole increased PPI in low PPI gaters, whereas modafinil and lorazepam attenuated PPI in both groups. Risperidone, amisulpride, and valproate did not influence PPI. P50 suppression in low gaters appears to be an antipsychotic-sensitive neurophysiologic marker. This conclusion is supported by the association of low P50 suppression and higher clinically associated scores. Furthermore, PPI might be sensitive for atypical mechanisms of antipsychotic medication. The translational model investigating differential effects of AAPs on gating in healthy subjects with naturally low gating can be beneficial for phase II/III development plans by providing additional information for critical decision making.	\N	\N
25024207	Historic theories of speech perception (Motor Theory and Analysis by Synthesis) invoked listeners' knowledge of speech production to explain speech perception. Neuroimaging data show that adult listeners activate motor brain areas during speech perception. In two experiments using magnetoencephalography (MEG), we investigated motor brain activation, as well as auditory brain activation, during discrimination of native and nonnative syllables in infants at two ages that straddle the developmental transition from language-universal to language-specific speech perception. Adults are also tested in Exp. 1. MEG data revealed that 7-mo-old infants activate auditory (superior temporal) as well as motor brain areas (Broca's area, cerebellum) in response to speech, and equivalently for native and nonnative syllables. However, in 11- and 12-mo-old infants, native speech activates auditory brain areas to a greater degree than nonnative, whereas nonnative speech activates motor brain areas to a greater degree than native speech. This double dissociation in 11- to 12-mo-old infants matches the pattern of results obtained in adult listeners. Our infant data are consistent with Analysis by Synthesis: auditory analysis of speech is coupled with synthesis of the motor plans necessary to produce the speech signal. The findings have implications for: (i) perception-action theories of speech perception, (ii) the impact of "motherese" on early language learning, and (iii) the "social-gating" hypothesis and humans' development of social understanding.	\N	\N
25544613	Perception routinely integrates inputs from different senses. Stimulus temporal proximity critically determines whether or not these inputs are bound together. Despite the temporal window of integration being a widely accepted notion, its neurophysiological substrate remains unclear. Many types of common audio-visual interactions occur within a time window of ∼100 ms. For example, in the sound-induced double-flash illusion, when two beeps are presented within ∼100 ms together with one flash, a second illusory flash is often perceived. Due to their intrinsic rhythmic nature, brain oscillations are one candidate mechanism for gating the temporal window of integration. Interestingly, occipital alpha band oscillations cycle on average every ∼100 ms, with peak frequencies ranging between 8 and 14 Hz (i.e., 120-60 ms cycle). Moreover, presenting a brief tone can phase-reset such oscillations in visual cortex. Based on these observations, we hypothesized that the duration of each alpha cycle might provide the temporal unit to bind audio-visual events. Here, we first recorded EEG while participants performed the sound-induced double-flash illusion task and found positive correlation between individual alpha frequency (IAF) peak and the size of the temporal window of the illusion. Participants then performed the same task while receiving occipital transcranial alternating current stimulation (tACS), to modulate oscillatory activity either at their IAF or at off-peak alpha frequencies (IAF±2 Hz). Compared to IAF tACS, IAF-2 Hz and IAF+2 Hz tACS, respectively, enlarged and shrunk the temporal window of illusion, suggesting that alpha oscillations might represent the temporal unit of visual processing that cyclically gates perception and the neurophysiological substrate promoting audio-visual interactions.	\N	\N
