Hemispheric association and dissociation of voice and speech information processing in stroke

As we listen to someone speaking, we extract both linguistic and non-linguistic information. Knowing how these two sets of information are processed in the brain is fundamental for the general understanding of social communication, speech recognition and therapy of language impairments. We investigated the pattern of performances in phoneme vs. gender categorization in left and right hemisphere stroke patients, and found an anatomo-functional dissociation in the right frontal cortex, establishing a new syndrome in voice discrimination abilities. In addition, phoneme and gender performances were most often associated than dissociated in the left hemisphere patients, suggesting a common neural underpinnings.


Introduction
Speech perception is often seen as special (Liberman & Mattingly, 1989) because localized brain injury can elicit specific language impairments such as aphasia, and because healthy individuals are extremely efficient at categorizing phonemes and syllables despite large variations in the stimulus spectral patterns (Liberman, Delattre, & Cooper, 1952). To achieve high performance levels, it has been hypothesized that voice information (talker specific information) is extracted along with the speech signal, and then stripped away to access (invariant) phonemic content: a process known as 'speaker normalization'. This hypothesis is however challenged because general auditory learning mechanisms are capable of explaining category formation in the absence of invariant acoustic information. Birds can learn speech consonant categories with no obvious acoustic invariant cue (Kluender, Diehl, & Killeen, 1987) and human listeners can readily learn non-speech categories that are similarly structured (Wade & Holt, 2005). In addition, several studies showed that talker variability influences speech perception. For instance the literature describes increased memory for words spoken by familiar voices, compared to non-familiar voices (Nygaard & Pisoni, 1998;Nygaard, Sommers, & Pisoni, 1994;Palmeri, Goldinger, & Pisoni, 1993), and similarly enhanced discrimination of, and memory for, (non-familiar) speakers of our own language compared to speakers of another language (Language Familiarity Effect - Perrachione & Wong, 2007) even in the absence of intelligibility (Fleming, Giordano, Caldara, & Belin, 2014). Most of these studies do not, however, specifically address the issue of phoneme perception, and thus acoustical regularities coming from multiple levels are at play.
In this study, we investigated how phoneme and talker information processing relate to each other, by comparing performances of right fronto-temporal (non-aphasic), left frontotemporal aphasic and left fronto-temporal non-aphasic stroke patients. Each participant categorized sounds from pitch equalized morphed continua as being male-female or /pa/-/ta/ (Pernet, Belin, & Jones, 2014). Stimuli were the same in both tasks, and participants had therefore to discard talker specific or phoneme specific information depending on the task at hand. Given the importance of the right STS (Bestelemeyer, Belin, P., & Grosbras, 2011) and right Middle and Inferior Frontal Gyrus (MFG-IFG) (Charest et al., 2012) in talker information processing, we hypothesized that right hemisphere patients will show a dissociation between the two tasks. In contrast, following our hypothesis of co-optation of voice selective neurons in phoneme processing, we hypothesized that left hemispheric aphasic patients will not show such dissociation, while non-aphasic patient could be impaired for voice but not phoneme.

Materials and Methods
The experiment used (program and stimuli) is freely available from http://dx.doi.org/10.6084/m9.figshare.1287284. It runs under Matlab with the psychophysical toolbox (Brainard, 1997;Kleiner et al., 2007). The behavioral data and scripts used to analyze the data are available from http://dx.doi.org/10.6084/m9.figshare.1287262. The imaging radiological analysis is also available with the behavioral data. CT or MRI scans could not be shared as they belong to the UK National Health Service (NHS) and not to the research team. The study was approved by the NHS Lothian South East Scotland Research Ethics Committee 01 (REC reference number: 11/SS/0055) and NHS Lothian Research and Development (R&D project number: 2011/W/NEU/09).
Participants: Twenty-five stroke patients (14 males, 11 females) with a median age of 69 years (min 39, max 85) were recruited into this study. At the time of testing, all patients were at the chronic stage (median time between stroke and testing 90±17 days). Participants were recruited as inpatients and outpatients from Lothian NHS hospitals via stroke physicians and Speech & Language Therapists between 10 and 60 weeks post-stroke with the sole inclusion criterion of a stroke affecting perisylvian tissues (supplementary table 1). Exclusion criteria were the presence of a previous stroke and/or English not being the participant's first language.
All patients were tested for their mood (Visual Analogue Self Esteem Scale -VASES (Brumfitt & Sheeran, 1999) and The Depression Intensity Scale Circles -DISCS (Turner-Stokes, Kalmus, Hirani, & Clegg, 2005) and language abilities (Western Aphasia Battery -WAB, Shewan & Kertesz, 1980). No patient had language deficits in the group with right hemisphere lesions (N=9, 5 males and 4 females, WAB median score 98.8), and 10 out of 16 patients showed signs of aphasia in the left hemisphere group (N=10, 5 males and 5 females, WAB median score 49.5 for aphasics vs. N=6, 4 males and 2 females, WAB median score 99.4 for non-aphasics -percentile bootstrap difference 49.8 [15 80] p=0). Kruskall-Wallis ANOVA showed that groups did not differ in terms of median age ( 2 (2,22)=4.58 p=0.1), in median time delay between stroke and testing ( 2 (2,22)=1.68 p=0.43) or depression scores ( 2 (2,22)=4 p=0.13 for VASES and  2 (2,22)=2.19 p=0.33 for DISCS  (Whiteley, Lindsey, Wardlaw, & Sandercock, 2006) along with the aphasia quotient and classification (Shewan & Kertesz, 1980 Paradigm: The experiment was identical to (Pernet et al., 2014), except that only pitch equalized stimuli were used. Participants carried out two 2 alternative forced choice identification tasks: voice gender (male vs. female) and phoneme (/pa/ vs. /ta/), and responded by button press on a keyboard. For each task, the same two continua of morphed sounds were used: the 1 st continuum going from a Male-/pa/ to a Female-/ta/ and the 2 nd continuum with the same speakers going from a Male-/ta/ to Female-/pa/. Morphs were generated by steps of 10% giv-ing for the 1 st continuum, 100% Male-/pa/, 90% Male-/pa/ with 10% Female-/ta/, 80% Male-/pa/ with 20% Female-/ta/, etc. until 100% Female-/ta/ and for the 2 nd continuum, 100% Male-/ta/, 90% Male-/ta/ with 10% Female-/pa/, 80% Male-/ta/ with 20% Female-/pa/, etc. until 100% Female-/pa/. This design allowed investigation of the effect of the task while controlling for the general acoustic characteristics of the stimuli, since the same stimuli were used in both tasks. Participants heard each stimulus in pseudo-random order six times each, for a total of 132 stimuli (2 continua * 11 steps * 6 trials) per task. Eighteen different continua of stimuli were generated from 6 different speakers (3 males and 3 females pronouncing /pa/ and /ta/) and randomly assigned to participants. Task order and key orientation were counterbalanced between participants. Between each task, an interfering tone discrimination task was also performed. Participants heard pure tones of various frequencies corresponding to the male and the female ranges and had to tell if 2 consecutive sounds were the same or different. The task followed a 2 down, 1 up step-wise procedure (Levitt, H., 1971) equating participants performances when the staircase has ended (70.71% percent correct). This task was primarily designed to minimise the influence of one categorization task on the other, but also allowed control for basic auditory impairments. No significant differences were observed between groups on this task (Kruskall-Wallis ANOVA,  2 (2,22) Behavioral Classification: To assess the independence of phoneme and gender categorization tasks, behavioral performances were binarized as impaired vs. unimpaired. For each subject, response proportion curves (percentage of female response or percentage of /ta/ responses) were obtained by averaging repeated trials from the different continua (figure 1). Each participant was then classified based on his or her ability to perform outside chance level, at least one time for the first 3 stimuli and at least one time for the last 3 stimuli along the sound continua. This implies that if a participant answered correctly for a least one of the initial stimuli (100% or 90% or 80% male or /pa/) and one of the final stimuli (100% or 90% or 80% female or /ta/), he or she was considered unimpaired (see supplementary material 1 for repeated analyses using an incremental classification criterion). In normal healthy participants, this is achieved very easily. Taking the data from (Pernet et al., 2014), 100% of controls (N=18, 9 males, 9 females) were unimpaired (figure 2). From the resulting classification in patients, the independence between the phoneme and gender categorization performance was tested for each group using a McNemar test with exact central probability (Fay, 2010).
Lesion to Symptom Mapping: Each participant received one or more CT or MRI scan from the National Health Service, and the scan the closest in time to the subjects' participation in this study was considered. One-to-one mapping was computed using a McNemar test with exact central probability (Fay, 2010) between behavioral deficits (impaired/non-impaired) and 6 Regions Of Interest (lesioned/non-lesioned). Significance was considered at al-pha=0.0083% i.e. Bonferroni corrected for the 6 ROI. The ROI classification was performed by an expert neuro-radiologist (AF) with 12 years of experience, and following a detailed protocol 1 adapted from the International Stroke Trial III (Whiteley, Lindsey, Wardlaw, & Sandercock, 2006). Regions of Interest considered were the middle/inferior frontal gyrus (involved in gender categorization, Fecteau, et al., 2005, Charest et al., 2012, Heschl's gyrus (also called Transverse Temporal gyrus -involved in language learning and spectral/pitch information processing, Warrier et al., 2009), the superior temporal gyrus (anterior/posterior, involved in general auditory processes but also voice perception, Belin, et al., 2004), the insula (involved in central auditory functions, in particular temporal resolution and sequencing, Bamiou et al., 2006) and the amygdala-hippocampal complex (involved in memory and emotional voice perception, Johnstone et al., 2006;Rama et al., 2004).
Behavioral quantitative analyses: Each participant ability to distinguish between male-female and /pa/-/ta/ stimuli was investigated using signal detection theory (Macmillan, & Creelman, 2005), computing the perceptual distance d' (i) between successive pairs of stimuli along each continuum, and (ii) between the extreme stimuli of each continuum (global d' -the distance between 100% male or /pa/ and 100% female or /ta/). We speak of categorical perception when a continuum of stimuli is divided by a boundary, such that stimuli on one side of the boundary are perceived as belonging to one category, whereas stimuli on the other side are perceived as belonging to another category, and perceptual distances within a category are low. Here we investigated categorical perception in each group and task by testing if d' values differed from 0 (bootstrap-t test with Bonferroni correction -Wilcox, 2012) for each pair of stimuli along the continuum, thus delineating a perceptual boundary. Finally, differences between gender and phoneme categorization performances were tested comparing global d' values within groups (percentile bootstrap of the difference with adjustment for multiple testing) and relative to the healthy participants from (Pernet et al., 2014) (mean difference between groups, with adjustment for multiple comparisons based on the maximum statistics -Wilcox, 2012).

Behavioral classification
Of the 25 patients recruited, 19 showed at least one categorization deficit (table 1), which we defined as the inability to perform above chance for at least one of three extreme stimuli (100%, 90% or 80% male/pa, female/pa, male/ta or female/ta). As hypothesized, we observed a significant dissociation in right fronto-temporal patients with impaired voice gender categorization vs. intact phonological categorization (8 out of 9 patients,  2 =7, p=0, Φ=inf). In left fronto-temporal patients, we found no dissociation, for both the aphasic and the non-aphasic groups. Aphasic patients tended to show both phonological and voice gender categorization deficits ( 2 =0.2, p=1, Φ=0.6) and non-aphasic patients tended to performed normally in both tasks ( 2 =0, p=0.5, Φ=1 -figure 1). The same association/dissociation patterns were observed when varying the categorization deficit criteria (supplementary material 1).  1). In the left aphasic group, out of the 10 patients, the number of subjects showing dissociations was respectively 4, 4, 5, 5 and 5. In the left non aphasic group, the same 2 patients always showed a dissociation. Finally in the right hemisphere group, out of 9 patients, the number of subjects showing dissociations was respectively 7, 7, 7, 6 and 6.
Supplementary figure 1. Percentage of patients impaired in each task for each patient group, computed using multiple criteria (red lines represent phoneme task; blue lines represent gender task, vertical bars represent the 95% confidence interval).

Lesion symptom mapping
One-to-one mapping between the behavioral classifications (impaired / unimpaired) and region-of-interests (lesioned / not-lesioned) showed that gender categorization impairments are associated with right frontal lesions ( 2 =8, p= 0.0078 -see figure 2 for details). No other ROI shows significant results. (Pernet et al., 2014), illustrating that the classification works for well with controls scoring at 100% correct. Below are the same response proportion curves but for each right hemisphere patient, associated with axial slices showing the lesion in relation to the Sylvian fissure (highlighted in yellow). Importantly, the STG is intact in 7 out of 9 cases (patients 3 and 5 having both frontal and temporal lesions), suggesting that this region is not critical for gender/voice categorization.

Behavioral quantitative analyses
Analyses of perceptual distances (d') between successive pairs of items along continua revealed that none of the patient groups had increased perceptual distances for ambiguous items (figure 3), contrary to healthy subjects as shown in (Pernet et al., 2014). This result indicates a generalized reduction in categorical boundaries following stroke.

Discussion
Based on functional MRI (Belin, et al., 2002(Belin, et al., , 2000Charest et al., 2012) and Transcranial Magnetic Stimulation (Bestelemeyer, et al., 2011) results observed in healthy volunteers, we hypothesized that patients with right fronto-temporal stroke will show a deficit in gender categorization but intact phonological performances. Our results using both a qualitative (classification of percentage of responses) and a quantitative (signal detection theory) approach confirmed this hypothesis. To our knowledge this is the first time that such deficit has been described in the literature. All of the patients presenting right frontal lesions showed a deficit in voice categorization, thus demonstrating a significant brain/behavior association. Previous studies have reported cases of phonagnosia (Van Lancker & Canter, 1982;Van Lancker, Cummings, Kreiman, & Dobkin, 1988;Van Lancker, Kreiman, & Cummings, 1989), in which patients could not recognize familiar voices; but this deficit was associated with right parietal lesions. When discrimination of unfamiliar voices was tested, deficits were associated with temporal (left or right) lesions although some evidence also exists for voice deficit during fronto-temporal degeneration (Hailstone, Crutch, Vestergaard, Patterson, & Warren, 2010;. What remains unclear is (1) if the deficit is specific to gender categorization or if it also relates to identity and (2) what is the role of the right IFG. On one hand, studies comparing attention to voice vs. speech found voice specific effects over the right STS (von Kriegstein, Eger, Kleinschmidt, & Giraud, 2003), and this effect has been related to speaker identity (Schall, Kiebel, Maess, & von Kriegstein, 2014). On the other hand, using a continuous carry over-over design allowing acoustic from perceived distances to be distinguished during gender categorization, Charest et al., (2012) showed that the STS processes gender (and thus identity as well) related acoustic information whilst the right IFG is involved in perceived gender related distances. In most studies, stimuli are pitch equalized. We previously showed that equalizing pitch does not influence performance or RT Pernet et al., 2014); thus making timbre and consequently spectrotemporal analysis a key element in gender categorization. There is no doubt that the patients in our study found the gender categorization task difficult, even when the STS was intact ( figure 2 -patients 1, 2 , 4, 6, 7, 8, 9). This difficulty was not related to the absence of pitch information since their pitch perception threshold did not differ from other patient groups. It has been proposed that the right IFG plays a general role in voice recognition and social communication at large, since direct connections have been demonstrated between the STG and the ventrolateral prefrontal cortex of the macaque (equivalent of IFG in humans) along with vocalization responsive cells (Romanski, 2012). Voice deficits observed here following right IFG lesions, often in association with insula lesions (although the association was not significant), could then reflect a disconnection syndrome from the STS.
Results observed in left hemisphere patients were heterogeneous. The absence of one-to-one mapping between speech deficits and brain lesions is not completely surprising since many studies have found that phonological categorization depends critically on the left supramarginal gyrus, a region not investigated in our study. Phonological deficits were also not always associated with aphasia (4 out of 16 patients), which concurs with the idea that sublexical speech perception impairments do not necessarily predict auditory comprehension deficits (Turkeltaub & Coslett, 2010). More than half of the left hemisphere patients (9 out of 16) were either impaired or unimpaired in both tasks, and in the aphasic group 4 out of 10 patients showed a double deficit and 5 others showed reduced performances, resulting in lower perceived distances between extreme stimuli (either male-female or /pa/-/ta/). One possible explanation is that aphasic patients simply did not understand the instructions. Because these same patients could, however, perform other tasks with more complex instructions (e.g. the tone discrimination task used to check their pitch perception -see method), we consider this pattern of results as supportive of the co-optation hypothesis. Indeed, if we accept that there is speech-selectivity in the left fronto-temporal cortex, then we have to conceive that this selectivity can be associated with the mechanisms that produce and perceive the sounds of speech (McGettigan & Scott, 2012). The dissociations observed in 3 left hemisphere patients, also reveals that gender categorization deficits can be observed following left hemisphere lesions, and therefore that gender categorization is processed bilaterally.
In conclusion, through the analysis of categorization performances of right fronto-temporal stroke patients, we showed that the right frontal cortex (likely the ventral part of the IFG) plays a major role in voice/gender information processing. In contrast, left fronto-temporal patients (aphasic or not) tend to show associated performances for both voice/gender perception and speech perception, although dissociation are also possible. Together, these results lend support to the hypothesis of bilateral processing of voice information with (i) a important role of the right frontal cortex in voice categorization and (ii) both common and dedicated mechanisms, in the left hemisphere, for talker and speech information processing.