A High Density of Human Communication-Associated Genes in Chromosome 7q31-q36: Differential Expression in Human and Non-Human Primate Cortices

The human brain is distinguished by its remarkable size, high energy consumption, and cognitive abilities compared to all other mammals and non-human primates. However, little is known about what has accelerated brain evolution in the human lineage. One possible explanation is that the appearance of advanced communication skills and language has been a driving force of human brain development. The phenotypic adaptations in brain structure and function which occurred on the way to modern humans may be associated with specific molecular signatures in today’s human genome and/or transcriptome. Genes that have been linked to language, reading, and/or autism spectrum disorders are prime candidates when searching for genes for human-specific communication abilities. The database and genome-wide expression analyses we present here revealed a clustering of such communication-associated genes (COAG) on human chromosomes X and 7, in particular chromosome 7q31-q36. Compared to the rest of the genome, we found a high number of COAG to be differentially expressed in the cortices of humans and non-human primates (chimpanzee, baboon, and/or marmoset). The role of X-linked genes for the development of human-specific cognitive abilities is well known. We now propose that chromosome 7q31-q36 also represents a hot spot for the evolution of human-specific communication abilities. Selective pressure on the T cell receptor beta locus on chromosome 7q34, which plays a pivotal role in the immune system, could have led to rapid dissemination of positive gene variants in hitchhiking COAG.

relative terms. Related to body size, humans are endowed with the largest brain among all mammals. In general, this encephalization is assumed to reflect intelligence and cognitive abilities [Alba, 2010]. Human communication, in particular spoken language, is a major cognitive ability which has its structural basis in the enlarged human brain. At the same time, it may have played an essential role in human brain evolution that has led to a 3 times higher encephalization quotient (EQ) than in chimpanzees [Jerison, 1976].
Having been constant in Australopithecines for at least 2 million years, absolute cranial capacity started to increase about 2-3 million years ago. Homo habilis was the first Homo showing an enhanced EQ in the fossil record [Ruff et al., 1997]. A higher EQ is associated with increased energy consumption. In fact, the adaptive changes to increase human brain metabolism may be at their limits [Khaitovich et al., 2008]. In modern humans the adult brain takes up 20-25% of the total resting metabolic rate (RMR), mostly in form of about 130 g glucose per day [Hitze et al., 2010]. In newborns the brain even requires up to 60% of the RMR [Gibbons, 1998]. This is outstanding, compared to non-human primates with an average RMR of 8-9%. According to the 'expensive tissue hypothesis' [Aiello and Wheeler, 1995], the reduced size of the human gastrointestinal tract, which compares to only 60% of a similar-sized primate, along with a dietary shift from low to high energy-dense food, makes good for the high energy demand of the human brain. Consumption of animal meat, as proven by cut marks of stone tools on fossilized bones, may have resulted in a higher dietary quality and supplied early humans with fatty acids for optimum brain metabolism [Cordain et al., 2001]. It is generally accepted that the increase in brain size, which took place somewhere along the way from late Australopithecines to the early Homo , was largely influenced by bipedalism and remodeling of the jaw apparatus that led to changed feeding habits and higher energy turnover [Jones et al., 2000].
Little is known about what has pushed the increase in brain size in the human lineage. We propose a 2-step process with an initial spark causing selective pressure on a critical region in the genome, followed by adaptations for fine adjustments. It is reasonable to assume that the dramatic evolutionary changes in human brain structure and function are still reflected in today's human genome. There are numerous studies on (positive) selection of brain-specific genes which are usually defined as genes only expressed in the brain and/or the nervous systems [Enard et al., 2002]. The results do not provide evidence for positive selection in the human lineage [Kosiol et al., 2008] or even show an excess of positively selected genes in the chimpanzee [Shi et al., 2006]. Thus, changes in brain-specific genes cannot satisfactorily explain the mechanism underlying human brain evolution. As outlined by Jobling et al. [2004], one plausible approach towards identification of the genetic changes underlying human evolution is 'defining important human specific phenotypes (e.g., language, disease phenotypes) and researching their genetic bases'.
Any intentional thinking of human beings is structured and influenced through language [Lakatos and Janka, 2008]. If language acquisition is hindered, i.e., in untreated children with congenital deafness, mental development is impaired. On the other hand, once language is acquired, even deaf, mute, and/or blind people can communicate with artificial aids. Language, reading, and autism spectrum disorders may help to identify a group of communication-associated genes (COAG). Autism spectrum disorders are characterized by impairment of social interaction, communication, and language development [Tuchmann, 2003;Scherer and Dawson, 2011]. In this study, we performed in silico analyses on a panel of COAG as well as comparative gene expression analyses in frontal cortices of human and non-human primates. Our goal was to identify possible evolutionary mechanisms which may have contributed to the development of human-specific communication abilities.
Data mining including searches for information on genes were performed with BioMart in the Ensembl database (Ensembl genome browser release 60, November 2010). Chromosomal syntenies were delineated with Compare Genome (http://genomevolution.org/CoGe/index.pl). Biological gene functions were assigned with the Panther database (http://www.pantherdb.org). Both COAG and the entire human Ensembl gene set were analyzed using the Panther batch search tool. Statistical analyses were performed with Fisher's exact test for count data.
Brain Samples, RNA Isolation, and cDNA Synthesis Human brain samples (excess material from autopsies) were obtained from the Department of Legal Medicine, University Medical Center Mainz, Germany and primate samples from the Biomedical Primate Research Centre, Rijswijk, The Netherlands and the German Primate Center, Göttingen. Male brain samples were prepared between 1-2 days post-mortem from 3 humans (Homo sapiens) , 1 chimpanzee (Pan troglodytes) , 3 baboons (Papio hamadryas) (Old World monkey), and 5 marmosets (Callithrix jacchus) (New World monkey). The human brains were from two 40-year-old victims of an accident and a 59-year-old male who died of acute heart failure. The chimpanzee brain was from a 14-year-old (late adolescent) animal which died of hemolytic anemia. The baboon brains came from two 9-year-old (subadult) animals and one 30-year-old (adult) animal which were terminated for experimental brain surgery. The marmoset brains came from five 2-year-old (adult) animals which were terminated for this study. Age classes of chimpanzee [Goodall, 1983], baboons [Sigg et al., 1982], and marmosets [Aroujo et al., 2000] were determined according to established standards.
Because of similar brain architecture in humans and great apes [Semendeferi et al., 2001;Sherwood et al., 2003], area A10 was excised from the frontal pole of human and chimpanzee brains. The baboon and marmoset samples were taken from the corresponding topological region. Frontal cortex tissue was immediately frozen and stored at -80 ° C after dissection. Total RNA was isolated using TRIzol Reagent (Invitrogen, Darmstadt, Germany). The absorbance ratio at 260 and 280 nm was determined with a Nanodrop spectrophotometer (Thermo Scientific, Wilmington, Del., USA). With one single exception, the measured ratios were between 1.92 and 2.04, indicative of pure high-quality RNA. RNA was amplified with the Illumina TotalPrep RNA Amplification Kit (Ambion, Austin, Tex., USA). T7 promoter containing cDNA was produced by reverse transcription of 400 ng total RNA each. This cDNA was subsequently transcribed into cRNA in the presence of biotinylated nucleotides.

Array Hybridization and Analysis
The labeled cDNA samples of different individuals from the same species (3 humans, 1 chimpanzee, 3 baboons, and 5 marmosets) were pooled. The 4 cDNA pools were hybridized under the same stringency conditions to 4 (of the 6) arrays on a Sentrix Hu-man-6 Expression BeadChip (Illumina, San Diego, Calif., USA) that contain probes for 1 48.000 transcripts. Following washing, the arrays were stained with Cy3-streptavidin and scanned with an Illumina BeadStation 500. The Illumina BeadStudio software was used for data analysis. The 'rank invariant normalization' algorithm was used for normalization of the data. Only genes passing the quality thresholds of diffScores of lower than -13 and higher than +13 with detection p values ! 0.05 were considered. A positive diffScore represents upregulation, while a negative diff-Score represents downregulation.
When non-human primate samples are hybridized on a human array, sequence mismatches between the primate sample and the human oligonucleotide sequences on the array may reduce the signal intensity and, thus, be misinterpreted as reduced gene expression [Dannemann et al., 2009]. In order to avoid this mismatch problem, we only considered genes that appeared to be upregulated in baboon and marmoset compared to humans. It is very unlikely that stronger hybridization of the primate sample to a human array is due to sequence divergence. Because of high conservation of coding sequences between humans and chimpanzees [Chimpanzee Sequencing and Analysis Consortium, 2005;Varki and Altheide, 2005], both upregulated and downregulated genes were analyzed in this species.

Distribution of COAG in the Human Genome
We performed an extensive literature and database research to select a panel of 244 communication-associated genes ( table 1 ). Eight COAG were associated with language disorders, 4 with dyslexia, 2 with schizophrenia and language impairment, and 230 with autism spectrum disorders (ASD). This asymmetry reflects the enormous research efforts on autism and, on the other hand, our so far limited knowledge on the genetic basis of language development. In a first step we assigned biological functions to the 244 COAG and all genes (23,921) with Chromosome Genes (localization) 22 There were no statistically significant between-group differences ( fig. 1 ).
We then compared the chromosomal distribution of the 244 COAG with that of all Ensembl genes with chromosomal assignments ( table 2 ). The expected number of COAG on a particular chromosome was calculated based on the ratio of the number of Ensembl genes on this chromosome divided by the number (23,921) of all Ensembl genes. Using Fisher's exact test, COAG were considered to be enriched/depleted on a particular chromosome when the observed number of COAG was significantly higher/lower than the expected number. Chromosomes 7 with 33 COAG (p = 0.0001, 95% CI 1.89-3.86) and X with 35 COAG (p = 0.003, 95% CI 1.16-2.32) were enriched with COAG, whereas chromosomes 1 (15 genes; p = 0.004, 95% CI 0.26-0.77) and 19 (3 genes; p = 0.003, 95% CI 0.04-0.58) contained less than expected COAG. When applying strict Bonferroni adjustment to correct for 24-fold parallel testing, the 5% criterion of significance has to be lowered accordingly to a p value of 0.002 being significant. Thus, even after correcting for multiple testing, the enrichment of chromosome 7 with COAG remained significant. In this context it is worth emphasizing that 17 of the 33 COAG on chromosome 7, including FOXP2 and CNTNAP2 , were clustered in 7q31-q36. There is no between-group difference in gene functions.

Expression Differences in Human and Non-Human
Primate Cortices COAG expression in 3 human, 1 chimpanzee, 3 baboon, and 5 marmoset cortex samples was analyzed with an Illumina Expression BeadChip. In total 24,385 transcripts produced detectable hybridization signals on this array. Our first goal was to identify genes with higher expression levels in primate cortices than in the human cortex. This largely excludes interspecific hybridization artifacts due to sequence divergence. Compared to humans, 1,489 genes appeared to be upregulated in chimpanzee cortex, 3,893 in baboon cortex, and 4,659 in marmoset cortex ( fig. 2 ). Seventy-seven (2.0%) of 3,893 upregulated genes in baboon and 79 (1.7%) of 4,659 in marmoset belong to the COAG group, implying that the set of genes that is upregulated in Old and New World monkeys is significantly (p ! 0.001, 95% CI 1.55-2.47 in baboon and 1.33-2.09 in marmoset) enriched with COAG. In contrast, only 11 COAG showed higher expression levels in the chimpanzee than in the human brain.
For the identification of genes that are upregulated in the human cortex, we only relied on the human-chimpanzee comparison. Due to the high degree of coding sequence conservation between humans and chimpanzees, most detected interspecies hybridization differences can be assumed to reflect true expression differences. Altogether, we found 2,750 genes with higher expression levels in human than in the chimpanzee cortex. A conceptually related study [Nowick et al., 2009] revealed 2,182 upregulated genes in the human cortex. In sum, 508 genes were identified in both studies. Fifty-nine (2.2%) of the 2,750 human upregulated genes in our study represent COAG, which is a significant enrichment (p ! 0.001, 95% CI 1.87-3.17).
Because we were mainly interested in genes with conserved expression in non-human primates and humanspecific up-or downregulation, we filtered out genes showing opposite expression changes in different non-human primates. For example, GABRB3 showed higher expression levels in baboon and marmoset, but a reduced expression level in chimpanzee, compared to humans. Table 3 presents the remaining 23 COAG with human-specific upregulation and 38 genes with human-specific  downregulation. Of the latter group, 34 genes showed increased expression levels in baboon and marmoset cortex, 2 in chimpanzee and marmoset cortex, and 4 in chimpanzee and baboon cortex. One gene encoding the sodium hydrogen exchanger, SLC9A9 , was more highly expressed in all 3 non-human primates compared to humans. When only considering COAG which are differentially regulated in human and non-human primate cortices, chromosome 7 also shows a highly significant (p ! 0.001) enrichment. Six (CNTNAP2 , DPP6 , IMMP2L , LAMB1 , LRRN3 , and UBE2H) of 23 genes with human-specific upregulation and 6 (AUTS2 , CADPS2 , CTTNBP2 , FASTK , GRM8 , and ST7) of 38 genes with human-specific downregulation are located on chromosome 7.

Chromosome 7q31-q36
Both in silico and expression analyses identified chromosome region 7q31-q36 as a hot spot for COAG. It contains 17 (7%) of 244 COAG, including the 2 most prominent language genes FOXP2 and CNTNAP2 ( fig. 3 ). FOXP2 and CNTNAP2 are not only physically but also functionally linked [Vernes et al., 2008]. Five COAG on 7q31-q36, namely ST7 , CTTNBP2 , CADPS2 , GRM8 , and FASTK , were downregulated in the human cortex compared to non-human primates. Three genes, LRRN3 , IMMPL2 , and CNTNAP2 , were upregulated in the human cortex compared to chimpanzee. Thus, 8 (13%) of our 61 top candidate genes for human-specific regulation ( table 2 ) are clustered in this region which represents only approximately 1.5% of the entire genome. Consistent with the results of Nowick et al. [2009], FOXP2 was not differentially regulated in the human and non-human primate cortex. In addition to the COAG cluster, 7q31-q36 contains the T cell receptor beta locus, which is important for T cell activation in response to specific antigens.

Discussion
There are many hypotheses about the evolution of language and the human brain. One widely accepted theory claims that language was the main driving force for human brain evolution [Jerison, 1976]. On the other hand, even non-human primates appear to exhibit a predisposition to language [Cooper, 2006]. Thus, the genetic basis of language may have arisen long before language appeared [Chater et al., 2009]. Two or 3 million years ago, an Australopithecine species may have acquired communication abilities which were still far away from spoken language but already distinct from ape-like communication. This predisposition may have paved the way for the rapid evolution of a large brain and spoken language. Because it is not possible to analyze the genomes of Australopithecines, inferential methods, i.e., comparisons between human and non-human primate species, must be used to identify molecular signatures of language evolution in today's human genome and transcriptome.
First, we delineated a set of COAG and looked for their biological functions. Because language is a very complex trait, it is not unexpected that COAG are involved in many different biological processes, and there is no obvious enrichment for genes with a particular function compared to the entire human gene catalogue. Probably due to the high energy requirements of the human brain [Peters and Langemann, 2009], a large proportion of both COAG (16%) and all genes (21%) is associated with metabolic processes. There is a significant enrichment of COAG on chromosomes 7 (33 genes) and X (35 genes). The latter is not unexpected, because the X chromosome has accumulated a disproportionate number of genes for mental functioning and social cognition [Zechner et al., 2001;Skuse, 2005]. With the exception of RPL10 , which

C N T N A P 2
TRB Chromosome 7q31-q36 Fig. 3. Enrichment of chromosome 7q31-q36 with 17 COAG. Genes with similar expression levels in human and primate cortices are indicated by black circles, genes that are downregulated in human cortex (compared to chimpanzee, baboon, and/or marmoset) are indicated by blue circles, and genes that are upregulated in human cortex (compared to chimpanzee) by red circles. Gene names are written below the circles. FOXP2 binding to CNTNAP2 (indicated by an arrow) leads to downregulation of CNTNAP2 . TRB (indicated by a green box) represents the T cell receptor beta locus. resides on chimpanzee chromosome 19, 34 of 35 human X-linked COAG are located on the orthologous primate X chromosomes. The hypothetical last common ancestor of primates was endowed with 2 human chromosome 7 homologs, called 7a (syntenic to 7p21-q11.21, 7q11.23-q21.3, and 7q22.1-qter) and 7b (syntenic to 7pter-p22, 7q11.21-q11.23, and 7q21.3-q22.1). A centric fusion of 7a and 7b in a common ancestor of recent great apes and humans generated the ancestral chromosome 7 of Hominidae [Müller et al., 2004;Froenicke, 2005]. This implies that the 33 COAG on human chromosome 7 have travelled together on the same chromosome for at least 15-20 million years (split estimate of humans and great apes) [Goodman et al., 1998;Enard and Pääbo, 2004]. The cluster of 17 (7%) COAG on human chromosome 7q31-q36 was already present on the ancestral primate chromosome 7a more than 80 million years ago, although the gene order on this chromosome may have been extensively reshuffled in the course of evolution. We conclude that highly conserved arrays of genes were selected during primate evolution, long before these genes were recruited for advanced communication abilities. Similar to transcriptional operons in bacteria, functionally cooperating genes also tend to cluster in higher eukaryotic genomes. This clustering could facilitate their coordinated regulation, i.e., by reducing the expenditure of chromatin unpackaging for transcription [Lee and Sonnhammer, 2003]. In this light, it is tempting to speculate that coexpression and fine-regulation of the 7q31-q36 COAG cluster in the human cortex has been important for the acquisition of human-specific communication abilities and language. Indeed, at least 2 genes in this cluster are known to functionally interact. The transcription factor FOXP2 binds to specific sites in the first intron of CNTNAP2 and downregulates it [Vernes et al., 2008].
The extremely high similarity of the human and chimpanzee genomes [Chimpanzee Sequencing and Analysis Consortium, 2005;Varki and Altheide, 2005] argues in favor of the notion that phenotypic differences between humans and chimpanzees are based on differences in gene regulation. Indeed, comparative transcriptome analyses revealed subsets of genes with different expression levels in human and chimpanzee brains [Caceres et al., 2003;Gilad et al., 2006;Khaitovich et al., 2006;Nowick et al., 2009;Somel et al., 2009]. In this study, we used human expression arrays to compare the COAG expression profiles in human, chimpanzee, baboon, and marmoset cortices. About 12 million years of evolution can be assumed for the human-chimpanzee, 45 million years for the human-baboon, and 80 million years for the hu-man-marmoset pairwise comparisons [Goodman et al., 1998;Enard and Pääbo, 2004]. One of our main interests was to find genes with similar expression levels in different non-human primates, including chimpanzee and Old and New World monkeys, but changed expression levels in the human brain. When using a human array to measure differential gene expression across primate species, probe sequence mismatches due to sequence divergence render the identification of genes that are downregulated in Old and New World monkeys difficult. In contrast to conceptually related studies [e.g., Somel et al., 2009], we did not mask all probes on the array that did not perfectly match the DNA sequences of the species examined. The Illumina bead technology used in our study is based on rather long oligonucleotides compared to other platforms. Consequently, there are relatively few oligonucleotides without any sequence mismatch in all 4 analyzed species. On the other hand, long oligonucleotides are more robust against hybridization artifacts. Because sequence mismatches are unlikely to improve hybridization efficiency, we focused on genes with higher expression levels in non-human primates. Another unfavorable factor for this type of studies is the limited availability of primate brain samples for high-quality RNA preparation. This makes it difficult to minimize stochastic effects. Previous studies demonstrated considerable intraspecific variation in epigenetic gene regulation both in human and non-human primate brains [Farcas et al., 2009]. Although we have used only male brain samples from subadult and adult individuals, it is unrealizable to control for all possible confounding factors.
A large proportion (20%) of the 244 analyzed COAG was downregulated in the human cortex compared to both baboon and marmoset. Relatively few (5%) COAG were downregulated in humans compared to chimpanzee, and only 1 gene, SLC9A9 , was downregulated in humans compared to all 3 analyzed non-human primates. This may be partially explained by the fact that the transcriptomes of humans and chimpanzee are more similar to each other than those of humans and Old or New World monkeys, respectively. On the other hand, only 1 chimpanzee brain was available for expression analyses, whereas 3 baboons and 5 marmosets could be analyzed. In contrast to Old and New World monkeys, the coding sequence between humans and chimpanzee is highly conserved exhibiting approximately 1 mismatch in 100 bp [Chimpanzee Sequencing and Analysis Consortium, 2005] which allows one to identify genes that are downregulated in chimpanzee using a human array. In silico analyses confirmed that for the vast majority of chim-panzee downregulated genes the 50mer oligonucleotides on the array exhibit either no or a single mismatch which is unlikely to interfere with hybridization efficiency. Altogether, 59 (25%) of 244 COAG appeared to be downregulated in the chimpanzee cortex compared to humans. However, as outlined above, results based on 1 cortex sample have to be interpreted with caution. Nevertheless, our study clearly demonstrates an enrichment of COAG among all genes that are differentially regulated in human and non-human primate cortices.
Relying mostly on autism-related genes as source of COAG, the selected set includes genes that may be involved in many sorts of communication, not only language. In addition to impaired language development, autistic persons may exhibit stereotypic, ritualistic, and compulsive behaviors, limited social interactions as well as sensory problems [Tuchmann, 2003;Scherer and Dawson, 2011]. Evidently, non-human primates do not have language, but they cannot afford to indulge in repetitive or self-injurious behaviors. So far there is no general agreement on what is unique to human language. Tomasello [2008] considers a 'theory of mind' as basis for the faculty of language. Advanced (intentional rather than reflective) communication abilities in humans require assessment of the mental states of others. This ability is severely impaired in autistic persons, which makes ASD genes prime candidates for COAG. Although some genes, for example HLA-A and HLA-DBR , that have been associated with ASD are unlikely to be related to communication, we did not eliminate these genes from our COAG set to avoid any bias in selection.
Several observations argue in favor of the notion that human chromosome 7q31-q36 may represent a hot spot for the evolution of advanced communication abilities. Firstly, this region is enriched with COAG that show human-specific upregulation (LRRN3 , IMMPL2, and CNTNAP2) or downregulation (ST7 , CTTNBP2 , CADPS2 , GRM8 , and FASTK) . The FOXP2-CNTNAP2 pathway has been related to speech and language development as well as different neurodevelopmental disorders, including Gilles de la Tourette syndrome, schizophrenia, autism, and attention deficit hyperactivity disorder [Fisher and Scharff, 2009;Newbury and Monaco, 2010]. Secondly, this region contains the T cell receptor beta locus which plays an essential role in the immune system. In T cells, antigen receptor genes are assembled by V(D)J recombination from germline V, D, and J segments, creating much of the diversity of the immune system [Bonnet et al., 2009]. Genes involved in immune defense are targets of rapid evolution, most likely because they have a direct impact on the organisms' fitness in the battle with pathogens [Kosiol et al., 2008]. The T cell receptor beta locus could have conferred selective pressure on a haplotype with favorable COAG variants. Australopithecines most likely improved the protein content of their diet by scavenging, increasing the risk for infections through exposure to rotten corpses [Teaford and Ungar, 2000]. Epidemic diseases may have dramatically reduced the number of breeding individuals (N e ) from time to time [Wright, 1931]. Consistent with a bottleneck effect [Nei et al., 1975], individuals with an immune system fitting new challenges had a higher chance to survive and to transmit hitchhiking positive COAG variants into next generations. Such selective pressure could have rapidly increased the frequency of a COAG haplotype predisposing to language acquisition and/or brain development. Future bioinformatic analyses estimating the age of 7q31-q36 haplotypes may help to elucidate the time window when the favorable COAG haplotype arose during human evolution.