This is a short example-driven introduction to the linguistic concept of grammatical gender. So what's the textbook definition? Most generally, we can say that a language has grammatical gender or a gender system if its nouns can be grouped into classes according to the different agreements they trigger on other words. In the following we'll go over some language data to see what this definition means and which noun classes and types of agreements we find. We start with a rather boring language when it comes to gender systems, English.
- A woman walked past a man. She (he) didn't notice him (her).
- Peter owns a computer, but it doesn't work.
- I own a dog. It (he/she) is 10 years old.
From the first two examples we can infer that English has pronominal gender agreement and we can distinguish three classes. Depending on whether the pronoun refers to woman or man, we get she or he in subject position and her or him in object position. Inanimate entities are referred to with it. This seems in line with our definition, it just so happens that in English, grammatical gender is perfectly aligned with natural gender (with some notable exceptions like ships and certain countries). However, example (3) suggests that pronominal agreement does not depend on the class of the noun, but the entity referred to by the noun and our knowledge about it. If we know the sex of the dog, we may refer to it with a feminine or masculine pronoun, otherwise we use the neuter form (animacy and 'petability' also seem to play a role). We could argue that this kind of agreement does not constitute a formal gender system, because it does not refer to any grammatical property of the noun. Some authors rule out grammatical gender for languages that have only pronominal agreement altogether.
Let's turn to a close relative of the English language that provides some more interesting gender data, German.
- ein kalter Tag // eine kalte Nacht // ein kaltes Herz
- Der Tag ist kalt. // Die Nacht ist kalt. // Das Herz ist kalt.
- der Hund/Mann // die Katze/Frau // das Haus/Mädchen
I only give a simple interlinear gloss with literal word-by-word translations. In (4) and (5) we have three repeating phrases and sentences. In the English translation, only the nouns differ, but in German also other words change. In (4) the article and the adjective have different suffixes, in (5) the article has a different form in each case, but the adjective remains the same. The referents of all three nouns have the same 'natural gender'. Let's generalize: we observe gender agreement in articles (or more broadly determiners) and attributive adjectives (a cold night), but not in predicative adjectives (the night is cold). We are dealing with three noun classes, like in English: Tag is masculine, Nacht is feminine and Herz is neuter, but they do not necessarily align with natural gender. (6) shows again that natural gender and grammatical gender need not align, the grammatical gender of the German noun for girl is neuter (forced by the diminutive suffix -chen), dogs as a species trigger masculine, cats as a species feminine agreement.
Here are some similar examples in French, which distinguishes only two noun classes, masculine and feminine.
- Le jour est froid. // La nuit est froide.
- le chat/chien, // la femme/fille
In (7) we see that in French also predicative adjectives agree with the gender of the noun. The noun classes in (7) match those in the German example, but this is a mere coincidence as (8) shows.
In Russian we again encounter three noun classes like in German, but we observe agreement on yet another part of speech, namely the verb.
- zájac bežál // sobáka bežála // nasekómoje bežálo
Verbs in past tense mark the gender of their subject; in our example, with a feminine subject they receive the suffix -a, with a neuter subject the suffix -o.
All of these examples came from Indo-European languages and while we find agreement in different places (determiners, adjectives, verbs), we see similar patterns when it comes to the noun classes. There are two or three classes, nouns referring to humans are mostly masculine or feminine according to the referent's biological sex, and they are rarely neuter. The fact that we call these noun classes feminine, masculine and neuter is not a coincidence after all. The distribution of nouns with inanimate referents over noun classes on the other hand seems rather arbitrary—the noun refering to sun is masculine in French, feminine in German, and neuter in Russian. McCarthy et al. (2020) investigated the similarity of gender systems across Indo-European languages and Hebrew. For each language they partitioned a concept-aligned lexicon of inanimate nouns from the core vocabulary into noun classes (using the most frequent gender for epicene nouns, nominal word forms that may have more than one gender, to avoid overlaps) and used community detection measures to compute pairwise similarities between languages. From these similarities they were able to construct phylogenetic trees that resemble the tree models posited in historical linguistics (e.g., with Romance and Slavic languages forming subgroups that are closer to each other than to other languages). These results suggest that grammatical gender is a formal grammatical feature that is not determined by extra-linguistic properties of the referents, as hypothesized by Boroditsky and Schmidt (2000).
Nothing in our definition of grammatical gender limits the number of noun classes or refers to biological sex. If we look outside of the Indo-European languages, we find systems with far more classes that cannot be identified with natural gender. The Bantu languages are particularly famous for their many noun classes. One example is Swahili, which has 9 basic noun classes - each class has a singular and a plural form. These classes are often introduced in semantic terms, but a simpler way of looking at them is, again, in terms of agreement. I won't go into any further details about Swahili noun classes, but it's good to keep in mind that the linguistic term gender derives from Latin genus, which translates to kind or sort and does not imply any relation to biological sex, which is obvious if we look at languages like Swahili. Whether grammatical gender influences speakers' mental representations of inanimate objects in languages where the biological sex of human referents aligns with grammatical gender is not entirely clear, see Samuel et al. (2019) for a thorough review of observational/experimental studies and Kann (2019), Williams et al. (2020) for recent corpus-based approaches.
Another feature of Swahili that is worth mentioning is object agreement on verbs (although the status of the agreement marker as inflection or pronoun seems to be somewhat debated). Here are two examples I copied from Seidl & Dimitriadis (1997).
- Ni- li- mv- uliza Helena [...]
- Tuli a-li-ya- amini maneno hayo.
The bold markers agree with the noun class of the object: in (10) Helena, class I; in (11) these words, class VI. A kind of agreement we cannot find in Indo-European languages.
Boroditsky, L., & Schmidt, L. (2000). Sex, syntax, and semantics. Proceedings of the Cognitive Science Society, 22, 43-47.
Kann, K. (2019). Grammatical Gender, Neo-Whorfianism, and Word Embeddings: A Data-Driven Approach to Linguistic Relativity. arXiv preprint arXiv:1910.09729.
McCarthy, A. D., Williams, A., Liu, S., Yarowsky, D., & Cotterell, R. (2020). Measuring the Similarity of Grammatical Gender Systems by Comparing Partitions. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 5664-5675).
Samuel, S., Cole, G., & Eacott, M. J. (2019). Grammatical gender and linguistic relativity: A systematic review. Psychonomic bulletin & review, 26(6), 1767-1786.
Seidl, A., & Dimitriadis, A. (1997). The discourse function of object marking in Swahili. CLS, 33, 17-19.
Williams, A., Cotterell, R., Wolf-Sonkin, L., Blasi, D., & Wallach, H. (2020). On the relationships between the grammatical genders of inanimate nouns and their co-occurring adjectives and verbs. arXiv preprint arXiv:2005.01204.