Creole Language Evolution in an Agent-Based Model

This is an academic paper for Computational Economics Course at Davidson College with Dr. Shyam Gouri Suresh. Written November 12th 2021. This paper was a group project with Clay Tribus ’22 and Sam Cascio ’22.

  1. Introduction

Creole languages are a type of language that emerges when groups of people who do not share a common language result are required to communicate (Siegel 2008). For example, slaves and slave-owners on a plantation need to communicate in order to work, but come from diverse origins without any mutual vocabulary. Characteristically, creole languages form primarily from the lexicon of one of the languages in the environment, usually the vernacular of the group that controls the region where the contact occurs (Satterfield 2008). These contact languages develop as individuals begin using their own ways of communicating, either with words and phrases they hear from the dominant group or that they think will be known by others. If the groups of disparate origins remain in frequent contact and if several members of the subordinate group start to regularly use the proto-language, called a pidgin, a new language results (Siegel 2008).

We wanted to analyze how different network structures affect the creation of Creole Languages among agents. Pidgin and Creole languages form in different ways and for different functions. Some pidgin continues to be learned and passed down as a secondary language and used only as necessary (Siegel 2008). Other times, a stable pidgin becomes so widely and popularly used, some Creole languages are now recognized as the official language of particular political territories (Pieter 2016). This analysis aims to utilize different network models, like global and small-world structures, to measure language development under the conditions of creole formation. The remainder of this paper is organized as follows. Section II presents a description of the model’s structure and an explanation of the simulation code. Section III explores the results of our analysis. Section IV discusses the implications of these results and presents ideas for further research.

  1. Description

We constructed artificial societies to examine the development of language under different network compositions and power dynamics. Agents interact based on their adjacency in a network, and we track linguistic transmissions and developments. We use a simplified adaptive strategy for language acquisition where convergent language emerges from the simple interactions of many individual elements. Our study aims to model the basic demographic and power dynamics of a generic slave-ownership scenario and compare language formation under different network compositions.  

To model the demographic and socio-cultural features of artificial societies, we use a set of initializing factors that specify the experimental environment: numAgents, numGroup, domGroup, ratioDomToNot, networkType, and numRounds. In all our test cases, the number of agents is 54, the number of groups is 3, and the number of rounds is 30. domGroup is a boolean that indicates whether dominance is present or not in the network. If true, then the first group will have superiority over the other two groups — the first group represents the slave-owners and the other two groups represent slaves who come from two different ethno-graphic backgrounds. We will discuss how dominance affects language acquisition below. ratioDomToNot is a value between 0 and 1 that indicates the ratio of the population in the dominant group. In all our trials, the proportion of dominance is 5%. 

Our model structure’s foundation is built off of the Trade Networks model (Wilhite, 2001). This model uses different network structures, i.e., crossovers and overlaps, to measure the interaction between agents and clustering groups. Variable networkType is an integer from 1 to 4 that specifies which network type the agents will be organized into. Appendix A displays the different network compositions. Network type 1 models an independent network with three groups that interact internally but not externally with other groups due to no crossovers or overlaps. We use this dispersed network as a control to observe the effect of connected networks. Network type 2 models a global network where there is one group containing all of the agents. Network type 3 is a small-world network with one overlap for each end agent of the groups. Network type 4 also models a small-world network but with no overlaps and ten crossovers amongst the groups.

To initialize the languages within the simulation, the function makeDictionary creates a unique list of words for each agent. The words in each dictionary are four letters and are made up of combinations of consonants and vowels with the pattern CVCV. Because we want each group to start with some commonality of language, we initialize two arrays — freqConsonants and freqVowels for each group that gives a list of weights on each consonant and vowel in the alphabet. The words for each agent are randomly generated using the weights of the group. 

Agents’ dictionaries are randomly constructed, so when agents come in contact with mutually unintelligible languages, they must understand what words mean. One possible outcome for a simple model of language exchange in which dominance is not present might yield results such as those below:

Agent A’s word for sun: “zeze” Agent B’s word for sun: “tiko” Agent C’s word for sun: “zeto”Agent A and Agent B have a conversation and use the word sun, so they say “zeze” and “tiko.” The agents realize they are both trying to say “sun” but have drastically different words to describe “sun.” Thus, Agent A and Agent B will both add each other’s words to their short-term memory. 

Agent A’s word for sun: “zeze”. Short term memory = {“tiko”}
Agent B’s word for sun: “tiko”. Short term memory = {“zeze”}
Agent C’s word for sun: “zeto”


Agent A and Agent C have a conversation and say “zeze” and “zeto” to each other. They realize that the words they use for sun are remarkably similar, only two letters different. One agent in the conversation will change their word for sun just slightly and form a morph-word. The agent is randomly chosen. Suppose Agent B changes their word to “zezo.”  

Agent A’s word for sun: “zeze”. Short term memory = {“tiko”}
Agent B’s word for sun: “tiko”. Short term memory = {“zeze”}
Agent C’s word for sun: “zezo”


Agent B and Agent C have a conversation and say “tiko” and “zezo” to each other. These words are not very similar, but Agent B tries a different word for sun in their short-term memory to see if agent C knows it. So, Agent C says “zeze” which is very similar to Agent B’s word with only one letter difference. Since they are so similar, one of the agents will simply adopt the word of the other. Suppose Agent B adopts Agent C’s word. 

Agent A’s word for sun: “zeze”. Short term memory = {“tiko”}
Agent B’s word for sun: “zezo”. Short term memory = {“zeze”}
Agent C’s word for sun: “zezo”

Notice in this simple example with just three agents, they quickly converge to have similar words for the sun. Thus this is how language develops. Another way agents update their vocabulary that is not demonstrated above is if they encounter a word enough times. If an agent hears the same word three times, they will change their word to match. Note that if an agent has terrible short-term memory and can only remember one word, they will not acquire language in this way. Agents’ short-term memory is not infinite but is limited to five. Each individual has a random capability for short-term memory between 1 and 5, the random variation in memory represents learning differences between individuals. 

When one group is dominant, then the process of language exchanges differs just slightly. One difference is that a dominant group will not put any word from a subordinate agent into short-term memory, create a morph-word with a subordinate, or take the subordinate agent’s word into their dictionary. We assume that dominant agents will not adopt the language of a subordinate group at all, so when a subordinate and dominant agent exchange, the only dictionary that might change is one of the subordinates.

In every round, one agent is chosen at random, that agent will interact with everyone in their network. This process repeats until every agent has travesered their network. If we run the simulation for a very long time, then eventually all dictionaries will converge. We find that 30 periods is sufficient to observe convergence in our model. We repeat trials under different conditions 10 times. 

  1. Results

Our model is simulated repeatedly under different networks and with and without a dominant group present to determine the effect of dominance on language. Because the model is emergent and not predetermined, agents might end up with drastically different vocabularies. Thus, we test whether languages converge by measuring the similarity between the agent’s dictionaries and other agents’ at the end of the simulation and call the outcome “score.”  We use four network types when measuring the effect of network structures on our observations. 

Table 1: Summary Of Results

In Table 1, column 3 and 4 shows data averaging the score between agents for each network type. The highest average ending score value is the separated network, which we interpret due to lack of interaction within this network structure leading to low convergence of agents and their dictionaries. On the other hand, the lowest end average score is the global network that we interpret due to the abundance of agent interaction within one central group. Additionally, small world structures with overlaps and crossovers show some convergence within their network structures, more language convergence visible in the small world overlap network with dominance.

Table 1 also shows the average score of agents in networks with dominance present and with no dominance present. This excludes the type 2 network model or the global network because only one group is present. Comparing the scores in column 3 and 4 for small-world overlap demonstrates that with dominance there is more language convergence — the convergence score is smaller and less unique languages form. Furthermore, the average number of words that members from group 2 or group 3 acquire from group 1 when dominance is present is much higher than the words taken from other groups and almost twice the average of all the groups. Thus, the dominant group had a significant impact on the subordinate populations’ lexicon. This displays characteristics of pidgin formation. Thus, the results of the small-world overlap network may give insight to the social dynamics of cerole creation. Namely there may well have been key actors who were bridges between groups that were required to communicate with the dominant group and relay to their network. Such key actors undertake the lexicon of the dominant group and  spread it to the greater subordinate population.

On the other hand, for small-world crossover networks with dominance have a higher average score than those without, indicating less convergence between agents and dictionaries. Interesting to note is that for small-world cross over networks when dominance is present the number of words from the dominant group acquired by the subordinate groups is much smaller, and there are much more interactions and exchanges between the two subordinate groups than in small-world overlay networks. This result is somewhat surprising because the dominant group is more connected to the other groups in this network (see Appendix A). However, by the end of the simulation there ends up being more than 3 unique dictionaries on average. Considering that the most dominant social group that speaks the creole’s parent language typically has more influence in the outcome (Pieter 2016), it is unlikely that this type of network structure accurately resembles the conditions for creole development. 

Table 1 column 4 and 5 describes the number of changes each group makes to their dictionary throughout the four trials. It is clear from earlier results that the global network structure endures the most interactions due to the absence of dominance and global connections. Unsurprisingly, it proved to have the most changes amongst the trials as well. However, Table 1 also uncovers insights relating to the dominance structure of the groups within these network types. Firstly, dominant groups show to have fewer dictionary changes than non-dominant groups. This trend is consistent with the underlying theory and structure of the model. Part of this result is owed to the fact that the dominant group is smaller, but even still when comparing the disconnected network with dominance to the small-world connected networks the subordinate groups change their dictionary much more often than the dominant group.

Figure 2: Language Convergence Under Different Network and Power Compositions

Figure 2 demonstrates language convergence over the course of the simulation. Notice that when dominance is not present or in dispersed groups, the vocabularies converge nicely into one language smoothly in a gradual transition over the periods. On the other hand, when dominance is present even though the dominant group is only 5% of the population, language does converge by the end of the simulation as indicated by the number of unique dictionaries in Table 1, but the process of convergence is much more volatile. This pattern is particularly present in the small-world with crossover. For the subordinate groups, their language seems to flip-flop. In Table 1, notice that the average number of word changes for group two and three is ten times greater than for group one indicating that these groups are changing their lexicon much more frequently. In addition, the number of words acquired from group one is much higher than the other two groups, and compared to the small-world crossover simulation with no dominance. 

The results from our simple language evolution simulation give insight into the creation of cerole languages. In a very short period of time two larger, much more dominant groups with their own language will adopt the words of a dominant group simply as a result of the dominant group refusing to adopt any words from a subordinate group and under the pressure and ability to learn words by a subordinate group in order to communicate basic information. Our results indicate that the more likely structure for cerole formation is under a network with overlapping agents from the various groups. 

  1. Conclusion 

Given our model results, we can conclude that different network structures affect the creation of Creole languages. Convergence is strong amongst global network structures, minimally present and lacking amongst small-world networks with no overlap, and noticeable in small-world network structures with overlaps and crossovers. Dominant networks tend to have higher scores, leading to less convergence with other networks. Additionally, the dominance status of a group can give insights into how much their dictionary changes when conversing with other languages. 

We model language development over a very short time and only model the interactions between agents in a static network. In reality, language develops over multiple generations. In addition, it is well known that children are better able to learn new languages quickly. An extension of this model would include the generational transfer of language that allows children to learn multiple languages quickly. 

  1. References 

Eckardt, R., Jäger, G., & Veenstra, T. (Eds.). (2008). Variation, selection, development: Probing the evolutionary model of language change. De Gruyter, Inc.

Nowak, Martin A & Komarova, Natalia L. (July, 2001). Towards an evolutionary theory of language. Trends in Cognitive Sciences (1364-6613). https://www.sciencedirect.com/science/article/pii/S1364661300016831 

Satterfield, Teresa. “Back To Nature Or Nurture: Using Computer Models In Creole Genesis.” In Variation, Selection, Development, pp. 143-178. De Gruyter Mouton, 2008.

Siegel, Jeff. The Emergence of Pidgin and Creole Languages. Oxford ;: Oxford University Press, 2008.

Wilhite, Allen. (2001). Bilateral Trade and ‘Small-World’ Networks. Computational Economics. Kluwer Academic Publishers (49-64).

Muysken, Pieter. “Creole Languages.” Oxford Research Encyclopedia of Linguistics, 9 June 2016, https://oxfordre.com/linguistics/view/10.1093/acrefore/9780199384655.001.0001/acrefore-9780199384655-e-68?mediaType=Article.

Appendix A: Network Types

Leave a Comment

Your email address will not be published. Required fields are marked *

css.php