Disney Princess Name Study
A Statistical Analysis of New Born Baby Naming Trends In Relation To Disney Princess Movie Release Dates
Introduction
This was one of my last papers of college, just now converted into a mediocre blog style post! No promises at the quality of this page. Few cultural items have permeated so deeply into childhood has the Disney Princess. From Snow White to Moana, Disney Princesses seem ever present in the zeitgeist, occupying a role in our lives seemingly agnostic of the time and place the world is in.
From its start in the early 1920s as an animation studio, to its status today as a massive multimedia conglomerate spanning sports, entertainment, theme parks, and more, Disney has impacted popular culture in almost every way— In this paper, I will examine the relationship between Disney Princess names, and the names of newborn babies in the United States.
While there’s fair discussion on who is and who isn’t aDisney Princess, for this exploration, we’ll first consider those with semi-conventional names, who’s modern status comes from the Disney films of which they are a part, and whose movies were released in the last 50 years. That includes Ariel, Belle, Jasmine, Pocahontas, Mulan, Tiana, Rapunzel, Merida, Anna, Elsa, Moana. This means we are excluding Snow White, Cinderella, Aurora (Sleeping Beauty) and Tinkerbell.
Methodology
We’ll be examining the baby naming tendencies over time with consideration to these key dates and landmarks associated with each princess.
To refine my exploration I considered the question to examine to be: Does the release of a Disney princess movie change the naming rates of the respective princess’s name in a set number of years before, and after, the release year.
Data
Data Exploration began with consideration to which data sets would provide me the necessary information in a form conducive to this kind of exploration. Many naming data sets only contained the 2000 most popular names in any given year, but eventually I was able to find the entire US Department of Social Security data set on baby names, including over 32,000 names in many years.
One challenge to noe is that with naming data, when a name has been given between 1 and 4 times in a given year, it’s abstracted to 0 to maintain anonymity. For most names, the counts were orders of magnitudes larger than 4, so this did not present an issue, but in some cases this certainly created uncertainty in data.
After converting the data into an accessible form, I then processed the data to give counts of each occurance of the names Ariel, Belle, Jasmine, Pocahontas, Mulan, Tiana, Rapunzel, Merida, Anna, Elsa, and Moana in each year of social security card data starting in 1979, and ending in 2017. On first glance, this gave some clear trends of names over time— some names didn’t event exist in the years preceding a movie release, and then became quite common. To visualize this information, I plotted name counts vs years for each princess name over the entire data set.
For most names— Ariel, Belle, Jasmine, Tiana, Anna and Elsa. This resulted in charts showing gradual changes in the name over time, with some showing moderate to steep peaks near the movie release date. These plots are shown below.
However, while the above plots are interesting in that a quick glimpse indicates ANOVA may give interesting results, there were a few other plots that actually stood out far more, and some that stood out so much I’ve decided to exclude their names from analysis.
The first interesting plots come by the name Merida (from Disney’s Brave, released in 2012) and the name Mulan (from Mulan, released in 1998)
If you look at the sections before and after the asterisk, we see Mulan and Merida were basically not actually names used until the movie. While the occurrences for mulan are small in general, an is a case where the 1-4 occurrences = 0 challenge may be causing a more drastic visual than reality, Merrida, however goes from having virtually no naming occurances to often over one hundred.
Some names also gave data that wasn’t well able to be processed, and therefore were removed from the study. These were the data sets associated with the names Pocahontas, Rapunzel, and Moana. For rapunzel and Pocahontas, this was simply because so few babies had ever been named rapunzel, that nearly all years were reported as zero. However, oddly, both names saww their first occurrences larger than 4 in 2016 and 2017 respectively, something for which great speculation would be required to explain.
The other name removed from the analysis due to data is Moana. While the plot clearly shows a large spike in names around the movie release time, the movie is so recent (released in 2016) that only one year of data exists for the years after it was released.
Results
To analyze if the release of a Disney princess movie change the naming rates of the respective princess’s name in a set number of years before, and after, the release year, I examined for each name (Ariel, Belle, Jasmine, Tiana, Anna and Elsa) a pair of null and alternative hypotheses, performed 1-Way Analysis of Variation (ANOVA), and calculated descriptive statistics.
The Little Mermaid (Ariel)
For analysis of Ariel, I selected the following hypotheses:
H0: There is not variation in names between the 10 years before and 10 years after
H1: There is variation in names between the 10 years before and 10 years after
The data is summarized by this table
The duration (10 years) of each data group is due to the amount of data preceding ariel (the subset of this data began in 1979). We can see just from the data summary that ANOVA may provide interesting results.
From this we see an extremely low P-Value of 0.00002218743622 (alpha=.05), to which we reject the null hypothesis and say there is variation in occurances of the name Ariel between the 10 years before and 10 years after The Little Mermaid was released.
Belle (Beauty and the Beast)
For analysis of Belle, I selected the following hypotheses:
H0: There is not variation in names between the 12 years before and 12 years after
H1: There is variation in names between the 12 years before and 12 years after
The duration (12 years) of each data group is again due to the amount of data preceding ariel (the subset of this data began in 1979). The ANOVA results again show interesting results.
From this we see an extremely low P-Value of 0.0001033025434 (alpha=.05), to which we reject the null hypothesis and say there is variation in occurances of the name Belle between the 12 years before and 12 years after Beauty and The Beast was released.
Jasmine (Aladdin)
For analysis of Jasmine, I selected the following hypotheses:
H0: There is not variation in names between the 13 years before and 13 years after
H1: There is variation in names between the 13 years before and 13 years after
The duration (13 years) of each data group is again due to the amount of data preceding Aladdin’s release (the subset of this data began in 1979) and the amount of data after (the data ends in 2017). The ANOVA results again show interesting results.
From this we see an extremely low P-Value of 0.00008356544002 (alpha=.05), to which we reject the null hypothesis and say there is variation in occurances of the name Jasmine between the 13 years before and 13 years after Aladdin was released.
Tiana (The Princess and The Frog)