How Women are Represented in Sports Media: Text Analysis in Sports Illustrated

This is a paper I wrote for ECO 227 Economics of Gender and Family class with Dr. Angela Cools at Davidson College


There is a substantial gender pay gap between male and female professional athletes. It is difficult to measure and compare athletes’ earnings in different sports and across leagues, but still, disparities come across. For example, the combined salaries for female soccer players across 81 international teams, over one thousand players, summed to $32.8 million a year. In that same year, one player, Neymar, made $32.9 million. Similar disparities are present in prize money awarded for tournaments. Among the worst differences is in golf, soccer, and basketball. In the US, women’s basketball players are the most well paid out of any female sport — paying on average $74,759 per season. A male basketball player earns on average $7,147,217 per season, which is 96 times more than a female basketball player (Global Sports Salaries Survey 2019). 

The most commonly cited explanation for the wide gender pay gap in professional sports is that there’s a higher demand for men’s sports media coverage than women’s.  Television licenses, event-day ticket sales, and sponsorship deals are the primary sources of revenue for teams and athletes. The traditional argument is that men’s teams have more fans and, therefore, more commercial worth. However, as women’s participation in athletics has grown dramatically in the past 30 years, and along with it, interest in women’s sports is also increasing. A study using international data found that 66% of the general population are interested in at least one women’s sport, and among sports fans, 84% are interested (Neilson 2018). Fans are not all male either; in that same study, 49% of sports fans were female. Despite the demand for coverage of women’s sports, men and male sports still dominate sports media. 

Women’s sports have massive potential for monetization, and the audience they draw is only going to grow. However, women and women’s sports are underrepresented in sports media. Male athletes not only dominated media coverage, but men also occupy most of the jobs in sports media. Men make up 95% of anchors, co-anchors, and analysts (Musto, Cooky, and Messner 2017). Media coverage is essential for women in sports both in the social and occupational domain and has enormous effects on women’s sports. Positive media attention can heighten a player’s prestige, encourages attendance of the game, and helps a player get sponsors for brands. Imbalance in media coverage contributes to the gender pay gap in sports. 

Women are underrepresented in sports television, radio, podcasts, news, and online resources. Unlike funding between men and women’s sports, equal television coverage is not enforced by Title IX. A study of ESPN’s SportsCenter and Fox Sports found that out of 118 hours of sports news, content featured women’s sports less than 1% of the time (Billings and Young 2015). The lack of women portrayed in the media reinforces the stereotypes that sports are a “man’s domain.” This has feedback effects as well. How sports media represent women affects women’s decision to enter professional sports and may explain the gender representation gap in sports. 

How media portrays women also affects the wage gap. Gender differences in news media coverage in sports may reveal women’s status in the world of sports and perpetuate harmful stereotypes. A recent study found that media coverage of women’s sports is more dull and obligatory than coverage of men’s sports. Women’s content contained fewer jokes, compliments and was often offered only as a subsection of segments on men’s sports (Musto, Cooky, and Messner 2017). So-called “Gender-bland sexism” also has feedback effects. Fans of women’s sports might be disappointed by boring content and lose interest leading the news/media outlet to stop making articles on women’s sports. 

Using articles from Sports Illustrated from the past two years, I evaluate which words are the most strong predictors of whether an article is about women in sports or men in sports. I found that women’s sports or topics are covered less, are shorter articles, and contain fewer photos. Words most strongly predicting that a piece is about women’s sports do not explicitly demonstrate overt sexism through language. Still, some words indicate that women are portrayed differently in sports media.

Literature Review 

Researcher Coche gathered articles from ESPN during the Australian Open and analyzed for prominence and relative coverage of men’s and women’s news. She found that out of over 200 stories, 72% were about men’s tennis, and 20% were about women’s tennis. In terms of production value, Coche found no difference between men’s and women’s tennis articles. Women’s tennis articles were no less likely to contain pictures or videos or be featured on the homepage (Coche 2013). 

Alice Wu used text scrapping of private candidate reviews online, extracted female and male classifiers using pronouns, and evaluated the types of words in the post for unwelcome or stereotypical terms. We use a predictive model to see which words best predict the gender of the candidate. We found that words that most strongly predicted female candidates were related to physical appearances such as “hot” or “attractive,” personal or family information, and gender issues. In contrast, words that most strongly predicted male posts included academic and professional-oriented terms. Her evidence indicates that we talk about men and women differently, which has direct economic consequences.

Friesen and Kay examined gender differences in language by coding for specific words. They found that job advertisements for traditionally male-dominated occupations that contain gendered wording may maintain gender inequality. In their experimental study, Kay and Friesen found that males are much more likely to apply to job recruitment materials that used male-biased words such as competitive, dominate, or leader than female candidates. Their results suggest that reinforcing unacknowledged, institutional-level gender stereotypes exacerbates gender divisions of occupations. 


The articles I used to evaluate gender bias in sports media from Sports Illustrated Magazine famous for their annual swimsuit issue displaying women in bikinis. Sports Illustrated has been publishing since 1954 and is an authority magazine in sports media with 2.75 million subscribers. I do not know the gender breakdown of the readers, but based on statistics about fans of sports in the US, we can assume the audience is primarily male. 

I gathered full-text articles from Sports Illustrated Online. I identified an article as a women’s or men’s article based on the frequency of women’s or men’s pronouns in the article. After excluding any article with an equal number of pronouns between men and women, my sample included 630 articles. I gender identified two issues of articles “by hand,” and the predictability of this method was almost identical to my identification. Therefore, I believe this method of gender identification to be sufficient. 


Using a machine learning process, I use a logistic model to find the words that most strongly predict whether the article is about women’s sports or men’s sports. The training data I used were selected at random. I used a lasso-logistic model, which is better for machine learning because it includes a penalty on the best fit line, unlike simple logistic regression, which minimizes least squares. I used 70% of the samples as training data 30% as testing data for selecting the optimal probability threshold. 

After cleaning the text by deleting any special characters, replacing contractions, and removing common filer words such as “and” and “the.” I tokenized each word in the article. I then assign a term frequency-inverse document frequency (TF-IDF) to each word. TF-IDF transformation normalizes each word in the dataset by dividing by the number of documents a word appears in instead  of simply counting the frequency of each word in the document. 

Then I ran a Lasso-Logistic Regression of all the transformed articles. Lasso-logistical models are better for machine learning processing because of their assumed penalty. I add a penalty (Lamda) equal to the absolute value of the magnitude of coefficients. I used an inverse regularization strength of 20 to avoid overfitting. Regularization is the Lambda penalty times the sum of my parameter values. I set the cross-validation generator to 5 folds. I used a liblinear optimizing algorithm which is better for smaller datasets. The results are the coefficient on the words in the testing data, indicating which words most strongly predict whether an article’s topic is female or male. 


 Of the 630 articles about men’s sports or women’s sports, only 113 were about women’s sports — about 18%. This finding is consistent with Coche. Men’s articles are 25.6% more likely to contain a photo than a women’s article, compatible with Musto, Cooky, and Messner. Similarly, women’s pieces are on average about 400 words shorter than men’s articles. In other words, men’s articles are 28% longer than women’s articles.

The overall mean accuracy on the test data for the lasso-logistic regression model was 88.9%. I excluded proper pronouns and words with frequencies less than three. The results of the lasso-logistic regression indicate that the words that most strongly predict if an article is about women’s sports are “tourneys,” “vocalize,” “update,” and “excel.” The table below contains the complete list of words. The top words for men’s articles are “lauded,” “wash,” “abbreviated,” and “observation.” Overall, the top words with any predictability do not seem to have any overt sexist or gender-biased language. 

Because of the small dataset, keywords are not frequently used, but their predictability is significant. For example, “vocalize” has 39.5% predictability and is only used eight times, but seven out of the eight times is in a women’s sports article. Some examples of how “vocalize” is used are “It’s easier to vocalize and be vulnerable with my teammates now”; “I didn’t look at it that way until she vocalized it”; and “we could vocalize the destructive behaviors that permeate girls’ sports.” Newsworthy content on women is about them speaking up, while men’s articles are about their stories and athletic performance. The association of women’s articles and “Vocalize” and the context with which the word is used demonstrates that women’s articles are about women speaking up and taking space, but not about sports. Similarly, “activism” has 17.24% predictability, and “equality” has 14% probability. 

The word “Update” had a 31% predictability. “Update” is somewhat indicative of how women’s sports are talked about in sports media. “Update” typically as a header for non-articles that are either scorecards or game updates. This is a strong predictor for women’s articles indicating that women’s articles are updates on games. There are fewer articles that are stories about specific athletes or teams. This may be an example of “Gender bland Sexism” and corroborates Musto, Cooky, and Messner’s findings.

My results should be taken with a grain of salt considering the small dataset and the low frequency of the resulting keywords. Though I believe that the most strong predictors reveal something of the nature of women’s sports, there are several highly predictive words with no gender bias or connotation, such as “attacker” or “excel,” “glimpse,” or “bearer.” 

The words associated with men’s articles have less predictive power than women’s keywords. I believe this is because the dataset is skewed towards men’s articles. Although one interesting finding is the extremely high predictability of “Lauded”; if the word “Lauded” is included in an article, there is an almost 80% chance the article is about men’s sports. Some examples of how “Lauded” is used include, “After being lauded for his work on the court and off in 2012”, “And they should be lauded for doing so” and “The NCAA made the right choice and should be lauded for it.” 


My results are consistent with other research that women are not represented equally in sports media. Overall, the results from the lasso-logistic model are somewhat inconclusive, though the high predictability of some words gives insight into how women are portrayed in sports media. Women are represented as equalizers and activists but not as athletes. Women’s sports are not talked about or reported on in a journalistic way but are game updates or scorecards. For women’s sports to reach their full potential, the entire industry must create more opportunities for women’s sports to prove their commercial worth. The sports media industry needs to do its part as well. 



Billings, Andrew C., and Brittany D. Young. “Comparing flagship news programs: Women’s sports coverage in ESPN’s SportsCenter and FOX Sports 1’s FOX Sports Live.” Electronic News 9, no. 1 (2015): 3-16.

Coche, Roxane. “Is ESPN really the women’s sports network? A content analysis of ESPN’s Internet coverage of the Australian Open.” Electronic News 7, no. 2 (2013): 72-88.

Gaucher, Danielle, Justin Friesen, and Aaron C. Kay. “Evidence that gendered wording in job advertisements exists and sustains gender inequality.” Journal of personality and social psychology 101, no. 1 (2011): 109.

Musto, Michela, Cheryl Cooky, and Michael A. Messner. ““From Fizzle to Sizzle!” Televised sports news and the production of gender-bland sexism.” Gender & Society 31, no. 5 (2017): 573-596.

Nielsen Sports. “The rise of women’s sports – Identifying and maximising the opportunity, (2018). 

Shor, Eran, Arnout van de Rijt, and Babak Fotouhi. “A large-scale test of gender bias in the media.” Sociological science 6 (2019): 526-550.

Sporting Intelligence. “Global Sports Salaries Survey” (2017). 

Wu, Alice H. “Gendered Language on the Economics Job Market Rumors Forum.” In AEA Papers and Proceedings, vol. 108, (2018):175-79. 

Leave a Comment

Your email address will not be published.