Create a Wordcloud Plot in Less Than Ten Lines of R Code

Martinqiu
3 min readJul 20, 2021

I underestimated the popularity of wordcloud plots for non-data science individuals, such as my kids, who are eager to impress their elementary school teachers with wordcloud-embedded assignments. That is the motivation for this super short Medium article. I haven’t written anything new in a month. I have been busying with teaching as my big data and marketing analytics (BDMA) course for laurier’s MMA students will end in two weeks.

In just ten lines (counted by RStudio script line index), I show how to create a wordcloud from your chosen “raw” text. The only thing that needs from you is to import your text to R Studio. For the rest, just copy/paste the code to R Studio and execute it.

First, import your text to R Studio. If your text is saved in a text file, you can use readLines function.

my.text= readLines("file_location/file_name.text")

If not, you can copy/paste your text (paragraphs, articles) from a Word file, a webpage, etc., to the line below in R Studio.

my.text= "copy/paste your text here"

We also need to load three packages to do the job.

library("quanteda","tidytext","wordcloud2")

With the text and the packages ready, we first use tokens function from quanteda package to break the text into words. The function tokens indeed gives you more control over fining tuning the tokenization job. Execute ?tokens to learn about those controls (here we control the output by removing punctuation , symbols and numbers).

The function tokens creates a list of words. Through pipe operations we convert the list into a dataframe. setNames function renames the variable that stores individual words as “word”. The following anti_join function removes stop words (e.g., the, a, in, of, this) from the word column. Note that the dataframe stop_words is provided in tidytext package, and it includes 1,149 stop words. You can self-define your own stop words and add them to this dataframe.

We then use summarise and group_by functions jointly create a new variable of word frequency count, and use arrange function to sort the dataframe based on count in a descending order.

word.df=my.text %>%  tokens(remove_punct = T,remove_symbols = T,remove_numbers = T) %>% unlist() %>% as.data.frame() %>% setNames(c("word")) %>%  anti_join(stop_words)%>%  group_by(word)%>%summarise(count=n())%>%arrange(-count)

With the new dataframe word.df created, we use wordcloud2 function in wordcloud2 package to create a wordcloud. There is another R package that also does wordcloud plots, and it’s called wordcloud. I prefer wordcloud2 package over wordcloud package.

word.df%>%wordcloud2(size = .3,color='random-dark',backgroundColor = "white",minRotation = pi/3, maxRotation = pi/2, shape="cardioid",rotateRatio = 0.3)

You can customize your plot by changing the values of those options (size, color, minRotation, shape, etc.)

I copied and pasted the abstract of a recent review article of my doctor friend on early life stress and depression to test-drive the code. And that is where the wordcloud at the beginning comes from.

--

--

Martinqiu

I am a marketing professor and I teach BDMA (big data and marketing analytics) at Lazaridis School of Business, Wilfrid Laurier University, in Waterloo, Canada.