- Hands-On Exploratory Data Analysis with Python
- Suresh Kumar Mukhiya Usman Ahmed
- 159字
- 2021-06-24 16:44:58
Most frequently used words
One of the easiest things to analyze about your emails is the most frequently used words. We can create a word cloud to see the most frequently used words. Let's first remove the archived emails:
from wordcloud import WordCloud
df_no_arxiv = dfs[dfs['from'] != 'no-reply@arXiv.org']
text = ' '.join(map(str, sent['subject'].values))
Next, let's plot the word cloud:
stopwords = ['Re', 'Fwd', '3A_']
wrd = WordCloud(width=700, height=480, margin=0, collocations=False)
for sw in stopwords:
wrd.stopwords.add(sw)
wordcloud = wrd.generate(text)
plt.figure(figsize=(25,15))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.margins(x=0, y=0)
I added some extra stop words to filter out from the graph. The output for me is as follows:
This tells me what I mostly communicate about. From the analysis of emails from 2011 to 2019, the most frequently used words are new, site, project, Data, WordPress, and website. This is really good, right? What is presented in this chapter is just a starting point. You can take this further in several other directions.
推薦閱讀
- TypeScript Blueprints
- Visual C++串口通信開發入門與編程實踐
- Android Studio Essentials
- 算法基礎:打開程序設計之門
- 深入淺出RxJS
- Android開發三劍客:UML、模式與測試
- Node學習指南(第2版)
- Android Studio Cookbook
- Clojure Polymorphism
- Mastering Bootstrap 4
- Python應用開發技術
- 例解Python:Python編程快速入門踐行指南
- Distributed Computing with Python
- H5頁面設計與制作(全彩慕課版·第2版)
- Hands-On ROS for Robotics Programming