官术网_书友最值得收藏!

Most frequently used words

One of the easiest things to analyze about your emails is the most frequently used words. We can create a word cloud to see the most frequently used words. Let's first remove the archived emails:

from wordcloud import WordCloud 

df_no_arxiv = dfs[dfs['from'] != 'no-reply@arXiv.org']
text = ' '.join(map(str, sent['subject'].values))

Next, let's plot the word cloud:

stopwords = ['Re', 'Fwd', '3A_']
wrd = WordCloud(width=700, height=480, margin=0, collocations=False)
for sw in stopwords:
wrd.stopwords.add(sw)
wordcloud = wrd.generate(text)

plt.figure(figsize=(25,15))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.margins(x=0, y=0)

I added some extra stop words to filter out from the graph. The output for me is as follows:

This tells me what I mostly communicate about. From the analysis of emails from 2011 to 2019, the most frequently used words are new, site, project, Data, WordPress, and website. This is really good, right? What is presented in this chapter is just a starting point. You can take this further in several other directions. 

主站蜘蛛池模板: 嘉禾县| 曲阜市| 丰都县| 香河县| 江华| 崇州市| 富阳市| 吉木萨尔县| 隆林| 红原县| 焉耆| 米泉市| 西峡县| 张家港市| 三亚市| 法库县| 阳曲县| 祁门县| 聂拉木县| 五原县| 周宁县| 赫章县| 馆陶县| 邯郸市| 类乌齐县| 七台河市| 淮南市| 濮阳市| 泽库县| 志丹县| 旬邑县| 拜泉县| 西城区| 调兵山市| 长春市| 伽师县| 五家渠市| 永仁县| 龙山县| 鸡西市| 喀什市|