官术网_书友最值得收藏!

Chapter 4. Topic Modeling

In the previous chapter we clustered texts into groups. This is a very useful tool, but it is not always appropriate. Clustering results in each text belonging to exactly one cluster. This book is about machine learning and Python. Should it be grouped with other Python-related works or with machine-related works? In the paper book age, a bookstore would need to make this decision when deciding where to stock it. In the Internet store age, however, the answer is that this book is both about machine learning and Python, and the book can be listed in both sections. We will, however, not list it in the food section.

In this chapter, we will learn methods that do not cluster objects, but put them into a small number of groups called topics. We will also learn how to derive between topics that are central to the text and others only that are vaguely mentioned (this book mentions plotting every so often, but it is not a central topic such as machine learning is). The subfield of machine learning that deals with these problems is called topic modeling.

主站蜘蛛池模板: 新河县| 桂平市| 罗山县| 化州市| 临武县| 阿拉善盟| 井冈山市| 台湾省| 长沙市| 台东市| 上高县| 榆林市| 雅江县| 吕梁市| 汝南县| 舟曲县| 五大连池市| 邛崃市| 海原县| 河北区| 闸北区| 娄烦县| 广昌县| 清流县| 东平县| 唐海县| 讷河市| 屏山县| 泾源县| 鄂托克前旗| 元朗区| 宁城县| 庆城县| 德阳市| 乐都县| 商丘市| 连平县| 东莞市| 奉新县| 济南市| 雷波县|