官术网_书友最值得收藏!

Chapter 4. Advanced Word2vec

In Chapter 3, Word2vec – Learning Word Embeddings, we introduced you to Word2vec, the basics of learning word embeddings, and the two common Word2vec algorithms: skip-gram and CBOW. In this chapter, we will discuss several topics related to Word2vec, focusing on these two algorithms and extensions.

First, we will explore how the original skip-gram algorithm was implemented and how it compares to its more modern variant, which we used in Chapter 3, Word2vec – Learning Word Embeddings. We will examine the differences between skip-gram and CBOW and look at the behavior of the loss over time of the two approaches. We will also discuss which method works better, using both our observation and the available literature.

We will discuss several extensions to the existing Word2vec methods that boost performance. These extensions include using more effective sampling techniques to sample negative examples for negative sampling and ignoring uninformative words in the learning process, among others. You will also learn a novel word embedding learning technique known as Global Vectors (GloVe) and the specific advantages that GloVe has over skip-gram and CBOW.

Finally, you will learn how to use Word2vec to solve a real-world problem: document classification. We will see this with a simple trick of obtaining document embeddings from word embeddings.

主站蜘蛛池模板: 岳普湖县| 乌恰县| 丹巴县| 双流县| 安义县| 健康| 汝南县| 佛学| 敦化市| 文昌市| 略阳县| 孝义市| 西城区| 巴林左旗| 蒙阴县| 蓝田县| 杭锦后旗| 阿巴嘎旗| 景谷| 永和县| 福州市| 石城县| 大连市| 东阳市| 奉贤区| 海淀区| 江川县| 辰溪县| 潮州市| 新安县| 五大连池市| 鄂托克前旗| 克什克腾旗| 台东市| 德格县| 雷州市| 武强县| 新河县| 鲁甸县| 无极县| 绩溪县|