- Natural Language Processing with TensorFlow
- Thushan Ganegedara
- 312字
- 2021-06-25 21:28:22
Chapter 3. Word2vec – Learning Word Embeddings
In this chapter, we will discuss a topic of paramount importance in NLP—Word2vec, a technique to learn word embeddings or distributed numerical feature representations (that is, vectors) of words. Learning word representations lies at the very foundation of many NLP tasks because many NLP tasks rely on good feature representations for words that preserve their semantics as well as their context in a language. For example, the feature representation of the word forest should be very different from oven as these words are rarely used in similar contexts, whereas the representations of forest and jungle should be very similar.
Note
Word2vec is called a distributed representation, as the semantics of the word is captured by the activation pattern of the full representation vector, in contrast to a single element of the representation vector (for example, setting a single element in the vector to 1 and rest to 0 for a single word).
We will go step by step from the classical approach to solving this problem to modern neural network-based methods that deliver state-of-the-art performance in finding good word representations. We visualize (using t-SNE, a visualization technique for high-dimensional data) such learned word embeddings for a set of words on a 2D canvas in Figure 3.1. If you take a closer look, you will see that similar things are placed close to each other (for example, numbers in the cluster in the middle):

Figure 3.1: An example visualization of learned word embeddings using t-SNE
Note
t-Distributed Stochastic Neighbor Embedding (t-SNE)
This is a dimensionality reduction technique that projects high-dimensional data to a two-dimensional space. This allows us to imagine how high-dimensional data is distributed in space, and it is quite useful as we cannot visualize beyond three dimensions easily. You will learn about t-SNE in more detail in the next chapter.
- iOS面試一戰到底
- AngularJS Testing Cookbook
- C++面向對象程序設計(微課版)
- Access 2010數據庫基礎與應用項目式教程(第3版)
- Python機器學習算法與實戰
- 單片機應用與調試項目教程(C語言版)
- 碼上行動:用ChatGPT學會Python編程
- Spring Boot進階:原理、實戰與面試題分析
- Mastering Docker
- 物聯網系統架構設計與邊緣計算(原書第2版)
- 超好玩的Scratch 3.5少兒編程
- Software-Defined Networking with OpenFlow(Second Edition)
- Flink核心技術:源碼剖析與特性開發
- 高性能MVVM框架的設計與實現:San
- MATLAB從入門到精通