官术网_书友最值得收藏!

Classifying text with language models

Text classification is an application of classification algorithms. However, the text is a combination of words in a specific order. Hence, you can observe that a text document with a class variable is not similar to the dataset that we presented in the table in the Classification algorithms section. 

A text dataset can be represented as shown in the following table:

Table 2: Example of a Twitter dataset

 

For this chapter, we have built a dataset based on tweets from two different accounts. We also have provided code in the following sections so that you can create your own datasets to try this example. Our purpose is to build a smart application that is capable of predicting the source of a tweet just by reading the tweet text. We will collect several tweets by the United States Republican Party (@GOP) and the Democratic Party (@TheDemocrats) to build a model that can predict which party wrote a given tweet. In order to do this, we will randomly select some tweets from each party and submit them through the model to check whether the prediction actually matched reality.

主站蜘蛛池模板: 玛纳斯县| 西林县| 张家港市| 佛坪县| 临湘市| 汉川市| 江阴市| 于田县| 天峨县| 山东省| 芷江| 黔西县| 和龙市| 军事| 华亭县| 中超| 维西| 永仁县| 宁津县| 保康县| 辽宁省| 申扎县| 荣昌县| 宁安市| 时尚| 岢岚县| 米泉市| 榆中县| 海安县| 略阳县| 百色市| 龙州县| 平阴县| 灵寿县| 开远市| 上思县| 英吉沙县| 巩义市| 乌审旗| 宜兰县| 个旧市|