官术网_书友最值得收藏!

Natural Language Processing

Processing natural language texts is very complex, they are not very well structured and require a lot of cleaning and normalizing. Yet the amount of textual information around us is tremendous: a lot of text data is generated every minute, and it is very hard to retrieve useful information from them. Using data science and machine learning is very helpful for text problems as well; they allow us to find the right text, process it, and extract the valuable bits of information.

There are multiple ways we can use the text information. One example is information retrieval, or, simply, text search--given a user query and a collection of documents, we want to find what are the most relevant documents in the corpus with respect to the query, and present them to the user. Other applications include sentiment analysis--predicting whether a product review is positive, neutral or negative, or grouping the reviews according to how they talk about the products. 

We will talk more about information retrieval, Natural Language Processing (NLP) and working with texts in Chapter 6, Working with Text - Natural Language Processing and Information Retrieval. Additionally, we will see how to process large amounts of text data in Chapter 9Scaling Data Science.  

The methods we can use for machine learning and data science are very important. What is equally important is the the way we create them and then put them to use in production systems. Data science process models help us make it more organized and systematic, which is why we will talk about them next.

主站蜘蛛池模板: 宝鸡市| 姜堰市| 鸡泽县| 准格尔旗| 新绛县| 壤塘县| 天长市| 高碑店市| 太保市| 高淳县| 青铜峡市| 东源县| 虎林市| 思茅市| 长海县| 台山市| 吐鲁番市| 叙永县| 萨迦县| 桐柏县| 望城县| 静宁县| 宁都县| 湟源县| 合山市| 吉水县| 酉阳| 饶阳县| 庄浪县| 建平县| 栖霞市| 民县| 万载县| 古交市| 秦安县| 双牌县| 连云港市| 黑河市| 石楼县| 张家界市| 灵璧县|