- Natural Language Processing with Python Quick Start Guide
- Nirant Kasliwal
- 114字
- 2021-06-10 18:36:35
Understanding and preparing the data
Text and language is inherently unstructured. We might want to clean it in certain ways, such as expanding abbreviations and acronyms, removing punctuation, and so on. We also want to select a few samples that are the best representatives of the data we might see in the wild.
The other common practice is to prepare a gold dataset. A gold dataset is the best available data under reasonable conditions. This is not the best available data under ideal conditions. Creating the gold dataset often involves manual tagging and cleaning processes.
The next few sections are dedicated to text cleaning and text representations at this stage of the NLP workflow.
推薦閱讀
- 新編Visual Basic程序設計上機實驗教程
- Learning ArcGIS Pro 2
- R語言數據可視化實戰
- Bootstrap Essentials
- iOS編程基礎:Swift、Xcode和Cocoa入門指南
- 硅谷Python工程師面試指南:數據結構、算法與系統設計
- Unity UI Cookbook
- R Data Science Essentials
- Mastering HTML5 Forms
- R語言數據挖掘:實用項目解析
- Java EE項目應用開發
- Learning Java Lambdas
- Daniel Arbuckle's Mastering Python
- Advanced C++
- Modern R Programming Cookbook