- Natural Language Processing with Python Quick Start Guide
- Nirant Kasliwal
- 114字
- 2021-06-10 18:36:35
Understanding and preparing the data
Text and language is inherently unstructured. We might want to clean it in certain ways, such as expanding abbreviations and acronyms, removing punctuation, and so on. We also want to select a few samples that are the best representatives of the data we might see in the wild.
The other common practice is to prepare a gold dataset. A gold dataset is the best available data under reasonable conditions. This is not the best available data under ideal conditions. Creating the gold dataset often involves manual tagging and cleaning processes.
The next few sections are dedicated to text cleaning and text representations at this stage of the NLP workflow.
推薦閱讀
- Go Web編程
- Python數(shù)據(jù)分析入門與實戰(zhàn)
- Visual Basic程序設(shè)計(第3版):學(xué)習(xí)指導(dǎo)與練習(xí)
- Programming ArcGIS 10.1 with Python Cookbook
- Learn Scala Programming
- C語言程序設(shè)計立體化案例教程
- Java軟件開發(fā)基礎(chǔ)
- 深度學(xué)習(xí):算法入門與Keras編程實踐
- Swift細(xì)致入門與最佳實踐
- Extending Puppet(Second Edition)
- Bootstrap 4 Cookbook
- Tableau Desktop可視化高級應(yīng)用
- Clojure Polymorphism
- 黑莓(BlackBerry)開發(fā)從入門到精通
- 算法訓(xùn)練營:海量圖解+競賽刷題(入門篇)