- Building Machine Learning Systems with Python
- Luis Pedro Coelho Willi Richert Matthieu Brucher
- 285字
- 2021-07-23 17:11:24
P-greater-than-N scenarios
The title of this section is a bit of jargon, which you will learn now. In the 1990s, first in the biomedical domain, and then on the web, problems started to appear where P was greater than N. What this meant was that the number of features, P, was greater than the number of examples, N (these letters were the conventional statistical shorthand for these concepts).
For example, if your input is a set of written documents, a simple way to approach it is to consider each possible word in the dictionary as a feature and regress on those (we will later work on one such problem ourselves). In the English language, you have over 20,000 words (this is if you perform some stemming and only consider common words; it is more than ten times that if you skip this preprocessing step). If you only have a few hundred or a few thousand examples, you will have more features than examples.
In this case, as the number of features is greater than the number of examples, it is possible to have a perfect fit on the training data. This is a mathematical fact, which is independent of your data. You are, in effect, solving a system of linear equations with fewer equations than variables. You can find a set of regression coefficients with zero training errors (in fact, you can find more than one perfect solution).
However, and this is a major problem, zero training error does not mean that your solution will generalize well. In fact, it may generalize very poorly. In earlier examples, regularization could give you a little extra boost, but it is now absolutely required for a meaningful result.
- Cortex-M3 + μC/OS-II嵌入式系統開發入門與應用
- Instant uTorrent
- Learning AngularJS Animations
- 新型電腦主板關鍵電路維修圖冊
- 辦公通信設備維修
- 3ds Max Speed Modeling for 3D Artists
- 計算機組裝維修與外設配置(高等職業院校教改示范教材·計算機系列)
- BeagleBone Robotic Projects
- Spring Cloud微服務和分布式系統實踐
- Python Machine Learning Blueprints
- Deep Learning with Keras
- The Applied Artificial Intelligence Workshop
- 詳解FPGA:人工智能時代的驅動引擎
- 施耐德M241/251可編程序控制器應用技術
- ARM接口編程