官术网_书友最值得收藏!

Vapnik-Chervonenkis capacity

 A mathematical formalization of the capacity of a classifier is provided by the Vapnik-Chervonenkis theory. To introduce the definition, it's first necessary to define the concept of shattering. If we have a class of sets C and a set M, we say that C shatters M if:

In other words, given any subset of M, it can be obtained as the intersection of a particular instance of C (cj) and M itself. If we now consider a model as a parameterized function:

We want to determine its capacity in relation to a finite dataset X:

According to the Vapnik-Chervonenkis theory, we can say that the model f shatters X if there are no classification errors for every possible label assignment. Therefore, we can define the Vapnik-Chervonenkis-capacity or VC-capacity (sometimes called VC-dimension) as the maximum cardinality of a subset of X so that f can shatter it.

For example, if we consider a linear classifier in a bi-dimensional space, the VC-capacity is equal to 3, because it's always possible to label three samples so that f shatters them; however, it's impossible to do it in all situations where N > 3. The XOR problem is an example that needs a VC-capacity higher than 3. Let's explore the following plot:

XOR problem with different separating curves

This particular label choice makes the set non-linearly separable. The only way to overcome this problem is to use higher-order functions (or non-linear ones). The curve lines (belonging to a classifier whose VC-capacity is greater than 3) can separate both the upper-left and the lower-right regions from the remaining space, but no straight line can do the same (while it can always separate one point from the other three).

主站蜘蛛池模板: 磴口县| 昌都县| 太白县| 新巴尔虎右旗| 克什克腾旗| 永州市| 红原县| 安岳县| 泗阳县| 彰武县| 禄丰县| 德安县| 赤壁市| 财经| 松桃| 四平市| 阿巴嘎旗| 防城港市| 海阳市| 故城县| 白银市| 武胜县| 治县。| 赤峰市| 榕江县| 宁城县| 修武县| 门源| 嘉定区| 县级市| 融水| 东乌珠穆沁旗| 友谊县| 隆林| 清苑县| 鹤岗市| 泰和县| 平潭县| 柳江县| 阳泉市| 石首市|