官术网_书友最值得收藏!

PAC learning

In many cases machine learning seems to work seamlessly, but is there any way to determine formally the learnability of a concept? In 1984, the computer scientist L. Valiant proposed a mathematical approach to determine whether a problem is learnable by a computer. The name of this technique is PAC, or probably approximately correct.

The original formulation (you can read it in Valiant L., A Theory of the Learnable, Communications of the ACM, Vol. 27, No. 11 , Nov. 1984) is based on a particular hypothesis, however, without a considerable loss of precision, we can think about a classification problem where an algorithm A has to learn a set of concepts. In particular, a concept is a subset of input patterns X which determine the same output element. Therefore, learning a concept (parametrically) means minimizing the corresponding loss function restricted to a specific class, while learning all possible concepts (belonging to the same universe), means finding the minimum of a global loss function.

However, given a problem, we have many possible (sometimes, theoretically infinite) hypotheses and a probabilistic trade-off is often necessary. For this reason, we accept good approximations with high probability based on a limited number of input elements and produced in polynomial time.

Therefore, an algorithm A can learn the class C of all concepts (making them PAC learnable) if it's able to find a hypothesis H with a procedure O(nk) so that A, with a probability p, can classify all patterns correctly with a maximum allowed error me. This must be valid for all statistical distributions on X and for a number of training samples which must be greater than or equal to a minimum value depending only on p and me.

The constraint to computation complexity is not a secondary matter, in fact, we expect our algorithms to learn efficiently in a reasonable time also when the problem is quite complex. An exponential time could lead to computational explosions when the datasets are too large or the optimization starting point is very far from an acceptable minimum. Moreover, it's important to remember the so-called curse of dimensionality, which is an effect that often happens in some models where training or prediction time is proportional (not always linearly) to the dimensions, so when the number of features increases, the performance of the models (that can be reasonable when the input dimensionality is small) gets dramatically reduced. Moreover, in many cases, in order to capture the full expressivity, it's necessary to have a very large dataset and without enough training data, the approximation can become problematic (this is called Hughes phenomenon). For these reasons, looking for polynomial-time algorithms is more than a simple effort, because it can determine the success or the failure of a machine learning problem. For these reasons, in the next chapters, we're going to introduce some techniques that can be used to efficiently reduce the dimensionality of a dataset without a problematic loss of information.

主站蜘蛛池模板: 光山县| 昌邑市| 陆良县| 凤阳县| 通榆县| 南康市| 阿城市| 古交市| 宁夏| 蛟河市| 刚察县| 清水河县| 康保县| 蕲春县| 文水县| 巴青县| 临汾市| 娄底市| 蚌埠市| 阿坝| 阿荣旗| 南充市| 南和县| 光山县| 蕉岭县| 萨嘎县| 三河市| 桃园市| 阳城县| 鄄城县| 宜兰县| 固阳县| 盘山县| 获嘉县| 巨鹿县| 岗巴县| 万盛区| 清涧县| 彭州市| 吴江市| 凉城县|