官术网_书友最值得收藏!

Semi-supervised scenario

A typical semi-supervised scenario is not very different from a supervised one. Let's suppose we have a data generating process, pdata:

However, contrary to a supervised approach, we have only a limited number N of samples drawn from pdata and provided with a label, as follows:

Instead, we have a larger amount (M) of unlabeled samples drawn from the marginal distribution p(x):

In general, there are no restrictions on the values of N and M; however, a semi-supervised problem arises when the number of unlabeled samples is much higher than the number of complete samples. If we can draw N >> M labeled samples from pdata, it's probably useless to keep on working with semi-supervised approaches and preferring classical supervised methods is likely to be the best choice. The extra complexity we need is justified by M >> N, which is a common condition in all those situations where the amount of available unlabeled data is large, while the number of correctly labeled samples is quite a lot lower. For example, we can easily access millions of free images but detailed labeled datasets are expensive and include only a limited subset of possibilities. However, is it always possible to apply semi-supervised learning to improve our models? The answer to this question is almost obvious: unfortunately no. As a rule of thumb, we can say that if the knowledge of Xu increases our knowledge about the prior distribution p(x), a semi-supervised algorithm is likely to perform better than a purely supervised (and thus limited to Xl) counterpart. On the other hand, if the unlabeled samples are drawn from different distributions, the final result can be quite a lot worse. In real cases, it's not so immediately necessary to decide whether a semi-supervised algorithm is the best choice; therefore, cross-validation and comparisons are the best practices to employ when evaluating a scenario.

主站蜘蛛池模板: 韶山市| 玉林市| 裕民县| 景德镇市| 清流县| 凤庆县| 青州市| 克东县| 玛沁县| 台州市| 长武县| 隆安县| 盐池县| 灵台县| 罗源县| 东方市| 华亭县| 若羌县| 凤阳县| 陆川县| 西华县| 兴山县| 邯郸市| 稷山县| 盐城市| 上蔡县| 陈巴尔虎旗| 沙雅县| 高淳县| 翁源县| 普兰店市| 陵川县| 彭阳县| 恩平市| 穆棱市| 湟源县| 上饶市| 永宁县| 衡阳市| 贡嘎县| 西昌市|