官术网_书友最值得收藏!

Generative Gaussian mixtures

Generative Gaussian mixtures is an inductive algorithm for semi-supervised clustering. Let's suppose we have a labeled dataset (Xl, Yl) containing N samples (drawn from pdata) and an unlabeled dataset Xu containing M >> N samples (drawn from the marginal distribution p(x)). It's not necessary that M >> N, but we want to create a real semi-supervised scenario, with only a few labeled samples. Moreover, we are assuming that all unlabeled samples are consistent with pdata. This can seem like a vicious cycle, but without this assumption, the procedure does not have a strong mathematical foundation. Our goal is to determine a complete p(x|y) distribution using a generative model. In general, it's possible to use different priors, but we are now employing multivariate Gaussians to model our data:

Thus, our model parameters are means and covariance matrices for all Gaussians. In other contexts, it's possible to use binomial or multinomial distributions. However, the procedure doesn't change; therefore, let's assume that it's possible to approximate p(x|y) with a parametrized distribution p(x|y, θ). We can achieve this goal by minimizing the Kullback-Leibler pergence between the two distributions:

In Chapter 5EM Algorithm and Applications we are going to show that this is equivalent to maximizing the likelihood of the dataset. To obtain the likelihood, it's necessary to define the number of expected Gaussians (which is known from the labeled samples) and a weight-vector that represents the marginal probability of a specific Gaussian:

Using the Bayes' theorem, we get:

As we are working with both labeled and unlabeled samples, the previous expression has a double interpretation:

  • For unlabeled samples, it is computed by multiplying the ith Gaussian weight times the probability p(xj) relative to the ith Gaussian distribution.
  •  For labeled samples, it can be represented by a vector p = [0, 0, ... 1, ... 0, 0] where 1 is the ith element. In this way, we force our model to trust the labeled samples in order to find the best parameter values that maximize the likelihood on the whole dataset.

With this distinction, we can consider a single log-likelihood function where the term fw(yi|xj) has been substituted by a per sample weight:

It's possible to maximize the log-likelihood using the EM algorithm (see Chapter 5, EM Algorithm and Applications). In this context, we provide the steps directly:

  • p(yi|xj,θ,w) is computed according to the previously explained method
  • The parameters of the Gaussians are updated using these rules:

N is the total number of samples. The procedure must be iterated until the parameters stop modifying or the modifications are lower than a fixed threshold.

主站蜘蛛池模板: 确山县| 虞城县| 且末县| 云阳县| 浦北县| 澄江县| 南城县| 永州市| 甘肃省| 中牟县| 遂宁市| 汉中市| 白水县| 肃南| 五常市| 青田县| 延津县| 临沭县| 广水市| 西峡县| 旺苍县| 林州市| 丹巴县| 嘉禾县| 扎兰屯市| 正安县| 明水县| 芮城县| 天峻县| 屏山县| 连城县| 正蓝旗| 巴塘县| 博野县| 德州市| 萝北县| 册亨县| 油尖旺区| 永顺县| 小金县| 商南县|