- Mastering Machine Learning Algorithms
- Giuseppe Bonaccorso
- 439字
- 2021-06-25 22:07:31
Generative Gaussian mixtures
Generative Gaussian mixtures is an inductive algorithm for semi-supervised clustering. Let's suppose we have a labeled dataset (Xl, Yl) containing N samples (drawn from pdata) and an unlabeled dataset Xu containing M >> N samples (drawn from the marginal distribution p(x)). It's not necessary that M >> N, but we want to create a real semi-supervised scenario, with only a few labeled samples. Moreover, we are assuming that all unlabeled samples are consistent with pdata. This can seem like a vicious cycle, but without this assumption, the procedure does not have a strong mathematical foundation. Our goal is to determine a complete p(x|y) distribution using a generative model. In general, it's possible to use different priors, but we are now employing multivariate Gaussians to model our data:
Thus, our model parameters are means and covariance matrices for all Gaussians. In other contexts, it's possible to use binomial or multinomial distributions. However, the procedure doesn't change; therefore, let's assume that it's possible to approximate p(x|y) with a parametrized distribution p(x|y, θ). We can achieve this goal by minimizing the Kullback-Leibler pergence between the two distributions:
In Chapter 5, EM Algorithm and Applications we are going to show that this is equivalent to maximizing the likelihood of the dataset. To obtain the likelihood, it's necessary to define the number of expected Gaussians (which is known from the labeled samples) and a weight-vector that represents the marginal probability of a specific Gaussian:
Using the Bayes' theorem, we get:
As we are working with both labeled and unlabeled samples, the previous expression has a double interpretation:
- For unlabeled samples, it is computed by multiplying the ith Gaussian weight times the probability p(xj) relative to the ith Gaussian distribution.
- For labeled samples, it can be represented by a vector p = [0, 0, ... 1, ... 0, 0] where 1 is the ith element. In this way, we force our model to trust the labeled samples in order to find the best parameter values that maximize the likelihood on the whole dataset.
With this distinction, we can consider a single log-likelihood function where the term fw(yi|xj) has been substituted by a per sample weight:
It's possible to maximize the log-likelihood using the EM algorithm (see Chapter 5, EM Algorithm and Applications). In this context, we provide the steps directly:
- p(yi|xj,θ,w) is computed according to the previously explained method
- The parameters of the Gaussians are updated using these rules:
N is the total number of samples. The procedure must be iterated until the parameters stop modifying or the modifications are lower than a fixed threshold.
- Canvas LMS Course Design
- Dreamweaver 8中文版商業案例精粹
- MCSA Windows Server 2016 Certification Guide:Exam 70-741
- 永磁同步電動機變頻調速系統及其控制(第2版)
- 基于ARM 32位高速嵌入式微控制器
- 21天學通Java Web開發
- 傳感器與物聯網技術
- 大數據時代
- 數據掘金
- 突破,Objective-C開發速學手冊
- TensorFlow Reinforcement Learning Quick Start Guide
- Web璀璨:Silverlight應用技術完全指南
- MPC5554/5553微處理器揭秘
- 典型Hadoop云計算
- 電動汽車驅動與控制技術