官术网_书友最值得收藏!

Tweaking the parameters

So what about all the other parameters? Can we tweak them all to get better results?

Sure. We could, of course, tweak the number of clusters or play with the vectorizer's max_features parameter (you should try that!). Also, we could play with different cluster center initializations. There are also more exciting alternatives to KMeans itself. There are, for example, clustering approaches that also let you use different similarity measurements such as Cosine similarity, Pearson, or Jaccard. An exciting field for you to play.

But before you go there, you will have to define what you actually mean by "better". Scikit has a complete package dedicated only to this definition. The package is called sklearn.metrics and also contains a full range of different metrics to measure clustering quality. Maybe that should be the first place to go now, right into the sources of the metrics package.

主站蜘蛛池模板: 类乌齐县| 隆安县| 乡城县| 亳州市| 阿巴嘎旗| 凤山市| 靖远县| 施甸县| 伊吾县| 西峡县| 古交市| 巩义市| 高清| 抚州市| 乌拉特中旗| 金昌市| 河曲县| 商南县| 庆安县| 合江县| 丰镇市| 黄浦区| 都昌县| 安顺市| 马公市| 大名县| 和静县| 清苑县| 婺源县| 汝州市| 南部县| 乐都县| 新河县| 克拉玛依市| 曲沃县| 黎城县| 容城县| 中西区| 广汉市| 扶沟县| 遂川县|