官术网_书友最值得收藏!

Tweaking the parameters

So what about all the other parameters? Can we tweak them all to get better results?

Sure. We could, of course, tweak the number of clusters or play with the vectorizer's max_features parameter (you should try that!). Also, we could play with different cluster center initializations. There are also more exciting alternatives to KMeans itself. There are, for example, clustering approaches that also let you use different similarity measurements such as Cosine similarity, Pearson, or Jaccard. An exciting field for you to play.

But before you go there, you will have to define what you actually mean by "better". Scikit has a complete package dedicated only to this definition. The package is called sklearn.metrics and also contains a full range of different metrics to measure clustering quality. Maybe that should be the first place to go now, right into the sources of the metrics package.

主站蜘蛛池模板: 鹰潭市| 嘉义市| 通海县| 雷州市| 葵青区| 婺源县| 綦江县| 苏尼特右旗| 綦江县| 城口县| 衡阳市| 自治县| 衡阳市| 克山县| 宁阳县| 永定县| 五指山市| 曲阜市| 遵化市| 郁南县| 滨海县| 兴宁市| 临汾市| 翁源县| 阳城县| 夏津县| 灵武市| 和龙市| 白银市| 通榆县| 调兵山市| 临沂市| 漳浦县| 武夷山市| 沂源县| 汉川市| 安溪县| 鹤壁市| 新化县| 弋阳县| 保定市|