官术网_书友最值得收藏!

The dataset

Upon releasing their findings to the scientific community in 2012, researchers later made the data public from the LHC experiments where they observed - and identified - a signal which is indicative of the Higgs-Boson particle. However, amidst the positive findings is a lot of background noise which causes an imbalance within the dataset. Our task as data scientist is to build a machine learning model which can accurately identify the Higgs-Boson particle from background noise. Already, you should be thinking about how this question is phrased which would be indicative of binary classification (that is, is this example the Higgs-Boson versus background noise?).

You can download the dataset from  https://archive.ics.uci.edu/ml/datasets/HIGGS or use the script getdata.sh located in the   bin folder of this chapter.

This file is 2.6 gigs (uncompressed) and contains 11 million examples that have been labeled as 0 - background noise and 1 - Higgs-Boson. First, you will need to uncompress this file and then we will begin loading the data into Spark for processing and analysis. There are 29 total fields which make up the dataset:

  • Field 1: Class label (1 = signal for Higgs-Boson, 2 = background noise)
  • Fields 2-22: 21 "low-level" features that come from the collision detectors
  • Fields 23-29: seven "high-level" features that have been hand-derived by particle physicists to help classify the particle into its appropriate class (Higgs or background noise)

Later in this chapter, we cover a Deep Neural Network (DNN) example that will attempt to learn these hand-derived features through layers of non-linear transformations to the input data.

Note that for the purposes of this chapter, we will work with a subset of the data, the first 100,000 rows, but all the code we show would also work on the original dataset.

主站蜘蛛池模板: 盐源县| 太康县| 无为县| 山丹县| 金寨县| 若尔盖县| 佳木斯市| 凉城县| 和顺县| 桐柏县| 南溪县| 平潭县| 洛扎县| 宣城市| 抚州市| 遵义县| 闽清县| 蓬安县| 新巴尔虎右旗| 永丰县| 兴隆县| 宜都市| 安岳县| 拜泉县| 肇源县| 江山市| 南澳县| 涪陵区| 万盛区| 清新县| 汉寿县| 名山县| 建湖县| 昭苏县| 抚顺县| 循化| 磐安县| 武清区| 华坪县| 嘉禾县| 阿拉尔市|