官术网_书友最值得收藏!

Downloading the UK Road Safety Data dataset

In this section, we're going to download and take a bird's eye view of the dataset we'll be using throughout this book—the UK Road Safety Data. In total, this dataset provides more than 15 million rows across three CSV files.

How to do it…

  1. Visit the following URL: http://data.gov.uk/dataset/road-accidents-safety-data/resource/80b76aec-a0a1-4e14-8235-09cc6b92574a.
  2. Click on the red Download button on the right side of the page. I suggest creating a data directory to hold the data files.
  3. Unpack the provided zip files in the directory you created.
  4. You should see the following four files included in the expanded directory:
    • Accidents7904.csv
    • Casualty7904.csv
    • Road-Accident-Safety-Data-Guide-1979-2004.xls
    • Vehicles7904.csv

How it works…

The CSV files contain the data that we are going to use in the recipes throughout this book. The Excel file is pure magic, though. It contains a reference for all the data, including a list of the fields in each dataset as well as the coding used.

Coding data is a very important preprocessing step. Most analysis tools that you will use expect to see numbers rather than labels such as city or road type. The reason for this is that computers don't understand context like we humans do. Is Paris a city or a person? It depends. Computers can't make that judgment call. To get around this, we assign numbers to each text value. That's been done with this dataset.

Why we are using this dataset

It is said that up to 90 percent of the time spent on most data projects is for preparing the data for analysis. Anecdotal evidence from this author and those I speak with holds this to be true. While you will learn a number of techniques for cleaning and standardizing data, also known as preprocessing in the data world, the UK Road Safety Data dataset is an analysis-ready dataset. In addition, it provides a large amount of data—millions of rows—for us to work with.

This dataset contains detailed road safety data about the circumstances of personal injury road accidents in GB from 1979, the types (including Make and Model) of vehicles involved and the consequential casualties.

主站蜘蛛池模板: 宁津县| 西华县| 翁牛特旗| 库尔勒市| 全州县| 西藏| 霍林郭勒市| 天峻县| 青田县| 隆林| 神池县| 绥阳县| 额济纳旗| 富裕县| 海城市| 弋阳县| 滁州市| 南汇区| 陵水| 康保县| 咸阳市| 玉树县| 湖南省| 高邮市| 巴东县| 中山市| 萨迦县| 临洮县| 吴堡县| 罗源县| 雅江县| 正定县| 永春县| 石阡县| 长阳| 瑞昌市| 呼和浩特市| 宿松县| 靖宇县| 佛冈县| 玛多县|