- Python Business Intelligence Cookbook
- Robert Dempsey
- 361字
- 2021-07-30 09:51:50
Downloading the UK Road Safety Data dataset
In this section, we're going to download and take a bird's eye view of the dataset we'll be using throughout this book—the UK Road Safety Data. In total, this dataset provides more than 15 million rows across three CSV files.
How to do it…
- Visit the following URL: http://data.gov.uk/dataset/road-accidents-safety-data/resource/80b76aec-a0a1-4e14-8235-09cc6b92574a.
- Click on the red Download button on the right side of the page. I suggest creating a data directory to hold the data files.
- Unpack the provided zip files in the directory you created.
- You should see the following four files included in the expanded directory:
Accidents7904.csv
Casualty7904.csv
Road-Accident-Safety-Data-Guide-1979-2004.xls
Vehicles7904.csv
How it works…
The CSV files contain the data that we are going to use in the recipes throughout this book. The Excel file is pure magic, though. It contains a reference for all the data, including a list of the fields in each dataset as well as the coding used.
Coding data is a very important preprocessing step. Most analysis tools that you will use expect to see numbers rather than labels such as city or road type. The reason for this is that computers don't understand context like we humans do. Is Paris a city or a person? It depends. Computers can't make that judgment call. To get around this, we assign numbers to each text value. That's been done with this dataset.
Why we are using this dataset
It is said that up to 90 percent of the time spent on most data projects is for preparing the data for analysis. Anecdotal evidence from this author and those I speak with holds this to be true. While you will learn a number of techniques for cleaning and standardizing data, also known as preprocessing in the data world, the UK Road Safety Data dataset is an analysis-ready dataset. In addition, it provides a large amount of data—millions of rows—for us to work with.
This dataset contains detailed road safety data about the circumstances of personal injury road accidents in GB from 1979, the types (including Make and Model) of vehicles involved and the consequential casualties.
- DevOps for Networking
- SQL for Data Analytics
- C語言程序設計
- TypeScript圖形渲染實戰:基于WebGL的3D架構與實現
- UML 基礎與 Rose 建模案例(第3版)
- HTML 5與CSS 3權威指南(第3版·上冊)
- Essential C++(中文版)
- Visual Studio Code 權威指南
- HTML5+CSS3+JavaScript 從入門到項目實踐(超值版)
- 超簡單:Photoshop+JavaScript+Python智能修圖與圖像自動化處理
- 零基礎學C++(升級版)
- Mastering OAuth 2.0
- PHP 7 Programming Blueprints
- Learning Azure DocumentDB
- PHP高性能開發:基礎、框架與項目實戰