官术网_书友最值得收藏!

Consuming Location Data Like a Data Scientist

Location comes in different forms, but what if it comes in a simple structured data format and we overlooked it all this time? Most machine learning algorithms, such as random forests, are geared toward creating insights from structured data in tabular form. In this chapter, we will discuss how to leverage spatial data that is masquerading as tabular data and apply machine learning techniques to it as any data scientist would. For this chapter, we will be using New York taxi trip data to predict trip duration for any given New York taxi trip. We are choosing this dataset because of the following reasons:

  • Predicting trip duration has the right mix of geospatial analytics and machine learning
  • Finding the time it takes to travel from point A to point B is a routing problem, which will be dealt with in Chapter 6, Let's Build a Routing Engine, and so this chapter is a perfect introduction

We will be using a library known as fastai, an amazing Python library built around popular machine learning libraries such as scikit-learn and PyTorch. In this chapter, we will be discussing the following topics:

  • Exploratory data analysis
  • Processing spatial data
  • Understanding and inferring the error metric
  • Building and inferencing a random forest model
主站蜘蛛池模板: 互助| 平定县| 沂源县| 慈溪市| 尼木县| 南昌市| 同德县| 上饶县| 西城区| 阿克| 萍乡市| 辉县市| 洛南县| 阿拉尔市| 怀仁县| 龙口市| 宜阳县| 祁东县| 株洲县| 色达县| 龙里县| 五指山市| 施甸县| 平塘县| 陕西省| 合江县| 南岸区| 名山县| 东莞市| 漳浦县| 诸暨市| 贡山| 三台县| 房山区| 平度市| 和田县| 葫芦岛市| 丰县| 民丰县| 蛟河市| 桑日县|