- Hands-On Big Data Analytics with PySpark
- Rudy Lai Bart?omiej Potaczek
- 111字
- 2021-06-24 15:52:34
The UCI machine learning repository
We can access the UCI machine learning repository by navigating to https://archive.ics.uci.edu/ml/. So, what is the UCI machine learning repository? UCI stands for the University of California Irvine machine learning repository, and it is a very useful resource for getting open source and free datasets for machine learning. Although PySpark's main issue or solution doesn't concern machine learning, we can use this as a chance to get big datasets that help us test out the functions of PySpark.
Let's take a look at the KDD Cup 1999 dataset, which we will download, and then we will load the whole dataset into PySpark.
推薦閱讀
- MySQL高可用解決方案:從主從復(fù)制到InnoDB Cluster架構(gòu)
- Python金融大數(shù)據(jù)分析(第2版)
- Live Longer with AI
- 大數(shù)據(jù)時代下的智能轉(zhuǎn)型進程精選(套裝共10冊)
- 大數(shù)據(jù)營銷:如何讓營銷更具吸引力
- 數(shù)據(jù)架構(gòu)與商業(yè)智能
- MySQL 8.x從入門到精通(視頻教學(xué)版)
- 網(wǎng)站數(shù)據(jù)庫技術(shù)
- SQL應(yīng)用及誤區(qū)分析
- Splunk智能運維實戰(zhàn)
- 二進制分析實戰(zhàn)
- 碼上行動:利用Python與ChatGPT高效搞定Excel數(shù)據(jù)分析
- Scratch Cookbook
- 一本書講透數(shù)據(jù)治理:戰(zhàn)略、方法、工具與實踐
- Learning Construct 2