- Fast Data Processing with Spark 2(Third Edition)
- Krishna Sankar
- 258字
- 2021-08-20 10:27:11
Data wrangling with iPython
I found iPython to be the best way to learn Spark. It is also a very good choice for data scientists and data engineers to explore, model, and reason with data.
- The exploration step includes understanding the data, experimenting with multiple transformations, extracting features for aggregation, and machine learning as well as ETL strategies
- The modeling and reason (of relationships and distributions between the variables) steps require fast iteration over the data and extracted features with different algorithms, experimenting with different parameters and arriving at a set of ML algorithms to develop an analytics app
The iPython installation for your system (depending on OS, CPU, and so on) is best described at the iPython site, http://ipython.org/install.html and https://ipython.readthedocs.org/en/stable/install/install.html. The iPython command shell requires the Jupyter notebook system, and then the iPython libraries. Of course, you also would need to have Python installed in your system.
Once iPython is working, starting the Spark development with iPython is very easy. The iPython IDE hooks up to pyspark
and the interface is via the web browser as follows:
- Use
cd
into the directory where your notebooks are; for example, assuming that you have downloaded GitHub'sfdps-v3
into your home directory, enter as follows:
cd ~/fdps-v3 PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" ~/Downloads/spark-2.0.0-preview/bin/pyspark
- I have
spark
in myDownloads
directory. If you havespark
in your/opt
directory, the command would be as follows:
PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" /opt/spark/bin/pyspark
- What you are doing is invoking
pyspark
via the iPython IDE. - You will see the IDE on the browser as shown in the following screenshot:

推薦閱讀
- Learning Python Web Penetration Testing
- ASP.NET Web API:Build RESTful web applications and services on the .NET framework
- C語言程序設(shè)計案例教程(第2版)
- SQL Server 2016從入門到精通(視頻教學(xué)超值版)
- Visual C++數(shù)字圖像模式識別技術(shù)詳解
- 編寫高質(zhì)量代碼:改善Python程序的91個建議
- NativeScript for Angular Mobile Development
- Python王者歸來
- Django 5企業(yè)級Web應(yīng)用開發(fā)實戰(zhàn)(視頻教學(xué)版)
- Java Web動態(tài)網(wǎng)站開發(fā)(第2版·微課版)
- 大話代碼架構(gòu):項目實戰(zhàn)版
- Java程序設(shè)計(項目教學(xué)版)
- iOS應(yīng)用逆向工程:分析與實戰(zhàn)
- 微軟辦公軟件認證考試MOS Access 2013實訓(xùn)教程
- Hands-On Machine Learning with ML.NET