官术网_书友最值得收藏!

Summary

In this chapter, we have discussed many ways to prepare data for machine learning and other forms of AI. Raw data from source systems had to be transported across the data layers of a modern data lake, including a historical data archive, a set of (virtualized) analytics datasets, and a machine learning environment. There are several tools for creating such a data pipeline: simple scripts and traditional software, ETL tools, big data processing frameworks, and streaming data engines.

We have also introduced the concept of feature engineering. This is an important piece of work in any AI system, where data is prepared to be consumed by a machine learning model. Independent of the programming language and frameworks that are chosen for this, an AI team has to spend significant time writing the features and ensuring that the resulting code and binaries are well managed and deployed, together with the models themselves.

We have performed exercises and activities where we have worked with Bash scripts, Jupyter Notebooks, Spark, and finally, stream processing with live Twitter data.

In the next chapter, we will look into a less technical but very important topic for data engineering and machine learning: the ethics of AI.

主站蜘蛛池模板: 开阳县| 开封市| 崇明县| 板桥市| 昌乐县| 郎溪县| 刚察县| 抚州市| 晴隆县| 阿拉善盟| 凭祥市| 广饶县| 孝感市| 天气| 大新县| 自治县| 宝坻区| 沁水县| 金山区| 黑山县| 郑州市| 永川市| 陇西县| 玛多县| 丰原市| 阜康市| 宁武县| 奎屯市| 萨嘎县| 冕宁县| 伊川县| 蛟河市| 开封县| 平山县| 邵阳县| 长丰县| 庆安县| 江达县| 嘉善县| 忻州市| 枣阳市|