- Machine Learning With Go
- Daniel Whitenack
- 323字
- 2021-07-08 10:37:26
Best practices for gathering and organizing data with Go
As you can see in the preceding section, Go itself provides us with an opportunity to maintain high levels of integrity in our data gathering, parsing, and organization. We want to ensure that we leverage Go's unique properties whenever we are preparing our data for machine learning workflows.
Generally, Go data scientists/analysts should follow the following best practices when gathering and organizing data. These best practices are meant to help you maintain integrity in your applications, and been able you to reproduce any analysis:
- Check for and enforce expected types: This might seem obvious, but it is too often overlooked when using dynamically typed languages. Although it is slightly verbose, explicitly parsing data into expected types and handling related errors can save you big headaches down the road.
- Standardize and simplify your data ingress/egress: There are many third-party packages for handling certain types of data or interactions with certain sources of data (some of which we will cover in this book). However, if you standardize the ways you are interacting with data sources, particularly centered around the use of stdlib, you can develop predictable patterns and maintain consistency within your team. A good example of this is a choice to utilize database/sql for database interactions rather than using various third-party APIs and DSLs.
- Version your data: Machine learning models produce extremely different results depending on the training data you use, your choice of parameters, and input data. Thus, it is impossible to reproduce results without versioning both your code and data. We will discuss the appropriate techniques for data versioning later in this chapter.
If you start to stray from these general principles, you should stop immediately. You are likely to sacrifice integrity for the sake of convenience, which is a dangerous road. We will let these principles guide us through the book and as we consider various data formats/sources in the following section.
推薦閱讀
- Python科學計算(第2版)
- 自制編譯器
- 基于Java技術的Web應用開發(fā)
- 編程珠璣(續(xù))
- C++ 從入門到項目實踐(超值版)
- KnockoutJS Starter
- Learning DHTMLX Suite UI
- Python時間序列預測
- Python算法指南:程序員經(jīng)典算法分析與實現(xiàn)
- 現(xiàn)代C++編程實戰(zhàn):132個核心技巧示例(原書第2版)
- Domain-Driven Design in PHP
- Python Programming for Arduino
- Mastering SciPy
- Oracle SOA Suite 12c Administrator's Guide
- Laravel 5.x Cookbook