- Deep Learning with R for Beginners
- Mark Hodnett Joshua F. Wiley Yuxi (Hayden) Liu Pablo Maldonado
- 491字
- 2021-06-24 14:30:38
Setting up reproducible results
Software for data science is advancing and changing rapidly. Although this is wonderful for progress, it can make reproducing someone else's results a challenge. Even your own code may not work when you go back to it a few months later. This is one of the biggest issues in scientific research today, across all fields, not just artificial intelligence and machine learning. If you work in research or academia and you want to publish your results in scientific journals, this is something you need to be concerned about. The first edition of this book partially addressed this problem by using the R checkpoint package provided by Revolution Analytics. This makes a record of what versions of software were used and ensures there is a snapshot of them available.
For the second edition, we will not use this package for a number of reasons:
- Most readers are probably not publishing their work and are more interested in other concerns (maximizing accuracy, interpretability, and so on).
- Deep learning requires large datasets. When you have a large amount of data, it should mean that, while we may not get precisely the same result each time, it will be very close (fractions of percentages).
- In production systems, there is more to reproducibility than software. You also have to consider data pipelines and random seed-generation.
- In order to ensure reproducibility, the libraries used must stay frozen. New versions of deep learning APIs are released constantly and may contain enhancements. If we limited ourselves to old versions, we would get poor results.
If you are interested in learning more about the checkpoint package, you can read the online vignette for the package at https://cran.r-project.org/web/packages/checkpoint/vignettes/checkpoint.html.
This book was written using R version 3.5 on Windows 10 Professional x64, which is the latest version of R at the time of writing. The code was run on a machine with an Intel i5 processor and 32 GB RAM; it should run on an Intel i3 processor with 8 GB RAM.
You can download the example code files for this book from your account at http://www.packtpub.com/. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
- Log in or register to our website using your email address and password.
- Hover the mouse pointer on the SUPPORT tab at the top.
- Click on Code Downloads & Errata.
- Enter the name of the book in the Search box.
- Select the book for which you're looking to download the code files.
- Choose from the drop-down menu where you purchased this book from.
- Click on Code Download.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
- WinRAR I 7-Zip for Windows
- Zipeg I iZip I UnRarX for Mac
- 7-Zip I PeaZip for Linux
- 數(shù)據(jù)要素安全流通
- Java Data Science Cookbook
- Test-Driven Development with Mockito
- 從0到1:數(shù)據(jù)分析師養(yǎng)成寶典
- 使用GitOps實(shí)現(xiàn)Kubernetes的持續(xù)部署:模式、流程及工具
- INSTANT Cytoscape Complex Network Analysis How-to
- 基于OPAC日志的高校圖書館用戶信息需求與檢索行為研究
- 數(shù)據(jù)庫技術(shù)實(shí)用教程
- Hadoop大數(shù)據(jù)開發(fā)案例教程與項目實(shí)戰(zhàn)(在線實(shí)驗(yàn)+在線自測)
- 一本書講透Elasticsearch:原理、進(jìn)階與工程實(shí)踐
- 大數(shù)據(jù)數(shù)學(xué)基礎(chǔ)(R語言描述)
- 商業(yè)智能工具應(yīng)用與數(shù)據(jù)可視化
- 數(shù)字化轉(zhuǎn)型實(shí)踐:構(gòu)建云原生大數(shù)據(jù)平臺
- ORACLE 11g權(quán)威指南
- 數(shù)字孿生