舉報(bào)

會(huì)員
Hands-On Data Analysis with Scala
Efficientbusinessdecisionswithanaccuratesenseofbusinessdatahelpsindeliveringbetterperformanceacrossproductsandservices.ThisbookhelpsyoutoleveragethepopularScalalibrariesandtoolsforperformingcoredataanalysistaskswithease.Thebookbeginswithaquickoverviewofthebuildingblocksofastandarddataanalysisprocess.YouwilllearntoperformbasictaskslikeExtraction,Staging,Validation,Cleaning,andShapingofdatasets.Youwilllaterdeepdiveintothedataexplorationandvisualizationareasofthedataanalysislifecycle.YouwillmakeuseofpopularScalalibrarieslikeSaddle,Breeze,Vegas,andPredictionIOforprocessingyourdatasets.Youwilllearnstatisticalmethodsforderivingmeaningfulinsightsfromdata.YouwillalsolearntocreateapplicationsforApacheSpark2.xoncomplexdataanalysis,inreal-time.Youwilldiscovertraditionalmachinelearningtechniquesfordoingdataanalysis.Furthermore,youwillalsobeintroducedtoneuralnetworksanddeeplearningfromadataanalysisstandpoint.Bytheendofthisbook,youwillbecapableofhandlinglargesetsofstructuredandunstructureddata,performexploratoryanalysis,andbuildingefficientScalaapplicationsfordiscoveringanddeliveringinsights
目錄(158章)
倒序
- coverpage
- Title Page
- Copyright and Credits
- Hands-On Data Analysis with Scala
- Dedication
- About Packt
- Why subscribe?
- Packt.com
- Contributors
- About the author
- About the reviewer
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Download the color images
- Conventions used
- Get in touch
- Reviews
- Section 1: Scala and Data Analysis Life Cycle
- Scala Overview
- Getting started with Scala
- Running Scala code online
- Scastie
- ScalaFiddle
- Installing Scala on your computer
- Installing command-line tools
- Installing IDE
- Overview of object-oriented and functional programming
- Object-oriented programming using Scala
- Functional programming using Scala
- Scala case classes and the collection API
- Scala case classes
- Scala collection API
- Array
- List
- Map
- Overview of Scala libraries for data analysis
- Apache Spark
- Breeze
- Breeze-viz
- DeepLearning
- Epic
- Saddle
- Scalalab
- Smile
- Vegas
- Summary
- Data Analysis Life Cycle
- Data journey
- Sourcing data
- Data formats
- XML
- JSON
- CSV
- Understanding data
- Using statistical methods for data exploration
- Using Scala
- Other Scala tools
- Using data visualization for data exploration
- Using the vegas-viz library for data visualization
- Other libraries for data visualization
- Using ML to learn from data
- Setting up Smile
- Running Smile
- Creating a data pipeline
- Summary
- Data Ingestion
- Data extraction
- Pull-oriented data extraction
- Push-oriented data delivery
- Data staging
- Why is the staging important?
- Cleaning and normalizing
- Enriching
- Organizing and storing
- Summary
- Data Exploration and Visualization
- Sampling data
- Selecting the sample
- Selecting samples using Saddle
- Performing ad hoc analysis
- Finding a relationship between data elements
- Visualizing data
- Vegas viz for data visualization
- Spark Notebook for data visualization
- Downloading and installing Spark Notebook
- Creating a Spark Notebook with simple visuals
- More charts with Spark Notebook
- Box plot
- Histogram
- Bubble chart
- Summary
- Applying Statistics and Hypothesis Testing
- Basics of statistics
- Summary level statistics
- Correlation statistics
- Vector level statistics
- Random data generation
- Pseudorandom numbers
- Random numbers with normal distribution
- Random numbers with Poisson distribution
- Hypothesis testing
- Summary
- Section 2: Advanced Data Analysis and Machine Learning
- Introduction to Spark for Distributed Data Analysis
- Spark setup and overview
- Spark core concepts
- Spark Datasets and DataFrames
- Sourcing data using Spark
- Parquet file format
- Avro file format
- Spark JDBC integration
- Using Spark to explore data
- Summary
- Traditional Machine Learning for Data Analysis
- ML overview
- Characteristics of ML
- Categories or types of ML
- Decision trees
- Implementing decision trees
- Decision tree algorithms
- Implementing decision tree algorithms in our example
- Evaluating the results
- Using our model with a decision tree
- Random forest
- Random forest algorithms
- Ridge and lasso regression
- Characteristics of ridge regression
- Characteristics of lasso regression
- k-means cluster analysis
- Natural language processing for data analysis
- Algorithm selections
- Summary
- Section 3: Real-Time Data Analysis and Scalability
- Near Real-Time Data Analysis Using Streaming
- Overview of streaming
- Spark Streaming overview
- Word count using pure Scala
- Word count using Scala and Spark
- Word count using Scala and Spark Streaming
- Deep dive into the Spark Streaming solution
- Streaming a k-means clustering algorithm using Spark
- Streaming linear regression using Spark
- Summary
- Working with Data at Scale
- Working with data at scale
- Cost considerations
- Data storage
- Data governance
- Reliability considerations
- Input data errors
- Processing failures
- Summary
- Another Book You May Enjoy
- Leave a review - let other readers know what you think 更新時(shí)間:2021-06-24 14:51:32
推薦閱讀
- 計(jì)算機(jī)應(yīng)用
- 構(gòu)建高質(zhì)量的C#代碼
- 21小時(shí)學(xué)通AutoCAD
- 網(wǎng)上沖浪
- 計(jì)算機(jī)圖形圖像處理:Photoshop CS3
- 大數(shù)據(jù)安全與隱私保護(hù)
- C語言寶典
- 計(jì)算機(jī)網(wǎng)絡(luò)原理與技術(shù)
- 精通數(shù)據(jù)科學(xué):從線性回歸到深度學(xué)習(xí)
- Ansible 2 Cloud Automation Cookbook
- 基于RPA技術(shù)財(cái)務(wù)機(jī)器人的應(yīng)用與研究
- 智能制造系統(tǒng)及關(guān)鍵使能技術(shù)
- Linux Shell Scripting Cookbook(Third Edition)
- Puppet 3 Beginner’s Guide
- FANUC工業(yè)機(jī)器人虛擬仿真教程
- x86/x64體系探索及編程
- Flash CS3動(dòng)畫制作
- Hands-On Generative Adversarial Networks with Keras
- Oracle Blockchain Quick Start Guide
- SQL語言與數(shù)據(jù)庫操作技術(shù)大全
- Building Impressive Presentations with Impress.js
- Hands-On Data Analysis with Scala
- 單片機(jī)原理、應(yīng)用與仿真
- 數(shù)據(jù)庫技術(shù):Access 2003·計(jì)算機(jī)網(wǎng)絡(luò)技術(shù)
- 仿蛇機(jī)器人的設(shè)計(jì)與制作
- 網(wǎng)絡(luò)管理自動(dòng)化
- Mastering Microsoft Dynamics CRM 2016
- Mastering PostGIS
- Hands-On Data Science with the Command Line
- 決戰(zhàn).NET