舉報

會員
Hands-On Big Data Modeling
Modelingandmanagingdataisacentralfocusofallbigdataprojects.Infact,adatabaseisconsideredtobeeffectiveonlyifyouhavealogicalandsophisticateddatamodel.Thisbookwillhelpyoudeveloppracticalskillsinmodelingyourownbigdataprojectsandimprovetheperformanceofanalyticalqueriesforyourspecificbusinessrequirements.Tostartwith,you’llgetaquickintroductiontobigdataandunderstandthedifferentdatamodelinganddatamanagementplatformsforbigdata.Thenyou’llworkwithstructuredandsemi-structureddatawiththehelpofreal-lifeexamples.Onceyou’vegottogripswiththebasics,you’llusetheSQLDeveloperDataModelertocreateyourowndatamodelscontainingdifferentfiletypessuchasCSV,XML,andJSON.You’llalsolearntocreategraphdatamodelsandexploredatamodelingwithstreamingdatausingreal-worlddatasets.Bytheendofthisbook,you’llbeabletodesignanddevelopefficientdatamodelsforvaryingdatasizeseasilyandefficiently.
目錄(281章)
倒序
- coverpage
- Title Page
- About Packt
- Why subscribe?
- Packt.com
- Contributors
- About the authors
- About the reviewers
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Download the color images
- Conventions used
- Get in touch
- Reviews
- Introduction to Big Data and Data Management
- The concept of big data
- Interesting insights regarding big data
- Characteristics of big data
- Sources and types of big data
- Challenges of big data
- Introduction to big data modeling
- Uses of models
- Introduction to managing big data
- Importance and implications of big data modeling and management
- Benefits of big data management
- Challenges in big data management
- Setting up big data modeling platforms
- Getting started on Windows
- Getting started on macOS
- Summary
- Further reading
- Data Modeling and Management Platforms
- Big data management
- Data ingestion
- Data storage
- Data quality
- Data operations
- Data scalability and security
- Big data management services
- Data cleansing
- Data integration
- Big data management vendors
- Big data storage and data models
- Storage models
- Block-based storage
- File-based storage
- Object-based storage
- Data models
- Relational stores (SQLs)
- Scalable relational systems
- Database as a Service (DaaS)
- NoSQL stores
- Document stores
- Key-value stores
- Extensible-record stores
- Big data programming models
- MapReduce
- MapReduce functionality
- Hadoop
- Features of Hadoop frameworks
- Yet Another Resource Negotiator
- Functional programming
- Spark
- Reasons to choose Apache Spark
- Flink
- Advantages of Flink
- SQL data models
- Hive Query Langauge (HQL)
- Cassandra Query Language (CQL)
- Spark SQL
- Apache Drill
- Getting started with Python and R
- Python on macOS
- Python on Windows
- R on macOS
- R on Windows
- Summary
- Further reading
- Defining Data Models
- Data model structures
- Structured data
- Unstructured data
- Sources of unstructured data
- Comparing structured and unstructured data
- Data operations
- Subsetting
- Union
- Projection
- Join
- Data constraints
- Types of constraints
- Value constraints
- Uniqueness constraints
- Cardinality constraints
- Type constraints
- Domain constraints
- Structural constraints
- A unified approach to big data modeling and data management
- Summary
- Further reading
- Categorizing Data Models
- Levels of data modeling
- Conceptual data modeling
- Logical data modeling
- Benefits of constructing LDMs
- Physical data modeling
- Features of the physical data model
- Types of data model
- Hierarchical database models
- Relational models
- Advantages of the relational data model
- Network models
- Object-oriented database model
- Entity-relationship models
- Object-relational models
- Summary
- Further reading
- Structures of Data Models
- Semi-structured data models
- Exploring the semi-structured data model of JSON data
- Installing Python and the Tweepy library
- Getting authorization credentials to access the Twitter API
- VSM with Lucene
- Lucene
- Graph-data models
- Graph-data models with Gephi
- Summary
- Further reading
- Modeling Structured Data
- Getting started with structured data
- NumPy
- Operations using NumPy
- Pandas
- Matplotlib
- Seaborn
- IPython
- Modeling structured data using Python
- Visualizing the location of houses based on latitude and longitude
- Factors that affect the price of houses
- Visualizing more than one parameter
- Gradient-boosting regression
- Summary
- Further reading
- Modeling with Unstructured Data
- Getting started with unstructured data
- Tools for intelligent analysis
- New methods of data processing
- Tools for analyzing unstructured data
- Weka
- KNIME
- Characteristics of KNIME
- The R language
- Unstructured text analysis using R
- Data ingestion
- Data cleaning and transformations
- Data visualization
- Improving the model
- Summary
- Further reading
- Modeling with Streaming Data
- Data stream and data model versus data format
- Why is streaming data different?
- Use cases of stream processing
- What is a data stream?
- Data streaming systems
- How streaming works
- Data harvesting
- Data processing
- Data analytics
- Importance and implications of streaming data
- Needs for stream processing
- Challenges with streaming data
- Streaming data solutions
- Exploring streaming sensor data from the Twitter API
- Analyzing the streaming data
- Summary
- Further reading
- Streaming Sensor Data
- Sensor data
- Data lakes
- Differences between data lakes and data warehouses
- How a data lake works
- Exploring streaming sensor data from a weather station
- Summary
- Further study
- Concept and Approaches of Big-Data Management
- Non-DBMS-based approach to big data
- Filesystems
- Problems with processing files
- DBMS-based approach to big data
- Advantages of the DBMS
- Declarative Query Language (DQL)
- Data independence
- Controlling data redundancy
- Centralized data management and concurrent access
- Data integrity
- Data availability
- Efficient access through optimization
- Parallel and distributed DBMS
- Parallel DBMS
- Motivations for parallel DBMS
- Architectures for parallel databases
- Distributed DBMS
- Features of a distributed DBMS
- Merits of a distributed DBMS
- DBMS and MapReduce-style systems
- Summary
- Further reading
- DBMS to BDMS
- Characteristics of BDMS
- BASE properties
- Exploring data management with Redis
- Getting started with Redis on macOS
- Advanced key-value stores
- Redis and Hadoop
- Aerospike
- Aerospike technology
- AsterixDB
- Data models
- The Asterix query language
- Getting started with AsterixDB
- Unstructured data in AsterixDB
- Inserting into datasets
- Querying in AsterixDB
- Summary
- Further reading
- Modeling Bitcoin Data Points with Python
- Introduction to Bitcoin data
- Theory
- Importing Bitcoin data into iPython
- Importing required libraries
- Preprocessing and model creation
- Predicting Bitcoin price using Recurrent Neural Network
- Importing packages
- Importing datasets
- Preprocessing
- Constructing the RNN model
- Prediction
- Summary
- Further reading
- Modeling Twitter Feeds Using Python
- Importing Twitter feed data
- Modeling Twitter feeds
- The frequency of the tweets
- Sentiment analysis
- Installing TextBlob
- Parts of speech
- Noun-phrase extraction
- Tokenization
- Bag of words
- Summary
- Further reading
- Modeling Weather Data Points with Python
- Introduction to weather data
- Importing data
- Forecasting Nepal's temperature change
- Modeling with data
- Persistence model forecast
- Weather statistics by country
- Linear regression to predict the temperature of a city
- Summary
- Further reading
- Modeling IMDb Data Points with Python
- Introduction to IMDb data
- Episode data
- Rating data
- Theory
- Modeling with the IMDb dataset
- Starting the platform
- Importing the required libraries
- Importing a file
- Data cleansing
- Clustering
- Summary
- Further reading
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時間:2021-06-10 18:59:37
推薦閱讀
- 平面設計初步
- SCRATCH與機器人
- 基于LPC3250的嵌入式Linux系統開發
- 自動檢測與轉換技術
- 80x86/Pentium微型計算機原理及應用
- 分布式多媒體計算機系統
- OpenStack Cloud Computing Cookbook(Second Edition)
- 愛犯錯的智能體
- 計算機組網技術
- 從零開始學PHP
- 運動控制系統(第2版)
- Wireshark Revealed:Essential Skills for IT Professionals
- EDA技術及其創新實踐(Verilog HDL版)
- 輸送技術、設備與工業應用
- AVR單片機C語言程序設計實例精粹
- Internet of Things for Architects
- Mastering Kubernetes
- 數字孿生技術與工程實踐:模型+數據驅動的智能系統
- Machine Learning with R Quick Start Guide
- PyTorch 1.x Reinforcement Learning Cookbook
- 網絡滲透技術攻防高手修煉
- SQL機器學習庫MADlib技術解析
- Deep Learning with PyTorch 1.x
- 人人都應該知道的人工智能
- 計算機網絡
- 叩響智能制造的大門(全5冊)
- 輕松學HTML+CSS網站開發
- Mastering pandas
- 電子商務網站安全與維護
- Windows 8入門與提高