- Scala for Data Science
- Pascal Bugnion
- 419字
- 2021-07-23 14:33:09
Chapter 5. Scala and SQL through JDBC
One of data science's raison d'être is the difficulty of manipulating large datasets. Much of the data of interest to a company or research group cannot fit conveniently in a single computer's RAM. Storing the data in a way that is easy to query is therefore a complex problem.
Relational databases have been successful at solving the data storage problem. Originally proposed in 1970 (http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf), the overwhelming majority of databases in active use today are still relational. In that time, the price of RAM per megabyte has decreased by a factor of a hundred million. Similarly, hard drive capacity has increased from tens or hundreds of megabytes to terabytes. It is remarkable that, despite this exponential growth in data storage capacity, the relational model has remained dominant.
Virtually all relational databases are described and queried with variants of SQL (Structured Query Language). With the advent of distributed computing, the position of SQL databases as the de facto data storage standard is being challenged by other types of databases, commonly grouped under the umbrella term NoSQL. Many NoSQL databases are more partition-tolerant than SQL databases: they can be split into several parts residing on different computers. While this author expects that NoSQL databases will become increasingly popular, SQL databases are likely to remain prevalent as a data persistence mechanism; hence, a significant portion of this book is devoted to interacting with SQL from Scala.
While SQL is standardized, most implementations do not follow the full standard. Additionally, most implementations provide extensions to the standard. This means that, while many of the concepts in this book will apply to all SQL backends, the exact syntax will need to be adjusted. We will consider only the MySQL implementation here.
In this chapter, you will learn how to interact with SQL databases from Scala using JDBC, a bare bones Java API. In the next chapter, we will consider Slick, an Object Relational Mapper (ORM) that gives a more Scala-esque feel to interacting with SQL.
This chapter is roughly composed of two sections: we will first discuss the basic functionality for connecting and interacting with SQL databases, and then discuss useful functional patterns that can be used to create an elegant, loosely coupled, and coherent data access layer.
This chapter assumes that you have a basic working knowledge of SQL. If you do not, you would be better off first reading one of the reference books mentioned at the end of the chapter.
- Oracle從入門到精通(第3版)
- C++程序設計教程
- SoapUI Cookbook
- Python從小白到大牛
- JavaScript+jQuery網頁特效設計任務驅動教程(第2版)
- TestNG Beginner's Guide
- 機器人Python青少年編程開發實例
- Instant 960 Grid System
- 實戰Java高并發程序設計(第3版)
- 名師講壇:Spring實戰開發(Redis+SpringDataJPA+SpringMVC+SpringSecurity)
- RabbitMQ Essentials
- Unity 2018 Augmented Reality Projects
- 深度學習原理與PyTorch實戰(第2版)
- Mastering jQuery Mobile
- WildFly Cookbook