ABIS Infor - 2016-04

Reading the DB2 LUW Version 11.1 announcement

Koen De Backer (ABIS) - 19 April 2016

Abstract

IBM announces a new version of DB2 for Linux, Unix and Windows, Version 11.1. In the announcement letter the current capabilities and scope of the DB2 LUW product are described based on typical DB2-speak. For those of you rather new to the DB2 LUW product some of this DB2 terminology might be somewhat mystifying. In this piece I'd like to concentrate on "BLU Acceleration". "BLU what?" Right ... just read on.

From the announcement

"... IBM DB2 for Linux, UNIX, and Windows is the transformation data platform for transactional and analytical workloads ... DB2 11.1 scales to new heights of performance by enabling its BLU Acceleration(R) capability to be deployed across a massively parallel processing (MPP) network cluster architecture to improve response time. ..."

What is this BLU Acceleration(R) they are talking about?

The term "BLU Acceleration" was first introduced in DB2 LUW V10.5. It's a technology for complex analytic queries based on research for in-memory, hardware-optimized analytics. BLU Acceleration is all about reducing costs and improve speed by making complex analytics faster, easier, and more resource-friendly.

BLU is fast

Performance will be stable as data size grows beyond RAM. Why? Because BLU Acceleration applies predicates, performs joins, and does grouping, all on the compressed values of column-organized tables. This combination of techniques brings together all resources - I/O bandwidth, buffer pools, memory bandwidth, processor caches, and even machine cycles - through single-instruction, multiple data (SIMD) operations (i.o.w. optimal hardware exploitation).

BLU is easy

Based on a new single registry setting, DB2_WORKLOAD=ANALYTICS, DB2 automatically adapts resources, configuration, workload management, and storage parameters to optimize resource consumption for the platform you're running on. It also enables BLU Acceleration by default, creating all new user tables in column-organized format. From then on users simply load data to run their queries, without the need for tuning.

Maintenance is another aspect that needs less worry

There are no indexes or materialized query tables (MQTs) to define or tune. Storage is automatically freed and returned to the system for reuse as data is deleted over time. Also compression algorithms will automatically adapt to changing data patterns.

Application access to data isn't touched

BLU Acceleration reuses the same SQL compiler and optimizer. Most utilities—including LOAD, INGEST, EXPORT, BACKUP and RESTORE, ROLLFORWARD, and many others—work as usual. In addition there is the ability to mix row-organized and column-organized tables in the same storage (tablespace), bufferpool, schema, and even within the same SQL statement.

Making complex analytics more resource-friendly

DB2 10.5 with BLU Acceleration introduces automatic workload management when DB2_WORKLOAD=ANALYTICS is set. This feature ensures that while any number of queries may be submitted by applications, only a controlled number are allowed to consume resources simultaneously.

What does in-memory columnar store processing actually bring us?

  1. It is a columnar storage. It's obvious that if a query relies on the data from a single or several columns, you should avoid scanning each row to get the data. In business applications, each row can easily go up to double-digit number of columns. Storing data in columnar format definitely leads to smaller storage size and speeds up the query process.
  2. Data skipping is implemented. There is automatic detection of large sections of data that do not qualify for a query and can be ignored. All necessary objects are created and maintained by the system automatically: "Synopsis" created and maintained as data is LOADed or INSERTed, persistent storage of lowest and highest values for data ranges.
  3. What about shadow tables? IBM DB2 BLU introduces the shadow table for the columnar storage. These are up-to-date column organized copies of their row-organized source tables. Shadow tables can "shadow" existing row-organized (source) tables by their associated column-organized tables, allowing the DB2 optimizer to choose the optimal execution environment for both OLAP and OLTP queries.

In a nutshell:

  • Integrated into DB2 with consistent SQL.
  • Row-based table storage and column-based table storage both coexist in the same database and SQL can access them both at the same time
  • Consistent SQL, language interfaces, administration; Reuses DB2 process model, storage, utilities
  • Integration Tooling Support
  • Integrated DBA support through Data Studio and Data Server Manager
  • Simplification
  • Reduces the cost and time of performance tuning for analytical workloads
  • The DDL changes for creating these table are kept simple. Moreover, database-level overrides (such as the
  • dft_table_org database configuration parameter) further simplify the adoption process
  • The db2convert conversion utility enables customers to convert existing tables in row orientation to column orientation.
  • No indexes
  • No reorganization (automated)
  • No runstats (automated)
  • No MDC or MQT
  • No partitioning, statistical views, or optimizer hints

Conclusion

  • When do you use BLU Acceleration?
  • If you have considerable data quantities like 30 GB coming in every day, and have to create analytics report quite fast. The acceleration is needed to achieve the goal.
  • If you have a workload that exclusively executes deep analytic queries, then the decision is easy: use BLU Acceleration.
  • If your workload is somewhat mixed, then the Workload Table Organization Advisor in IBM Data Studio can analyze your workload and recommend which tables should take advantage of this new technology.
  • Why is DB2 BLU Acceleration relevant?
  • Extreme Compression and the resulting storage (read I/O) savings address the "Big Data" issue of size.
  • Fast reporting based on much faster query response (remember Data Skipping) address the "Big Data" issue of response times.
  • Simplicity and seamless integration, and the embedding in the DB2 kernel address the "Big Data" issue of application development approach and development time.