Get a quoteGet a quote
Live ChatLive chat
Client ZoneClient Zone
ContactsContacts
Hide
Live Support
Sales department
Technical Support
Data warehousing in MySQL

Some databases are used to store large amount of historical data (such as call detail records, log messages, etc). This data has the following common characteristics:

  • every record has timestamp (date and time when event occurs)
  • very large amount of records
  • such records should be stored for long time
  • records are accessed very infrequently

During the life of such databases, we get data fragmentation (non-contiguous free space blocks made by deletion of old rows) and problems with long insert time - because of the oversized B-Tree indexes. During periodic data maintenance, we should delete rows from such tables - and we get excessive table locking time, if table stored as MyISAM and in all cases got excessive IO and CPU usage, while executing DELETE statement.
These problems prevent data from growing and now, we will describe how to avoid these problems and how to make database eat all your disk almost without performance loss.
Table partitioning can be used to solve these problems (starting from MySQL 5.1 http://dev.mysql.com/doc/refman/5.1/en/partitioning.html). Partitioning feature provides functionality to separate data between tablespaces, where partition is automatically selected in depency of inserted data.
MySQL has support of the following partitioning alghorithms: RANGE, HASH, KEY, LIST (http://dev.mysql.com/doc/refman/5.1/en/partitioning-types.html). In this article we will only describe RANGE partitioning based on date, because it is slightly better for historical Data Warehousing. Please note that MySQL is optimized to use to_days() and year() functions as partitioning key.

 

Partitioned tables require additional administrative tasks for partition management (adding/deleting/splitting partitions), but performance improved over all problems we mentioned before:

  • locked only partition against full table in MyISAM
  • no data fragmentation - empty/new partition does not contain old data
  • indexed operations improved - lowering count of iterations to search key inside B-Tree indexes

Consider to use this sample table:
CREATE TABLE members (
firstname VARCHAR(25) NOT NULL,
lastname VARCHAR(25) NOT NULL,
username VARCHAR(16) NOT NULL,
email VARCHAR(35),
joined DATE NOT NULL
)
PARTITION BY RANGE( YEAR(joined) ) (
PARTITION p0 VALUES LESS THAN (1960),
PARTITION p1 VALUES LESS THAN (1970),
PARTITION p2 VALUES LESS THAN (1980),
PARTITION p3 VALUES LESS THAN (1990),
PARTITION p4 VALUES LESS THAN MAXVALUE
);

Created table had the following files inside database directory for MyISAM:
---
# ls -1 members*
members.frm - table description
members.par - partition description
members#P#p0.MYD, members#P#p0.MYI - partition 0
...
members#P#p4.MYD, members#P#p4.MYI - partition 4
---
For InnoDB (with innodb-file-per-table), following files are created:
---
members.frm - table description
members.par - partition description
members#P#p0.ibd, members#P#p1.ibd, members#P#p2.ibd, members#P#p3.ibd, members#P#p4.ibd - partition tablespaces
---
From this file structure it is clearly seen that each partition has its own space.

Please note, that the last partition's maximum value can be limited with specified value or be unlimited (MAXVALUE).
Partitioned tables require periodic maintenance - creation of the new partitions and deletion of the obsolete ones. If last partition key has been set as a sample to ‘01 jan 2011’, this means that data with date below, but not including 1 january 2011, but not including this date can be inserted in such partition. If date exceeds key of last partition - insert will fail. ‘MAXVALUE’ can be specified as last partition key - this means that partition has no upper limit. In all cases - periodic partition maintenance with partitions created in advance is the best way.

Partitioned tables can be easily managed:
If we want to append partition for the next year in case of last partition has MAXVALUE, then we should split last partition into new ones:
---
alter table members reorganize partition p4 into (partition y1999 values less than (2000), partition y2000 values less than (maxvalue));
---
In case, when last partition has a fixed date, new partitions can be simply added:
---
alter table members add partition (partition y2001 values less than (2002), partition y2002 values less than (2003));
---
Unwanted data can be easily dropped, without excessive CPU and DISK load, because of partition deletion is a DDL construction. Let's delete partition that holds members before 1960 year:
---
alter table members drop partition p0;
---
In conclusion: usage of the partitioning greatly helps to reduce CPU and disk load during historic data management. Regular operations such as data inserts and select also have their benefits, because of B-Tree indexes have less height among with reduced data fragmentation. Partitioned table can grow almost unlimited.

Request a Quote
Request a quote
Fill out a small form and let us contact you shortly
Client Zone
Access Client Zone
Ticket system, knowledge base and other services for our clients
Contact Us
Contact us
Send us a request or ask any questions via this contact form