SLA Covered Services

Services offered by us are covered by SLA because we are confident in everything we do. Our Customers with standart service subscriptions will obtain an opportunity to request the compensation for SLA breaches.

 

Let your server idle!

Buy NGINX Integration 50% cheaper!
Decrease the load average of your server just for $10.00!
Use "NGINX" coupon during procurement of NGINX Integration.

 

Migration you will never observe.

No Downtime Мigration
REMSYS' perfect solution for no downtime data migration.

 

Other Services & Products

We are ready to offer you all our potentials to provide your company with high quality and time-efficient services and products.

The best     
      you can do
for your     
      SERVER !

 

Solutions for

Hosting
Providers

 

 

 
 

Solutions for

Corporate
Customers

 

 

 

 

Solutions for

SoHo &
Startups

 

 

 
11.10.2011

Data warehousing in MySQL



Some databases are used to store large amount of historical data (such as call detail records, log messages, etc). This data has the following common characteristics:

  • every record has timestamp (date and time when event occurs)
  • very large amount of records
  • such records should be stored for long time
  • records are accessed very infrequently

During life of such databases, we got data fragmentation (non-contiguous free space blocks made by deletion of old rows) and problems with long insert time - because of oversized B-Tree indexes. During periodic data maintenance, we should delete rows from such tables - and we got excessive table locking time if table stored as MyISAM and in all cases got excessive IO and CPU usage while executing DELETE statement.
These problems prevents data to grow and now, we will describe how to avoid these problems and make database to eat all your disk almost without performance loss.
Table partitioning can be used to solve these problems (starting from MySQL 5.1 http://dev.mysql.com/doc/refman/5.1/en/partitioning.html). Partitioning feature provides functionality to separate data between tablespaces, where partition is automatically selected in depency of inserted data.
MySQL has support of following partitioning alghorithms: RANGE, HASH, KEY, LIST (http://dev.mysql.com/doc/refman/5.1/en/partitioning-types.html). In this article we will only describe RANGE partitioning based on date, because it is slightly better for historical Data Warehousing. Please note that MySQL is optimized to use to_days() and year() functions as partitioning key.

Partitioned tables require additional administrative tasks for partition management (adding/deleting/splitting partitions), but performance improved over all problems we tell before:

  • locked only partition against full table in MyISAM
  • no data fragmentation - empty/new partition does not contain old data
  • indexed operations improved - lowering count of iterations to search key inside B-Tree indexes

Consider to use this sample table:
CREATE TABLE members (
firstname VARCHAR(25) NOT NULL,
lastname VARCHAR(25) NOT NULL,
username VARCHAR(16) NOT NULL,
email VARCHAR(35),
joined DATE NOT NULL
)
PARTITION BY RANGE( YEAR(joined) ) (
PARTITION p0 VALUES LESS THAN (1960),
PARTITION p1 VALUES LESS THAN (1970),
PARTITION p2 VALUES LESS THAN (1980),
PARTITION p3 VALUES LESS THAN (1990),
PARTITION p4 VALUES LESS THAN MAXVALUE
);

Created table had following files inside database directory for MyISAM:
---
# ls -1 members*
members.frm - table description
members.par - partition description
members#P#p0.MYD, members#P#p0.MYI - partition 0
...
members#P#p4.MYD, members#P#p4.MYI - partition 4
---
For InnoDB (with innodb-file-per-table), following files are created:
---
members.frm - table description
members.par - partition description
members#P#p0.ibd, members#P#p1.ibd, members#P#p2.ibd, members#P#p3.ibd, members#P#p4.ibd - partition tablespaces
---
From this file structure it is clearly seen that each partition has it’s own space.

Please note, that last partition maximum value can be limited with specified value or be unlimited (MAXVALUE).
Partitioned tables requires periodic maintenance - creation of new partitions and deletion of obsolete. If last partition key has been set as sample to ‘01 jan 2011’, this means that data with date below but not including 1 january 2011 but not including this date can be inserted in such partition. If date exceeds key of last partition - insert will fail. ‘MAXVALUE’ can be specified as last partition key - this means that partition has no upper limit. In all cases - periodic partition maintenance with partitions created in advance is the best way.

Partitioned tables can be easily managed:
If we want to append partition for next year in case of last partition has MAXVALUE, then we should split last partition into new ones:
---
alter table members reorganize partition p4 into (partition y1999 values less than (2000), partition y2000 values less than (maxvalue));
---
In case, when last partition has fixed date, then new partitions can be simply added:
---
alter table members add partition (partition y2001 values less than (2002), partition y2002 values less than (2003));
---
Unwanted data can be easily dropped without excessive CPU and DISK load, because of partition deletion is a DDL construction. Lets delete partition that holds members before 1960 year:
---
alter table members drop partition p0;
---
In conclusion: using partitioning greatly help to reduce CPU and disk load during historic data management. Regular operations such as data inserts and select is also have benefits, because of B-Tree indexes has less height among with reduced data fragmentation. Partitioned table can grow almost unlimited.