How HBase major compaction works

Compaction is the process in which HBase combines small files (HStoreFiles) into bigger ones.

Its of two types

Minor : When it take FEW number of files which are placed together and make them one.

Major : When it takes all the files in region and make them one.

This post covers the major compaction.

If you want to read about minor compaction , please read other post. How HBase minor compaction works . I suggest you to read that first.

The following properties effect major compaction

hbase.hregion.majorcompaction

The time (in miliseconds) between 'major' compactions of all HStoreFiles in a region. Default: 1 day. Set to 0 to disable automated major compactions.

    Default: 86400000

hbase.server.compactchecker.interval.multiplier This property affects decision to get the number that determines how often (time interval) we scan to see if compaction is necessary.

The interval between checks is hbase.server.compactchecker.interval.multiplier multiplied by hbase.server.thread.wakefrequency.
hbase.server.thread.wakefrequency

Time to sleep in between searches for work (in milliseconds). Used as sleep interval by service threads such as log roller.

Default: 10000

 

Quoting from ( Discussion specific stuff i have removed )

http://apache-hbase.679495.n3.nabble.com/Major-Compaction-Concerns-tp3642142p3645444.html

Major compactions are triggered by 3 methods: user issued, timed, and size-based. 

Even if we disable time based major compaction we can hit size-based compactions where your config is disabling time-based compactions.  Minor compactions are issued on a size-based threshold. 

The algorithm sees if sum(file[0:i] * ratio) > file[i+1] and includes file[0:i+1]   if so. 

This is a reverse iteration, so the highest 'i' value is used.  If all files match, then you can remove delete markers [which is the difference between a major and minor compaction].  Major compactions aren't a bad or time-intensive thing, it's just delete marker removal.

Minor compactions will usually pick up a couple of the smaller adjacent StoreFiles and rewrite them as one. Minors do not drop deletes or expired cells, only major compactions do this.

Now that you have read what major and minor compaction is , optimizing the above parameters based on cluster profile is necessary which we would see in other post.   

Happy Hadooping :)

No comments:

Post a Comment

Please share your views and comments below.

Thank You.