How HBase minor compaction works

Compaction is the process in which HBase combines small files (HStoreFiles) into bigger ones.

Its of two types

Minor : When it take FEW number of files which are placed together and make them one.

Major : When it takes all the files in region and make them one.

This post covers the minor compaction.

If you want to read about major compaction , please read other post. How HBase major compaction works . I suggest you to read minor compaction first.

Lets see what decides the term FEW in minor compaction

The following properties effect minor compaction

 

hbase.hstore.compaction.min Minimum number of StoreFiles per Store to be selected for a compaction to occur (default 2).
hbase.hstore.compaction.max Maximum number of StoreFiles to compact per minor compaction (default 10).
hbase.hstore.compaction.min.size Any StoreFile smaller than this setting with automatically be a candidate for compaction.
hbase.hstore.compaction.max.size Any StoreFile larger than this setting with automatically be excluded from compaction
hbase.store.compaction.ratio Ratio used in compaction file selection algorithm

 

The file which would be used for minor compaction is decided based on following logic

Note the size of file

selects a file for compaction when the file size <= sum(smaller_files_size) * hbase.hstore.compaction.ratio.

Quoting example from official book

 

Consider following configuration settings

    hbase.store.compaction.ratio = 1.0f
    hbase.hstore.compaction.min = 3 (files)
    hbase.hstore.compaction.max = 5 (files)
    hbase.hstore.compaction.min.size = 10 (bytes)
    hbase.hstore.compaction.max.size = 1000 (bytes)

The following StoreFiles exist: 100, 50, 23, 12, and 12 bytes apiece (oldest to newest). With the above parameters, the files that would be selected for minor compaction are 23, 12, and 12.

Why?

Remember the logic

selects a file for compaction when the file size <= sum(smaller_files_size) * hbase.hstore.compaction.ratio.

    100 --> No, because sum(50, 23, 12, 12) * 1.0 = 97.
    50 --> No, because sum(23, 12, 12) * 1.0 = 47.
    23 --> Yes, because sum(12, 12) * 1.0 = 24.
    12 --> Yes, because the previous file has been included, and because this does not exceed the the max-file limit of 5
    12 --> Yes, because the previous file had been included, and because this does not exceed the the max-file limit of 5.

Hope this helps in understanding HBase minor compaction

Hadoop Hadooping :)

No comments:

Post a Comment

Please share your views and comments below.

Thank You.