How Google Mesa works ( short summary )

Mesa is a highly scalable analytic data warehousing system that stores critical measurement data related to Google's Internet advertising business.

Mesa leverages common Google infrastructure and services, such as Colossus (Google’s next-generation distributed file system)  BigTable, and MapReduce. To achieve storage scalability and availability

http://research.google.com/pubs/pub42851.html

Characteristics and Goals


  • Data is horizontally partitioned and replicated.
  • To achieve consistent and repeatable queries during updates,the underlying data is multi-versioned.
  • To achieve update scalability, data updates are batched, assigned a new version number, and periodically (e.g., every few minutes) incorporated into Mesa.
  • To achieve update consistency across multiple data centers, Mesa uses a distributed synchronization
  • protocol based on Paxos


How it is different from existing Google tools


  • Megastore, Spanner, and F1 all three are intended for online transaction processing they do provide strong consistency across geo-replicated data but they do not support the peak update throughput needed by clients of Mesa.
  • Mesa does leverage BigTable and the Paxos technology underlying Spanner for metadata storage and maintenance.


What to learn

Schema changes for a large number of tables can be performed dynamically and efficiently without affecting correctness or performance of existing applications

How it works


  • It uses associative and commutative functions based  aggregations in tables
  • While new version-ed information is being calculated old version is used to server the applications
  • When all calculations are over the version is incremented and users issue queries against new version
  • Upstream systems generate updated data in batches
  • The committer assigns each update batch a new version number and publishes all metadata associated with the update (e.g., the locations of the files containing the update data) to the versions database, a globally replicated and consistent data store build on top of the Paxos consensus algorithm.

Schema changes handling


The method Mesa uses to perform online schema changes is to 
(i) make a separate copy of the table with data stored in the new schema version at a fixed update version, (ii) replay any updates to the table generated in the meantime until the new schema version is current, and 
(iii) switch the schema version used for new queries to the new schema version as an atomic controller BigTable metadata operation. 

Older queries may continue to run against the old schema version for some amount of time before the old
schema version is dropped to reclaim space.

No comments:

Post a Comment

Please share your views and comments below.

Thank You.