Tuesday, November 18, 2014

Slowly changing dimensions and ACID with HIVE(Version 0.14)


Slow changing dimensions ( http://en.wikipedia.org/wiki/Slowly_changing_dimension ). In a typical star schema data warehouse, dimensions tables change slowly over time. For example, a retailer will open new stores, which need to be added to the stores table, or an existing store may change its square footage or some other tracked characteristic. These changes lead to inserts of individual records or updates of records (depending on the strategy chosen). Starting with 0.14, Hive is able to support this.
ACID stands for four traits of database transactions: Atomicity (an operation either succeeds completely or fails, it does not leave partial data), Consistency (once an application performs an operation the results of that operation are visible to it in every subsequent operation), Isolation (operations by one user do not cause unexpected side effects for other users), and Durability (once an operation is complete it will be preserved even in the face of machine or system failure). These traits have long been expected of database systems as part of their transaction functionality.

Hive Update

UPDATE tablename SET column = value [, column = value ...] [WHERE expression]
  1. The referenced column must be a column of the table being updated.
  2. The value assigned must be an expression that Hive supports in the select clause. Thus arithmetic operators, UDFs, casts, literals, etc. are supported. Subqueries are not supported.
  3. Only rows that match the WHERE clause will be updated.
  4. Partitioning columns cannot be updated.
  5. Bucketing columns cannot be updated.
  6. In Hive 0.14, upon successful completion of this operation the changes will be auto-committed.
Hive Delete

DELETE FROM tablename [WHERE expression]




Jira issue status : https://issues.apache.org/jira/browse/HIVE-5317

3 comments:

  1. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. Hi Devan,

      How to do update on previous versions of Hive ?

      Do we need to depend on creating some staging tables to get this thing done?
      Or is there any other methods that we can folllow.

      Thanks

      Delete