Thursday, November 19, 2015

2.5 How to Manually purge data.

How to Manually purge data.

Manually purging deleted data consists of the following series of steps:

  1. Determine the point in time to which you want to purge deleted data.

  1. Set the Ancient History Mark (AHM) to this point in time using one of the following SQL functions (described in the SQL Reference Manual):

  • SET_AHM_TIME() sets the AHM to the epoch that includes the specified TIMESTAMP value on the initiator node.
     
  • SET_AHM_EPOCH() sets the AHM to the specified epoch.
     
  • GET_AHM_TIME() returns a TIMESTAMP value representing the AHM.
     
  • GET_AHM_EPOCH() returns the number of the epoch in which the AHM is located.
     
  • MAKE_AHM_NOW() sets the AHM to the greatest allowable value (now), and lets you drop pre-existing projections. This purges all deleted data.

What is AHM?
AHM stands for Ancient History Mark.
AHM the epoch prior to which historical data can be purged from physical storage. 

When you use SET_AHM_TIME or GET_AHM_TIME, keep in mind that the timestamp you specify is mapped to an epoch, which has (by default) a three-minute granularity. Thus, if you specify an AHM time of '2008-01-01 00:00:00.00' the resulting purge could permanently remove as much as the first three minutes of 2008, or could fail to remove the last three minutes of 2007.

  1. Manually initiate a purge using one of the following SQL functions:

  1. The Tuple Mover performs a mergeout operation to purge the data.


Following are the important functions.
Note that all are functions and we need to execute them using SELECT.

SET_AHM_TIME()
Sets the Ancient History Mark (AHM) to the epoch corresponding to the specified time on the initiator node. This function allows historical data up to and including the AHM epoch to be purged from physical storage.

SET_AHM_TIME ( time , [ true ] )

  1. "time" Is a TIMESTAMP value that is automatically converted to the appropriate epoch number.
     
  2. True [Optional] Allows the AHM to advance when nodes are down.

SET_AHM_TIME returns a TIMESTAMP WITH TIME ZONE value representing the end point of the AHM epoch.

You cannot change the AHM when any node in the cluster is down, except by using the optional true parameter

Epochs depend on a configured epoch advancement interval. If an epoch includes a three-minute range of time, the purge operation is accurate only to within minus three minutes of the specified timestamp:

=> SELECT SET_AHM_TIME('2008-02-27 18:13');
           set_ahm_time
------------------------------------
 AHM set to '2008-02-27 18:11:50-05'
(1 row)

In the previous example, the actual AHM epoch ends at 18:11:50, roughly one minute before the specified timestamp. This is because SET_AHM_TIME selects the epoch that ends at or before the specified timestamp. It does not select the epoch that ends after the specified timestamp because that would purge data deleted as much as three minutes after the AHM.

For example, using only hours and minutes, suppose that epoch 9000 runs from 08:50 to 11:50 and epoch 9001 runs from 11:50 to 15:50. SET_AHM_TIME('11:51')chooses epoch 9000 because it ends roughly one minute before the specified timestamp.

    Sets the Ancient History Mark (AHM) to the specified epoch. This function allows deleted data up to and including the AHM epoch to be purged from physical storage.

     
    SET_AHM_EPOCH ( epoch, [ true ]  )

    Epoch: The number of the epoch in which to set the AHM
    True : Optionally allows the AHM to advance when nodes are down.

    If you use SET_AHM_EPOCH , the number of the specified epoch must be:
    • Greater than the current AHM epoch
    • Less than the current epoch
    • Less than or equal to the cluster last good epoch (the minimum of the last good epochs of the individual nodes in the cluster)

    Use the SYSTEM table to see current values of various epochs related to the AHM;

    sukul1=> select * from system;
    -[ RECORD 1 ]------------+--------------
    current_epoch            | 4158884
    ahm_epoch                | 4158880
    last_good_epoch          | 4158883
    refresh_epoch            | -1
    designed_fault_tolerance | 1
    node_count               | 54
    node_down_count          | 0
    current_fault_tolerance  | 1
    catalog_revision_number  | 5434550
    wos_used_bytes           | 0
    wos_row_count            | 0
    ros_used_bytes           | 2499756575124
    ros_row_count            | 205470844516
    total_used_bytes         | 2499756575124
    total_row_count          | 205470844516


    The following command sets the AHM to a specified epoch of 12:

    => SELECT SET_AHM_EPOCH(12);

    The following command sets the AHM to a specified epoch of 2 and allows the AHM to advance despite a failed node:

    => SELECT SET_AHM_EPOCH(2, true);

Returns a TIMESTAMP value representing the Ancient History Mark.

SELECT GET_AHM_TIME();
                  GET_AHM_TIME
-------------------------------------------------
 Current AHM Time: 2010-05-13 12:48:10.532332-04
(1 row)

Returns the number of the epoch in which the Ancient History Mark is located.

SELECT GET_AHM_EPOCH();
    GET_AHM_EPOCH
----------------------
 Current AHM epoch: 0
(1 row)

Sets the Ancient History Mark (AHM) to the greatest allowable value, and lets you drop any projections that existed before the issue occurred.

This function is intended for use by Administrators only.

MAKE_AHM_NOW ( [ true ] )

True: [Optional] Allows AHM to advance when nodes are down. 


Permanently removes deleted data from physical storage so that the disk space can be reused. You can purge historical data up to and including the epoch in which the Ancient History Mark is contained.

Note that only data that is deleted and is beyond the AHM will be purged.

Purges the specified projection.

PURGE_PROJECTION ( '[[db-name.]schema.]projection-name' )

=> CREATE TABLE tbl (x INT , y INT);
CREATE TABLE
=> CREATE PROJECTION tbl_p AS SELECT x FROM tbl;
CREATE PROJECTION
=> SELECT PURGE_PROJECTION ('tbl_p');
purge_projection
-------------------
Projection purged
(1 row)
(projection Name is in single quotes).

Purges a table partition of deleted rows. Similar to PURGE() and PURGE_PROJECTION(), this function removes deleted data from physical storage so you can reuse the disk space. PURGE_PARTITION() removes data from the AHM epoch and earlier only.

PURGE_PARTITION ( '[[db_name.]schema_name.]table_name', partition-key )

tablename: The partitioned table to purge.
partition
key : The key of the partition to purge.

=> SELECT PURGE_PARTITION('t',5); -- Purge partition with key 5.
                            purge_partition
------------------------------------------------------------------------
 Task: merge partitions
(Table: public.t) (Projection: public.t_super)
(1 row)

Permanently removes deleted data from physical storage so that the disk space can be reused.

Purges all projections of the specified table. You cannot use this function to purge temporary tables.

 
PURGE_TABLE ( '[[db-name.]schema.]table-name' )

Permanently removes deleted data from physical storage so that the disk space can be reused. You can purge historical data up to and including the epoch in which the Ancient History Mark is contained.

Purges all projections in the physical schema. PURGE does not delete temporary tables.

PURGE()

PURGE() was formerly named PURGE_ALL_PROJECTIONS. HP Vertica supports both function calls.



No comments:

Post a Comment