How to Manually purge
data.
Manually purging deleted data consists of the following series of
steps:
- Determine the point in time to which you want to purge deleted data.
- Set the Ancient History Mark (AHM) to this point in time using one of the following SQL functions (described in the SQL Reference Manual):
- SET_AHM_TIME()
sets the AHM to the epoch that
includes the specified TIMESTAMP value on the initiator node.
- SET_AHM_EPOCH()
sets the AHM to the specified epoch.
- GET_AHM_TIME()
returns a TIMESTAMP value representing the AHM.
- GET_AHM_EPOCH()
returns the number of the epoch in which the AHM is located.
- MAKE_AHM_NOW() sets the AHM to the greatest allowable value (now), and lets you drop pre-existing projections. This purges all deleted data.
What is AHM?
|
AHM
stands for Ancient History Mark.
AHM the epoch prior to which historical data can be purged from physical storage. |
When you use SET_AHM_TIME or GET_AHM_TIME, keep in mind that the
timestamp you specify is mapped to an epoch, which has (by default) a
three-minute granularity. Thus, if you specify an AHM time of '2008-01-01
00:00:00.00' the resulting purge could permanently remove as much as the first
three minutes of 2008, or could fail to remove the last three minutes of 2007.
- Manually initiate a purge using one of the following SQL functions:
- PURGE_PROJECTION -
purges a specified projection.
- PURGE_PARTITION -
purges a specified partition.
- PURGE_TABLE -
purges all projections on the specified table.
- PURGE -
purges all projections in the physical schema.
- The Tuple Mover performs a mergeout operation to purge the data.
Following are the important functions.
Note that all are functions and we need to execute them using
SELECT.
SET_AHM_TIME()
|
Sets
the Ancient
History Mark (AHM) to the epoch corresponding to the specified time on the initiator
node. This function allows historical data up to and including the AHM
epoch to be purged from physical storage.
SET_AHM_TIME
( time , [ true ] )
SET_AHM_TIME
returns a TIMESTAMP WITH TIME ZONE value representing the end point of the
AHM epoch.
You
cannot change the AHM when any node in the cluster is down, except by using
the optional true parameter
Epochs depend on a configured
epoch advancement interval. If an epoch includes a three-minute range of
time, the purge operation is accurate only to within minus three minutes of
the specified timestamp:
=> SELECT SET_AHM_TIME('2008-02-27 18:13');
set_ahm_time ------------------------------------ AHM set to '2008-02-27 18:11:50-05' (1 row)
In the
previous example, the actual AHM epoch ends at 18:11:50, roughly one minute
before the specified timestamp. This is because SET_AHM_TIME selects the
epoch that ends at or before the specified timestamp. It does not select the
epoch that ends after the specified timestamp because that would purge data
deleted as much as three minutes after the AHM.
For example, using only hours and minutes, suppose
that epoch 9000 runs from 08:50 to 11:50 and epoch 9001 runs from 11:50 to
15:50. SET_AHM_TIME('11:51')chooses epoch 9000
because it ends roughly one minute before the specified timestamp.
|
Sets
the Ancient History
Mark (AHM) to the specified epoch. This function allows deleted data up to and
including the AHM epoch to be purged from physical storage.
SET_AHM_EPOCH ( epoch, [ true ] )
Epoch:
The number of the epoch in which to set the AHM
True
: Optionally allows the AHM to advance when nodes are down.
If
you use SET_AHM_EPOCH , the number of the specified epoch must be:
Use
the SYSTEM table
to see current values of various epochs related to the AHM;
sukul1=> select * from system;
-[ RECORD 1 ]------------+--------------
current_epoch |
4158884
ahm_epoch |
4158880
last_good_epoch |
4158883
refresh_epoch |
-1
designed_fault_tolerance | 1
node_count |
54
node_down_count |
0
current_fault_tolerance |
1
catalog_revision_number |
5434550
wos_used_bytes |
0
wos_row_count |
0
ros_used_bytes |
2499756575124
ros_row_count |
205470844516
total_used_bytes |
2499756575124
total_row_count |
205470844516
The
following command sets the AHM to a specified epoch of 12:
=> SELECT SET_AHM_EPOCH(12);
The
following command sets the AHM to a specified epoch of 2 and allows the AHM
to advance despite a failed node:
=> SELECT SET_AHM_EPOCH(2, true);
|
|
SELECT GET_AHM_TIME();
GET_AHM_TIME ------------------------------------------------- Current AHM Time: 2010-05-13 12:48:10.532332-04 (1 row) |
|
Returns
the number of the epoch in
which the Ancient
History Mark is located.
SELECT GET_AHM_EPOCH();
GET_AHM_EPOCH ---------------------- Current AHM epoch: 0 (1 row) |
|
Sets
the Ancient
History Mark (AHM) to the greatest allowable value, and lets you
drop any projections that existed before the issue occurred.
This
function is intended for use by Administrators only.
MAKE_AHM_NOW ( [ true ] )
True:
[Optional] Allows AHM to advance when nodes are down.
|
|
Permanently removes deleted
data from physical storage so that the disk space can be reused. You
can purge historical data up to and including the epoch in which the Ancient
History Mark is contained.
Note that only data that is deleted and is
beyond the AHM will be purged.
Purges
the specified projection.
PURGE_PROJECTION ( '[[db-name.]schema.]projection-name'
)
=> CREATE TABLE tbl (x INT , y INT);
CREATE TABLE
=> CREATE PROJECTION tbl_p AS SELECT x FROM tbl;
CREATE PROJECTION
=> SELECT
PURGE_PROJECTION ('tbl_p');
purge_projection ------------------- Projection purged (1 row)
(projection
Name is in single quotes).
|
|
Purges
a table partition of deleted rows. Similar
to PURGE() and PURGE_PROJECTION(), this function removes deleted data from
physical storage so you can reuse the disk space. PURGE_PARTITION() removes
data from the AHM epoch
and earlier only.
PURGE_PARTITION ( '[[db_name.]schema_name.]table_name',
partition-key )
table‑name: The
partitioned table to purge.
partition‑key : The key of the partition to purge.
=> SELECT PURGE_PARTITION('t',5); -- Purge partition with key
5.
purge_partition ------------------------------------------------------------------------ Task: merge partitions (Table: public.t) (Projection: public.t_super) (1 row) |
|
Permanently
removes deleted data from physical storage so that the disk space can be
reused.
Purges all projections of the
specified table. You cannot use this function to purge temporary tables.
PURGE_TABLE ( '[[db-name.]schema.]table-name'
)
|
|
Permanently
removes deleted data from physical storage so that the disk space can be
reused. You can purge historical data up to and including the epoch in which
the Ancient History Mark is contained.
Purges all projections in the
physical schema. PURGE does not delete temporary tables.
PURGE()
PURGE() was formerly named PURGE_ALL_PROJECTIONS. HP Vertica supports both function calls.
|