Sunday, November 15, 2015

2.1 4 main methods of deleting data in bulk from tables.

4 main methods of deleting data in bulk from tables.

Command
Description
Permanently removes a table and its definition. Optionally removes associated views and projections as well.
Marks rows with delete vectors and stores them so data can be rolled back to a previous epoch.

The data must eventually be purged before the database can reclaim disk space.
Removes all storage and history associated with a table. The table structure is preserved for future use.

The results of this command cannot be rolled back.
Removes one partition from a partitioned table. Each partition contains a related subset of data in the table.

Partitioned data can be dropped efficiently, and provides query performance benefits.


Choosing the Right Technique for Deleting Data :

  • To delete both table data and definitions and start from scratch, use the DROP TABLE [CASCADE] command.

  • To drop data, while preserving table definitions so that you can quickly and easily reload data, use TRUNCATE TABLE. Note that unlike DELETE, TRUNCATE does not have to mark each row with delete vectors, so it runs much more quickly. Explains why TRUNCATE is faster.

  • To perform bulk delete operations on a regular basis, HP Vertica recommends using Partitioning.

  • To perform occasional small deletes or updates with the option to roll back or review history, use DELETE FROM TABLE.



A] DROP TABLE

  1. Used to remove tables and optionally related projections.

  1. Syntax:

     
DROP TABLE [ IF EXISTS ] [[db-name.]schema.]table [, ...] [ CASCADE ]

             Notice the IF EXISTS clause. It is optional.
             If specified, DROP TABLE does not report an error if one or more of the tables to be dropped does not exist. This clause is useful in SQL scripts where you want to drop a table if it exists before recreating it.

              Also notice the CASCADE option. This is also optional.
              Drops all projections that include the table. We need to use this option if the table to be dropped has projections.

              Following shows an example of using DELETE on a table with projections on it.

              In 1st statement we create unsegmented projections on the table.                     Note that we have created unsegmented projection on all nodes.

sukul1=> CREATE PROJECTION USER_30_DAY_TABLES.p1_emp AS select FIRSTNAME FROM USER_30_DAY_TABLES.EMPTABLE UNSEGMENTED ALL NODES;
WARNING 4468:  Projection <USER_30_DAY_TABLES.p1_emp_node0001> is not available for query processing. Execute the select start_refresh() function to copy data into this projection.
          The projection must have a sufficient number of buddy projections and all nodes must be up before starting a refresh
WARNING 4468:  Projection <USER_30_DAY_TABLES.p1_emp_node0002> is not available for query processing. Execute the select start_refresh() function to copy data into this projection.
          The projection must have a sufficient number of buddy projections and all nodes must be up before starting a refresh
WARNING 4468:  Projection <USER_30_DAY_TABLES.p1_emp_node0003> is not available for query processing. Execute the select start_refresh() function to copy data into this projection.
          The projection must have a sufficient number of buddy projections and all nodes must be up before starting a refresh

        In next statement we can see that the drop fails because of number of projections dependent on the table.

sukul1=> DROP TABLE USER_30_DAY_TABLES.EMPTABLE;
NOTICE 4927:  The Projection p1_emp_node0001 depends on Table EMPTABLE
NOTICE 4927:  The Projection p1_emp_node0002 depends on Table EMPTABLE
NOTICE 4927:  The Projection p1_emp_node0003 depends on Table EMPTABLE
ROLLBACK 3128:  DROP failed due to dependencies
DETAIL:  Cannot drop Table EMPTABLE because other objects depend on it
HINT:  Use DROP ... CASCADE to drop the dependent objects too

       In next statement we can see how using CASCADE option works.

sukul1=> DROP TABLE USER_30_DAY_TABLES.EMPTABLE CASCADE;
DROP TABLE
sukul1=>


B] DELETE


  1. Marks tuples(records) as no longer valid in the current epoch, marking the records for deletion in the WOS, rather than deleting data from disk storage.

  1. By default, delete uses the WOS and if the WOS fills up, overflows to the ROS.

  1. You cannot delete records from a projection.

  1. Syntax

              DELETE [ /*+ DIRECT */ ] [ /*+ LABEL(label-name)*/ ]
              ... FROM [[
db-name.]schema.]table
              ...
WHERE Clause

/*+ DIRECT */
Writes the data directly to disk (ROS) bypassing memory (WOS).
Note: If you delete using the direct hint, you still need to issue a COMMIT or ROLLBACK command to finish the transaction.

  1. DELETE statements support subqueries and joins, which is useful for deleting values in a table based on values that are stored in other tables.

  1. To remove all rows from a temporary table, use a DELETE statement with no WHERE clause. In this case, the rows are not stored in the system, which improves performance.(Similar to truncate)

  1. If you include a WHERE clause with a DELETE statement on a temporary table, DELETE behaves the same as for anchor tables, marking all delete vectors for storage, and you lose any performance benefits.

  1. To truncate a temporary table, without ending the transaction, use DELETE FROM temp_table.
    Truncate table is a DDL and will commit the current transaction.


C] TRUNCATE TABLE

  1. Removes all storage associated with a table, while preserving the table definitions.

  1. TRUNCATE TABLE auto-commits the current transaction after statement execution and cannot be rolled back. TRUNCATE TABLE is a DDL.

  1. Syntax
    TRUNCATE TABLE [[
    db-name.]schema.]table

  1. To truncate an ON COMMIT DELETE ROWS temporary table without ending the transaction, use DELETE FROM temp_table syntax.
    Note: The effect of DELETE FROM depends on the table type. If the table is specified as ON COMMIT DELETE ROWS, then DELETE FROM works like TRUNCATE TABLE;
    otherwise it behaves like a normal delete in that it does not truncate the table.

  1. After truncate operations complete, the data recovers from that current epoch onward. Because TRUNCATE TABLE removes table history, AT EPOCH queries return nothing.


D] DROP_PARTITION

  1. Forces the partition of projections (if needed) and then drops the specified partition.
    REMEMBER DROP PARTITION is not a statement, its a function. Called using SELECT.

  1. Syntax
    DROP_PARTITION (
    table_name , partition_value [ , ignore_moveout_errors, reorganize_data ])

table-name
Specifies the name of the table
partition_value
The key of the partition to drop. For example: DROP_PARTITION('trade', 2006);
ignore_moveout_errors
    Optional Boolean, defaults to false.
    • true—Ignores any WOS moveout errors and forces the operation to continue. Set this parameter to true only if there is no WOS data for the partition.
    • false (or omit)—Displays any moveout errors and aborts the operation on error.
    Note: If you set this parameter to true and the WOS includes data for the partition in WOS, partition data in WOS is not dropped.
reorganize_data
    Optional Boolean, defaults to false.
    • true—Reorganizes the data if it is not organized, and then drops the partition.
    • false—Does not attempt to reorganize the data before dropping the partition. If this parameter is false and the function encounters a ROS without partition keys, an error occurs.

  1. The results of a DROP_PARTITION call go into effect immediately. If you drop a partition using DROP_PARTITION and then try to add data to a partition with the same name, HP Vertica creates a new partition.

  1. In general, if a ROS container has data that belongs to n+1 partitions and you want to drop a specific partition, the DROP_PARTITION operation:
    1. Forces the partition of data into two containers where
      • One container holds the data that belongs to the partition that is to be dropped.
      • Another container holds the remaining n partitions.
    1. Drops the specified partition.
      • DROP_PARTITION forces a moveout if there is data in the WOS (WOS is not partition aware).
      • DROP_PARTITION acquires an exclusive lock on the table to prevent DELETE | UPDATE | INSERT | COPY statements from affecting the table, as well as any SELECT statements issued at SERIALIZABLE isolation level.

  1. You cannot perform a DROP_PARTITION operation on tables with projections that are not up to date

  1. Example:
    SELECT DROP_PARTITION('trade', 2009);DROP_PARTITION-------------------Partition dropped(1 row)
    Note that the table needs to be partitioned in the 1st place for the DROP_PARTITION to work.

No comments:

Post a Comment