Sunday, November 15, 2015

1.32 COPY: Saving REJECTED Data to a table.


COPY: Saving REJECTED Data to a table.

  1. Use the REJECTED DATA parameter with the AS TABLE clause to specify a table in which to save rejected data.

  1. When you use the AS TABLE clause, HP Vertica creates a new table if one does not exist, or appends to an existing table. If no parsing rejections occur during a load, the table exists but is empty. The next time you load data, HP Vertica inserts any rejected rows to the existing table.

  1. The load rejection tables are a special type of table with the following capabilities and limitations:
    • Support SELECT statements
    • Can use DROP TABLE
    • Cannot be created outside of a COPY statement
    • Do not support DML and DDL activities

  1. If you use both the NO COMMIT and REJECTED DATA AS TABLE clauses in your COPY statement, and the reject-table does not already exist, Vertica Analytic Database saves the rejected data table as a LOCAL TEMP table and returns a message that a LOCAL TEMP table is being created.

  1. When you save rejected records to a table, using the REJECTED DATA AS TABLE table_name option, the data for the table is saved in a database data subdirectory, RejectionTableData. For example, for a VMart database, table data files reside here:

/home/dbadmin/VMart/v_vmart_node0001_data/RejectionTableData

  1. Rejected data tables include both rejected data and the reason for the rejection (exceptions), along with other data columns


The rejected data table has the following columns: 

Column
Data Type
Description
node_name
VARCHAR
The name of the HP Vertica node on which the input load file was located.
file_name
VARCHAR
The name of the file being loaded, which applies if you loaded a file (as opposed to using STDIN).
session_id
VARCHAR
The session ID number in which the COPY statement occurred.
transaction_id
INTEGER
Identifier for the transaction within the session, if any; otherwise NULL.
statement_id 
INTEGER
The unique identification number of the statement within the transaction that included the rejected data.
Tip: You can use the session_id, transaction_id, and statement_id columns to create joins with many system tables. For example, if you join against theQUERY_REQUESTS table using those three columns, theQUERY_REQUESTS.REQUEST column contains the actual COPY statement (as a string) used to load this data.
batch_number
INTEGER
INTERNAL USE. Represents which batch (chunk) the data comes from.
row_number
INTEGER
The rejected row number from the input file.
rejected_data
LONG VARCHAR
The data that was not loaded.
rejected_data_orig_length
INTEGER
The length of the rejected data.
rejected_reason
VARCHAR
The error that caused the rejected row. This column returns the same message that exists in a load exceptions file when you do not save to a table


Following shows an example of querying the rejected data table:

  1. Create the loader table: 

=> CREATE TABLE loader(a INT)
CREATE TABLE

  1. Use COPY to load values, saving rejected data to a table, loader_rejects:

=> copy loader from stdin rejected data as table loader_rejects;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 1
>> 2
>> 3
>> a
>> \.

  1. Query the loader table after loading data: 

=> select * from loader;
 x
---
 1
 2
 3
(3 rows)

  1. Query the loader_rejects table to see its column rows:

=> select * from loader_rejects;
-[ RECORD 1 ]-------------+--------------------------------------------
node_name                 | v_vmart_node0001
file_name                 | STDIN
session_id                | doch01.verticacorp.-24016:0x3439
transaction_id            | 45035996274080923
statement_id              | 1
batch_number              | 0
row_number                | 4
rejected_data             | a
rejected_data_orig_length | 1
rejected_reason           | Invalid integer format 'a' for column 1 (x)





Following shows an example of exporting the Rejected Records table:

  1. Create a sample table:
=> create table t (i int);
CREATE TABLE

  1. Copy data directly into the table, using a table to store rejected data: 
=> copy t from stdin rejected data as table "t_rejects";
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 1
>> 2
>> 3
>> 4
>> a
>> b
>> c
>> \.

  1. Show only tuples and set the output format:
=> \t
Showing only tuples.
=> \a
Output format is unaligned.

  1. Output to a file (rejected.txt):
=> \o rejected.txt
=> select rejected_data from t_rejects;
=> \o

  1. Use the catcommand on the saved file:
    => \! cat rejected.txt
    a
    b
    c
    =>


No comments:

Post a Comment