COPY: Saving REJECTED
Data to a table.
- Use the REJECTED DATA parameter with the AS TABLE clause to specify a table in which to save rejected data.
- When you use the AS TABLE clause, HP Vertica creates a new table if one does not exist, or appends to an existing table. If no parsing rejections occur during a load, the table exists but is empty. The next time you load data, HP Vertica inserts any rejected rows to the existing table.
- The load rejection tables are a special type of table with the following capabilities and limitations:
- Support SELECT statements
- Can use DROP TABLE
- Cannot be created outside of a COPY statement
- Are not K-safe
- If you use both the NO COMMIT and REJECTED DATA AS TABLE clauses in your COPY statement, and the reject-table does not already exist, Vertica Analytic Database saves the rejected data table as a LOCAL TEMP table and returns a message that a LOCAL TEMP table is being created.
- When you save rejected records to a table, using the REJECTED DATA AS TABLE table_name option, the data for the table is saved in a database data subdirectory, RejectionTableData. For example, for a VMart database, table data files reside here:
/home/dbadmin/VMart/v_vmart_node0001_data/RejectionTableData
- Rejected data tables include both rejected data and the reason for the rejection (exceptions), along with other data columns
The rejected data table has the following columns:
Column
|
Data Type
|
Description
|
node_name
|
VARCHAR
|
The
name of the HP Vertica node on which the input load file was
located.
|
file_name
|
VARCHAR
|
The
name of the file being loaded, which applies if you loaded a file (as opposed
to using STDIN).
|
session_id
|
VARCHAR
|
The
session ID number in which the COPY statement occurred.
|
transaction_id
|
INTEGER
|
Identifier
for the transaction within the session, if any; otherwise NULL.
|
statement_id
|
INTEGER
|
The
unique identification number of the statement within the transaction that
included the rejected data.
Tip: You
can use the session_id, transaction_id,
and statement_id columns to create joins with many system tables.
For example, if you join against theQUERY_REQUESTS table using those
three columns, theQUERY_REQUESTS.REQUEST column contains the
actual COPY statement (as a string) used to load this data.
|
batch_number
|
INTEGER
|
INTERNAL
USE. Represents which batch (chunk) the data comes from.
|
row_number
|
INTEGER
|
The rejected row number from
the input file.
|
rejected_data
|
LONG VARCHAR
|
The data that was not loaded.
|
rejected_data_orig_length
|
INTEGER
|
The
length of the rejected data.
|
rejected_reason
|
VARCHAR
|
The
error that caused the rejected row. This column returns the same message that
exists in a load exceptions file when you do not save to a table
|
Following shows an example
of querying the rejected data table:
- Create the loader table:
=> CREATE TABLE loader(a INT)
CREATE TABLE
CREATE TABLE
- Use COPY to load values, saving rejected data to a table, loader_rejects:
=> copy loader from stdin rejected data as table
loader_rejects;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 1
>> 2
>> 3
>> a
>> \.
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 1
>> 2
>> 3
>> a
>> \.
- Query the loader table after loading data:
=> select * from loader;
x
---
1
2
3
(3 rows)
x
---
1
2
3
(3 rows)
- Query the loader_rejects table to see its column rows:
=> select * from loader_rejects;
-[ RECORD 1 ]-------------+--------------------------------------------
node_name | v_vmart_node0001
file_name | STDIN
session_id | doch01.verticacorp.-24016:0x3439
transaction_id | 45035996274080923
statement_id | 1
batch_number | 0
row_number | 4
rejected_data | a
rejected_data_orig_length | 1
rejected_reason | Invalid integer format 'a' for column 1 (x)
-[ RECORD 1 ]-------------+--------------------------------------------
node_name | v_vmart_node0001
file_name | STDIN
session_id | doch01.verticacorp.-24016:0x3439
transaction_id | 45035996274080923
statement_id | 1
batch_number | 0
row_number | 4
rejected_data | a
rejected_data_orig_length | 1
rejected_reason | Invalid integer format 'a' for column 1 (x)
Following shows an example
of exporting the Rejected Records table:
- Create a sample table:
=> create table t (i int);
CREATE TABLE
CREATE TABLE
- Copy data directly into the table, using a table to store rejected data:
=> copy t from stdin rejected data as table
"t_rejects";
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 1
>> 2
>> 3
>> 4
>> a
>> b
>> c
>> \.
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 1
>> 2
>> 3
>> 4
>> a
>> b
>> c
>> \.
- Show only tuples and set the output format:
=> \t
Showing only tuples.
=> \a
Output format is unaligned.
Showing only tuples.
=> \a
Output format is unaligned.
- Output to a file (rejected.txt):
=> \o rejected.txt
=> select rejected_data from t_rejects;
=> \o
=> select rejected_data from t_rejects;
=> \o
- Use the catcommand on the saved file:
=> \! cat rejected.txt
a
b
c
=>
No comments:
Post a Comment