How to transform Data while loading
using COPY statement.
|
Transforming
data during loads is useful for computing values to insert into a target
database column from other columns in the source database.
|
|
To
transform data during load, use the following syntax to specify the target
column for which you want to compute values, as an expression:
COPY [ [database-name.]schema-name.]table [([Column as Expression] /
column[FORMAT 'format']
[ ,...])] FROM ... |
|
When transforming data during loads,
the COPY statement must contain at least one parsed column.
Note that the parsed column can be a FILLER column. FILLER are fields from file that we are not loading to any target column. Such fields must be marked as FILLER. However FILLER data could be used for extracting values for other column.
Transforming
data while loading is one of the advantages of using FILLER.
Note
that when we extract data from another column(non filler), then that another
column is also loaded to the table.
|
4
|
The expression you use in a COPY statement can
be as simple as a single column or as
complex as a case expression for multiple columns.
You can specify multiple columns in a COPY expression, and have multiple COPY expressions refer to the same parsed column. You can specify COPY expressions for columns of all supported data types. |
|
COPY expressions
can use many HP Vertica-supported SQL functions, operators, constants,
NULLs, and comments, including these functions:
|
|
COPY expressions cannot use SQL meta functions (HP Vertica-specific), analytic functions, aggregate functions,
or computed columns.
|
7
|
The
return data type of the expression must be coercible to that of the target
column.
|
|
Errors that occur in COPY expressions are
treated as SQL errors, not parse errors.
When a parse errors occur, COPY rejects the row and adds a copy of the
row to the rejected data file. COPY also adds a message to the
exceptions file describing why the row was rejected.
COPY expression
errors are treated as SQL errors and cause the entire load to rollback.
The HP
Vertica-specific log file will include the SQL error message, but the reason
for the rollback is not obvious without researching the log.
|
Example
1: Computing column values from
other columns.
Below example uses a column JOINING_DATE to extract value for
another column GIFT_ELIGIBLE.
Before running COPY statement we need to set some variables.
<> /home/sukul1 $ export LANG=en_US.UTF-8
<> /home/sukul1 $ export PAGER=less
We need to add the location of the vsql utility to the PATH
variable, before running the below command.
<> /home/sukul1 $ vsql -U sukul1 -w
bored2d# -h gone077.xxxx.lll.com
Welcome to vsql, the Vertica Analytic Database
interactive terminal.
Type: \h
or \? for help with vsql commands
\g
or terminate with semicolon to execute query
\q
to quit
Bad terminal type: "xterm". Will
assume vt100.
Following is the file we would load.
Note the command \! Allows us
to run shell commands from within the vsql.
sukul1=> \! cat /home/sukul1/employeedet.dat
SUKUL|28|2010-12-20|10000
RUTUJA|24|2005-12-24|20000
Following is the command used to create the table
USER_30_DAY_TABLES.EMPLOYEE_1
sukul1=> CREATE TABLE
USER_30_DAY_TABLES.EMPLOYEE_1
sukul1-> (
sukul1(> NAME VARCHAR(30),
sukul1(> AGE INTEGER,
sukul1(> JOINING_DATE DATE,
sukul1(> SALARY DECIMAL(18,0),
sukul1(> GIFT_ELIGIBLE CHAR(1)
sukul1(> );
CREATE TABLE
Following shows how to load the table using COPY statement.
Note that the file contains no value for the column GIFT_ELIGIBLE.
We used the "COLUMN AS EXPRESSION" syntax to extract value
for the column GIFT_ELIGIBLE.
sukul1=> COPY USER_30_DAY_TABLES.EMPLOYEE_1
sukul1-> (
sukul1(> NAME,
sukul1(> AGE,
sukul1(> JOINING_DATE,
sukul1(> SALARY,
sukul1(> GIFT_ELIGIBLE AS CASE WHEN JOINING_DATE >
TO_DATE('2012-02-02','YYYY-MM-DD') THEN 'N' ELSE 'Y' END
sukul1(> )
sukul1-> FROM LOCAL
'/home/sukul1/employeedet.dat'
sukul1-> DELIMITER '|'
sukul1-> ABORT ON ERROR
sukul1-> DIRECT;
Rows
Loaded
-------------
2
(1 row)
sukul1=> select * from
USER_30_DAY_TABLES.EMPLOYEE_1;
NAME | AGE | JOINING_DATE |
SALARY | GIFT_ELIGIBLE
--------+-----+--------------+--------+---------------
RUTUJA
| 24 | 2005-12-24 |
20000 | Y
SUKUL
| 28 | 2010-12-20 |
10000 | Y
(2 rows)
Example
2: Computing column values from
FILLER columns.
FILLER columns are used when we wont be loading the data fields as
it is.
We could use filler to ignore fields in the file or use them to
transform data and generate values for other columns.
Following shows how we generate values for the column JOINING_YEAR
from JOINING_DATE.
Since we are not loading the JOINING_DATE column we mark it as
FILLER.
But then use it to extract values for JOINING_YEAR.
sukul1=> \! cat /home/sukul1/employeedet.dat
SUKUL|28|2010-12-20|10000
RUTUJA|24|2005-12-24|20000
sukul1=> CREATE TABLE
USER_30_DAY_TABLES.EMPLOYEE_1
sukul1-> (
sukul1(> NAME VARCHAR(30),
sukul1(> AGE INTEGER,
sukul1(> JOINING_YEAR INTEGER,
sukul1(> SALARY DECIMAL(18,0)
sukul1(> );
CREATE TABLE
sukul1=> COPY USER_30_DAY_TABLES.EMPLOYEE_1
sukul1-> (
sukul1(> NAME,
sukul1(> AGE,
sukul1(> JOINING_DATE FILLER VARCHAR(10),
sukul1(> JOINING_YEAR AS
TO_NUMBER(TO_CHAR(TO_DATE(JOINING_DATE,'YYYY-MM-DD'),'YYYY')),
sukul1(> SALARY
sukul1(> )
sukul1-> FROM LOCAL
'/home/sukul1/employeedet.dat'
sukul1-> DELIMITER '|'
sukul1-> ABORT ON ERROR
sukul1-> DIRECT;
Rows
Loaded
-------------
2
(1 row)
sukul1=> select * from
USER_30_DAY_TABLES.EMPLOYEE_1;
NAME | AGE | JOINING_YEAR |
SALARY
--------+-----+--------------+--------
SUKUL
| 28 | 2010 |
10000
RUTUJA
| 24 | 2005 |
20000
(2 rows)
No comments:
Post a Comment