COPY : How to specify a delimiter
other than |?
The default COPY delimiter is a vertical bar (|).
The
DELIMITER is a single ASCII character used
to separate columns within each record of a file.
You
can define a different delimiter using any ASCII value in the range E'\000' to E'\177' inclusive.
For
instance, if you are loading CSV data
files, and the files use a comma (,) character as a delimiter, you can change
the default delimiter to a comma.
You cannot use the same character for both the DELIMITER and
NULL options.
If
the delimiter character is among a string of data values, use the ESCAPE AS character (\ by default) to indicate that
the delimiter should be treated as a literal.
To
indicate a non-printing delimiter character (such as a tab), specify the
character in extended string
syntax (E'...'). If your database has StandardConformingStrings enabled,
use a Unicode string literal (U&'...'). For example, use
either E'\t' or U&'\0009' to specify tab as the
delimiter.
Example:
We have
following file to load. The file has columns separated by tabs.
sukul1=> \! cat employeedet2_tab.dat
BHIM
29 2010-02-01 20000
MAHESH
30 2015-01-01 30000
sukul1=> COPY USER_30_DAY_TABLES.EMPLOYEE_1
sukul1-> (
sukul1(> NAME,
sukul1(> AGE,
sukul1(> JOINING_DATE FILLER VARCHAR(10),
sukul1(> JOINING_YEAR AS
TO_NUMBER(TO_CHAR(TO_DATE(JOINING_DATE,'YYYY-MM-DD'),'YYYY')),
sukul1(> SALARY
sukul1(> )
sukul1-> FROM LOCAL
'/home/sukul1/employeedet2_tab.dat'
sukul1-> DELIMITER E'\t'
sukul1-> ABORT ON ERROR
sukul1-> DIRECT;
Rows
Loaded
-------------
2
(1 row)
sukul1=> select * from
USER_30_DAY_TABLES.EMPLOYEE_1;
NAME | AGE | JOINING_YEAR |
SALARY
--------+-----+--------------+--------
MAHESH
| 30 | 2015 |
30000
BHIM | 29
| 2010 | 20000
(2 rows)
Note the
way we specified the delimiter. We used the E'\t' notation to indicate the tab.
Tab being
a non printing character we used the extended string syntax.
Following
shows as example of using comma as the delimiter.
<> /home/sm017r $ cat
employeedet2_comma.dat
MUDIT,29,2013-01-01,40000
HONEY,32,2015-01-02,30000
sm017r=> COPY USER_30_DAY_TABLES.EMPLOYEE_1
sm017r-> (
sm017r(> NAME,
sm017r(> AGE,
sm017r(> JOINING_DATE FILLER VARCHAR(10),
sm017r(> JOINING_YEAR AS
TO_NUMBER(TO_CHAR(TO_DATE(JOINING_DATE,'YYYY-MM-DD'),'YYYY')),
sm017r(> SALARY
sm017r(> )
sm017r-> FROM LOCAL
'/home/sm017r/employeedet2_comma.dat'
sm017r-> DELIMITER ','
sm017r-> ABORT ON ERROR
sukul1-> DIRECT;
Rows
Loaded
-------------
2
(1 row)
sm017r=> select * from
USER_30_DAY_TABLES.EMPLOYEE_1;
NAME | AGE | JOINING_YEAR |
SALARY
--------+-----+--------------+--------
BHIM | 29
| 2010 | 20000
MAHESH
| 30 | 2015 |
30000
HONEY
| 32 | 2015 |
30000
MUDIT
| 29 | 2013 |
40000
(4 rows)
As we
know the DELIMITER option can be used at statement
level and also at COLUMN OPTION level.
Say we
have following input line to be loaded.
1|ee~gg|yy|1999-12-12
Here the
value of 1st column is expected to be 1, 2nd column should be ee and 3rd column
should be gg.
Using the
default | delimiter the 2nc column would get the value ee~gg.
So we
need to specify a second delimiter to override the default | delimiter only for
the 2nd column.
So
following is how we specify the delimiter only for the 2nd column.
=> COPY t COLUMN
OPTION(col1 DELIMITER '~') FROM STDIN NO
COMMIT;
>> 1|ee~gg|yy|1999-12-12
>> \.
=> SELECT * FROM t;
pk | col1 | col2 | col3 | col4
----+------+------+------+---------------------
1 | ee | gg | yy | 1999-12-12 00:00:00
(1 row)
>> 1|ee~gg|yy|1999-12-12
>> \.
=> SELECT * FROM t;
pk | col1 | col2 | col3 | col4
----+------+------+------+---------------------
1 | ee | gg | yy | 1999-12-12 00:00:00
(1 row)
Note that
value ee goes to the 2nd column and value gg goes to 3rd column.
Remember that COLUMN level load metadata has higher precedence than
the statement level (or defaults)
No comments:
Post a Comment