Sunday, November 8, 2015

1.16 COPY : How to specify a delimiter other than |?

COPY : How to specify a delimiter other than |?

The default COPY delimiter is a vertical bar (|).

The DELIMITER is a single ASCII character used to separate columns within each record of a file.

You can define a different delimiter using any ASCII value in the range E'\000' to E'\177' inclusive.
For instance, if you are loading CSV data files, and the files use a comma (,) character as a delimiter, you can change the default delimiter to a comma.

You cannot use the same character for both the DELIMITER and NULL options.

If the delimiter character is among a string of data values, use the ESCAPE AS character (\ by default) to indicate that the delimiter should be treated as a literal.

To indicate a non-printing delimiter character (such as a tab), specify the character in extended string syntax (E'...'). If your database has StandardConformingStrings enabled, use a Unicode string literal (U&'...'). For example, use either E'\t' or U&'\0009' to specify tab as the delimiter.

Example:

We have following file to load. The file has columns separated by tabs.

sukul1=> \! cat employeedet2_tab.dat
BHIM     29      2010-02-01      20000
MAHESH  30      2015-01-01      30000

sukul1=> COPY USER_30_DAY_TABLES.EMPLOYEE_1
sukul1-> (
sukul1(> NAME,
sukul1(> AGE,
sukul1(> JOINING_DATE FILLER VARCHAR(10),
sukul1(> JOINING_YEAR AS TO_NUMBER(TO_CHAR(TO_DATE(JOINING_DATE,'YYYY-MM-DD'),'YYYY')),
sukul1(> SALARY
sukul1(> )
sukul1-> FROM LOCAL '/home/sukul1/employeedet2_tab.dat'
sukul1-> DELIMITER E'\t'
sukul1-> ABORT ON ERROR
sukul1-> DIRECT;
 Rows Loaded
-------------
           2
(1 row)

sukul1=> select * from USER_30_DAY_TABLES.EMPLOYEE_1;
  NAME  | AGE | JOINING_YEAR | SALARY
--------+-----+--------------+--------
 MAHESH |  30 |         2015 |  30000
 BHIM   |  29 |         2010 |  20000
(2 rows)


Note the way we specified the delimiter. We used the E'\t' notation to indicate the tab.
Tab being a non printing character we used the extended string syntax.

Following shows as example of using comma as the delimiter.

<> /home/sm017r $ cat employeedet2_comma.dat
MUDIT,29,2013-01-01,40000
HONEY,32,2015-01-02,30000


sm017r=> COPY USER_30_DAY_TABLES.EMPLOYEE_1
sm017r-> (
sm017r(> NAME,
sm017r(> AGE,
sm017r(> JOINING_DATE FILLER VARCHAR(10),
sm017r(> JOINING_YEAR AS TO_NUMBER(TO_CHAR(TO_DATE(JOINING_DATE,'YYYY-MM-DD'),'YYYY')),
sm017r(> SALARY
sm017r(> )
sm017r-> FROM LOCAL '/home/sm017r/employeedet2_comma.dat'
sm017r-> DELIMITER ','
sm017r-> ABORT ON ERROR
sukul1-> DIRECT;
 Rows Loaded
-------------
           2
(1 row)

sm017r=> select * from USER_30_DAY_TABLES.EMPLOYEE_1;
  NAME  | AGE | JOINING_YEAR | SALARY
--------+-----+--------------+--------
 BHIM   |  29 |         2010 |  20000
 MAHESH |  30 |         2015 |  30000
 HONEY  |  32 |         2015 |  30000
 MUDIT  |  29 |         2013 |  40000
(4 rows)

As we know the DELIMITER option can be used at statement level and also at COLUMN OPTION level.
Say we have following input line to be loaded.

1|ee~gg|yy|1999-12-12

Here the value of 1st column is expected to be 1, 2nd column should be ee and 3rd column should be gg.
Using the default | delimiter the 2nc column would get the value ee~gg.
So we need to specify a second delimiter to override the default | delimiter only for the 2nd column.
So following is how we specify the delimiter only for the 2nd column.

=> COPY t COLUMN OPTION(col1 DELIMITER '~') FROM STDIN NO COMMIT;
>> 1|ee~gg|yy|1999-12-12
>> \.
=> SELECT * FROM t;
 pk | col1 | col2 | col3 |        col4
----+------+------+------+---------------------
  1 | ee   | gg   | yy   | 1999-12-12 00:00:00
(1 row)

Note that value ee goes to the 2nd column and value gg goes to 3rd column.

Remember that COLUMN level load metadata has higher precedence than the statement level (or defaults)

No comments:

Post a Comment