Vertica Learning: 1.16 COPY : How to specify a delimiter other than |?

COPY : How to specify a delimiter other than |?

The default COPY delimiter is a vertical bar (|).

The DELIMITER is a single ASCII character used to separate columns within each record of a file.

You can define a different delimiter using any ASCII value in the range E'\000' to E'\177' inclusive.

For instance, if you are loading CSV data files, and the files use a comma (,) character as a delimiter, you can change the default delimiter to a comma.

You cannot use the same character for both the DELIMITER and NULL options.

If the delimiter character is among a string of data values, use the ESCAPE AS character (\ by default) to indicate that the delimiter should be treated as a literal.

To indicate a non-printing delimiter character (such as a tab), specify the character in extended string syntax (E'...'). If your database has StandardConformingStrings enabled, use a Unicode string literal (U&'...'). For example, use either E'\t' or U&'\0009' to specify tab as the delimiter.

Example:

We have following file to load. The file has columns separated by tabs.

sukul1=> \! cat employeedet2_tab.dat

BHIM    
29      2010-02-01      20000

MAHESH 
30      2015-01-01      30000

sukul1=> COPY USER_30_DAY_TABLES.EMPLOYEE_1

sukul1-> (

sukul1(> NAME,

sukul1(> AGE,

sukul1(> JOINING_DATE FILLER VARCHAR(10),

sukul1(> JOINING_YEAR AS
TO_NUMBER(TO_CHAR(TO_DATE(JOINING_DATE,'YYYY-MM-DD'),'YYYY')),

sukul1(> SALARY

sukul1(> )

sukul1-> FROM LOCAL
'/home/sukul1/employeedet2_tab.dat'

sukul1-> DELIMITER E'\t'

sukul1-> ABORT ON ERROR

sukul1-> DIRECT;

 Rows
Loaded

-------------

2

(1 row)

sukul1=> select * from
USER_30_DAY_TABLES.EMPLOYEE_1;

 
NAME  | AGE | JOINING_YEAR |
SALARY

--------+-----+--------------+--------

 MAHESH
|  30 |         2015 | 
30000

 BHIM   |  29
|         2010 |  20000

(2 rows)

Note the way we specified the delimiter. We used the E'\t' notation to indicate the tab.

Tab being a non printing character we used the extended string syntax.

Following shows as example of using comma as the delimiter.

<> /home/sm017r $ cat
employeedet2_comma.dat

MUDIT,29,2013-01-01,40000

HONEY,32,2015-01-02,30000

sm017r=> COPY USER_30_DAY_TABLES.EMPLOYEE_1

sm017r-> (

sm017r(> NAME,

sm017r(> AGE,

sm017r(> JOINING_DATE FILLER VARCHAR(10),

sm017r(> JOINING_YEAR AS
TO_NUMBER(TO_CHAR(TO_DATE(JOINING_DATE,'YYYY-MM-DD'),'YYYY')),

sm017r(> SALARY

sm017r(> )

sm017r-> FROM LOCAL
'/home/sm017r/employeedet2_comma.dat'

sm017r-> DELIMITER ','

sm017r-> ABORT ON ERROR

sukul1-> DIRECT;

 Rows
Loaded

-------------

2

(1 row)

sm017r=> select * from
USER_30_DAY_TABLES.EMPLOYEE_1;

 
NAME  | AGE | JOINING_YEAR |
SALARY

--------+-----+--------------+--------

 BHIM   |  29
|         2010 |  20000

 MAHESH
|  30 |         2015 | 
30000

 HONEY 
|  32 |         2015 | 
30000

 MUDIT 
|  29 |         2013 | 
40000

(4 rows)

As we know the DELIMITER option can be used at statement level and also at COLUMN OPTION level.

Say we have following input line to be loaded.

1|ee~gg|yy|1999-12-12

Here the value of 1st column is expected to be 1, 2nd column should be ee and 3rd column should be gg.

Using the default | delimiter the 2nc column would get the value ee~gg.

So we need to specify a second delimiter to override the default | delimiter only for the 2nd column.

So following is how we specify the delimiter only for the 2nd column.

=> COPY t COLUMN
OPTION(col1 DELIMITER '~') FROM STDIN NO
COMMIT;

>> 1|ee~gg|yy|1999-12-12

>> \. 

=> SELECT * FROM t;

 pk | col1 | col2 | col3 |        col4

----+------+------+------+---------------------

  1 | ee  
| gg   | yy   | 1999-12-12 00:00:00

(1 row)

Note that value ee goes to the 2nd column and value gg goes to 3rd column.

Remember that COLUMN level load metadata has higher precedence than the statement level (or defaults)

Vertica Learning

Pages

Sunday, November 8, 2015

1.16 COPY : How to specify a delimiter other than |?

No comments:

Post a Comment