Vertica Learning: 1.8 COPY : Loading From specific paths and Loading BZIP/GZIP files

COPY: Loading Data from specific Paths.

Use the 'pathToData' option to indicate the location of the load file.

Optionally we can indicate a node name or ON ANY NODE to indicate which node (or nodes) should parse the load file

We can specify multiple formats like GZIP,BZIP or UNCOMPRESSED (default).

Using the ON ANY NODE clause (without indicates that the source file to load is on all of the nodes, so COPY opens the file and parses it from any node in the cluster.

Be sure that the source file you specify is available and accessible on each cluster node.

If pathToData resolves to a storage location, and the user invoking COPY is not a superuser, these are the required permissions:

The storage location must have been created with the USER option
The user must already have been granted READ access to the storage location where the file(s) exist

COPY: Loading BZIP and GZIP Files

We can load files compressed with BZIP or GZIP.

To do so, we must indicate the compression format for each file when loading multiple files.

Following examples show how to load compressd files.

We need to specify the compression technique used after the file name.

sukul1=> COPY USER_30_DAY_TABLES.EMPLOYEE_1

sukul1-> (

sukul1(> NAME,

sukul1(> AGE,

sukul1(> JOINING_DATE FILLER VARCHAR(10),

sukul1(> JOINING_YEAR AS
TO_NUMBER(TO_CHAR(TO_DATE(JOINING_DATE,'YYYY-MM-DD'),'YYYY')),

sukul1(> SALARY

sukul1(> )

sukul1-> FROM LOCAL '/home/sukul1/employeedet1.dat.bz2' BZIP

sukul1-> DELIMITER '|'

sukul1-> ABORT ON ERROR

sukul1-> DIRECT;

 Rows
Loaded

-------------

2

(1 row)

sukul1=> select * from
USER_30_DAY_TABLES.EMPLOYEE_1;

 
NAME  | AGE | JOINING_YEAR |
SALARY

--------+-----+--------------+--------

 RUTUJA
|  24 |         2005 | 
20000

 SUKUL 
|  28 |         2010 | 
10000

(2 rows)

sukul1=> COPY USER_30_DAY_TABLES.EMPLOYEE_1

sukul1-> (

sukul1(> NAME,

sukul1(> AGE,

sukul1(> JOINING_DATE FILLER VARCHAR(10),

sukul1(> JOINING_YEAR AS
TO_NUMBER(TO_CHAR(TO_DATE(JOINING_DATE,'YYYY-MM-DD'),'YYYY')),

sukul1(> SALARY

sukul1(> )

sukul1-> FROM LOCAL '/home/sukul1/employeedet2.dat.gz' GZIP

sukul1-> DELIMITER '|'

sukul1-> ABORT ON ERROR

sukul1-> DIRECT;

 Rows
Loaded

-------------

2

(1 row)

sukul1=> select * from
USER_30_DAY_TABLES.EMPLOYEE_1;

  
NAME   | AGE | JOINING_YEAR |
SALARY

----------+-----+--------------+--------

 EMMANUEL
|  28 |         2010 | 
10000

 ALVARO  
|  24 |         2005 | 
20000

 SUKUL   
|  28 |         2010 | 
10000

 RUTUJA  
|  24 |         2005 | 
20000

(4 rows)

sukul1=>

In case we have multiple files we can list them all one after another separated by ,.

However note that compression technique for all files should be same.

We cannot load two files- one .gz and another .bz2 in the same load.

Following example show the error we would get.

sukul1=> COPY USER_30_DAY_TABLES.EMPLOYEE_1

sukul1-> (

sukul1(> NAME,

sukul1(> AGE,

sukul1(> JOINING_DATE FILLER VARCHAR(10),

sukul1(> JOINING_YEAR AS
TO_NUMBER(TO_CHAR(TO_DATE(JOINING_DATE,'YYYY-MM-DD'),'YYYY')),

sukul1(> SALARY

sukul1(> )

sukul1-> FROM LOCAL '/home/sukul1/employeedet2.dat.gz' GZIP,
'/home/sukul1/employeedet1.dat.bz2' BZIP

sukul1-> DELIMITER '|'

sukul1-> ABORT ON ERROR

sukul1-> DIRECT;

ERROR 2724:  COPY ... LOCAL can
read files with same compression only

Vertica Learning

Pages

Sunday, November 8, 2015

1.8 COPY : Loading From specific paths and Loading BZIP/GZIP files

No comments:

Post a Comment