Sunday, November 8, 2015

1.8 COPY : Loading From specific paths and Loading BZIP/GZIP files

COPY: Loading Data from specific Paths.

Use the 'pathToData' option to indicate the location of the load file.
Optionally we can indicate a node name or ON ANY NODE to indicate which node (or nodes) should parse the load file
We can specify multiple formats like GZIP,BZIP or UNCOMPRESSED (default).

Using the ON ANY NODE clause (without  indicates that the source file to load is on all of the nodes, so COPY opens the file and parses it from any node in the cluster.
Be sure that the source file you specify is available and accessible on each cluster node.

If pathToData resolves to a storage location, and the user invoking COPY is not a superuser, these are the required permissions:
  • The storage location must have been created with the USER option
  • The user must already have been granted READ access to the storage location where the file(s) exist

COPY: Loading BZIP and GZIP Files

We can load files compressed with BZIP or GZIP.
To do so, we must indicate the compression format for each file when loading multiple files.

Following examples show how to load compressd files.
We need to specify the compression technique used after the file name.

sukul1=> COPY USER_30_DAY_TABLES.EMPLOYEE_1
sukul1-> (
sukul1(> NAME,
sukul1(> AGE,
sukul1(> JOINING_DATE FILLER VARCHAR(10),
sukul1(> JOINING_YEAR AS TO_NUMBER(TO_CHAR(TO_DATE(JOINING_DATE,'YYYY-MM-DD'),'YYYY')),
sukul1(> SALARY
sukul1(> )
sukul1-> FROM LOCAL '/home/sukul1/employeedet1.dat.bz2' BZIP
sukul1-> DELIMITER '|'
sukul1-> ABORT ON ERROR
sukul1-> DIRECT;
 Rows Loaded
-------------
           2
(1 row)

sukul1=> select * from USER_30_DAY_TABLES.EMPLOYEE_1;
  NAME  | AGE | JOINING_YEAR | SALARY
--------+-----+--------------+--------
 RUTUJA |  24 |         2005 |  20000
 SUKUL  |  28 |         2010 |  10000
(2 rows)

sukul1=> COPY USER_30_DAY_TABLES.EMPLOYEE_1
sukul1-> (
sukul1(> NAME,
sukul1(> AGE,
sukul1(> JOINING_DATE FILLER VARCHAR(10),
sukul1(> JOINING_YEAR AS TO_NUMBER(TO_CHAR(TO_DATE(JOINING_DATE,'YYYY-MM-DD'),'YYYY')),
sukul1(> SALARY
sukul1(> )
sukul1-> FROM LOCAL '/home/sukul1/employeedet2.dat.gz' GZIP
sukul1-> DELIMITER '|'
sukul1-> ABORT ON ERROR
sukul1-> DIRECT;
 Rows Loaded
-------------
           2
(1 row)

sukul1=> select * from USER_30_DAY_TABLES.EMPLOYEE_1;
   NAME   | AGE | JOINING_YEAR | SALARY
----------+-----+--------------+--------
 EMMANUEL |  28 |         2010 |  10000
 ALVARO   |  24 |         2005 |  20000
 SUKUL    |  28 |         2010 |  10000
 RUTUJA   |  24 |         2005 |  20000
(4 rows)

sukul1=>

In case we have multiple files we can list them all one after another separated by ,.
However note that compression technique for all files should be same.

We cannot load two files- one .gz and another .bz2 in the same load.
Following example show the error we would get.

sukul1=> COPY USER_30_DAY_TABLES.EMPLOYEE_1
sukul1-> (
sukul1(> NAME,
sukul1(> AGE,
sukul1(> JOINING_DATE FILLER VARCHAR(10),
sukul1(> JOINING_YEAR AS TO_NUMBER(TO_CHAR(TO_DATE(JOINING_DATE,'YYYY-MM-DD'),'YYYY')),
sukul1(> SALARY
sukul1(> )
sukul1-> FROM LOCAL '/home/sukul1/employeedet2.dat.gz' GZIP, '/home/sukul1/employeedet1.dat.bz2' BZIP
sukul1-> DELIMITER '|'
sukul1-> ABORT ON ERROR
sukul1-> DIRECT;

ERROR 2724:  COPY ... LOCAL can read files with same compression only

No comments:

Post a Comment