Sunday, November 8, 2015

1.9 COPY : Loading with Wildcards (glob) ON ANY NODE

COPY : Loading with Wildcards (glob) ON ANY NODE

COPY fully supports the ON ANY NODE clause with a wildcard (glob).
We can invoke COPY for a large number of files in a shared directory with a single statement such as:

COPY myTable FROM '/mydirectory/ofmanyfiles/*.dat' ON ANY NODE

Advantage of using * with ON ANY NODE option:

Using a wildcard with the ON ANY NODE clause expands the file list on the initiator node.
This command then distributes the individual files among all nodes, so that the COPY workload is evenly distributed across the entire cluster.

This technique is used commonly across our system to load huge files.
Files are created on a shared storage and * is used to load them.


We can use wildcard while loading files using COPY LOCAL, however that does not distribute files among nodes.

No comments:

Post a Comment