Sunday, November 15, 2015

1.29 COPY: Purpose of using the STREAM NAME Parameter.

COPY: Purpose of using the STREAM NAME Parameter.

The LOAD_STREAMS system table includes information about load stream metrics from COPY and COPY FROM VERTICA statements
We can query this table to get the COPY metrics.

The LOAD_STREAMS system table includes stream names for every COPY statement that takes more than 1-second to run. The 1-second duration includes the time to plan and execute the statement.
HP Vertica maintains system table metrics until they reach a designated size quota (in kilobytes). The quota is set through internal processes and cannot be set or viewed directly.

Using the STREAM NAME parameter as part of the COPY statement labels COPY streams explicitly so they are easier to identify in the LOAD_STREAMS system table. This is the main purpose of the STREAM NAME option.

To use the STREAM NAME parameter:

=> COPY mytable FROM myfile DELIMITER '|' DIRECT STREAM NAME 'My stream name';

Following are few imp columns in the LOAD_STREAMS Table.

  • ACCEPTED_ROW_COUNT
  • REJECTED_ROW_COUNT
  • PARSE_COMPLETE_PERCENT
  • SORT_COMPLETE_PERCENT

When a COPY statement using the DIRECT option is in progress, the ACCEPTED_ROW_COUNT value can increase to the maximum number of rows in the input file as the rows are being parsed.

In a typical load, the PARSE_COMPLETE_PERCENT value can either increase slowly to 100%, or jump to 100% quickly if you are loading from named pipes or STDIN, while SORT_COMPLETE_PERCENT is at 0. Once PARSE_COMPLETE_PERCENT reaches 100%, SORT_COMPLETE_PERCENT increases to 100%. Depending on the data sizes, a significant lag can occur between the time PARSE_COMPLETE_PERCENT reaches 100% and the timeSORT_COMPLETE_PERCENT begins to increase.

If COPY reads input data from multiple named pipes, the PARSE_COMPLETE_PERCENT value will remain at zero (0) until all named pipes return an EOF. While COPY awaits an EOF from multiple pipes, it can appear to be hung. However, before canceling the COPY statement, check your system CPU and disk accesses to see if any activity is in progress.

Expanded display is on.
=> SELECT stream_name, table_name, load_start, accepted_row_count,
   rejected_row_count, read_bytes, unsorted_row_count, sorted_row_count,
   sort_complete_percent FROM load_streams;
-[ RECORD 1 ]----------+---------------------------
stream_name            | fact-13
table_name             | fact
load_start             | 2010-12-28 15:07:41.132053
accepted_row_count     | 900
rejected_row_count     | 100
read_bytes             | 11975
input_file_size_bytes  | 0
parse_complete_percent | 0
unsorted_row_count     | 3600
sorted_row_count       | 3600
sort_complete_percent  | 100


No comments:

Post a Comment