среда, 24 сентября 2014 г.

DBMS_PARALLEL_EXECUTE

The DBMS_PARALLEL_EXECUTE package allows a workload associated with a base table to be broken down into smaller chunks which can be run in parallel. The examples in this article are based around a simple update statement, which in reality would be more efficiently coded as a single parallel DML statement, but it serves to explain the usage of the package nicely. In reality you should use this package where straight parallel DML is not appropriate. Using theDBMS_PARALLEL_EXECUTE package involves several distinct stages.

At the end of the article there are some complete examples, using some of the techniques discussed below.

The user controlling the process needs the CREATE JOB privilege.

CONN / AS SYSDBA
GRANT CREATE JOB TO test;

The examples used in this article require the following table to be created and populated.

CONN test/test

DROP TABLE test_tab;

CREATE TABLE test_tab (
  id          NUMBER,
  description VARCHAR2(50),
  num_col     NUMBER,
  session_id  NUMBER,
  CONSTRAINT test_tab_pk PRIMARY KEY (id)
);

INSERT /*+ APPEND */ INTO test_tab
SELECT level,
       'Description for ' || level,
       CASE
         WHEN MOD(level, 5) = 0 THEN 10
         WHEN MOD(level, 3) = 0 THEN 20
         ELSE 30
       END,
       NULL
FROM   dual
CONNECT BY level <= 500000;
COMMIT;

EXEC DBMS_STATS.gather_table_stats(USER, 'TEST_TAB', cascade => TRUE);

SELECT num_col, COUNT(*)
FROM   test_tab
GROUP BY num_col
ORDER BY num_col;

   NUM_COL   COUNT(*)
---------- ----------
        10     100000
        20     133333
        30     266667

SQL>

Create a task

The CREATE_TASK procedure is used to create a new task. It requires a task name to be specified, but can also include an optional task comment.

BEGIN
  DBMS_PARALLEL_EXECUTE.create_task (task_name => 'test_task');
END;
/

Information about existing tasks is displayed using the [DBA|USER]_PARALLEL_EXECUTE_TASKS views.

COLUMN task_name FORMAT A10
SELECT task_name,
       status
FROM   user_parallel_execute_tasks;

TASK_NAME  STATUS
---------- -------------------
test_task  CREATED

SQL>

The GENERATE_TASK_NAME function returns a unique task name if you do not want to name the task manually.

SELECT DBMS_PARALLEL_EXECUTE.generate_task_name
FROM   dual;

GENERATE_TASK_NAME
--------------------------------------------------------------------------------
TASK$_726

SQL>

Split the workload into chunks

The workload is associated with a base table, which can be split into subsets or chunks of rows. There are three methods of splitting the workload into chunks.

The chunks associated with a task can be dropped using the DROP_CHUNKS procedure.

CREATE_CHUNKS_BY_ROWID

The CREATE_CHUNKS_BY_ROWID procedure splits the data by rowid into chunks specified by the CHUNK_SIZE parameter. If the BY_ROW parameter is set to TRUE, the CHUNK_SIZE refers to the number of rows, otherwise it refers to the number of blocks.

BEGIN
  DBMS_PARALLEL_EXECUTE.create_chunks_by_rowid(task_name   => 'test_task',
                                               table_owner => 'TEST',
                                               table_name  => 'TEST_TAB',
                                               by_row      => TRUE,
                                               chunk_size  => 10000);
END;
/

Once the operation is complete the task status is changed to 'CHUNKED'.

COLUMN task_name FORMAT A10

SELECT task_name,
       status
FROM   user_parallel_execute_tasks;

TASK_NAME  STATUS
---------- -------------------
test_task  CHUNKED

SQL>

The [DBA|USER]_PARALLEL_EXECUTE_CHUNKS views display information about the individual chunks.

SELECT chunk_id, status, start_rowid, end_rowid
FROM   user_parallel_execute_chunks
WHERE  task_name = 'test_task'
ORDER BY chunk_id;

  CHUNK_ID STATUS               START_ROWID        END_ROWID
---------- -------------------- ------------------ ------------------
       287 UNASSIGNED           AAASjoAAEAAAAIwAAA AAASjoAAEAAAAI3CcP
       288 UNASSIGNED           AAASjoAAEAAAAI4AAA AAASjoAAEAAAAI/CcP
...
       450 UNASSIGNED           AAASjoAAEAAAAIIAAA AAASjoAAEAAAAIPCcP
       451 UNASSIGNED           AAASjoAAEAAAAIoAAA AAASjoAAEAAAAIvCcP

88 rows selected.

SQL>

CREATE_CHUNKS_BY_NUMBER_COL

The CREATE_CHUNKS_BY_NUMBER_COL procedure divides the workload up based on a number column. It uses the specified columns min and max values along with the chunk size to split the data into approximately equal chunks. For the chunks to be equally sized the column must contain a continuous sequence of numbers, like that generated by a sequence.

BEGIN
  DBMS_PARALLEL_EXECUTE.create_chunks_by_number_col(task_name    => 'test_task',
                                                    table_owner  => 'TEST',
                                                    table_name   => 'TEST_TAB',
                                                    table_column => 'ID',
                                                    chunk_size   => 10000);
END;
/

The [DBA|USER]_PARALLEL_EXECUTE_CHUNKS views display information about the individual chunks.

SELECT chunk_id, status, start_id, end_id
FROM   user_parallel_execute_chunks
WHERE  task_name = 'test_task'
ORDER BY chunk_id;

  CHUNK_ID STATUS                 START_ID     END_ID
---------- -------------------- ---------- ----------
       600 UNASSIGNED                    1      10000
       601 UNASSIGNED                10001      20000
 ...
       648 UNASSIGNED               480001     490000
       649 UNASSIGNED               490001     500000

50 rows selected.

SQL>

CREATE_CHUNKS_BY_SQL

The CREATE_CHUNKS_BY_SQL procedure divides the workload based on a user-defined query. If the BY_ROWID parameter is set to TRUE, the query must return a series of start and end rowids. If it's set to FALSE, the query must return a series of start and end IDs.

DECLARE
  l_stmt CLOB;
BEGIN
  l_stmt := 'SELECT DISTINCT num_col, num_col FROM test_tab';

  DBMS_PARALLEL_EXECUTE.create_chunks_by_sql(task_name => 'test_task',
                                             sql_stmt  => l_stmt,
                                             by_rowid  => FALSE);
END;
/

The [DBA|USER]_PARALLEL_EXECUTE_CHUNKS views display information about the individual chunks.

SELECT chunk_id, status, start_id, end_id
FROM   user_parallel_execute_chunks
WHERE  task_name = 'test_task'
ORDER BY chunk_id;

  CHUNK_ID STATUS                 START_ID     END_ID
---------- -------------------- ---------- ----------
       650 UNASSIGNED                   10         10
       651 UNASSIGNED                   30         30
       652 UNASSIGNED                   20         20

3 rows selected.

SQL>

Run the task

Running a task involves running a specific statement for each defined chunk of work. The documentation only shows examples using updates of the base table, but this is not the only use of this functionality. The statement associated with the task can be a procedure call, as shown in one of the examples at the end of the article.

There are two ways to run a task and several procedures to control a running task.

RUN_TASK

The RUN_TASK procedure runs the specified statement in parallel by scheduling jobs to process the workload chunks. The statement specifying the actual work to be done must include a reference to the ':start_id' and ':end_id', which represent a range of rowids or column IDs to be processed, as specified in the chunk definitions. The degree of parallelism is controlled by the number of scheduled jobs, not the number of chunks defined. The scheduled jobs take an unassigned workload chunk, process it, then move on to the next unassigned chunk.

DECLARE
  l_sql_stmt VARCHAR2(32767);
BEGIN
  l_sql_stmt := 'UPDATE test_tab t 
                 SET    t.num_col = t.num_col + 10,
                        t.session_id = SYS_CONTEXT(''USERENV'',''SESSIONID'')
                 WHERE rowid BETWEEN :start_id AND :end_id';

  DBMS_PARALLEL_EXECUTE.run_task(task_name      => 'test_task',
                                 sql_stmt       => l_sql_stmt,
                                 language_flag  => DBMS_SQL.NATIVE,
                                 parallel_level => 10);
END;
/

The RUN_TASK procedure waits for the task to complete. On completion, the status of the task must be assessed to know what action to take next.

You can see the activity ran in parallel by checking the SESSION_ID column that was set during the update.

SELECT session_id, COUNT(*)
FROM   test_tab
GROUP BY session_id
ORDER BY session_id;

SESSION_ID   COUNT(*)
---------- ----------
    232639      65041
    232640      54029
    232641      53571
    232642      44622
    232643      45264
    232644      54242
    232645      47314
    232646      44746
    232647      46309
    232648      44862

10 rows selected.

SQL>

User-defined framework

The DBMS_PARALLEL_EXECUTE package allows you to manually code the task run. The GET_ROWID_CHUNK and GET_NUMBER_COL_CHUNK procedures return the next available unassigned chunk. You can than manually process the chunk and set its status. The example below shows the processing of a workload chunked by rowid.

DECLARE
  l_sql_stmt    VARCHAR2(32767);
  l_chunk_id    NUMBER;
  l_start_rowid ROWID;
  l_end_rowid   ROWID;
  l_any_rows    BOOLEAN;
BEGIN
  l_sql_stmt := 'UPDATE test_tab t 
                 SET    t.num_col = t.num_col + 10,
                        t.session_id = SYS_CONTEXT(''USERENV'',''SESSIONID'')
                 WHERE rowid BETWEEN :start_id AND :end_id';
 
  LOOP
    -- Get next unassigned chunk.
    DBMS_PARALLEL_EXECUTE.get_rowid_chunk(task_name   => 'test_task',
                                          chunk_id    => l_chunk_id,
                                          start_rowid => l_start_rowid,
                                          end_rowid   => l_end_rowid,
                                          any_rows    => l_any_rows);

    EXIT WHEN l_any_rows = FALSE;
 
    BEGIN
      -- Manually execute the work.
      EXECUTE IMMEDIATE l_sql_stmt USING l_start_rowid, l_end_rowid;

      -- Set the chunk status as processed.
      DBMS_PARALLEL_EXECUTE.set_chunk_status(task_name => 'test_task',
                                             chunk_id  => l_chunk_id,
                                             status    => DBMS_PARALLEL_EXECUTE.PROCESSED);
      EXCEPTION
        WHEN OTHERS THEN
          -- Record chunk error.
          DBMS_PARALLEL_EXECUTE.set_chunk_status(task_name => 'test_task',
                                                 chunk_id  => l_chunk_id,
                                                 status    => DBMS_PARALLEL_EXECUTE.PROCESSED_WITH_ERROR,
                                                 err_num   => SQLCODE,
                                                 err_msg   => SQLERRM);
    END;

    -- Commit work.
    COMMIT;
  END LOOP;
END;
/

Task control

A running task can be stopped and restarted using the STOP_TASK and RESUME_TASK procedures respectively.

The PURGE_PROCESSED_CHUNKS procedure deletes all chunks with a status of 'PROCESSED' or 'PROCESSED_WITH_ERROR'.

The ADM_DROP_CHUNKS, ADM_DROP_TASK, ADM_TASK_STATUS and ADM_STOP_TASK routines have the same function as their namesakes, but they allow the operations to performed on tasks owned by other users. In order to use these routines the user must have been granted the ADM_PARALLEL_EXECUTE_TASK role.

Check the task status

The simplest way to check the status of a task is to use the TASK_STATUS function. After execution of the task, the only possible return values are the 'FINISHED' or 'FINISHED_WITH_ERROR' constants. If the status is not 'FINISHED', then the task can be resumed using the RESUME_TASK procedure.

DECLARE
  l_try NUMBER;
  l_status NUMBER;
BEGIN
  -- If there is error, RESUME it for at most 2 times.
  l_try := 0;
  l_status := DBMS_PARALLEL_EXECUTE.task_status('test_task');
  WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED) 
  Loop
    l_try := l_try + 1;
    DBMS_PARALLEL_EXECUTE.resume_task('test_task');
    l_status := DBMS_PARALLEL_EXECUTE.task_status('test_task');
  END LOOP;
END;
/

The status of the task and the chunks can also be queried.

COLUMN task_name FORMAT A10
SELECT task_name,
       status
FROM   user_parallel_execute_tasks;

TASK_NAME  STATUS
---------- -------------------
test_task  FINISHED

SQL>

If there were errors, the chunks can be queried to identify the problems.

SELECT status, COUNT(*)
FROM   user_parallel_execute_chunks
GROUP BY status
ORDER BY status;

STATUS                 COUNT(*)
-------------------- ----------
PROCESSED                    88

SQL>

The [DBA|USER]_PARALLEL_EXECUTE_TASKS views contain a record of the JOB_PREFIX used when scheduling the chunks of work.

SELECT job_prefix
FROM   user_parallel_execute_tasks
WHERE  task_name = 'test_task';

JOB_PREFIX
------------------------------
TASK$_368

SQL>

This value can be used to query information about the individual jobs used during the process. The number of jobs scheduled should match the degree of parallelism specified in the RUN_TASK procedure.

COLUMN job_name FORMAT A20

SELECT job_name, status
FROM   user_scheduler_job_run_details
WHERE  job_name LIKE (SELECT job_prefix || '%'
                      FROM   user_parallel_execute_tasks
                      WHERE  task_name = 'test_task');

JOB_NAME             STATUS
-------------------- ------------------------------
TASK$_368_1          SUCCEEDED
TASK$_368_6          SUCCEEDED
TASK$_368_2          SUCCEEDED
TASK$_368_9          SUCCEEDED
TASK$_368_10         SUCCEEDED
TASK$_368_8          SUCCEEDED
TASK$_368_7          SUCCEEDED
TASK$_368_4          SUCCEEDED
TASK$_368_5          SUCCEEDED
TASK$_368_3          SUCCEEDED

10 rows selected.

SQL>

Drop the task

Once the job is complete you can drop the task, which will drop the associated chunk information also.

BEGIN
  DBMS_PARALLEL_EXECUTE.drop_task('test_task');
END;
/

Complete examples

The following example shows the processing of a workload chunked by rowid.

DECLARE
  l_task     VARCHAR2(30) := 'test_task';
  l_sql_stmt VARCHAR2(32767);
  l_try      NUMBER;
  l_status   NUMBER;
BEGIN
  DBMS_PARALLEL_EXECUTE.create_task (task_name => l_task);

  DBMS_PARALLEL_EXECUTE.create_chunks_by_rowid(task_name   => l_task,
                                               table_owner => 'TEST',
                                               table_name  => 'TEST_TAB',
                                               by_row      => TRUE,
                                               chunk_size  => 10000);

  l_sql_stmt := 'UPDATE test_tab t 
                 SET    t.num_col = t.num_col + 10,
                        t.session_id = SYS_CONTEXT(''USERENV'',''SESSIONID'')
                 WHERE rowid BETWEEN :start_id AND :end_id';

  DBMS_PARALLEL_EXECUTE.run_task(task_name      => l_task,
                                 sql_stmt       => l_sql_stmt,
                                 language_flag  => DBMS_SQL.NATIVE,
                                 parallel_level => 10);

  -- If there is error, RESUME it for at most 2 times.
  l_try := 0;
  l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);
  WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED) 
  Loop
    l_try := l_try + 1;
    DBMS_PARALLEL_EXECUTE.resume_task(l_task);
    l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);
  END LOOP;

  DBMS_PARALLEL_EXECUTE.drop_task(l_task);
END;
/

The following example shows the processing of a workload chunked by a number column. Notice that the workload is actually a stored procedure in this case.

CREATE OR REPLACE PROCEDURE process_update (p_start_id IN NUMBER, p_end_id IN NUMBER) AS
BEGIN
  UPDATE test_tab t 
  SET    t.num_col = t.num_col + 10,
         t.session_id = SYS_CONTEXT('USERENV','SESSIONID')
  WHERE id BETWEEN p_start_id AND p_end_id;
END;
/

DECLARE
  l_task     VARCHAR2(30) := 'test_task';
  l_sql_stmt VARCHAR2(32767);
  l_try      NUMBER;
  l_status   NUMBER;
BEGIN
  DBMS_PARALLEL_EXECUTE.create_task (task_name => l_task);

  DBMS_PARALLEL_EXECUTE.create_chunks_by_number_col(task_name    => l_task,
                                                    table_owner  => 'TEST',
                                                    table_name   => 'TEST_TAB',
                                                    table_column => 'ID',
                                                    chunk_size   => 10000);

  l_sql_stmt := 'BEGIN process_update(:start_id, :end_id); END;';

  DBMS_PARALLEL_EXECUTE.run_task(task_name      => l_task,
                                 sql_stmt       => l_sql_stmt,
                                 language_flag  => DBMS_SQL.NATIVE,
                                 parallel_level => 10);

  -- If there is error, RESUME it for at most 2 times.
  l_try := 0;
  l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);
  WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED) 
  Loop
    l_try := l_try + 1;
    DBMS_PARALLEL_EXECUTE.resume_task(l_task);
    l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);
  END LOOP;

  DBMS_PARALLEL_EXECUTE.drop_task(l_task);
END;
/

The following example shows a workload chunked by an SQL statement and processed by a user-defined framework.

DECLARE
  l_task     VARCHAR2(30) := 'test_task';
  l_stmt     CLOB;
  l_sql_stmt VARCHAR2(32767);
  l_chunk_id NUMBER;
  l_start_id NUMBER;
  l_end_id   NUMBER;
  l_any_rows BOOLEAN;
BEGIN
  DBMS_PARALLEL_EXECUTE.create_task (task_name => l_task);

  l_stmt := 'SELECT DISTINCT num_col, num_col FROM test_tab';

  DBMS_PARALLEL_EXECUTE.create_chunks_by_sql(task_name => l_task,
                                             sql_stmt  => l_stmt,
                                             by_rowid  => FALSE);

  l_sql_stmt := 'UPDATE test_tab t 
                 SET    t.num_col = t.num_col,
                        t.session_id = SYS_CONTEXT(''USERENV'',''SESSIONID'')
                 WHERE num_col BETWEEN :start_id AND :end_id';

  LOOP
    -- Get next unassigned chunk.
    DBMS_PARALLEL_EXECUTE.get_number_col_chunk(task_name => 'test_task',
                                               chunk_id    => l_chunk_id,
                                               start_id    => l_start_id,
                                               end_id      => l_end_id,
                                               any_rows    => l_any_rows);

    EXIT WHEN l_any_rows = FALSE;
 
    BEGIN
      -- Manually execute the work.
      EXECUTE IMMEDIATE l_sql_stmt USING l_start_id, l_end_id;

      -- Set the chunk status as processed.
      DBMS_PARALLEL_EXECUTE.set_chunk_status(task_name => 'test_task',
                                             chunk_id  => l_chunk_id,
                                             status    => DBMS_PARALLEL_EXECUTE.PROCESSED);
      EXCEPTION
        WHEN OTHERS THEN
          -- Record chunk error.
          DBMS_PARALLEL_EXECUTE.set_chunk_status(task_name => 'test_task',
                                                 chunk_id  => l_chunk_id,
                                                 status    => DBMS_PARALLEL_EXECUTE.PROCESSED_WITH_ERROR,
                                                 err_num   => SQLCODE,
                                                 err_msg   => SQLERRM);
    END;

    -- Commit work.
    COMMIT;
  END LOOP;

  DBMS_PARALLEL_EXECUTE.drop_task(l_task);
END;
/

For more information see:

DBMS_PARALLEL_EXECUTE

Hope this helps. Regards Tim...

Back to the Top.

OWB 11gR2 - Creating Interval Partitions

By David Allan on Aug 02, 2011

Designing partitioned tables in OWB is done in the table editor in the partitioned tab, the partitions tab let’s you design and deploy complex partitioning strategies. Here we will see how to define an interval partition (see an example in the Oracle Database VLDB and Partitioning documentation here), we will partition the SALES fact table using a date column (TIMES) in the table below.

On the partitioning tab there is a table with a tree control inside, essentially there are 4 steps for this example; defining the partition type, define the key columns, define the interval expression and the initial partition details. The buttons Add/Add Subpartition/Add Hash Count/Delete get enabled when you select rows, so you can modify the definition.

Generating the code we can see the DDL for the Oracle partitioning clause has been included.

To create the table definition in OMB you can do something like the following – note there is some double quoting in the expressions.

OMBCREATE TABLE 'SALES_TAB' ADD COLUMN 'TIMES' SET PROPERTIES (DATATYPE) VALUES ('DATE')
# Plus the rest of your table definition....
OMBALTER TABLE 'SALES_TAB' ADD PARTITION_KEY 'TIMES' SET PROPERTIES (TYPE,INTERVAL) VALUES ('RANGE','NUMTOYMINTERVAL(1,''MONTH'')')
OMBALTER TABLE 'INTERVAL_TAB' ADD PARTITION 'PART_01' SET PROPERTIES (VALUES_LESS_THAN) VALUES ('TO_DATE(''01-NOV-2007'',''DD-MON-YYYY'')')

That’s it!

OWB to ODI 12c Migration in action

By David Allan on Aug 06, 2014

The OWB to ODI 12c migration utility provides an easy to use on-ramp to Oracle's strategic data integration tool. The utility was designed and built by the same development group that produced OWB and ODI.

Here's a screenshot from the recording below showing a project in OWB and what it looks like in ODI 12c;

There is a useful webcast that you can play and watch the migration utility in action. It takes an OWB implementation and uses the migration utility to move into ODI 12c.

http://oracleconferencing.webex.com/oracleconferencing/ldr.php?RCID=df8729e0c7628dde638847d9511f6b46
It's worth having a read of the following OTN article from Stewart Bryson which gives an overview of the capabilities and options OWB customers have moving forward.
http://www.oracle.com/technetwork/articles/datawarehouse/bryson-owb-to-odi-2130001.html

How to Execute Process Flow from SQL*Plus

By Jinggen He on Dec 02, 2009

As we all know, Process Flow is a component of Oracle Warehouse Builder, which allows activities to be linked together for execution. Using Process flow, we can execute mappings, transformations or external commands such as email and FTP in a well-constructed process.

OWB provides several ways to launch a Process Flow (see links in References):

Directly start a Process Flow from Design Center/Control Center.
In Design Center, schedule a Process Flow to run at a specified time or to run at regular intervals.
Use OMBSTART to start a Process Flow in OMB*Plus command line.

Besides, we also have another way: execute a Process Flow from SQL*Plus. We can do so by utilizing the utility SQL script: <ORACLE_HOME>/owb/rtp/sql/sqlplus_exec_template.sql.

The syntax of sqlplus_exec_template.sql is as follows:

SYNOPSYS

@sqlplus_exec_template.sql workspace location_name {PLSQLMAP | SQLLOADERCONTROLFILE |
PROCESSFLOW | ABAPFILE | DATAAUDITOR | SCHEDULEDJOB | CTMAPPING} [parent]/task_name
system_params custom_params

USAGE

workspace := e.g. MY_WORKSPACE - Workspace in which the job is to run

location_name :- e.g. MY_WAREHOUSE    - Physical Name of the Location to which this
                                        task was deployed (i.e. a DB Location or a
                                        Process Location or the Platform Schema)
                                        Note: Always use "PlaformSchema" for
                                        SQL_LOADER and SAP types.

task_type     :- PLSQLMAP             - OWB PL/SQL Mapping
              | SQLLOADERCONTROLFILE - OWB SQL*Loader Mapping
              | PROCESSFLOW          - OWB ProcessFlow
              | ABAPFILE             - OWB SAP Mapping
              | DATAAUDITOR          - OWB DataAuditor Mapping
              | SCHEDULEDJOB         - OWB Scheduled Job
              | CTMAPPING            - OWB Template Mapping

task_name     :- e.g. MY_MAPPING      - Physical Name of the Deployed Object. This
                                        can be optionally qualified by the name of a
                                        deployed parent, such as the Processflow
                                        Package name of a Processflow. A module name
                                        cannot be used here because it is not
                                        a deployable object.

system_params :- { , | (name = value [, name = value]...)}
e.g. ","
or MY_PARAM=1,YOUR_PARAM=true

custom_params :- { , | (name = value [, name = value]...)}
e.g. ","
or MY_PARAM=1,YOUR_PARAM=true

The more detailed syntax can be found at the header of sqlplus_exec_template.sql. Actually, as you can see, sqlplus_exec_template.sql is designed for executing several kinds of objects such as PL/SQL Mapping, Code Template Mapping, Process Flow, etc.. Here in this article, we focus only on Process Flow execution. Below is an example of how to execute a Process Flow using the script.

Example

sqlplus user/password@tns_name @sqlplus_exec_template.sql MY_WORKSPACE OWF_LOCATION PROCESS PFPKG/PF1 "," ","

Explanation

"user/password@tns_name" gives the credentials to connect to OWB workspace. It can be a workspace owner or a workspace user.
The workspace to run this Process Flow is "MY_WORKSPACE"
"OWF_LOCATION" is the Oracle Workflow Location, into which the Process Flow to be executed has been deployed.
"PROCESS" indicates the object to execute is of type Process Flow.
"PFPKG/PF1" means we are executing a Process Flow named "PF1", which is under the Process Flow Package named "PFPKG".

Let's go through a full example.

First, let's prepare a simple Process Flow containing only one activity: a user-defined activity, which executes a local shell script that takes a parameter from the caller.

Prepare the shell script

In our case, we let the user-defined activity pass a custom Process Flow scope parameter to this shell script. And here is the content of the Shell script to execute.

#!/bin/bash

WHO="$1"

echo "Process Flow Executed by $WHO" >> /tmp/pf_demo.txt

We put the Shell script in /tmp/pf_demo.sh and add execute privilege for all users. For each execution, pf_demo.sh will add a line to /tmp/pf_demo.txt, reading "Process Flow Executed by Somebody".

Prepare the Process Flow

We use the OMB script in Appendix A to create and deploy the Process Flow: 'PF_DEMO_1/PFM_1/PFPKG_1/PF_1'. You can see that the Process Flow has a custom parameter named " PARAM_WHO" and the user-defined activity passes this parameter to the shell script to execute.

Execute the Process Flow from workspace owner

Then we can execute the command as below:

sqlplus workspace_owner/passwd @sqlplus_exec_template.sql workspace1 PFM_LOCATION PROCESSFLOW "PFPKG_1/PF_1" "," "PARAM_WHO=Owner"

Note that, in Appendix A we have deployed the Process Flow Package "PFPKG_1", which contains a Process Flow "PF_1", to a Oracle Workflow Location "PFM_LOCATION". And in the above example, we pass value "Owner" to parameter "PARAM_WHO".

Here goes the snapshot.

And let's see the output of the shell script.

Execute the Process Flow from workspace user

Also we can use workspace user credentials to start the Process Flow. But before that we should make sure that in "Security Parameters" configuration page (navigated from menu in OWB client "Tools->Preferences"), the item "Share Location Password During Run Time" is checked.

Then we can use workspace user credentials to start the Process Flow as below:

sqlplus workspace_user/passwd @sqlplus_exec_template.sql workspace_owner.workspace1 PFM_LOCATION PROCESSFLOW "PFPKG_1/PF_1" "," "PARAM_WHO=User"

And bellows is the output.

There's a newly added line "Process Flow Executed by User".

Some Important notes

1. When executing sqlplus_exec_template.sql, we need to specify WORKSPACE, which should be declared as workspaceOwner.workspaceName (if only workspaceName is given, workspaceOwner will be defaulted to user). So if we login as workspace user, we need to specify the workspace owner name.

2. If you want to pass some parameters to a Process Flow, you can utilize custom_params. It's comma separated key-value pairs string like "PARAM1=1,PARAM1=true" and "," stands for no parameter. In our demo, we passed value "Owner" or "User" to the custom Parameter "PARAM_WHO".

3. If "Share Location Password During Run Time" is not checked, we have another method to execute the Process Flow from workspace user. That is to login as workspace user and register the OWF location. See OMB commands below.

OMBCONN workspace_user/passwd@localhost:1521:ora111.us.oracle.com

OMBCC 'PF_DEMO_1'

OMBALTER LOCATION 'PFM_LOCATION' SET PROPERTIES (PASSWORD) VALUES ('owf_mgr')

OMBCOMMIT

OMBCONN CONTROL_CENTER

OMBREGISTER LOCATION 'PFM_LOCATION'

4. A Process Flow may contain lots of activities such as mappings, transformations. To guarantee that the Process Flow can be executed successfully, you should confirm that all the depended objects have also been deployed correctly.

References

Charpters in Oracle® Warehouse Builder Data Modeling, ETL, and Data Quality Guide 11g Release 2 (11.2)

Command syntax in Oracle® Warehouse Builder OMB*Plus Command Reference 11g Release 2 (11.2)

OMBSTART

Appendix A

The OMB script to create and deploy the demo Process Flow:

OMBCONN workspace_owner/passwd@localhost:1521:ora111.us.oracle.com

OMBCREATE PROJECT 'PF_DEMO_1'

OMBCC 'PF_DEMO_1'

OMBCREATE LOCATION 'PFM_LOCATION' SET PROPERTIES (TYPE, VERSION) VALUES ('Oracle Workflow','2.6.4')

OMBALTER LOCATION 'PFM_LOCATION' SET PROPERTIES (HOST, PORT, SERVICE_NAME, SCHEMA, PASSWORD) VALUES ('localhost', '1521', 'ora111.us.oracle.com', 'owf_mgr', 'owf_mgr')

OMBCREATE PROCESS_FLOW_MODULE 'PFM_1' SET REF LOCATION 'PFM_LOCATION'

OMBCC 'PFM_1'

OMBCREATE PROCESS_FLOW_PACKAGE 'PFPKG_1'

OMBCC 'PFPKG_1'

OMBCREATE PROCESS_FLOW 'PF_1'

OMBALTER PROCESS_FLOW 'PF_1' ADD USER_DEFINED ACTIVITY 'USER_DEFINED'

OMBALTER PROCESS_FLOW 'PF_1' MODIFY PARAMETER 'COMMAND' OF ACTIVITY 'USER_DEFINED' SET PROPERTIES (VALUE) VALUES ('/tmp/pf_demo.sh')

OMBALTER PROCESS_FLOW 'PF_1' ADD PARAMETER 'PARAM_WHO' SET PROPERTIES (DIRECTION,VALUE) VALUES ('IN','NOBODY')

OMBALTER PROCESS_FLOW 'PF_1' ADD PARAMETER 'WHO' OF ACTIVITY 'USER_DEFINED' SET PROPERTIES (BINDING) VALUES ('PF_1.PARAM_WHO')

OMBALTER PROCESS_FLOW 'PF_1' MODIFY PARAMETER 'PARAMETER_LIST' OF ACTIVITY 'USER_DEFINED' SET PROPERTIES (VALUE) VALUES ('?\${WHO}?')

OMBALTER PROCESS_FLOW 'PF_1' ADD TRANSITION 'TRANSITION_1' FROM ACTIVITY 'START' TO 'USER_DEFINED'

OMBALTER PROCESS_FLOW 'PF_1' ADD TRANSITION 'TRANSITION_2' FROM ACTIVITY 'USER_DEFINED' TO 'END'

OMBCONN CONTROL_CENTER

OMBCOMMIT

OMBREGISTER USER 'owf_mgr'

OMBCOMMIT

OMBREGISTER LOCATION 'PFM_LOCATION'

OMBCC '../'

OMBCREATE TRANSIENT DEPLOYMENT_ACTION_PLAN 'PLAN_1'

OMBALTER DEPLOYMENT_ACTION_PLAN 'PLAN_1' ADD ACTION 'ACTION_1' SET PROPERTIES (OPERATION) VALUES ('CREATE') SET REFERENCE PROCESS_FLOW_PACKAGE 'PFPKG_1'

OMBDEPLOY DEPLOYMENT_ACTION_PLAN 'PLAN_1'

Parallel Processing with DBMS_PARALLEL_EXECUTE

By David Allan on Nov 02, 2011

Here is another illustration of some of the powerful capabilities of the DBMS_PARALLEL_EXECUTE package in the Oracle database, carrying on from the earlier post here. One of the comments from that post was on how to insert into a different named table within each chunk and that insert can perform parallel DML also. This kind of scenario could be interesting for very high end processing, it could be end point target tables or tables that are prepared and then you perform partition exchanges with them or something.

The image below shows a variation on the original post where rather than inserting into a specific partition, you write into a specific table.

Driving the whole process can be your own chunking criteria, the question was how to drive this process from a table using SQL such as ‘select distinct level_key, level_key from chunk_table’ where chunk_table has the level_key and the target table name. For example it could contain the following data;

level_key	table_name
1	sales_level1
2	sales_level2
3	sales_level3
4	sales_level4
…

So the first chunk with level_key 1 will write the results to table sales_level1 etc.

You can use the DBMS_PARALLEL_PACKAGE as follows to create this functionality. The start/end values have to be of data type NUMBER, so you will have to lookup the (target) table name inside your PLSQL block within the statement provided in the run_task call.

This block has the query to determine the chunks .....

begin
   begin
     DBMS_PARALLEL_EXECUTE.DROP_TASK(task_name => 'TASK_NAME');
   exception when others then null;
   end;
   DBMS_PARALLEL_EXECUTE.CREATE_TASK(task_name => 'TASK_NAME');
   DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_SQL(task_name => 'TASK_NAME',
   sql_stmt =>'select distinct level_key, level_key from chunk_table', by_rowid => false);
end;

Then the next block will construct the and process the tasks......

begin
   DBMS_PARALLEL_EXECUTE.RUN_TASK (task_name => 'TASK_NAME',
     sql_stmt =>'declare
       s varchar2(16000); vstart_id number := :start_id; vend_id number:= :end_id;
       table_name varchar2(30);
       begin
         select table_name into table_name from chunk_table where level_key=vstart_id;
         s:=''insert into ''||table_name||'' select /*+ PARALLEL(STG, 8) */ colx from STAGING_TABLE STG
           where level_key =:vstart_id'';
         execute immediate s using vstart_id;
         commit;
     end;',
     language_flag => DBMS_SQL.NATIVE, parallel_level => 2 );
end;

The anonymous PLSQL block can be any arbitrary code, you can see the table name of the target is retrieved, the example does a parallel INSERT using the hint PARALLEL(STG,8). Anyway, good to share.

As well as all of the parallel query and DML capabilities exposed in OWB, the 11gR2 release of OWBincludes out of the box support for the DBMS_PARALLEL_EXECUTE package in 11gR2 of the Oracle.. ==================================================================

OWB 11gR2 – Parallelization using DBMS_PARALLEL_EXECUTE

By David Allan on Jul 18, 2012

As well as all of the parallel query and DML capabilities exposed in OWB, the 11gR2 release of OWB includes out of the box support for the DBMS_PARALLEL_EXECUTE package in 11gR2 of the Oracle database, there are various articles on how to do the grunt work manually coding it on the web (see here for example). The use cases for this range from Oracle techie stuff like managing roll back segments for large table updates to parallelizing your PLSQL mappings…whatever they may be. Here we will see how with a couple of configuration settings in OWB you can self parallelize mappings – divide and conquer the olde fashioned way.

The use case that is generally mentioned for using this is for doing large table updates – basically its dividing a BIG problem into smaller pieces, the package also exposes a resume capability for reprocessing failed chunks which is also exposed from OWB (when you execute a chunked mapping you can resume or start a mapping). The approach can also be utilized for parallelizing arbitrary PLSQL (support for set based, row based also the table being chunked can be the source or the target or an arbitrary SQL statement). The 11.2.0.3 patch is best for this.

Chunk by SQL Statement

The example below updates employees salaries, driving the parallel worker threads by the departments table. As you see below we the start and end department id are the same, we are doing a distinct, so all employees in a department will be updated in each chunk, so we can increase the salary within the EMPLOYEES table for each department in a chunk. We are using the department id to drive the chunking – each department if processed in its own thread. We will see how we can control the thread pool of workers.

Note above there are other options such as chunk by ROWID and an arbitrary SQL statement).

The mapping now using olde fashioned divide and conquer has new runtime properties which relate back to the DBMS_PARALLEL_EXECUTE package and include the resume functionality for processing failed chunks and the parallel level, so you can change at runtime whether 1,2 or n threads are used to process the chunks.

We are saying that we want 2 child processes processing the chunks, this is a runtime parameter. Its wise to think about the number of chunks and the number of child processes to ensure optimal execution plan. The image below depicts the code that gets generated with chunk parallel level 2, and parallel level 4 – essentially the package ‘main’ procedure uses DBMS_PARALLEL_EXECUTE to process the heart of the mapping in the child processes.

There is much more to the package than meets the eye the resume capability for example provides a way of reprocessing failed chunks, rather than reprocessing everything again. This is also exposed as a runtime parameter for the chunked OWB mapping so you can resume and reprocess only the failed chunks.

Chunk by ROWID

This is the classic Large Table Update example that typical divide and conquer is used for. This example updates employees salaries, driving the parallel worker threads by rows in the target employees table itself. With this example when we configure the mapping we pick chunk method as ROWID, the chunk table EMPLOYEES, chunk type BY_ROWS and size is 1000 for example. The EMPLOYEES table is set to perform and update, I define the target filter for update for the ON clause in the merge/update – so its the ‘how do I identify the rows to update within this chunk’ clause. The other thing I had to do was define a chunking expression – now in this case its kind of redundant since the chunk is being done in the ON clause of the update – so we can trick OWB by just saying ‘start=start and end=end’ using the variables. If you don’t specify this, OWB will complain that the chunking expression is invalid.

So the MERGE statement within the chunking code will increase the SALARY for the chunk, you will see the expression to increase salary by, the dummy chunking expression used in selecting from the source (DUAL) and which rows to match – the rows in this chunk.

This let’s us perform large table updates in chunks and drive the parallelized mapping using mapping input parameters.

The parallelization for PLSQL (or row based mappings) is an interesting case - for example for the likes of match merge which has an inherent divide and conquer strategy (in binning), with match merge out of the box the processing of the bins is serialized by default. Combining the chunk parallelization with the match merge binning lets you boost the performance of such mappings. So if you pair the chunking criteria with the binning you can substantially increase the performance of such mappings.

OWB to ODI migration

Veröffentlicht am 11. Februar 2014 von

Oracle has finaliy presented an oficial OWB to ODI migration utility
and article describing typical migration path!

Alternative tools and ways for OWB to ODI migration:

OWB2ODI Converter
Export Metadata from OWB by OMBScript and generation of ODI XML files.

сравнение етээлеров

ETL tool comparison

Veröffentlicht am 6. Februar 2014 von

Loading Type

OWB: ELT
ODI: ELT
Informatica: ETL, ELT with “Pushdown Optimization Option”
Pentaho: ETL
DataStage: ETL
Microsoft SSIS: ELT

System Plfatform

OWB: Windows, Unix
ODI: Windows, Unix
Informatica: Windows, Unix
Pentaho: Windows, Unix, Other Platforms (JDK)
DataStage: Windows, Unix, Other Platforms
Microsoft SSIS: Windows

Parallelism

OWB: Parallel steps in OWB flows (Mapping or subflows). Parallel SQL.
ODI: Parallel processes in scenarios.
Informatica: Partition Option (parallel threads). Asynchronous task steps in workflows.
Pentaho: Parallel job execution (asynchronous). Parallel mapping execution (e.g. parallel DML).
DataStage: Parallel jobs and parallel stages.
Microsoft SSIS: asynchronous data flows (tasks) in a package.

Scalability

OWB: RAC.
ODI: Load balancing for distributed agents executing etl scenarios.
Informatica: Informatica cluster. Public cloud version.
Pentaho: Cluster (carte server).
DataStage: MPP, cluster. AWS cloud.
Microsoft SSIS: SQL Server Cluster. Azure cloud.

ETL Proccess

OWB: jobs with processes(PL/SQL) and mappings (generated PL/SQL)
ODI: Scenarios containing processes (knowledge modules)
Pentaho: jobs with transformations
Informatica: sessions with workflows/worklets and tasks/tasklets
DataStage: Server job, DataStage job, Sequence Job (workflow)
Microsoft SSIS: package

Deployment facility

OWB: Yes (Configurations, Export/Import between repositories)
ODI: Yes
Informatica: Sessions with environment specific parameter files.
Pentaho: Export/Import XML files between environments. Environment properties (as file or db content).
DataStage: Information Server Manager – build, deploy. Transfer binaries (same platform). ODF and XML.
Microsoft SSIS: Yes

Engine based / code generator

OWB: PL/SQL packages executed by OWB repository.
ODI: Agents executing generated scenarios (Interpreter).
Informatica: Engine.
Pentaho: Engine interpreting XML jobs and transformations.
DataStage: Binaries (linked libraries).
Microsoft SSIS: SSIS engine executing packages.

Environment configuration

OWB: In repository of target environment.
ODI: ?
Informatica: Environment specific parameter files on session level. Multiple datasources/targets.
Pentaho: Environment properties (as file or db content). Any kind of JDBC connection.
DataStage: Configuration files (APT_CONFIG_FILE)
Microsoft SSIS: “Configurations” in packages.

Numbers of available transformation functions

OWB: about 30
ODI:
Informatica: 58
Pentaho: 60
DataStage: 70
Microsoft SSIS: 30

Slowly Changing Dimension

OWB: SCD Mappings (Enterprise ETL Option)
ODI: Yes
Informatica: Supports Full history, recent values, Current & Prev values.
Pentaho: Yes
DataStage: SCD “stage” for Type 1 and 2
Microsoft SSIS: Yes, with a Wizard but difficult to modify

Rejected Records

OWB: Error tables
ODI: Some (oracle) “integration knowledge modules” has rejection logic. “check knowledge modules” for business check rejection suggested.
Informatica:. Reject (bad) files. Rejected records in logs.
Pentaho: Error handling data connections and validation nodes
DataStage: Yes
Microsoft SSIS: Can use conditional split transformation and error redirection to either CSV files or tables.

Debugging Facility

OWB: basic debugging, breakpoints
ODI: simulation mode
Informatica: session debugger, breakpoints, data movement, data analysis
Pentaho: mapping debugger, breakpoints, data movement, data analysis
DataStage: Supports basic debugging facilities for testing.
Microsoft SSIS: breakpoints, Data Viewer and error handling

Languages (extensibility)

OWB: SQL, PL/SQL
ODI: Java, Jython, Java Bean Shell
Informatica: Java transformations.
Pentaho: Java, JavaScript
DataStage: Script, C++
Microsoft SSIS: C#, VB.NET (2005)

Application Integration Functionality

OWB: Difficult
ODI: JAX-WS, JMS connectors
Informatica: WebServices (XML, JSON), JMS, IBM MQ, Tibco Rendevous
Pentaho: JMS node, possible (e.g. java transformation nodes)
DataStage: WISD ingegration (web services), InfoSphere Streams, IBM MQ stages, Java/C++ API.
Microsoft SSIS: WebService Task, MSMQ integration. Any .NET functionality possible.

CDC and real time ETL

OWB: Not available
ODI: Oracle Golden Gate knowledge modules
Informatica: PowerExchange CDC
Pentaho: possible (e.g. java transformation nodes)
DataStage: IBM InfoSphere CDC. Always-on stages (e.g. ISD, MQClient or custom Java).
Microsoft SSIS: StreamInsight

Metadata: Ability to view & navigate metadata on the web

OWB: OWB Repository Browser
ODI: ODI Metadata Navigator (weblogic)
Informatica: Job sessions can be monitored using Informatica Classes
Pentaho: Pentaho Repository
DataStage: InfoSphere Metadata workbench. DataStage Operations Console for operational view (monitoring, statistics, performance)
Microsoft SSIS: Does not support

Ability to Customize views of metadata for different users

OWB: No
ODI: User roles (repository)
Informatica: Supports.
Pentaho: User rights (read,write,access) on jobs and transformations
DataStage: User roles
Microsoft SSIS: Does not support

Metadata repository can be stored in RDBMS

OWB: Yes (only)
ODI: Yes (only)
Informatica: Yes
Pentaho: Yes (XML, RDBMS, Enterprise Repository)
DataStage: No. But the proprietary meta data can be moved to a RDBMS using the DOC Tool
Microsoft SSIS: Does not support

Metadata API

OWB: OBMScript. Oracle repository tables.
ODI: Java API. DB repository tables.
Informatica:
Pentaho: XML files. DB repository tables (DB repository). JSR-283 (Enterprise repository).
DataStage:
Microsoft SSIS:

Command line operation

OWB: No
ODI: Yes (start, stop, sleep and many others)
Informatica: Yes (Pmcmd – start, stop, abort or schedule workflow )
Pentaho: Start/Stop of job and transformations (pan and kitchen)
DataStage: Start/Stop, compile, export/import metadata (many different commands)
Microsoft SSIS: Yes (Dtexec.exe)

Ability to maintain versions of mappings

OWB: Very Difficult (through export/import with OMBScript on object level possible).
ODI: Yes. Internal version management on object level.
Informatica: Yes (Additional licence needed). SVN throught XML export/import possible.
Pentaho: SVN (for XML repositories) or in basic locking/versioning on object level in “Enterprise repository”
DataStage: CSV and ClearCase support.
Microsoft SSIS: Difficult (binary dtsx files).

Job Controlling & Scheduling

OWB: No (external scheduler or Oracle scheduler needed)
ODI: Yes
Informatica: Very good
Pentaho: only for “Enterprise repository”
DataStage: Does not support directly (no option). But possible to call custom programs after the job get executed.
Microsoft SSIS: Job Agent, Email Notification

Usability

OWB: Average. Slow learning curve. Bugs (out of support). Slow UI.
ODI: Average. Slow learing curve. Own ETL tooling vision.
Pentaho: Very good. Steep learning curve. Less mature than competitors (e.g. weak logging and monitoring). Good UI. No installation needed.
Informatica: Very good. Steep learning curve. Clean and intuitive UI.
DataStage:
Microsoft SSIS: Good. Steap learning curve. VisualStudio as UI. C# knowledges needed.

License

OWB: Included in Oracle DB. Additional options could be necessary (Partitioning, Enterprise ETL, etc.). About 15.000 core/year.
ODI: Per Core or Named User. About 30.000 per core/year
Pentaho: Free. Support costs about 30.000 per CPU/year.
Informatica: Per Server. Additional options could be necessary (e.g. “Pushdown Optimization”). Expensive and intransparent.
DataStage: Per Processor (Processor Value Unit)
Microsoft SSIS: Part of SQL Server license (CAL or Core).

среда, 24 сентября 2014 г.

DBMS_PARALLEL_EXECUTE

Create a task

Split the workload into chunks

CREATE_CHUNKS_BY_ROWID

CREATE_CHUNKS_BY_NUMBER_COL

CREATE_CHUNKS_BY_SQL

Run the task

RUN_TASK

User-defined framework

Task control

Check the task status

Drop the task

Complete examples

OWB 11gR2 - Creating Interval Partitions

By David Allan on Aug 02, 2011

OWB to ODI 12c Migration in action

By David Allan on Aug 06, 2014

How to Execute Process Flow from SQL*Plus

By Jinggen He on Dec 02, 2009

Prepare the shell script

Prepare the Process Flow

Execute the Process Flow from workspace owner

Execute the Process Flow from workspace user

Some Important notes

References

Appendix A

Parallel Processing with DBMS_PARALLEL_EXECUTE

Parallel Processing with DBMS_PARALLEL_EXECUTE

By David Allan on Nov 02, 2011

OWB 11gR2 – Parallelization using DBMS_PARALLEL_EXECUTE

OWB 11gR2 – Parallelization using DBMS_PARALLEL_EXECUTE

By David Allan on Jul 18, 2012

OWB to ODI migration

OWB to ODI migration

сравнение етээлеров

ETL tool comparison

Loading Type

System Plfatform

Parallelism

Scalability

ETL Proccess

Deployment facility

Engine based / code generator

Environment configuration

Numbers of available transformation functions

Slowly Changing Dimension

Rejected Records

Debugging Facility

Languages (extensibility)

Application Integration Functionality

CDC and real time ETL

Metadata: Ability to view & navigate metadata on the web

Ability to Customize views of metadata for different users

Metadata repository can be stored in RDBMS

Metadata API

Command line operation

Ability to maintain versions of mappings

Job Controlling & Scheduling

Usability

License

среда, 24 сентября 2014 г.