Building and Running Micro ETL
Execution Plans
1)
Micro ETL execution plans are ETL processes that you schedule
at very frequent intervals, such as hourly or half-hourly.
2)
They usually handle small subject areas or subsets of larger
subject areas.
3)
DAC tracks refresh dates for tables in micro ETL execution
plans separately from other execution plans and uses these refresh dates in the
change capture process.
4)
After a micro ETL execution plan runs, DAC populates refresh
date values in the Refresh Dates child tab of the Execution Plans tab.
5)
If a subject area is used in a regular execution plan (an
execution plan with the Keep Separate Refresh Dates option not selected) as
well as a micro ETL execution plan, DAC maintains refresh dates for the tables
in the regular execution plan in the Refresh Dates child tab of the Physical
Data Sources tab (Setup view).
6)
In cases of a subject area being used in both a regular and
micro ETL execution plan and the micro ETL execution plan is suspended for a
few days but the regular execution plan runs nightly, DAC automatically detects
the last refresh date for the tables common to both execution plans and
intelligently extracts only the most recent records for the micro ETL execution
plan.
Caution: Micro ETL processes can cause
issues with data
inconsistencies, data availability, and additional load on the
transactional database. Therefore, you should consider the
following
factors before implementing
a micro ETL process:
7)
For related star schemas, if one schema is omitted from a micro
ETL execution plan, the cross-star reports may be inaccurate.
8)
For example, if the Person fact table is refreshed more
frequently than the Revenue fact table, a report that spans the Person and
Revenue dimensional schemas may produce inconsistent results.
9)
If you omit dimension tables from a micro ETL execution plan,
the foreign keys for the fact tables will point to Unspecified rows for the new
dimension records.
10)
The foreign key references will be resolved when the Complete
ETL execution plan is run, but users of the reports should be aware of such
inconsistencies.
11)
If you do not include aggregate tables in micro ETL execution
plans, the reports that use data from these tables will be inconsistent with
the reports that use data from the detailed fact tables.
12)
However, if aggregate tables are included in the micro ETL
execution plan, the aggregate calculations are performed for each ETL process,
which will take a constant amount of time and may be inefficient to perform at
such frequent intervals.
13)
Hierarchy tables are rebuilt during every ETL execution plan by
querying the base dimension tables.
14)
This operation takes a constant amount of time.
15)
If the base tables are big, this operation may take a long time
and may be inefficient if the micro ETL execution plan runs several times a
day.
16)
However, if you avoid populating the hierarchy tables during
micro ETL processes, data inconsistencies will occur.
17)
With micro ETL execution
plans, caching will occur more frequently, which may have performance implications.
Micro
ETL execution plans will put more load on the transactional database
because
of the frequent extracts.
To
a build and run a micro ETL execution plan
1.
In the DAC toolbar, select the appropriate source system
container from the
drop-down
list in the toolbar.
2.
From the Menu bar, select Views, then select Design, then
select Subject Areas.
3.
In the Subject Areas tab, assemble a small subject area.
4.
In the Tasks child tab, inactivate all tasks that are not
required for the execution
plan.
5.
Create a new execution plan.
a.
Navigate to the Execute view, then select the Execution Plans
tab.
b.
Enter a name for the execution plan
c.
Select the Keep Separate Refresh Dates check box.
d.
Click Save.
6.
Associate one or more subject areas with the execution plan.
The subject areas can
belong to
one or more source systems.
a.
Click the Subject Areas child tab.
b. Click Add/Remove in the bottom pane toolbar.
c.
In the Choose Subject Areas dialog, select the appropriate
source system
container.
d.
Query for the subject area you want to associate with the
execution plan.
e.
Select the subject area and click Add.
You can
associate multiple subject areas with an execution plan, but all the
subject
areas must be from the same source system container.
f.
Click OK to close the window.
7.
Generate the runtime execution plan parameters.
a.
Click the Parameters subtab, and then click Generate in the
bottom pane
toolbar.
b.
In the Generating Parameters dialog box, enter the number of
copies for each
source
system container, and then click OK.
c.
Click OK in the informational message.
DAC
automatically generates the parameters required for each copy. Not all
copies
require all of the possible parameters.
d.
On the Parameters subtab, edit the parameters for each copy of
the source
system
container as follows:
Note: For the data source type of FlatFileConnection,
make sure you
have copied all files into the directory specified in the DAC
system
property InformaticaParameterFileLocation.
For each
data source type, select the appropriate value from the Value
drop-down
list.
For each
Informatica SIL and SDE folder, select the appropriate value in the
Value
drop-down list.
e.
For each data source type, enter the appropriate name in the
Value field.
f.
(Optional) If you are extracting data from more than one source
system
container
and want to stagger the data extracts, in the Delay field for the
appropriate
data source, enter a value for the number of minutes you want to
delay the
extract.
8.
In the top pane of the Execution Plans tab, make sure the new
execution plan is
highlighted,
and click Build.
DAC
builds the execution plan.
9.
Click the Ordered Tasks child tab and verify the following:
a.
Click Details in the toolbar, and review each task’s predecessor
and successor
tasks to
confirm tasks common to multiple sources are ordered in a manner
consistent
with the priority of the source connection.
b.
Confirm that load tasks appear only once even if there are
multiple extracts for
tables
common to multiple sources.
c.
For tasks common to multiple sources, click Preview Run Details
in the
toolbar,
and confirm the following:
The first
common task truncates the common target table and the following
tasks do not.
The first common task truncates the common target table and the
following
tasks do not.
The execution plan is now ready to run as a micro ETL execution
plan.
10. Create a schedule for the micro ETL execution plan.
For instructions, see
"Scheduling an
Execution Plan".
No comments:
Post a Comment