Notable new Key Features in DataStage 8.5
DataStage 8.5 is out
and IBM has made some significant improvements this time around. Let’s see some
of the important enhancements in the new DataStage 8.5 version
- Its
Fast!
DataStage
8.5 is considerably faster than its previous version (8.1). Tasks like saving,
renaming, compiling are faster by nearly 40%. The run time performance of jobs
has also improved.
- The
parallel engine
on DataStage has been
tuned to improve performance and resource usage has reduced by 5% when compared
to DataStage 8.1
- XML data
DataStage
has historically been inefficient at handling XML files, but in 8.5 IBM has
given us a great XML processing package. DataStage 8.5 can now process large
XML files (over 30 GB) with ease. Also, we can now process XML data in
parallel.
The
new XML transform stage can data from multiple sources into a single XML output
stream. If you think that is cool, it can also do it the other way around i.e.,
multiple XML input to a single output stream.
It
can also convert data from one XML format to another.
- Transformer Stage
It
is one of the most used and the most important stages on DataStage and it just
got better in 8.5
a. Transformer
Looping:
Over
the years DataStage programmers have been using workarounds to implement this
concept. Now IBM has included it directly in the transformer stage.
There
are two types of looping’s available
Output looping: Where we can output multiple output links for
a single input link
Ex:
Input
Record:
Salesman_name
|
City_1
|
City_2
|
City_3
|
Jason Bourne
|
New York
|
Madrid
|
New Delhi
|
Output
Record:
Salesman_name
|
City
|
Jason Bourne
|
New York
|
Jason Bourne
|
Madrid
|
Jason Bourne
|
New Delhi
|
This
is achieved using a new system variable @ITERATION
Input looping: We can now aggregate input records within the
transformer and assign the aggregated data to the original input link while
sending it to the output.
b. Transformer change
detection:
SaveInputRecord() – Save a record to be used for later
transformations within the job
GetInputRecord() – Retrieve the saved record as when it is
required for comparisons
c. System Variables:
i. @ITERATION: Used in the looping mechanism
ii. LastRow(): Indicates the last row in the job
iii. LastRowInGroup(): Will return the last row in the group based
on the key column
d. New NULL Handling
features:
In
DataStage 8.5 we need not explicitly handle NULL values. Record dropping is
arrested if the target column is nullable. We need not handle NULL values
explicitly when using functions over columns that have NULL values. And also
stage variables are now nullable by default.
APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING
has been prepared to support backward compatibility
e. New Data functions:
There
are a host of new date functions incorporated into DataStage 8.5. I personally
found the below function most useful
DataFromComponents(years, months, daysofmonth)
Ex:
DataFromComponenets(2012,07,20) will output 2012-07-20
DataOffsetByComponents(basedate, years offset,
month offset, daysofmonth offset)
Ex:
DataOffsetByComponents(2012-07-20, 2,1,1) will output 2014-08-21
DataOffsetByComponents(2012-07-20,
-4,0,0) will output 2008-07-20
I
will write another detailed blog on the new data functions shortly
- Functionality
Enhancements:
-
Mask encryption for before and after job subroutines
-
Ability to copy permissions from one project to a new project
-
Improvements in the multi-client manager
-
New audit tracing and enhanced exception dialog
-
Enhanced project creation failure details
- Vertical Pivoting:
At
long last vertical pivoting has been added
- Integration with
CVS
Now
in DataStage 8.5 we have the feature that integrates directly with version
control systems like CVS. We can now Check-in and Check-out directly from DataStage
- Information Architecture Diagraming Tool:
Now
solution architects can draw detailed integration solution plans for data
warehouses from within DataStage
- Balanced Optimizer:
As
you all know DataStage is an ETL tool. But now with Balanced Optimizer directly
being integrated we have the ELT (Extract Load and Transform) feature.
With
this we can extract the data, load it and perform the transformations
inside the database engine.
No comments:
Post a Comment