Monday, August 12, 2019

Notable new Key Features in DataStage 8.5


Notable new Key Features in DataStage 8.5

DataStage 8.5 is out and IBM has made some significant improvements this time around. Let’s see some of the important enhancements in the new DataStage 8.5 version
  • Its Fast!
DataStage 8.5 is considerably faster than its previous version (8.1). Tasks like saving, renaming, compiling are faster by nearly 40%. The run time performance of jobs has also improved.
  
  • The parallel engine
on DataStage has been tuned to improve performance and resource usage has reduced by 5% when compared to DataStage 8.1

  • XML data
DataStage has historically been inefficient at handling XML files, but in 8.5 IBM has given us a great XML processing package. DataStage 8.5 can now process large XML files (over 30 GB) with ease. Also, we can now process XML data in parallel.
The new XML transform stage can data from multiple sources into a single XML output stream. If you think that is cool, it can also do it the other way around i.e., multiple XML input to a single output stream.
It can also convert data from one XML format to another.

  • Transformer Stage
It is one of the most used and the most important stages on DataStage and it just got better in 8.5
a.     Transformer Looping:
Over the years DataStage programmers have been using workarounds to implement this concept. Now IBM has included it directly in the transformer stage.
There are two types of looping’s available
Output looping: Where we can output multiple output links for a single input link
Ex:
Input Record:
Salesman_name
City_1
City_2
City_3
Jason Bourne
New York
Madrid
New Delhi

Output Record:
Salesman_name
City
Jason Bourne
New York
Jason Bourne
Madrid
Jason Bourne
New Delhi

This is achieved using a new system variable @ITERATION
Input looping: We can now aggregate input records within the transformer and assign the aggregated data to the original input link while sending it to the output.

b.    Transformer change detection:
SaveInputRecord() – Save a record to be used for later transformations within the job
GetInputRecord() – Retrieve the saved record as when it is required for comparisons

c.     System Variables:
                              i.        @ITERATION: Used in the looping mechanism
                            ii.        LastRow(): Indicates the last row in the job
                           iii.        LastRowInGroup(): Will return the last row in the group based on the key column

d.    New NULL Handling features:
In DataStage 8.5 we need not explicitly handle NULL values. Record dropping is arrested if the target column is nullable. We need not handle NULL values explicitly when using functions over columns that have NULL values. And also stage variables are now nullable by default.
APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING has been prepared to support backward compatibility

e.     New Data functions:
There are a host of new date functions incorporated into DataStage 8.5. I personally found the below function most useful
DataFromComponents(years, months, daysofmonth)
Ex: DataFromComponenets(2012,07,20) will output 2012-07-20

DataOffsetByComponents(basedate, years offset, month offset, daysofmonth offset)
Ex: DataOffsetByComponents(2012-07-20, 2,1,1) will output 2014-08-21
DataOffsetByComponents(2012-07-20, -4,0,0) will output 2008-07-20
I will write another detailed blog on the new data functions shortly
 
  • Functionality Enhancements:
-       Mask encryption for before and after job subroutines
-       Ability to copy permissions from one project to a new project
-       Improvements in the multi-client manager
-       New audit tracing and enhanced exception dialog
-       Enhanced project creation failure details

  • Vertical Pivoting:
At long last vertical pivoting has been added
 
  • Integration with CVS
Now in DataStage 8.5 we have the feature that integrates directly with version control systems like CVS. We can now Check-in and Check-out directly from DataStage
 
  • Information Architecture Diagraming Tool:
Now solution architects can draw detailed integration solution plans for data warehouses from within DataStage
 
  • Balanced Optimizer:
As you all know DataStage is an ETL tool. But now with Balanced Optimizer directly being integrated we have the ELT (Extract Load and Transform) feature.
With this we can extract  the data, load it and perform the transformations inside the database engine.

No comments:

Post a Comment