Datastage4u: Notable new Key Features in DataStage 8.5

Notable new Key Features in DataStage 8.5

DataStage 8.5 is out and IBM has made some significant improvements this time around. Let’s see some of the important enhancements in the new DataStage 8.5 version

Its Fast!

DataStage 8.5 is considerably faster than its previous version (8.1). Tasks like saving, renaming, compiling are faster by nearly 40%. The run time performance of jobs has also improved.

The parallel engine

on DataStage has been tuned to improve performance and resource usage has reduced by 5% when compared to DataStage 8.1

XML data

DataStage has historically been inefficient at handling XML files, but in 8.5 IBM has given us a great XML processing package. DataStage 8.5 can now process large XML files (over 30 GB) with ease. Also, we can now process XML data in parallel.

The new XML transform stage can data from multiple sources into a single XML output stream. If you think that is cool, it can also do it the other way around i.e., multiple XML input to a single output stream.

It can also convert data from one XML format to another.

Transformer Stage

It is one of the most used and the most important stages on DataStage and it just got better in 8.5

a. Transformer Looping:

Over the years DataStage programmers have been using workarounds to implement this concept. Now IBM has included it directly in the transformer stage.

There are two types of looping’s available

Output looping: Where we can output multiple output links for a single input link

Ex:

Input Record:

Salesman_name	City_1	City_2	City_3
Jason Bourne	New York	Madrid	New Delhi

Output Record:

Salesman_name	City
Jason Bourne	New York
Jason Bourne	Madrid
Jason Bourne	New Delhi

This is achieved using a new system variable @ITERATION

Input looping: We can now aggregate input records within the transformer and assign the aggregated data to the original input link while sending it to the output.

b. Transformer change detection:

SaveInputRecord() – Save a record to be used for later transformations within the job

GetInputRecord() – Retrieve the saved record as when it is required for comparisons

c. System Variables:

i. @ITERATION: Used in the looping mechanism

ii. LastRow(): Indicates the last row in the job

iii. LastRowInGroup(): Will return the last row in the group based on the key column

d. New NULL Handling features:

In DataStage 8.5 we need not explicitly handle NULL values. Record dropping is arrested if the target column is nullable. We need not handle NULL values explicitly when using functions over columns that have NULL values. And also stage variables are now nullable by default.

APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING has been prepared to support backward compatibility

e. New Data functions:

There are a host of new date functions incorporated into DataStage 8.5. I personally found the below function most useful

DataFromComponents(years, months, daysofmonth)

Ex: DataFromComponenets(2012,07,20) will output 2012-07-20

DataOffsetByComponents(basedate, years offset, month offset, daysofmonth offset)

Ex: DataOffsetByComponents(2012-07-20, 2,1,1) will output 2014-08-21

DataOffsetByComponents(2012-07-20, -4,0,0) will output 2008-07-20

I will write another detailed blog on the new data functions shortly

Functionality Enhancements:

- Mask encryption for before and after job subroutines

- Ability to copy permissions from one project to a new project

- Improvements in the multi-client manager

- New audit tracing and enhanced exception dialog

- Enhanced project creation failure details

Vertical Pivoting:

At long last vertical pivoting has been added

Integration with CVS

Now in DataStage 8.5 we have the feature that integrates directly with version control systems like CVS. We can now Check-in and Check-out directly from DataStage

Information Architecture Diagraming Tool:

Now solution architects can draw detailed integration solution plans for data warehouses from within DataStage

Balanced Optimizer:

As you all know DataStage is an ETL tool. But now with Balanced Optimizer directly being integrated we have the ELT (Extract Load and Transform) feature.

With this we can extract the data, load it and perform the transformations inside the database engine.

Datastage4u

Monday, August 12, 2019

Notable new Key Features in DataStage 8.5

No comments:

Post a Comment

About Me