1)How can remove duplicates in a
file using UNIX?
Answer :You can use UNIQ to remove
duplicates from a file but it can
consider only same consecutive
records.
Ex: if u hve records as
1 vishwa
1 Shruti
2 Ravi
3 Naveen
1 vishwa
and if u use UNIQ cmd, then you wil
get vishwa, Ravi, Naveen
and vishwa.
So perform sort before using UNIQ so
tat to avoid duplicates
(Or)
$sort -u filename will sorts the
data and removes the duplicates lines from the output
2)What is configuration your file structure 2)I have
two databases both are Oracle while loading data from source to target the job
takes 30 min but I want to load less time how?
3)I have 2 jobs.I want to run job B
if job A has run 3 times.How can I achieve this through datastage? IBM
Answer
Hi Friend, by using sequencer, we
can run this scenario. Put
the first job A in loop for 3 times.
After successful
completion of 3 runs, then, run the
second job B.
Pls correct me, if i am wrong.
4) Hi, I am getting data from
sequential file like below column1 A&B C&D A&E X&Y&Z
v&x But I need the target column as like below column1 A and B C and D A
and E X and Y and Z v and x
Answer1
using sequential file using unix
command sed s/&/and/g we will get answer
OR
using data stage functions
convert(‘&’,’and’,expression);
OR
using expersion
u can create o/p in that write
replacestr(1,columnname,'&','and')
5)SEQUENTIAL FILE I HAVE ONE RECORD,I WANT 100 RECORDS
IN TARGET?HOW CAN WE DO THAT?PLS EXPLAIN ME AND WHAT STAGES ARE THERE?WHAT
LOGIC?tcs
6)A flatfile contains 200 records.I
want to load first 50 records at first time running the job,second 50 records
at second time running and so on,how u can develop the job?pls give the steps?
Answer1
first u use sequence generator for
the flat file,after that
we take rank transformation in that
take top as 50 records
and run the session 4 times it gives
200 records
7)if lookup tr is giving poor
performamence. what u do. 2. i have emp,dept table. i want output
ename,empno,dname,max(sal). 3. can any one tell me project flow in real time
scenario. please send me mail any queries and answers my mail id is
yaparala@gmail.com .igate
Answer1
Hi,
i dont know wether this is write or
wrong .but in my way in look up we r having 2 modes cache &noncachemode
noncache mode is low performance than the cachemode .bcz look up will maintains
the connection with lookup table until comparision is completed.in cache mode
lookup table data loaded into memory &dis connected from the lookup table
this gives high performance
by using joins we can get the
results as of my knowledge
8)I
have file with empid,empname and I want to load these two fields along with sal
in my target 1)salary must be same for all the records 2)I want pass the salary
at run time?TCS
9)source file is having 5 records
while moving into target it want to be 10 records ?ibm
Answer2
it's not a big issue create a one
more node extra and select
the partition entire and run the job
OR
target can create instance
10)can u explain me banking domain
projects ? and in banking domain projects how many dimension tables and how
many fact tables and how to load source and targets in mapping level plese
expain give me one example?
Answer1
Read the Ralph kimball DW tool kit
In that no. of domains and their
work processes and Dimensions and Facts are given
11)How cookies can be tested in QTP?
Answer2
Cookies can be tested using WebUtil
object.
Eg:
WebUtil.AddCookie
WebUtil.DeleteCookie
WebUtil.GetCookies
OR
WebUtil.DeleteCookies
1. what are the
production issues in datastage?
|
2. 1.Identify and discuss
the problems which are occurred during data quality verification for the
single-source and multi-source problems. 2.Testing has very important role in
DWH. In ETL testing phase, how can we perform the integration testing and
regression testing? 3.What are the perquisites of system testing and
regression testing in ETL phase?
|
3. Can anyone please send
me the latest informatica dumps (power center 8.1.1) for all the three exams?
My mail id is yokilamai@gmail.com
|
4. How to know
informatica conslole password??
|
5.
CAN ANY ONE SEND ME THE CODE FOR THE BELOW 1).REMOVE '_' FROM
TELE PHONE "323-237-2323" AND SHOULD DISPLAY OUT PUT AS
"323237232" 2). NEED TO ONLY "COMPANY.COM" FROM EMAIL
FILED "TEST@COMPANY.COM"
Sgr Ans: using field function as field(link.RECORD,’-’,
1) , field(link.RECORD,’-’, 2),
field(link.RECORD,’-’, 3)in transformerstage & Concatenate above
fields.2)using substring in txr link.record[6,16].
|
6. we have 1 source table
containing 100 records. now we have to transfer first set of 1-10 (i.e1-10) records
to one target table and another set of 1-10(11-20) records to other target
table and continue like that till 100th record
|
7. 5. How many
repositories can we create in Informatica? 6. What is surrogate key? 7. What
is difference between Mapplet and reusable transformation? 8. What is
aggregate awareness? 9. Explain reference cursor? 10. What are parallel
querys and query hints? 11. DWH architecture? 12. What are cursors? 13.
Advantages of de normalized data? 14. What is operational data source (ODS)? 15.
What is meta data and system catalog? 16. What is factless fact schema? 17.
What is confirmed dimension? 18. What is the capacity of power cube? 19.
Difference between PowerPlay transformer and power play reports? 20. What is
IQD file? 21. What is Cognos script editor? 22. What is difference macros and
prompts? 23. What is power play plug in? 24. Which kind of index is preferred
in DWH? 25. What is hash partition? 26. What is DTM session? 27. How can you
define a transformation? What are different types of transformations in
Informatica? 28. What is mapplet? 29. What is query panel? 30. What is a look
up function? What is default transformation for the look up function? 31.
What is difference between a connected look up and unconnected look up? 32.
What is staging area? 33. What is data merging, data cleansing and sampling?
34. What is up date strategy and what are th options for update strategy? 35.
OLAP architecture? 36. What is subject area? 37. Why do we use DSS database
for OLAP tools?
|
1.
indatastage
8.1 surrgoate key generates seqences numbers write but there is limite value
i want to reset the seqence after gettting some highest value like i want o/p
like this 1 2 3 4 5 .............10000 1 2 3.........10000
2.
diff
b/w dwh & data mart diff b/w star schema & snow flake schema diff b/w
fact table & dimension table what are the contents in requirement
specification what are the docs needed for testing draw ur proj architecture
3.
Knowledge
on what are the improtant characteristice that we need ot consider while
developing an ETL Code
4.
Hi
please try to reply with answer. my mail id is raghu.1028@gmail.com. if the
source is having millions of records, but having theree target tables. As per
user the first record need to insert into first target and second record in
to a second target and third into third target, similarly the process needs
to repeat?
5.
What
is the main difference between peek stage and copy stage
6.
give
detail on the sales project how many number of dimensions & fact tables
are used in projec briefly. how the four fact tables taken in the project.
7.
How
to load Session Statistics (Workflow name, TgtRows, Rejected Rows,
SessStartTime,SessEndTime..)into a database table ?? Another Question: I want
to pass values from oracle table or Flat file in Mapping parameter/Variable..
What is the method for that..Please provide me the solution step by step
|
8. indatastage 8.1 surrgoate key generates seqences
numbers write but there is limite value i want to reset the seqence after
gettting some highest value like i want o/p like this 1 2 3 4 5
.............10000 1 2 3.........10000
9. diff b/w dwh & data mart diff b/w star
schema & snow flake schema diff b/w fact table & dimension table what
are the contents in requirement specification what are the docs needed for testing
draw ur proj architecture
10. Knowledge on what are the improtant
characteristice that we need ot consider while developing an ETL Code
11. Hi please try to reply with answer. my mail id
is raghu.1028@gmail.com. if the source is having millions of records, but
having theree target tables. As per user the first record need to insert into
first target and second record in to a second target and third into third
target, similarly the process needs to repeat?
12. What is the main difference between peek stage
and copy stage
13. give detail on the sales project how many number
of dimensions & fact tables are used in projec briefly. how the four fact
tables taken in the project.
14. How to load Session Statistics (Workflow name,
TgtRows, Rejected Rows, SessStartTime,SessEndTime..)into a database table ??
Another Question: I want to pass values from oracle table or Flat file in
Mapping parameter/Variable.. What is the method for that..Please provide me the
solution step by step
15.my
source seq file have col1 1 2 3 4 5 6 7 8 9 i have 4 targets t1 t2 t3 t4 1 2 3
4 5 6 7 8 9 like this how we can get?POLARIS
Ans:it's a very simple task
select the partition type as round robin
and run the job
you will get the answer
16.how
to retrive the max(sal),deptno,empno in datastage? POLARIS
17.hi my
source is::
empno,deptno,salary
1, 10, 3.5
2, 20, 8
2, 10, 4.5
1, 30, 5
3, 10, 6
3, 20, 4
1, 20, 9
then target should be in below
form...
empno,max(salary),min(salary),deptno
1, 9, 3.5, 20
2, 8, 4.5, 20
3, 6, 4, 10
can anyone give data flow in data
stage for the above
scenario....
ANS:create one group using column generator and then Using
Aggregator stage select max sal,min sal
18) where we use column generator stage
in real time scenario?
Answer
#1
#1
The Column Generator stage adds columns to incoming data
and generates mock data for these columns for each data row
processed. The new data set is then output..
Let us two table x and Y ..and you are doing funnel..
X file is having only 2 columns and Y file is having 3
columns..
While doing funnel metadata should be same. in order to
achieve this include one column to X file and next use this
column generator stage, this stage populates some mock data
in third column..
OR
2.ANS:If u have two tables that are a and b. A table have
two
columns and b table have 3 columns in two tables are
metadata to be same that time we can use column generator
stage for getting total columns in a target
OR
3.ANS:Suppose consider a situation wherein u need to join
two
tables A and B but u don hve a same key column in both the
tables. U can use column generator for generating a DUMMY
column (key column) and perform joining
OR
if u want to add one mmore column in target while populate
data from source to taget.
19)when we will use connected Lookup
& Unconnected Lookup
Answer
# 3
# 3
In Datastage, the same can be expalained as Normal
(Connected)Lookup and Sparse(unconnected) lookup.
For sparse lookup
When the refrence data volume is huge and primary volume is
very less,you can pass the input records key columns to
match against reference in refrence table extraction query
where clause.
20) A+B+C+D=D+E+F+G=G+H+I=17 where each
letter represent a number from 1 to 9. Find out what does letter D and G
represent if letter A=4? Answer
# 1
# 1
4+2D+2G+B+C+E+F+H+I=51=45+D+G
D+G=6.
therefore, D+E+F+G=17
E+F=11
--
Case 1) D,G =1,5; A=4
H+I=12
(9,3)
E+F=11
3,8!!! (wRONG solution)
--
Case 2) D,G=5,1; A=4
H+I=16
(9,7)
E+F=11
(3,8)
4+B+C+5=17; B+C=8
(2,6)
Solution:
4+2+6+5=5+3+8+1=1+7+9=17
OR
ans;(2,6)
21)which partitioning follows in
join,merge and lookup?
Answer
Round Robin partition
OR
Join Stage follows Modulus partitioning method.Merge follows same partitioning method as well as Auto partitioning method.Lookup follows Entire partitioning method.
Round Robin partition
OR
Join Stage follows Modulus partitioning method.Merge follows same partitioning method as well as Auto partitioning method.Lookup follows Entire partitioning method.
22) How can u Call the Shell
Scripting/Unix Commands in Job Sequence? IBM
Answer1
hai..,
by using Routines activity.u can access unix/shellscript
OR
There are two scenarios where u myt want to call a script
Scenario 1(Dependency exists between script and a job):
Where a job has to be executed first then the script has to
run, upon completion of script execution only the sec job
has to be invoked. In this case develop a sequencer job
where first job activity will invoke the first job then
using Execute command activity call the script u would
desire to invoke by typing "sh <script
name>" in the
command property of the activity, then with the other job
activity call the second job.
Scenario 2: (Script and job are independent) : In this case
right in your parallel job say job1, under job properties u
can find "After-job subroutine" where u need to
select "ExecSH" and pass the script name which you
would
like to execute. By doing this once the job1 execution
completes the script gets invoked. The job succeeding the
job1 say job2 doesnt wait for the execution of the script
23)If I make any changes in the
parallel job,do I need to implement the changes in the sequencer job,else the
changes will be reflected automatically ?IBM
Answer
# 1
# 1
hai this is kiran ..,
if any changes do in ur parallal job automatically it will
reflect on sequencer jobs.
OR
For your "WHY "
ANS2:The sequence doesnot contain the underlying code of the
job but it picks up
the code from the job whose reference
is given in the sequence.
OR
once modifications are done,simply compile the job and run
the sequencer.no need to do any changes in sequencer.
24) How can u execute the sql query
through unix? What is the Primary key for Dimension table? what is the primary
key for Fact table? TCS
Answer
# 1 Primary Key for the dimension table is its own primary Key
# 1 Primary Key for the dimension table is its own primary Key
PRimary Key for Fact table is Foriegn key which dirived from
Dimensional table
OR
connect to sqlplus from unix
typing sqlplus
user name:user_name@database
password:
then type
@pathname of the file containing sql query\sql query file
name.sql
eg:@path\file.sql
press enter
25): what is time dimension?
and how to populate time demension ?TCS
Answer
# 1
# 1
Hi
Every DWH has time dimension u can load the time dimension
though pl/sql script.
26) how
can i abort the job if i get more than 100 errors in job log? ACCENTURE
Answer
You can use the Transformer stage,we have an option called
You can use the Transformer stage,we have an option called
abort rows just mention the value it will abort once it
reach that count.
OR
Compile the job and then run it- in the Job run option
select the "Abort Job after" and give 100. So if there
27)What is diff between Junk
dimensions and conform dimension? Polaris
Answer
# 1
# 1
a junk dimension is dimension but not using for
maping..like its a boolean value.but conform dimesion is
which dimension is shared by two or more than two fact
tables is called conform dimension.and conform dimesin
reduce the no of tables...
OR
JUNK DIMENSION
A Dimension which cannot be used to describe the facts is
known as junk dimension(junk dimension provides additional
information to the main dimension)
ex:-customer add
Confirmed Dimension
A dimension table which can be shared by multiple fact
tables
is known as Confirmed dimension
Ex:- Time dimension
28)How many nodes supported by a one
cpu in parallel jobs?
Answer
# 2
# 2
here node is nothing but processor, parallel jobs supports
uniprocessor,smp,mpp and clustered system so nodes supporting is depend upon
h/w architecture ok.
29)I am running a job with 1000
records.. If the job gots aborted after loading 400 records into target... In
this case i want to load the records in the target with 401 record... How will
we do it??? This scenario is not for sequence job it's only in the job Ex: Seq
file--> Trans--> Dataset.. Answer
# 5
# 5
by using look-up stage we can get the answer..
there are two tables like 1000 records(source) table and 400
records(target) table.
take the source table as primary table and 400 records table
as reference table to look-up table
reference table
.
.
.
source............. look-up......... target
30)how to remove duplicates in
transformer stage by using stage variables?one example? Answer
# 2
# 2
In Stage variable:
stage_variable3 <map> stage_variable1
if column=stage_variable1 than 0 else 1 <map>
stage_variable2
column <map> stage_variable3
Put stage_variable2 as constrain to target stage.
OR
if you want to remove duplicates in transformer stage
use one of the partition technic hash partition you can
easily remove duplicatess
OR
duplication of transformer stage is removed b7y using a
call by referance and call by value , using we create one
object and no other duplication is done
31)how to convert rows into columns
?
Answer
# 1
# 1
BY using Pivot stage convert rows into columns in
datastage.If we want to change in informatica we use
normalizer transformation.
32) what
is factless fact table?
Answer
A fact table without any facts is known as factless fact
table. Always a Fact should be a Numeric value but every
numeric value need not be a fact.
33) how
to cleansing data
Answer
# 1 Data cleansing means converting non unique data format into unique format .This is performed in Transformer stage
# 1 Data cleansing means converting non unique data format into unique format .This is performed in Transformer stage
OR
In this removes the unwanted data(Bad records OR NULL
Values) and find the inconsistent data and make it
consistent data.
Example:
LOc
---
Hyd
Hyderabad
hyde
After Cleansing
Loc
---
Hyderabad
Hyderabad
Hyderabad
OR
Data cleansing means
removing unwanted spaces.
By using LTRim,Rtrim functions we can remove unwanted space
34)how to calculate sum(sal) with
out using aggregator satge
Answer
# 2
# 2
we have to add 2 new ports in "expression"
1)variable-- declare sum(sal)
2)output port- cal this variable
35)i have one table with one column
in this column i have three rows like 1,1,2 then that rows populate to target
as first tow rows as one row and remaing row as one row how it posible?
COLUMN_NAME SHIVA RAMU MADHU THEN I WANT TO LIKE SHIVA AND RAMU IN ONE ROW AND
MADHU IS ONE ROW IF ANY ONE KNOW PLZ TELL ME ?capgemini
Answer
# 1
# 1
seqfile->sort->transformer->removeduplicate->targer
inser file load
col1
1
1
2
in the sort
key->sort
allowduplicate->true
keychangecolumn->true
in the transformer
create the stage variable
if keychnage=1 then col1 else stagevariablename:col1
drag col1 in transformer
in the derivation area only put stagevariable
remove duplicate:
key:col1
you select last records
we will get answer
col1
11
2
36)explain how to create SCD-2 IN
DATASTAGE 7.5X2 PLZ EXPLAIN WITH 4 OR 5 RECORDS TAKE IT EXAMPLE AND JOB DESINGN
URGENT?ibm
Answer
# 1
# 1
Dim_table
change
capture xfm
Src_table
37) can
we half project in parallel jobs and half project in server jobs? infosys
Answer1
No
38)1.what is stagearea?what is
stage variable?
2.this is my source
source:id, name target:id, name
100,
murty 100,madan
we have three duplicate records
for the id column,how can we getthe source record?
100,madan
100,saran
Ans;stage area is temporary memory location in real time
environment we get the data in form of files from client so
we need to store one temporary memory location and perform a
all validations like data validations and filed validation
and file validations . ofter we lode into a warehouse
stage variable assinge the expression in transformerstage
that can be reusable only with in transformerstage only
so it is called the stage variable
i am taking key column as name so we get the source record
39) WHAT are unix quentios in datastage
Answer
How to kill a process
How to kill a process
1.Kill
Process status
2.ps
Regular Expresion
3.Grep
Shellscript execution
4../script name
Running Jobs
5.dsjob -run
listing out files
6.ls -lrt
these are basic questions
40) i have source data like
empno,enmae
11 ,aa
12 ,bb
i want output like
empno,ename
11 ,aa
12 ,bb
11 ,aa
12 ,bb
tcs
Ans:Take source in the sequential file and take two copy
stages
and map the columns to two copy stages and use funnel stage
to funnel the data from two copy stages and o/p into the
file now you can see the o/p is your requirement....
41) tell me abt Datastage trigger?HP
Answer1DATA
STAGE TRIGGERS USED TO CONTROL JOB ACTIVITIES IN JOB
SEQUENCES,IN JOB SEQUENCES STAGES ARE CALLED AS ACTIVITIES
AND LINKS ARE CALLED AS TRIGGERS.
TRIGGERS ARE OF 3 TYPES.)1.0CONDITIONAL,2.)UN-
CONDITIONAL,3.)OTHER WISE.
SO, BY USING TRIGGERS
WE CAN CONTROL THE JOB
ACTIVITIES,USUALLY LARGE DWH JOBS DEPENDS ONE 1 OR 2 JOBS
TO GET DATA,SO BY USING THESE TRIGGERS WE CAN GIVE
CONDITION TO RUN POST JOB ONCE IF PREVIOUS ONE RUNS
SUCCESFULLY
42)I have scenario like this seq file
--> transformer --> sort stage --> dataset In this job which
prtitioning technique you are using and why??? cts
Answer1
Hash partition or modulus partition. When using sort
operations always better to go for key partition techniques.
43)What is Fastly Changing
Dimension? Tcs
Answer
# 1
# 1
usually in almost all projects have customer dimension which
is fast growing dimension .
The fast changing dimensions which changes frequently
44)How can we do null handling in
sequential files? Reliance
Answer
# 1
# 1
While importing the data we can specifie nullable 'yes' in
define tab.
or
In format tab we can specifie the Final delimeter as 'null'.
45)what is the best stage in
datastage parller jobs to use full out join and why ?virtusa
Answer
# 2
# 2
join stage is the best,why means data getting from buffer
area.
Or
join stage supports all the joins like inner join,leftouter
join,right outer join,full outer join
lookup supports only
inner,leftouter only
46) What is the difference between SQl
Loader and OCI in datastage? CSC
Answer
# 1 SQL Loader (sqlldr) is a bulk loader utility from Oracle
# 1 SQL Loader (sqlldr) is a bulk loader utility from Oracle
for loading data into Oracle tables where as DataStage OCI
(Oracle Call Inteface) is a passive stage that connects to
any Oracle Database and collects necessary data and passes
it on to Datastage ETL for processing and applying
transformation where necessary and loads into target table.
DataStage OCI offers input and output stages for
conveniently connecting to Oracle Source databases and
Target Oracle databases
OR
2.An addition of above info through sqlldr we load bulk data
for this we use ora bulk stage. In this we load data
through script so data loaded in form of package. But there
is some restriction also, when we load data (bulk) target
table should not have index. We generally load data in
table-partation like weekly,monthly. It is faster than OCI.
Where as through OCI we load data as sequence manar. we can
set buffer size as require to read or write data. Here no
need to remove index from table.
47) i have seq file that contents 10
million records load to target any data base.. in that case it takes lot of
time for loading..how do performance tuning in that situation...? TCS
Answer
# 1
# 1
1) In sequential stage we have stage properties on that
select options and increase the readers
2) In datastage admin we have on environmental variable like
logical reader that set as one
48)How to Convert a string function
to date function by using only sequential file stage with out using other
stages ?tcs
Answer 1
I think this is not posible......
But we can load date in sequenatial file stage.
49) how
can we create a Sorrogate key in transformer stage? I want it in parallel mode
Answer
In transformer Properties, there is a tab called “Surrogate Key”. Here we can define it.
In transformer Properties, there is a tab called “Surrogate Key”. Here we can define it.
We can select from
a. Flat file :
State file which holds the SK values and generates SK values.
b. DB Sequence:
Database sequence object to use to generate the SK values.
50) how
to retrive the max(sal),deptno,empno in datastage?
Answer
we can retrive the max(sal),deptno,empno in aggreator stage.
we can retrive the max(sal),deptno,empno in aggreator stage.
way is
set group=deptno
aggregator
type=calculation
column for
calculation=sal
max value output column=max(sal)
51)how to find no.of records in
sequntial file itself?
Answer
seq file --> go for options --> select row count.
seq file --> go for options --> select row count.
then automatically updated.
52) How can you join flat file, oracle
as a sources?
Answer
first populate flatfile data to orale db
two methods,
1. write a join query in source oralce stage and link to
target.
2. take two oracle sources like s1 and s2 , use join stage
and link to target
or
direct method
3.take flat file and oracle source, use join stage and link
to target.
53)types of errors in datastage?
Answer1
source file not found
null value populated not nullable column
died with signal 11( due to server down)
fatal errors
nls-warning
meta data missmatch
datatype missmatch
parallel loading not enabled
field size different
data type size between source and target diff
column missmatch
file opening error
mutex error
process time out...
54) Wat
is pre-load in Hashed file
Answer
you can speed up the read operation of reference hash file
you can speed up the read operation of reference hash file
link by pre loading the hash file in memory.
55)what is Audit table?Have u use
audit table in ur project?
Answer
Audit table mean its log file.in every job should has audit
Audit table mean its log file.in every job should has audit
table.
56) How can your remove the
duplicates in sequential File?
Answer
4
4
In the source sequential file goto Output->Properties
under options you mention reject mode = Output.
This will gives the duplicate records in the output.
This will 100% work why because in my present project we
are using the same format.
OR
Sort the incoming data using a sort stage & use remove
duplicates on the key columns. This will remove the
duplicates in ur source file.
57) WHAT
ARE THE STAGES U WILL USE IN IMPLEMENTING SCD2/DATE FIELD Answer
# 1
# 1
Hai..,
This is Kiran.
In SCD's type we r using change capture and transfernar
stage.
date fields are using in target level as Eff_start_date and
Eff_end_date.
Eff_start_date=current date
Eff_end_date
=Currentdate-1.
58)WHERE YOU USE UNIX COMMANDS AS A
ETL DEVELOPER?wipro
Answer
# 2
# 2
you can call in the following places in ETL(DataStage)
1-Sequential File Stage (Stage->Stage Uses filter
commands).
2-Before and after Stage Subroutines in Job Paramametrs tab.
3-Before and after Stage Subroutines in Transformer stage.
4-Job Sequences using Execute command Activity and
Routine
activity.
5-Using DataStage Routines
6-Using Routines called ExecSH and ExecSHSilent
59)eno ename esal acct1
acct2 amount1 amount2
100 suresh
10000 sbi1 sbi2 1000
2000
this is our sourse data
i would loke to disply like this
eno ename
esal acct amount
100 suresh
10000 sbi1 1000
100 suresh
10000 sbi2 2000
Ans:
Use Pivot stage......
60) In
my source i've 10 records in a single column.... but i want to split those
records into 5 sequential files each seq file contains 2 records.?.... can any
body help me?
Answer
# 4 Using transformer we have a system variable @INROWNUM which
# 4 Using transformer we have a system variable @INROWNUM which
will store the row number of each of the incoming row.
Using this variable in constrants part of the each link we
can split the records.
OR
source is seq file and take target as seq file
then in target, cilk on target folder option and select
file property for 5 times then run the job.
if u have any
doubts pls make a call
61) In
my source i have 10 records but i want 11 records in target. How can i acheive
this in Server jobs.? ibm
Answer
using pivot
satage you can get that target records dont bother ok you have to implement
that ,otherwise transformer stage in also you can add one record in output
colomns and implement thatderivation cell . ok i think it very useful to tour
question
1. What is the flow of loading data into fact &
dimensional tables?
A)
Fact table - Table with Collection of Foreign Keys corresponding to the Primary
Keys in Dimensional table. Consists of fields with numeric values.
Dimension table -
Table with Unique Primary Key.
Load
- Data should be first loaded into dimensional table. Based on the primary key
values in dimensional table, the data should be loaded into Fact table.
2.
What is the default cache size? How do you change the
cache size if needed?
A. Default cache size is 256 MB. We
can increase it by going into Datastage Administrator and selecting the Tunable
Tab and specify the cache size over there.
3.
What are types of Hashed File?
A) Hashed
File is classified broadly into 2 types.
a)
Static - Sub divided into 17 types based on Primary
Key Pattern.
b)
Dynamic - sub divided into 2 types
i)
Generic ii)
Specific.
Dynamic files do not perform as well as a
well, designed static file, but do perform better than a badly designed one.
When creating a dynamic file you can specify the following Although all of
these have default values)
By Default Hashed
file is "Dynamic - Type Random 30 D"
4. What does a
Config File in parallel extender consist of?
A)
Config file consists of the following. a) Number of Processes or Nodes.
b) Actual Disk
Storage Location.
5. What is Modulus
and Splitting in Dynamic Hashed File?
A. In a Hashed File, the size of the file
keeps changing randomly. If the size of the file increases it is called as
"Modulus".
If the size of the file decreases it is
called as "Splitting".
6. What are Stage
Variables, Derivations and Constants?
A.
Stage Variable - An intermediate processing variable that retains value during
read and doesn’t pass the value into target column.
Derivation - Expression that
specifies value to be passed on to the target column. Constant - Conditions
that are either true or false that specifies flow of data with a link.
7. Types of views
in Datastage Director?
There
are 3 types of views in Datastage Director
a)
Job View - Dates of Jobs Compiled.
b)
Log View - Status of Job last run
☻Page
2 of 210☻
c)
Status View - Warning Messages, Event Messages, Program Generated Messages.
8. Types of
Parallel Processing?
A)
Parallel Processing is broadly classified into 2 types.
a)
SMP - Symmetrical Multi Processing.
b)
MPP - Massive Parallel Processing.
9. Orchestrate Vs
Datastage Parallel Extender?
A) Orchestrate itself is an ETL tool with
extensive parallel processing capabilities and running on UNIX platform.
Datastage used Orchestrate with Datastage XE (Beta version of 6.0) to incorporate
the parallel processing capabilities. Now Datastage has purchased Orchestrate
and integrated it with Datastage XE and released a new version Datastage 6.0
i.e Parallel Extender.
10. Importance of
Surrogate Key in Data warehousing?
A) Surrogate Key is a
Primary Key for a Dimension table. Most importance of using it is it is
independent of underlying database. i.e. Surrogate Key is not affected by the
changes going on with a database.
11.
How to run a Shell Script within the scope of a Data
stage job?
A) By using
"ExcecSH" command at Before/After job properties.
12.
How to handle Date conversions in Datastage? Convert a
mm/dd/yyyy format to
yyyy-dd-mm?
A) We
use a) "Iconv" function - Internal Conversion.
b)
"Oconv" function - External Conversion.
Function to
convert mm/dd/yyyy format to yyyy-dd-mm is
Oconv(Iconv(Filedname,"D/MDY[2,2,4]"),"D-MDY[2,2,4]")
13 How do you
execute datastage job from command line prompt?
A) Using "dsjob" command
as follows. dsjob -run -jobstatus projectname jobname
14. Functionality
of Link Partitioner and Link Collector?
Link
Partitioner: It actually splits data into various partitions or data flows
using various partition methods.
Link
Collector: It collects the data coming from partitions, merges it into a single
15. Types of
Dimensional Modeling?
A)
Dimensional modeling is again sub divided into 2 types.
a)
Star Schema - Simple & Much Faster. Denormalized
form.
b)
Snowflake Schema - Complex with more Granularity. More
normalized form.
16. Differentiate
Primary Key and Partition Key?
☻Page
3 of 210☻
Primary Key is a
combination of unique and not null. It can be a collection of key values called
as composite primary key. Partition Key is a just a part of Primary Key. There
are several methods of partition like Hash, DB2, and Random etc. While using
Hash partition we specify the Partition Key.
17.
Differentiate Database data and Data warehouse data?
A) Data in a Database is
a)
Detailed or Transactional
b)
Both Readable and Writable.
c)
Current.
18.
Containers Usage and Types?
Container
is a collection of stages used for the purpose of Reusability. There are 2
types of Containers.
a) Local Container:
Job Specific
b) Shared Container:
Used in any job within a project.
19. Compare and
Contrast ODBC and Plug-In stages?
ODBC:
a) Poor Performance.
b)
Can be used for Variety of Databases.
c)
Can handle Stored Procedures.
Plug-In: a) Good
Performance.
b) Database specific.
(Only one database)
c) Cannot handle
Stored Procedures.
20. Dimension
Modelling types along with their significance
Data
Modelling is Broadly classified into 2 types.
a)
E-R Diagrams (Entity - Relatioships).
b)
Dimensional Modelling.
Q 21 What are
Ascential Dastastage Products, Connectivity
Ans:
Ascential Products
Ascential
DataStage
Ascential
DataStage EE (3)
Ascential
DataStage EE MVS
Ascential
DataStage TX
Ascential
QualityStage
Ascential
MetaStage
Ascential RTI (2)
Ascential
ProfileStage
Ascential
AuditStage
Ascential Commerce
Manager
Industry Solutions
☻Page
4 of 210☻
Connectivity
Files
RDBMS
Real-time
PACKs
EDI
Other
Q 22 Explain Data
Stage Architecture in 7.5x2?
Data Stage contains two
components,
Client Component.
Server Component.
Client Component:
₃
Data Stage Administrator.
₃
Data Stage Manager
₃
Data Stage Designer
₃
Data Stage Director
Server Components:
₃
Data Stage Engine
₃
Meta Data Repository
₃
Package Installer
Data Stage
Administrator:
Used to create the
project.
Contains set of properties
☻Page
5 of 210☻
We can increase
the buffer size.
We can set the
Environment Variables.
In tunable we have
in process and inter-process
In-process—Data
read in sequentially
Inter-process— It
reads the data as it comes.
It just interfaces
to metadata.
Q 23 What is Meta
Data Repository?
Meta
Data is a data about the data.
It also contains
₃
Query statistics
₃
ETL statistics
₃
Business subject area
₃
Source Information
₃
Target Information
₃
Source to Target mapping Information.
Q 24 What is Data
Stage Engine?
It
is a JAVA engine running at the background.
Q 25 What is
Dimensional Modeling?
Dimensional Modeling is a logical design
technique that seeks to present the data in a standard framework that is,
intuitive and allows for high performance access.
Q 26 What is Star
Schema?
Star Schema is a de-normalized
multi-dimensional model. It contains centralized fact tables surrounded by
dimensions table.
Dimension Table:
It contains a primary key and description about the fact table.
Fact Table: It
contains foreign keys to the dimension tables, measures and aggregates.
Q 27 What is
surrogate Key?
It is a 4-byte integer which replaces the
transaction / business / OLTP key in the dimension table. We can store up to 2
billion record.
Q 28 Why we need
surrogate key?
It is used for integrating
the data may help better for primary key.
Index maintenance, joins, table size, key
updates, disconnected inserts and partitioning.
Q 29 What is Snowflake schema?
It is partially normalized dimensional
model in which at two represents least one dimension or more hierarchy related
tables.
Q 30 Explain Types
of Fact Tables?
Factless
Fact: It contains only foreign keys to the dimension tables. Additive Fact:
Measures can be added across any dimensions.
Semi-Additive: Measures can be
added across some dimensions. Eg, % age, discount Non-Additive: Measures cannot
be added across any dimensions. Eg, Average
Conformed
Fact: The equation or the measures of the two fact tables are the same under
the facts are measured across the dimensions with a same set of measures.
Q 31 Explain the
Types of Dimension Tables?
Conformed Dimension: If a
dimension table is connected to more than one fact table, the granularity that
is defined in the dimension table is common across between the fact tables.
Junk Dimension:
The Dimension table, which contains only flags.
Monster
Dimension: If rapidly changes in Dimension are known as Monster Dimension.
De-generative Dimension: It is line item-oriented fact table design.
Q 32 What are
stage variables?
Stage variables are declaratives in
Transformer Stage used to store values. Stage variables are active at the run
time. (Because memory is allocated at the run time).
Q 33 What is
sequencer?
It
sets the sequence of execution of server jobs.
Q 34 What are
Active and Passive stages?
Active Stage: Active stage
model the flow of data and provide mechanisms for combining data streams,
aggregating data and converting data from one data type to another. Eg,
Transformer, aggregator, sort, Row Merger etc.
Passive Stage: A Passive stage handles
access to Database for the extraction or writing of data. Eg, IPC stage, File
types, Universe, Unidata, DRS stage etc.
Q 35 What is ODS?
Operational
Data Store is a staging area where data can be rolled back.
Q 36 What are
Macros?
They
are built from Data Stage functions and do not require arguments.
A number of macros are
provided in the JOBCONTROL.H file to facilitate getting information about the
current job, and links and stages belonging to the current job. These can be
used in expressions (for example for use in Transformer stages), job control
routines, filenames and table names, and before/after subroutines.
Q 37 What is
keyMgtGetNextValue?
It is a Built-in transform it generates
Sequential numbers. Its input type is literal string & output type is
string.
Q 38 What are
stages?
The
stages are either passive or active stages.
Passive stages
handle access to databases for extracting or writing data.
Active stages model the flow of
data and provide mechanisms for combining data streams, aggregating data, and
converting data from one data type to another.
Q 39 What index is
created on Data Warehouse?
Bitmap
index is created in Data Warehouse.
Q 40 What is
container?
A
container is a group of stages and links. Containers enable you to simplify and
modularize your server job designs by replacing complex areas of the diagram
with a single container stage. You can also use shared containers as a way of
incorporating server job functionality into parallel jobs.
DataStage provides
two types of container:
Question: What are Routines and where/how
are they written and have you written any routines before?
Answer:
Routines are stored in the Routines branch
of the DataStage Repository, where you can create, view or edit.
The following are
different types of Routines:
1. Transform
Functions
2. Before-After Job
subroutines
3. Job Control
Routines
Question:
How did you handle an 'Aborted' sequencer? Answer:
In almost all cases we have to delete the
data inserted by this from DB manually and fix the job and then run the job
again.
Question:
What are Sequencers? Answer:
Sequencers are job control programs
that execute other jobs with preset Job parameters.
Question:
Read the String functions in DS Answer:
Functions
like [] -> sub-string function and ':' -> concatenation operator Syntax:
string [ [ start,
] length ]
string [
delimiter, instance, repeats ]
Question: What will you in a situation
where somebody wants to send you a file and use that file as an input or
reference and then run job.
Answer:
• Under
Windows: Use the 'WaitForFileActivity' under the
Sequencers and then run the job. May be you can schedule the sequencer
around the time the file is expected to arrive.
• Under
UNIX: Poll for the file. Once the file has start the job
or sequencer depending on the file.
Question: What is the utility you use to
schedule the jobs on a UNIX server other than using Ascential Director?
Answer:
Use crontab utility along with
dsexecute() function along with proper parameters passed.
3.DIMENSION
TABLE VS FACT TABLE
Ans
|
|
DWH
Architecture
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The main difference between the
database architecture in a standard, on-line transaction processing oriented
system (usually ERP or CRM system) and a DataWarehouse is that the system’s
relational model is usually de-normalized into dimension and fact tables
which are typical to a data warehouse database design.
The differences in the database architectures are caused by different purposes of their existence.
In a typical OLTP system the
database performance is crucial, as end-user interface responsiveness is one
of the most important factors determining usefulness of the application. That
kind of a database needs to handle inserting thousands of new records every
hour. To achieve this usually the database is optimized for speed of Inserts,
Updates and Deletes and for holding as few records as possible. So from a
technical point of view most of the SQL queries issued will be INSERT, UPDATE
and DELETE.
Opposite
to OLTP systems, a DataWarehouse is a system that should give response to
almost any question regarding company performance measure. Usually the
information delivered from a data warehouse is used by people who are in
charge of making decisions. So the information should be accessible quickly
but the thing is that the data doesn’t have to be the freshest possible and
in a lowest detail level – usually a data warehouse is refreshed on a daily
basis and very often overnight.
The main challenge of DataWarehouse architecture is to store historical, summarized data with a read-only access of the end-users. Again, from a technical standpoint the most SQL queries would start with a SELECT statement. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
6.SNOWFLAKE SCHEMA
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Snowflake schema architecture is a
more complex variation of a star schema design. The main difference is that
dimensional tables in a snowflake schema are normalized, so they have a
typical relational database design.
Snowflake schemas are generally
used when a dimensional table becomes very big and when a star schema can’t
represent the complexity of a data structure. For example if a PRODUCT
dimension table contains millions of rows, the use of snowflake schemas
should significantly improve performance by moving out some data to other
table (with BRANDS for instance).
The problem is that the more
normalized the dimension table is, the more complicated SQL joins must be
issued to query them. This is because in order for a query to be answered,
many tables need to be joined and aggregates generated.
An example of a snowflake schema architecture is depicted below. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7.Mapping
|
:
|
Increamental Data load
code using SQL Override logic
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Explanation
|
:
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9.Design a mapping as shown below
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Solution
|
:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10.Design a mapping as shown below
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Solution
|
:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
11.Source
Definition
|
Target
Definition
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
No comments:
Post a Comment