Wednesday, 18 July 2012

Informatica Questions1


What are the differences between Connected and Unconnected Lookup?

The differences are illustrated in the below table
Connected LookupUnconnected Lookup
Connected lookup participates in dataflow and receives input directly from the pipelineUnconnected lookup receives input values from the result of a LKP: expression in another transformation
Connected lookup can use both dynamic and static cacheUnconnected Lookup cache can NOT be dynamic
Connected lookup can return more than one column value ( output port )Unconnected Lookup can return only one column value i.e. output port
Connected lookup caches all lookup columnsUnconnected lookup caches only the lookup output ports in the lookup conditions and the return port
Supports user-defined default values (i.e. value to return when lookup conditions are not satisfied)Does not support user defined default values

What is the difference between Router and Filter?

Following differences can be noted,
Router transformation divides the incoming records into multiple groups based on some condition. Such groups can be mutually inclusive (Different groups may contain same record) Filter transformation restricts or blocks the incoming record set based on one given condition.
Router transformation itself does not block any record. If a certain record does not match any of the routing conditions, the record is routed to default group Filter transformation does not have a default group. If one record does not match filter condition, the record is blocked
Router acts like CASE.. WHEN statement in SQL (Or Switch().. Case statement in C) Filter acts like WHERE condition is SQL.

What can we do to improve the performance of Informatica Aggregator Transformation?

Aggregator performance improves dramatically if records are sorted before passing to the aggregator and "sorted input" option under aggregator properties is checked. The record set should be sorted on those columns that are used in Group By operation.
It is often a good idea to sort the record set in database level (click here to see why?) e.g. inside a source qualifier transformation, unless there is a chance that already sorted records from source qualifier can again become unsorted before reaching aggregator
You may also read this article to know how to tune the performance of aggregator transformation

What are the different lookup cache(s)?

Informatica Lookups can be cached or un-cached (No cache). And Cached lookup can be either static or dynamic. A static cache is one which does not modify the cache once it is built and it remains same during the session run. On the other hand, A dynamic cache is refreshed during the session run by inserting or updating the records in cache based on the incoming source data. By default, Informatica cache is static cache.
A lookup cache can also be divided as persistent or non-persistent based on whether Informatica retains the cache even after the completion of session run or deletes it

How can we update a record in target table without using Update strategy?

A target table can be updated without using 'Update Strategy'. For this, we need to define the key in the target table in Informatica level and then we need to connect the key and the field we want to update in the mapping Target. In the session level, we should set the target property as "Update as Update" and check the "Update" check-box.
Let's assume we have a target table "Customer" with fields as "Customer ID", "Customer Name" and "Customer Address". Suppose we want to update "Customer Address" without an Update Strategy. Then we have to define "Customer ID" as primary key in Informatica level and we will have to connect Customer ID and Customer Address fields in the mapping. If the session properties are set correctly as described above, then the mapping will only update the customer address field for all matching customer IDs.

Under what condition selecting Sorted Input in aggregator may fail the session?

  • If the input data is not sorted correctly, the session will fail.
  • Also if the input data is properly sorted, the session may fail if the sort order by ports and the group by ports of the aggregator are not in the same order.

Why is Sorter an Active Transformation?

This is because we can select the "distinct" option in the sorter property.
When the Sorter transformation is configured to treat output rows as distinct, it assigns all ports as part of the sort key. The Integration Service discards duplicate rows compared during the sort operation. The number of Input Rows will vary as compared with the Output rows and hence it is an Active transformation.

Is lookup an active or passive transformation?

From Informatica 9x, Lookup transformation can be configured as as "Active" transformation. Find out How to configure lookup as active transformation
However, in the older versions of Informatica, lookup is a passive transformation

What is the difference between Static and Dynamic Lookup Cache?

We can configure a Lookup transformation to cache the underlying lookup table. In case of static or read-only lookup cache the Integration Service caches the lookup table at the beginning of the session and does not update the lookup cache while it processes the Lookup transformation.
In case of dynamic lookup cache the Integration Service dynamically inserts or updates data in the lookup cache and passes the data to the target. The dynamic cache is synchronized with the target.

What is the difference between STOP and ABORT options in Workflow Monitor?

When we issue the STOP command on the executing session task, the Integration Service stops reading data from source. It continues processing, writing and committing the data to targets. If the Integration Service cannot finish processing and committing data, we can issue the abort command.
In contrast ABORT command has a timeout period of 60 seconds. If the Integration Service cannot finish processing and committing data within the timeout period, it kills the DTM process and terminates the session.

How to Delete duplicate row using Informatica

Scenario 1: Duplicate rows are present in relational database

Suppose we have Duplicate records in Source System and we want to load only the unique records in the Target System eliminating the duplicate rows. What will be the approach?
Assuming that the source system is a Relational Database, to eliminate duplicate records, we can check the Distinct option of the Source Qualifier of the source table and load the target accordingly.
Source Qualifier Transformation DISTINCT clause 
2	Challenges of Data warehouse Testing
•	Data selection from multiple source systems and 
analysis that follows pose great challenge. 
•	Volume and the complexity of the data.
•	Inconsistent and redundant data in a data warehouse.
•	Inconsistent and Inaccurate reports. 
•	Non-availability of History data.
3	Testing Methodology 
•	Use of Traceability to enable full test coverage of 
Business Requirements
•	In depth review of Test Cases
•	Manipulation of Test Data to ensure full test 
Fig 1 Testing Methodology (V- Model)
•	Provision of appropriate tools to speed the process 
of Test Execution & Evaluation
•	Regression Testing
4	Testing Types
The following are types of Testing performed for Data 
warehousing projects.
1.	Unit Testing.
2.	Integration Testing.
3.	Technical Shakedown Testing.
4.	System Testing.
5.	Operation readiness Testing
6.	User Acceptance Testing.
4.1	Unit Testing
The objective of Unit testing involves testing of Business 
transformation rules, error conditions, mapping fields at 
staging and core levels.
Unit testing involves the following
1.	Check the Mapping of fields present in staging 
2.	Check for the duplication of values generated using 
Sequence generator.
3.	Check for the correctness of surrogate keys, which 
uniquely identifies rows in database.
4.	Check for Data type constraints of the fields 
present in staging and core levels.
5.	Check for the population of status and error 
messages into target table.
6.	Check for string columns are left and right trimmed.
7.	Check every mapping needs to implement the process 
abort mapplet which is invoked if the number of record read 
from source is not equal to trailer count.
8.	Check every object, transformation, source and 
target need to have proper metadata. Check visually in data 
warehouse designer tool if every transformation has a 
meaningful description.
4.2	Integration Testing
The objective of Integration Testing is to ensure that 
workflows are executed   as scheduled with correct 
Integration testing involves the following
1.	To check for the execution of workflows at the 
following stages
      Source to Staging  A.
Staging A to Staging  B.
      Staging B to Core.
2.	To check target tables are populated with correct 
number of records.
3.	 Performance of the schedule is recorded and 
analysis is performed    
                on the performance result.
4.	To verify the dependencies among workflows between 
source to staging, staging to staging and staging to core 
is have been properly defined.
5.	To Check for Error log messages in appropriate file.
6.	To verify if the start jobs starts at pre-defined 
starting time. Example if the start time for first job has 
been configured to be at 10:00AM and the Control-M group 
has been ordered at 7AM, the first job would not start in 
Control-M until 10:00AM. 
7.	To check for restarting of Jobs in case of failures.
4.3	Technical Shakedown Test
Due to the complexity in integrating the various source 
systems and tools, there are expected to be several 
teething problems with the environments. A Technical 
Shakedown Test will be conducted prior to commencing System 
Testing, Stress & Performance, User Acceptance testing and 
Operational Readiness Test to ensure the following points 
are proven:
•	Hardware is in place and has been configured 
correctly (including Informatica architecture, Source 
system connectivity and Business Objects).
•	All software has been migrated to the testing 
environments correctly.
•	All required connectivity between systems are in 
•	End-to-end transactions (both online and batch 
transactions) have been executed and do not fall over.
4.4	 System Testing 
The objective of System Testing is to ensure that the 
required business functions are implemented correctly. This 
phase includes data verification which tests the quality of 
data populated into target tables.
 System Testing involves the following
1.	To check the functionality of the system meets the 
business specifications.
2.	To check for the count of records in source table 
and comparing with the number of records in the target 
table followed by analysis of rejected records.
3.	To check for end to end integration of systems and 
connectivity of the infrastructure (e.g. hardware and 
network configurations are correct),
4.	To check all transactions, database updates and 
data flows functions for accuracy.
5.	To validate Business reports functionality.

Reporting functionality	Ability to report data as required 
by Business using Business Objects
Report Structure	Since the universe and reports have 
been migrated from previous version of Business Objects, 
it’s necessary to ensure that the upgraded reports 
replicate the structure/format and data requirements (until 
and unless a change / enhancement has been documented in 
Requirement Traceability Matrix / Functional Design 
Enhancements	Enhancements like reports’ structure, 
prompts ordering which were in scope of upgrade project 
will be tested
Data Accuracy	The data displayed in the reports / prompts 
matches with the actual data in data mart.
Performance	Ability of the system to perform certain 
functions within a prescribed time.
That the system meets the stated performance criteria 
according to agreed SLAs or specific non-functional 
Security	That the required level of security access 
is controlled and works properly, including domain 
security, profile security, Data Security, UserID and 
password control, and access procedures. That the security 
system cannot be bypassed.
Usability	That the system is useable as per specified 
User Accessibility	That specified type of access to 
data is provided to users
Connection Parameters	Test the connection
Data provider	Check for the right universe and duplicate 
Conditions/Selection criteria	Test the for selection 
criteria for the correct logic 
Object testing	Test the objects definitions
Context testing	Ensure formula is with input or output 
Variable testing	Test the variable for its syntax 
and data type compatible
Formulas or calculations	Test the formula for its 
syntax and validate the data given by the formula
Filters	Test the data has filter correctly
Alerts	Check for extreme limits Report alerts 
Sorting	Test the sorting order of Section headers fields, 
Totals and subtotals	Validate the data results
Universe Structure	Integrity of universe is maintained 
and there are no divergences in terms of joins / objects / 
4.5	User Acceptance Testing
The objective of this testing to ensure that System meets 
the expectations of the business users.  It aims to prove 
that the entire system operates effectively in a production 
environment and that the system successfully supports the 
business processes from a user's perspective. Essentially, 
these tests will run through “a day in the life of” 
business users.  The tests will also include functions that 
involve source systems connectivity, jobs scheduling and 
Business reports functionality.
4.6	Operational Readiness Testing (ORT)
This is the final phase of testing which focuses on 
verifying the deployment of software and the operational 
readiness of the application. The main areas of testing in 
this phase include:
 Deployment Test
1.	Tests the deployment of the solution 
2.	Tests overall technical deployment “checklist” and 
3.	Tests the security aspects of the system including 
user authentication and   authorization, and user-access 
 Operational and Business Acceptance Testing 
1.	Tests the operability of the system including job 
control and scheduling. 
2.	Tests include normal scenarios, abnormal, and fatal 
5	Test Data
Given the complexity of Data warehouse projects; 
preparation of test data is daunting task. Volume of data 
required for each level of testing is given below.
Unit Testing - This phase of testing will be performed with 
a small subset (20%) of production data for each source 
Integration Testing - This phase of testing will be 
performed with a small subset of production data for each 
source system.
System Testing – This phase of a subset of live data will 
be used which is sufficient in volume to contain all 
required test conditions that includes normal scenarios, 
abnormal, and fatal scenarios but small enough that 
workflow execution time does not impact the test schedule 
6	Conclusion
Data warehouse solutions are becoming almost ubiquitous as 
a supporting technology for the operational and strategic 
functions at most companies. Data warehouses play an 
integral role in business functions as diverse as 
enterprise process management and monitoring, and 
production of financial statements. The approach described 
here combines an understanding of the business rules 
applied to the data with the ability to develop and use 
testing procedures that check the accuracy of entire data 
sets. This level of testing rigor requires additional 
effort and more skilled resources. However, by employing 
this methodology, the team can be more confident, from day 
one of the implementation of the DW, in the quality of the 
data. This will build the confidence of the end-user 
community, and it will ultimately lead to a more effective 

No comments:

Post a Comment