Thursday 8 September 2011

DWH Banking Project Explanation


                             DWH Bank   Project Explanation

First you have to start with
Ø  Objective of the project and what are client expectations
Ø  Your involvement and responsibility of your job and limitations of job, Project architecture and team structure.

Client Introduction:
            Main objective of this project is we are providing a system with all the information regarding Transactions of entire banks all over the country. We will get the daily transaction data from all branches at the end of the day. We have to validate the transactions and implement the business logic based on the      transactions type or transaction code. We have to load all historical data into DWH and once finished historical data. We have to load Delta Loads.

Also explain about:
Ø  Target Staging Area.
Ø  Mappings:
o   Source to staging mappings.
o   Staging to warehousing.


Each transaction contains Transaction code. Based on the transaction code we can identify whether that transaction belongs to withdraw, deposit, loan, payment… based on that code business logic will be change. We validate and calculate the measure and load to database.

Mapping explanation:
In Informatica mapping, we first lookup all the transaction codes with code master table to identify the transaction type to implement the correct logic and filter the unnecessary transactions. Because in BANK there are lot of transactions will be there but we have to consider only required transactions for the project. The transaction code exists in the code master table are only transactions we have to consider and other transactions load into one table called Wrap table and invalid records( transaction code missing, null, spaces) to Error table. For each dimension table we are creating surrogate key and load into DWH tables.

SCD2 Mapping:
We are implementing SCD2 mapping for customer dimension to keep history of the customers. We are using SCD2 Date method.

Architecture of Project:

 Analysis
                  Requirement Gathering
                                                            Design
                                                                          Development
                                                                                                   Testing
                                                                                                                 Production

Analysis and Requirement Gathering:

Output:            Analysis Doc, Subject Area
100% in onsite,
Business Analyst, project manager.
Gather the useful information for the DSS and indentifying the subject areas, identify the schema objects and all.

Design:

Output:            Technical Design Doc’s, HLD,UTP
ETL Lead, BA and Data Architect
80% onsite .( Schema design in Erwin and implement in database and preparing the technical design document for ETL.
20% offshore: HLD & UTP
HLD:
            High Level Design Document, Based on the Technical specs. Developers   has to   create the HLD, it will have the Informatica flow chart. What are the  transformations required for that mapping. HLD will cover only 75% of  requirement.
UTP:
            Unit Test Plan. Write the test cases based on the requirement.

Development:

Output:            Bugs free code, UTR, Integration Test Plan
ETL Team and offshore BA
100% offshore
Based on the HLD. U have to create the mappings. After that code review and code standard review will be done by another team member. Based on the review comments we have to update the mapping. Unit testing based on the UTP. We have to fill the UTP and enter the expected values and name it as UTR (Unit Test Results). 2 times code review and 2 times unit testing will be conducted in this phase. Migrating to testing repository. Integration test plan has to prepare by the senior people.



Testing:
Output:            ITR, UAT, Deployment Doc and User Guide
Testing Team, Business Analyst and Client.
80% offshore
based on the integration test plan testing the application and gives the bugs list to the developer. Developers will fix the bugs in the development repository and again migrated to testing repository. Again testing starts till the bugs free code.
20% Onsite
UAT                User Accept Testing. Client will do the UAT. This is last phase of the ETL                         project.

Production:
50% offshore 50% onsite
Work will be distributed between offshore and onsite based on the run time of the Loading. Mapping bugs needs to fix by Development team. Development team will support for warranty period based on agreement days.

In ETL projects Three Repositories. For each repository access permissions and location will be different.
Development:              E1
Testing:                       E2
Production:                 E3

Note:   E1, E2, E3 are Repository names. Nothing else.

Dimensions:
             Employee
            Customer
            Agent
            Transaction
            Date

Facts are relation between all those tables.





 Thanks & Regards,
SkyEss DWH Team..





Wednesday 1 June 2011

Best Practices


The following lists some performance considerations when building mappings.    

1.  Filter Expressions - try to evaluate them in a port expression. Try to create the filter (true/false) answer inside a port expression upstream. Complex filter expressions slow down the mapping.  Again, expressions/conditions operate fastest in an Expression Object with an output port for the result.  Turns out - the longer the expression, or the more complex - the more severe the speed degradation. Place the actual expression (complex or not) in an EXPRESSION OBJECT upstream from the filter. Compute a single numerical flag: 1 for true, 0 for false as an output port. Pump this in to the filter - you should see the maximum performance ability with this configuration.

2.  Remove all "DEFAULT" value expressions where possible. Having a default value - even the "ERROR(xxx)" command slows down the session. It causes an unnecessary evaluation of values for every data element in the map. The only time you want to use "DEFAULT value is when you have to provide a default value for a specific port. There is another method: placing a variable with an IIF(xxxx, DEFAULT VALUE, xxxx) condition within an expression. This will always be faster (if assigned to an output port) than a default value.

3.  Variable Ports are "slower" than Output Expressions. Whenever possible, use output expressions instead of variable ports. The variables are good for "static - and state driven" but do slow down the processing time - as they are allocated/reallocated each pass of a row through the expression transform.
4.  Datatype conversion - perform it in a port expression. Simply mapping a string to an integer, or an integer to a string will perform the conversion, however it will be slower than creating an output port with an expression like: to_integer(xxxx) and mapping an integer to an integer. It's because PMServer is left to decide if the conversion can be done mid-stream which seems to slow things down.

5.  Unused Ports. Surprisingly, unused output ports have no affect on performance. This is a good thing. However in general it is good practice to remove any unused ports in the mapping, including variables. Unfortunately - there is no "quick" method for identifying unused ports.

6.  String Functions. String functions definitely have an impact on performance. Particularly those that change the length of a string (substring, ltrim, rtrim, etc.). These functions slow the map down considerably, the operations behind each string function are expensive (de-allocate, and re-allocate memory within a READER block in the session). String functions are a necessary and important part of ETL, we do not recommend removing their use completely, only try to limit them to necessary operations. One of the ways to tune these is to use "varchar/varchar2" data types in your database sources, or to use delimited strings in source flat files (as much as possible). This will help reduce the need for "trimming" input. If your sources are in a database, perform the LTRIM/RTRIM functions on the data coming in from a database SQL statement, this will be much faster than operationally performing it mid-stream.

7.  IIF Conditionals are costly. When possible - arrange the logic to minimize the use of IIF conditionals. This is not particular to Informatica.  It is costly in ANY programming language. It introduces "decisions" within the tool; it also introduces multiple code paths across the logic (thus increasing complexity). Therefore - when possible, avoid utilizing an IIF conditional - again, the only possibility here might be (for example) an ORACLE DECODE function applied to a SQL source.

8.  Sequence Generators slow down mappings. Unfortunately there is no "fast" and easy way to create sequence generators. The cost is not that high for using a sequence generator inside of Informatica, particularly if you are caching values (cache at around 2000) - seems to be the suite spot. However - if at all avoidable, this is one "card" up a sleeve that can be played. If you don't absolutely need the sequence number in the map for calculation reasons, and you are utilizing Oracle, then let SQL*Loader create the sequence generator for all Insert Rows. If you're using Sybase, don't specify the Identity column as a target - let the Sybase Server generate the column. Also - try to avoid "reusable" sequence generators - they tend to slow the session down further, even with cached values.

9.  Test Expressions slow down sessions. Expressions such as: IS_SPACES tend slow down the mappings, this is a data validation expression which has to run through the entire string to determine if it is spaces, much the same as IS_NUMBER has to validate an entire string. These expressions (if at all avoidable) should be removed in cases where it is not necessary to "test" prior to conversion. Be aware however, that direct conversion without testing (conversion of an invalid value) will kill the transformation. If you absolutely need a test expression for a numerical value, try this: IIF(<field> * 1 >= 0,<field>,NULL) preferably you don't care if it's zero. An alpha in this expression should return a NULL to the computation. Yes - the IIF condition is slightly faster than the IS_NUMBER - because IS_NUMBER parses the entire string, where the multiplication operator is the actual speed gain.

10.       Reduce Number of OBJETS in a map. Frequently, the idea of these tools is to make the "data translation map" as easy as possible. All to often, that means creating "an" expression for each throughput/translation (taking it to an extreme of course). Each object adds computational overhead to the session and timings may suffer. Sometimes if performance is an issue / goal, you can integrate several expressions in to one expression object, thus reducing the "object" overhead. In doing so - you could speed up the map.

11.       Update Expressions - Session set to Update Else Insert. If you have this switch turned on - it will definitely slow the session down - Informatica performs 2 operations for each row: update (w/PK), then if it returns a ZERO rows updated, performs an insert. The way to speed this up is to "know" ahead of time if you need to issue a DD_UPDATE or DD_INSERT inside the mapping, then tell the update strategy what to do. After which you can change the session setting to: INSERT and UPDATE AS UPDATE or UPDATE AS INSERT.

12.       Multiple Targets are too slow. Frequently mappings are generated with multiple targets, and sometimes multiple sources. This (despite first appearances) can really burn up time. If the architecture permits change, and the users support re-work, then try to change the architecture -> 1 map per target is the general rule of thumb. Once reaching one map per target, the tuning gets easier. Sometimes it helps to reduce it to 1 source and 1 target per map. But - if the architecture allows more modularization 1 map per target usually does the trick. Going further, you could break it up: 1 map per target per operation (such as insert vs update). In doing this, it will provide a few more cards to the deck with which you can "tune" the session, as well as the target table itself. Going this route also introduces parallel operations. For further info on this topic, see my architecture presentations on Staging Tables, and 3rd normal form architecture (Corporate Data Warehouse Slides).

13.       Slow Sources - Flat Files. If you've got slow sources, and these sources are flat files, you can look at some of the following possibilities. If the sources reside on a different machine, and you've opened a named pipe to get them across the network - then you've opened (potentially) a can of worms. You've introduced the network speed as a variable on the speed of the flat file source. Try to compress the source file, FTP PUT it on the local machine (local to PMServer), decompress it, and then utilize it as a source.  If you're reaching across the network to a relational table - and the session is pulling many rows (over 10,000) then the source system itself may be slow.  You may be better off using a source system extract program to dump it to file first, then follow the above instructions. However, there is something your SA's and Network Ops folks could do (if necessary) - this is covered in detail in the advanced section.  They could backbone the two servers together with a dedicated network line (no hubs, routers, or other items in between the two machines).  At the very least, they could put the two machines on the same sub-net. Now, if your file is local to PMServer but is still slow, examine the location of the file (which device is it on). If it's not on an INTERNAL DISK then it will be slower than if it were on an internal disk (C drive for you folks on NT).  This doesn't mean a unix file LINK exists locally, and the file is remote - it means the actual file is local.

14.       Too Many Aggregators. If your map has more than 1 aggregator, chances are the session will run very slowly - unless the CACHE directory is extremely fast, and your drive seek/access times are very high.  Even still, placing aggregators end-to-end in mappings will slow the session down by factors of at least 2.  This is because of all the I/O activity being a bottleneck in Informatica.  What needs to be known here is that Informatica's products: PM / PC up through 4.7x are NOT built for parallel processing.  In other words, the internal core doesn't put the aggregators on threads, nor does it put the I/O on threads - therefore being a single strung process it becomes easy for a part of the session/map to become a "blocked" process by I/O factors.

15.       Maplets containing Aggregators. Maplets are a good source for replicating data logic. But just because an aggregator is in a maplet doesn't mean it won't affect the mapping. The reason maplets that don't affect speed of the mappings is they are treated as a part of the mapping once the session starts.  In other words, if you have an aggregator in a maplet, followed by another aggregator in a mapping you will still have the problem mentioned above in #14. Reduce the number of aggregators in the entire mapping (including maplets) to 1 if possible. If necessary split the mapping up in to several different mappings, use intermediate tables in the database if required to achieve processing goals.

16.       Eliminate "too many lookups". What happens and why? Well - with too many lookups, your cache is eaten in memory - particularly on the 1.6 / 4.6 products. The end result is that there is no memory left for the sessions to run in. The DTM reader/writer/transformer threads are not left with enough memory to be able to run efficiently. PC 1.7, PM 4.7 solve some of these problems by caching some of these lookups out to disk when the cache is full. But you still end up with contention - in this case, with too many lookups, you're trading in Memory Contention for Disk Contention.  The memory contention may be worse than the disk contention because the system OS end's up thrashing (swapping in and out of TEMP/SWAP disk space) with small block sizes to try and locate your lookup row.   As the row goes from lookup to lookup, the swapping / thrashing gets worse.

17.       Lookups & Aggregators Fight. The lookups and the aggregators fight for memory space as discussed above. Each requires Index Cache, and Data Cache and they "share" the same HEAP segments inside the core.   Particularly in the 4.6 / 1.6 products and prior - these memory areas become critical, and when dealing with many rows - the session is almost certain to cause the server to "thrash" memory in and out of the OS Swap space. If possible, separate the maps - perform the lookups in the first section of the maps, position the data in an intermediate target table - then a second map reads the target table and performs the aggregation (also provides the option for a group by to be done within the database)... Another speed improvement...


Tips & Tricks And Other Observations

Copying Sessions

When coping sessions from one repository to another, do not use the right-click copy/paste commands. This will copy the session in the current folder. Instead, use the CTRL-C, CTRL-V or the menu copy/paste commands.


IIF and Datatypes


If the precision and scale of value1 in an IIF statement is different than value 2, the precision for value 2 is rounded.

For example, if you have a value for sales of 7.75, and you have the following statement, a value of 7.7 will be returned:

IIF (ISNULL (Sales), 0.0, Sales)

If you had wanted two decimal places returned for sales, you would need to make your IIF statement look like the following:

IIF (IS Null (Sales), 0.00, Sales)


Lookup SQL Performance Considerations


A lookup SQL query always places an order by on the SQL for every lookup port.   For example, if the Lookup transformation includes three lookup ports used in the mapping, ITEM_ID, ITEM_NAME, and PRICE, the lookup query is:

SELECT  ITEM_NAME, PRICE, ITEM_ID FROM ITEMS_DIM ORDER BY ITEM_ID, ITEM_NAME, PRICE.

Therefore, if your lookups are slow in loading, you may need to have an index that your ORDER BY statement can utilize. 

Unconnected Lookup Port Default


If you have an unconnected lookup, Informatica ignores what is in the default port.  It always returns a null if the lookup is unsuccessful.

Unexpected Results When A Null Value Is In A Calculation


If you have an expression transformation that adds two values together (e.g.  A+B), if one of the values happens to contain a null value, the resulting sum will be set to null.  

One solution to this problem is to make sure you always set your default value for your port to 0 for numeric values.


Sessions and Using Debugger
 When using the debugger , always create a new session specifically for the debugger instead of using the original session created for the mapping.  The debugger can do things such as change your session to be a test load instead of a regular mode.  

Invalid Batch Ids
Let your administrator know when you are moving mappings/sessions from one repository to another – especially when you are moving to production.   If you are going to be running the sessions utilizing the pmcmd command, the server will need to be restarted.   If you don’t restart the server, you will receive ‘invalid batch id’ when you try to run something using pmcmd.   I have talked to Informatica about this and they state that their future architecture will eliminate this problem.

Resetting Designer or Server Manager Windows
 If you ever have your windows (e.g. navigator window, toolbars) undocked in designer or server manager, you can reset them back the standard settings by going to the \winnt directory and deleting either your pmdesign.ini file (to reset designer) or you pmsrvmgr.ini file (to reset your server manager).

Using Sequence Generators Across Multiple Environments
 If you ever struggle with managing your sequence generated ids between dev, fit, and prod, you can modify your mappings to make it seamless to move from one environment to another.   The following lists the steps you will need to include in your mappings:

1) Add a disconnected lookup that retrieves the max key from your target table.   Note:  This process will only work if you only have one session inserting into a target table at a time.
2) Create a mapping variable (e.g. $max_cust_key) and assign the max key from the disconnected lookup to the mapping variable.
3) Add a sequence generator to your mapping and specify that the value should be reset to 1 every time.
4) Pass the next_val from the sequence generator to an expression transformation.  Call this port something like ‘next_id’.    Add a new port called ‘next_key’ that uses the following formula:   $max_cust_key + ‘next_key’.   
5) Use ‘next_key’ as the key when you insert your records.


Saturday 28 May 2011

complex mapping..........!

Complex Mapping ,I'm sure everybody may have there own definition for this.

I hope, most of them will agree with me, on the following:---
1.If you have lot of Pipelines with Lots of Joiner's,Normalizer's,more number of Target instances to Load.
2.If a mapping is tryping to do both Insert's,Updates and deletes.

A advice in these cases is ,
Keep the Mapping more readible and simple,by splitting into smaller Mappings.

Wednesday 9 March 2011

the difference between union and joiner?

In earlier stage, i have lot of confusion........
finally i concluded,


Joiner select column from tables, where as union selects rows.
In Joiner, matching rows are joined side-by-side to make the
result table whereas in Union rows are joined one-below
the other to make the result table.

In informatica Union transformation is implemented with union all which mean if we do
union of two tables it will select all the rows i.e it will not select distinct rows.

Tuesday 8 March 2011

for informatica developers

Informatica Transformations

Informatica Transformations

A transformation is a repository object that generates, modifies, or passes data. The Designer provides a set of transformations that perform specific functions. For example, an Aggregator transformation performs calculations on groups of data.
Transformations can be of two types:
Active Transformation
An active transformation can change the number of rows that pass through the transformation, change the transaction boundary, can change the row type. For example, Filter, Transaction Control and Update Strategy are active transformations.
The key point is to note that Designer does not allow you to connect multiple active transformations or an active and a passive transformation to the same downstream transformation or transformation input group because the Integration Service may not be able to concatenate the rows passed by active transformations However, Sequence Generator transformation(SGT) is an exception to this rule. A SGT does not receive data. It generates unique numeric values. As a result, the Integration Service does not encounter problems concatenating rows passed by a SGT and an active transformation.
Passive Transformation.
A passive transformation does not change the number of rows that pass through it, maintains the transaction boundary, and maintains the row type.
The key point is to note that Designer allows you to connect multiple transformations to the same downstream transformation or transformation input group only if all transformations in the upstream branches are passive. The transformation that originates the branch can be active or passive.
Transformations can be Connected or UnConnected to the data flow.
Connected Transformation
Connected transformation is connected to other transformations or directly to target table in the mapping.
UnConnected Transformation
An unconnected transformation is not connected to other transformations in the mapping. It is called within another transformation, and returns a value to that transformation.

Informatica Transformations

Following are the list of Transformations available in Informatica:
  • Aggregator Transformation
  • Application Source Qualifier Transformation
  • Custom Transformation
  • Data Masking Transformation
  • Expression Transformation
  • External Procedure Transformation
  • Filter Transformation
  • HTTP Transformation
  • Input Transformation
  • Java Transformation
  • Joiner Transformation
  • Lookup Transformation
  • Normalizer Transformation
  • Output Transformation
  • Rank Transformation
  • Reusable Transformation
  • Router Transformation
  • Sequence Generator Transformation
  • Sorter Transformation
  • Source Qualifier Transformation
  • SQL Transformation
  • Stored Procedure Transformation
  • Transaction Control Transaction
  • Union Transformation
  • Unstructured Data Transformation
  • Update Strategy Transformation
  • XML Generator Transformation
  • XML Parser Transformation
  • XML Source Qualifier Transformation
  • Advanced External Procedure Transformation
  • External Transformation

 

 

 

 

 

Informatica Transformations

Aggregator Transformation
Aggregator transformation performs aggregate funtions like average, sum, count etc. on multiple rows or groups. The Integration Service performs these calculations as it reads and stores data group and row data in an aggregate cache. It is an Active & Connected transformation.
Difference b/w Aggregator and Expression Transformation? Expression transformation permits you to perform calculations row by row basis only. In Aggregator you can perform calculations on groups.
Aggregator transformation has following ports State, State_Count, Previous_State and State_Counter.
Components: Aggregate Cache, Aggregate Expression, Group by port, Sorted input.
Aggregate Expressions: are allowed only in aggregate transformations. can include conditional clauses and non-aggregate functions. can also include one aggregate function nested into another aggregate function.
Aggregate Functions: AVG, COUNT, FIRST, LAST, MAX, MEDIAN, MIN, PERCENTILE, STDDEV, SUM, VARIANCE
Application Source Qualifier Transformation
Represents the rows that the Integration Service reads from an application, such as an ERP source, when it runs a session.It is an Active & Connected transformation.
Custom Transformation
It works with procedures you create outside the designer interface to extend PowerCenter functionality. calls a procedure from a shared library or DLL. It is active/passive & connected type.
You can use CT to create T. that require multiple input groups and multiple output groups.
Custom transformation allows you to develop the transformation logic in a procedure. Some of the PowerCenter transformations are built using the Custom transformation. Rules that apply to Custom transformations, such as blocking rules, also apply to transformations built using Custom transformations. PowerCenter provides two sets of functions called generated and API functions. The Integration Service uses generated functions to interface with the procedure. When you create a Custom transformation and generate the source code files, the Designer includes the generated functions in the files. Use the API functions in the procedure code to develop the transformation logic.
Difference between Custom and External Procedure Transformation? In Custom T, input and output functions occur separately.The Integration Service passes the input data to the procedure using an input function. The output function is a separate function that you must enter in the procedure code to pass output data to the Integration Service. In contrast, in the External Procedure transformation, an external procedure function does both input and output, and its parameters consist of all the ports of the transformation.
Data Masking Transformation
Passive & Connected. It is used to change sensitive production data to realistic test data for non production environments. It creates masked data for development, testing, training and data mining. Data relationship and referential integrity are maintained in the masked data.
For example: It returns masked value that has a realistic format for SSN, Credit card number, birthdate, phone number, etc. But is not a valid value. Masking types: Key Masking, Random Masking, Expression Masking, Special Mask format. Default is no masking.
Expression Transformation
Passive & Connected. are used to perform non-aggregate functions, i.e to calculate values in a single row. Example: to calculate discount of each product or to concatenate first and last names or to convert date to a string field.
You can create an Expression transformation in the Transformation Developer or the Mapping Designer. Components: Transformation, Ports, Properties, Metadata Extensions.
External Procedure
Passive & Connected or Unconnected. It works with procedures you create outside of the Designer interface to extend PowerCenter functionality. You can create complex functions within a DLL or in the COM layer of windows and bind it to external procedure transformation. To get this kind of extensibility, use the Transformation Exchange (TX) dynamic invocation interface built into PowerCenter. You must be an experienced programmer to use TX and use multi-threaded code in external procedures.
Filter Transformation
Active & Connected. It allows rows that meet the specified filter condition and removes the rows that do not meet the condition. For example, to find all the employees who are working in NewYork or to find out all the faculty member teaching Chemistry in a state. The input ports for the filter must come from a single transformation. You cannot concatenate ports from more than one transformation into the Filter transformation. Components: Transformation, Ports, Properties, Metadata Extensions.
HTTP Transformation
Passive & Connected. It allows you to connect to an HTTP server to use its services and applications. With an HTTP transformation, the Integration Service connects to the HTTP server, and issues a request to retrieves data or posts data to the target or downstream transformation in the mapping.
Authentication types: Basic, Digest and NTLM. Examples: GET, POST and SIMPLE POST.
Java Transformation
Active or Passive & Connected. It provides a simple native programming interface to define transformation functionality with the Java programming language. You can use the Java transformation to quickly define simple or moderately complex transformation functionality without advanced knowledge of the Java programming language or an external Java development environment.
Joiner Transformation
Active & Connected. It is used to join data from two related heterogeneous sources residing in different locations or to join data from the same source. In order to join two sources, there must be at least one or more pairs of matching column between the sources and a must to specify one source as master and the other as detail. For example: to join a flat file and a relational source or to join two flat files or to join a relational source and a XML source.
The Joiner transformation supports the following types of joins:
  • Normal
Normal join discards all the rows of data from the master and detail source that do not match, based on the condition.
  • Master Outer
Master outer join discards all the unmatched rows from the master source and keeps all the rows from the detail source and the matching rows from the master source.
  • Detail Outer
Detail outer join keeps all rows of data from the master source and the matching rows from the detail source. It discards the unmatched rows from the detail source.
  • Full Outer
Full outer join keeps all rows of data from both the master and detail sources.
Limitations on the pipelines you connect to the Joiner transformation:
*You cannot use a Joiner transformation when either input pipeline contains an Update Strategy transformation.
*You cannot use a Joiner transformation if you connect a Sequence Generator transformation directly before the Joiner transformation.
Lookup Transformation
Passive & Connected or UnConnected. It is used to look up data in a flat file, relational table, view, or synonym. It compares lookup transformation ports (input ports) to the source column values based on the lookup condition. Later returned values can be passed to other transformations. You can create a lookup definition from a source qualifier and can also use multiple Lookup transformations in a mapping.
You can perform the following tasks with a Lookup transformation:
*Get a related value. Retrieve a value from the lookup table based on a value in the source. For example, the source has an employee ID. Retrieve the employee name from the lookup table.
*Perform a calculation. Retrieve a value from a lookup table and use it in a calculation. For example, retrieve a sales tax percentage, calculate a tax, and return the tax to a target.
*Update slowly changing dimension tables. Determine whether rows exist in a target.

Lookup Components: Lookup source, Ports, Properties, Condition.
Types of Lookup:
1) Relational or flat file lookup.
2) Pipeline lookup.
3) Cached or uncached lookup.
4) connected or unconnected lookup.


Normalizer Transformation
Active & Connected. The Normalizer transformation processes multiple-occurring columns or multiple-occurring groups of columns in each source row and returns a row for each instance of the multiple-occurring data. It is used mainly with COBOL sources where most of the time data is stored in de-normalized format.
You can create following Normalizer transformation:
*VSAM Normalizer transformation. A non-reusable transformation that is a Source Qualifier transformation for a COBOL source. VSAM stands for Virtual Storage Access Method, a file access method for IBM mainframe.
*Pipeline Normalizer transformation. A transformation that processes multiple-occurring data from relational tables or flat files. This is default when you create a normalizer transformation.
Components: Transformation, Ports, Properties, Normalizer, Metadata Extensions.
Rank Transformation
Active & Connected. It is used to select the top or bottom rank of data. You can use it to return the largest or smallest numeric value in a port or group or to return the strings at the top or the bottom of a session sort order. For example, to select top 10 Regions where the sales volume was very high or to select 10 lowest priced products. As an active transformation, it might change the number of rows passed through it. Like if you pass 100 rows to the Rank transformation, but select to rank only the top 10 rows, passing from the Rank transformation to another transformation. You can connect ports from only one transformation to the Rank transformation. You can also create local variables and write non-aggregate expressions.
Router Transformation
Active & Connected. It is similar to filter transformation because both allow you to apply a condition to test data. The only difference is, filter transformation drops the data that do not meet the condition whereas router has an option to capture the data that do not meet the condition and route it to a default output group.
If you need to test the same input data based on multiple conditions, use a Router transformation in a mapping instead of creating multiple Filter transformations to perform the same task. The Router transformation is more efficient.
Sequence Generator Transformation
Passive & Connected transformation. It is used to create unique primary key values or cycle through a sequential range of numbers or to replace missing primary keys.
It has two output ports: NEXTVAL and CURRVAL. You cannot edit or delete these ports. Likewise, you cannot add ports to the transformation. NEXTVAL port generates a sequence of numbers by connecting it to a transformation or target. CURRVAL is the NEXTVAL value plus one or NEXTVAL plus the Increment By value.
You can make a Sequence Generator reusable, and use it in multiple mappings. You might reuse a Sequence Generator when you perform multiple loads to a single target.
For non-reusable Sequence Generator transformations, Number of Cached Values is set to zero by default, and the Integration Service does not cache values during the session.For non-reusable Sequence Generator transformations, setting Number of Cached Values greater than zero can increase the number of times the Integration Service accesses the repository during the session. It also causes sections of skipped values since unused cached values are discarded at the end of each session.
For reusable Sequence Generator transformations, you can reduce Number of Cached Values to minimize discarded values, however it must be greater than one. When you reduce the Number of Cached Values, you might increase the number of times the Integration Service accesses the repository to cache values during the session.
Sorter Transformation
Active & Connected transformation. It is used sort data either in ascending or descending order according to a specified sort key. You can also configure the Sorter transformation for case-sensitive sorting, and specify whether the output rows should be distinct. When you create a Sorter transformation in a mapping, you specify one or more ports as a sort key and configure each sort key port to sort in ascending or descending order.
Source Qualifier Transformation
Active & Connected transformation. When adding a relational or a flat file source definition to a mapping, you need to connect it to a Source Qualifier transformation. The Source Qualifier is used to join data originating from the same source database, filter rows when the Integration Service reads source data, Specify an outer join rather than the default inner join and to specify sorted ports.
It is also used to select only distinct values from the source and to create a custom query to issue a special SELECT statement for the Integration Service to read source data
SQL Transformation
Active/Passive & Connected transformation. The SQL transformation processes SQL queries midstream in a pipeline. You can insert, delete, update, and retrieve rows from a database. You can pass the database connection information to the SQL transformation as input data at run time. The transformation processes external SQL scripts or SQL queries that you create in an SQL editor. The SQL transformation processes the query and returns rows and database errors.
Stored Procedure Transformation
Passive & Connected or UnConnected transformation. It is useful to automate time-consuming tasks and it is also used in error handling, to drop and recreate indexes and to determine the space in database, a specialized calculation etc. The stored procedure must exist in the database before creating a Stored Procedure transformation, and the stored procedure can exist in a source, target, or any database with a valid connection to the Informatica Server. Stored Procedure is an executable script with SQL statements and control statements, user-defined variables and conditional statements.
Transaction Control Transformation
Active & Connected. You can control commit and roll back of transactions based on a set of rows that pass through a Transaction Control transformation. Transaction control can be defined within a mapping or within a session.
Components: Transformation, Ports, Properties, Metadata Extensions.
Union Transformation
Active & Connected. The Union transformation is a multiple input group transformation that you use to merge data from multiple pipelines or pipeline branches into one pipeline branch. It merges data from multiple sources similar to the UNION ALL SQL statement to combine the results from two or more SQL statements. Similar to the UNION ALL statement, the Union transformation does not remove duplicate rows.
Rules
1) You can create multiple input groups, but only one output group.
2) All input groups and the output group must have matching ports. The precision, datatype, and scale must be identical across all groups.
3) The Union transformation does not remove duplicate rows. To remove duplicate rows, you must add another transformation such as a Router or Filter transformation.
4) You cannot use a Sequence Generator or Update Strategy transformation upstream from a Union transformation.
5) The Union transformation does not generate transactions.
Components: Transformation tab, Properties tab, Groups tab, Group Ports tab.
Unstructured Data Transformation
Active/Passive and connected. The Unstructured Data transformation is a transformation that processes unstructured and semi-structured file formats, such as messaging formats, HTML pages and PDF documents. It also transforms structured formats such as ACORD, HIPAA, HL7, EDI-X12, EDIFACT, AFP, and SWIFT.
Components: Transformation, Properties, UDT Settings, UDT Ports, Relational Hierarchy.
Update Strategy Transformation
Active & Connected transformation. It is used to update data in target table, either to maintain history of data or recent changes. It flags rows for insert, update, delete or reject within a mapping.
XML Generator Transformation
Active & Connected transformation. It lets you create XML inside a pipeline. The XML Generator transformation accepts data from multiple ports and writes XML through a single output port.
XML Parser Transformation
Active & Connected transformation. The XML Parser transformation lets you extract XML data from messaging systems, such as TIBCO or MQ Series, and from other sources, such as files or databases. The XML Parser transformation functionality is similar to the XML source functionality, except it parses the XML in the pipeline.
XML Source Qualifier Transformation
Active & Connected transformation. XML Source Qualifier is used only with an XML source definition. It represents the data elements that the Informatica Server reads when it executes a session with XML sources. has one input or output port for every column in the XML source.
External Procedure Transformation
Active & Connected/UnConnected transformation. Sometimes, the standard transformations such as Expression transformation may not provide the functionality that you want. In such cases External procedure is useful to develop complex functions within a dynamic link library (DLL) or UNIX shared library, instead of creating the necessary Expression transformations in a mapping.
Advanced External Procedure Transformation
Active & Connected transformation. It operates in conjunction with procedures, which are created outside of the Designer interface to extend PowerCenter/PowerMart functionality. It is useful in creating external transformation applications, such as sorting and aggregation, which require all input rows to be processed before emitting any output rows.


Using Java Transformation


Using Java Transformation


A Java transformation provides a native programming interface to define transformation functionality with the Java programming language. You can use a Java transformation to define simple or moderately complex transformation functionality without advanced knowledge of the Java programming language or without using an external Java development environment.

Types of Java transformation:
·         Active:  Generates more than one output row for each input row in the transformation.
·         Passive:  Generates one output row for each input row in the transformation.

Example :
Source Table:

Col1
Col2
Col3    
11
12
13      
21
22
23      
31
32
33 

Our requirement is to transpose the values using Java Transformation as shown below:
Target Table:
Col1
Col2
Col3       
11
21
31         
12
22
32         
13
23
33



Design a mapping as shown below



Create Source and Target definition and connect all the input and output ports as shown above.
Please note that you must select the type as “Active” for Java Transformation.
The ports tab will look like as shown below:


Go to Java Code tab and select Helper Code tab. We will define all the required variables here.

Under On Input Row tab, whenever the input row comes, we will store all the values in the buffer sequentially.    


Under On End of Data tab, we will define our logic and generate the output values.

ETL Process


ETL Process

¨       1. Analyses Business Requirement Documentation – In this process you should understand the business needs by gathering information from the user.  You should understand the data needed and if it is available.  Resources should be identified for information or help with the process.
¨       Deliverables
§         A logical description of how you will extract, transform, and load the data.
§         Sign-off of the customer(s).
o        Standards
§         Document ETL business requirements specification using the ETL Business Requirements Specification Template, your own team-specific business requirements template or system, or Oracle Designer.
o        Templates
§         ETL Business Requirements Specification Template
¨       2.0 Create Physical Design – In this process you should define your inputs and outputs by documenting record layouts.  You should also identify and define your location of source and target, file/table sizing information, volume information, and how the data will be transformed. 
o        Deliverables
§         Input and output record layouts
§         Location of source and target
§         File/table sizing information
§         File/table volume information
§         Documentation on how the data will be transformed, if at all
o        Standards
§         Complete ETL Business Requirements Specification using one of the methods documented in the previous steps.
§         Start ETL Mapping Specification
o        Templates
§         ETL Business Requirements Specification Template
§         ETL Mapping Specification Template
¨       3.0 Design Test Plan – Understand what the data combinations are and define what results are expected.  Remember to include error checks.  Decide how many test cases need to be built.  Look at technical risk and include security.  Test business requirements.
o        Deliverables
§         ETL Test Plan
§         ETL Performance Test Plan
o        Standards
§         Document ETL test plan and performance plan using either the standard templates listed below or your own team-specific template(s).
o        Templates
§         ETL Test Plan Template
§         ETL Performance Test Plan Template
¨       4.0 Create ETL Process – Start creating the actual Informatica ETL process.  The developer is actually doing some testing in this process.
¨       Deliverables
§         Mapping Specification
§         Mapping
§         Workflow
§         Session
§         Standards
§         Start the ETL Object Migration Form
§         Start Database Object Migration Form (if applicable)
§         Complete ETL Mapping Specification
§         Complete cleanup process for log and bad files – Refer to Standard_ETL_File_Cleanup.doc
§         Follow Informatica Naming Standards
o        Templates
§         ETL Object Migration Form
§         ETL Mapping Specification Template
§         Database Object Migration Form (if applicable)
¨       5.0 Test Process – The developer does the following types of tests: unit, volume, and performance.
o        Deliverables
§         ETL Test Plan
§         ETL Performance Test Plan
o        Standards
§         Complete ETL Test Plan
§         Complete ETL Performance Test Plan
o        Templates
§         ETL Test Plan Template
§         ETL Performance Test Plan
¨       6.0 Walkthrough ETL Process – Within the walkthrough the following factors should be addressed:  Identify common modules (reusable objects), efficiency of the ETL code, the business logic, accuracy, and standardization.
o        Deliverables
§         ETL process that has been reviewed
o        Standards
§         Conduct ETL Process Walkthrough
o        Templates
§         ETL Mapping Walkthrough Checklist Template
¨       7.0 Coordinate Move to QA – The developer works with the ETL Administrator to organize ETL Process move to QA.
o        Deliverables
§         ETL process moved to QA
o        Standards
§         Complete ETL Object Migration Form
§         Complete Unix Job Setup Request Form
§         Complete Database Object Migration Form (if applicable)
o        Templates
§         ETL Object Migration Form
§         Unix Job Setup Request Form
§         Database Object Migration Form
¨       8.0 Test Process – At this point, the developer once again tests the process after it has been moved to QA.
o        Deliverables
§         Tested ETL process
o        Standards
§         Developer validates ETL Test Plan and ETL Performance Test Plan
o        Templates
§         ETL Test Plan Template
§         ETL Performance Test Plan Template
¨       9.0 User Validates Data – The user validates the data and makes sure it satisfies the business requirements.
o        Deliverables
§         Validated ETL process
o        Standards
§         Validate Business Requirement Specifications with the data
o        Templates
§         ETL Business Requirement Specifications Template
¨       10.0 Coordinate Move to Production - The developer works with the ETL Administrator to organize ETL Process move to Production.
o        Deliverables
§         Accurate and efficient ETL process moved to production
o        Standards
§         Complete ETL Object Migration Form
§         Complete Unix Job Setup Request Form
§         Complete Database Object Migration Form (if applicable)
o        Templates
§         ETL Object Migration Form
§         Unix Job Setup Request Form
§         Database Object Migration Form (if applicable)
¨       11.0 Maintain ETL Process – There are a couple situations to consider when maintaining an ETL process.  There is maintenance when an ETL process breaks and there is maintenance when and ETL process needs updated.
o        Deliverables
§         Accurate and efficient ETL process in production
o        Standards
§         Updated Business Requirements Specification (if needed)
§         Updated Mapping Specification (if needed)
§         Revised mapping in appropriate folder
§         Updated ETL Object Migration Form
§         Developer checks final results in production
§         All monitoring (finding problems) of the ETL process is the responsibility of the project team
o        Templates
§         Business Requirements Specification Template
§         Mapping Specification Template
§         ETL Object Migration Form
§         Unix Job Setup Request Form
§         Database Object Migration Form (if applicable)