Friday, 13 April 2012

SQL Transformation

Ø  We can pass the database connection information to the SQL transformation as input data at run time.
Ø  The transformation processes external SQL scripts or SQL queries that you create in an SQL editor.
Ø  The SQL transformation processes the query and returns rows and database errors.

When you create an SQL transformation, you configure the following options:
Mode:-  The SQL transformation runs in one of the following modes:
*      Script mode.
o   The SQL transformation runs ANSI SQL scripts that are externally located. You pass a script name to the transformation with each input row. The SQL transformation outputs one row for each input row.
*      Query mode.
o   The SQL transformation executes a query that you define in a query editor. You can pass strings or parameters to the query to define dynamic queries or change the selection parameters. You can output multiple rows when the query has a SELECT statement.
*      Passive or active transformation.
o   The SQL transformation is an active transformation by default. You can configure it as a passive transformation when you create the transformation.
*      Database type.
o   The type of database the SQL transformation connects to.
*      Connection type
o   Pass database connection information to the SQL transformation or use a connection object.
Script Mode
An SQL transformation running in script mode runs SQL scripts from text files. You pass each script file name from the source to the SQL transformation Script Name port. The script file name contains the complete path to the script file.
When you configure the transformation to run in script mode, you create a passive transformation. The transformation returns one row for each input row. The output row contains results of the query and any database error.
Rules and Guidelines for Script Mode
Use the following rules and guidelines for an SQL transformation that runs in script mode:
  • You can use a static or dynamic database connection with script mode.
  • To include multiple query statements in a script, you can separate them with a semicolon.
  • You can use mapping variables or parameters in the script file name.
  • The script code page defaults to the locale of the operating system. You can change the locale of the script.
  • The script file must be accessible by the Integration Service. The Integration Service must have read permissions on the directory that contains the script.
  • The Integration Service ignores the output of any SELECT statement you include in the SQL script. The SQL transformation in script mode does not output more than one row of data for each input row.
  • You cannot use scripting languages such as Oracle PL/SQL or Microsoft/Sybase T-SQL in the script.
  • You cannot use nested scripts where the SQL script calls another SQL script.
  • A script cannot accept run-time arguments.
Query Mode
  • When you configure the SQL transformation to run in query mode, you create an active transformation.
  • When an SQL transformation runs in query mode, it executes an SQL query that you define in the transformation.
  • You pass strings or parameters to the query from the transformation input ports to change the query statement or the query data.
You can create the following types of SQL queries in the SQL transformation:
  • Static SQL query. The query statement does not change, but you can use query parameters to change the data. The Integration Service prepares the query once and runs the query for all input rows.
  •  Dynamic SQL query. You can change the query statements and the data. The Integration Service prepares a query for each input row.
Rules and Guidelines for Query Mode
Use the following rules and guidelines when you configure the SQL transformation to run in query mode:
  • The number and the order of the output ports must match the number and order of the fields in the query SELECT clause.
  • The native data type of an output port in the transformation must match the data type of the corresponding column in the database. The Integration Service generates a row error when the data types do not match.
  • When the SQL query contains an INSERT, UPDATE, or DELETE clause, the transformation returns data to the SQL Error port, the pass-through ports, and the Num Rows Affected port when it is enabled. If you add output ports the ports receive NULL data values.
  • When the SQL query contains a SELECT statement and the transformation has a pass-through port, the transformation returns data to the pass-through port whether or not the query returns database data. The SQL transformation returns a row with NULL data in the output ports.
  • You cannot add the "_output" suffix to output port names that you create.
  • You cannot use the pass-through port to return data from a SELECT query.
  • When the number of output ports is more than the number of columns in the SELECT clause, the extra ports receive a NULL value.
  • When the number of output ports is less than the number of columns in the SELECT clause, the Integration Service generates a row error.
  • You can use string substitution instead of parameter binding in a query. However, the input ports must be string data types.
SQL Transformation Properties
After you create the SQL transformation, you can define ports and set attributes in the following transformation tabs:
  •  Ports. Displays the transformation ports and attributes that you create on the SQL Ports tab.
  •  Properties. SQL transformation general properties.
  •  SQL Settings. Attributes unique to the SQL transformation.
  •  SQL Ports. SQL transformation ports and attributes.
Note: You cannot update the columns on the Ports tab. When you define ports on the SQL Ports tab, they display on the Ports tab.
Properties Tab
Configure the SQL transformation general properties on the Properties tab. Some transformation properties do not apply to the SQL transformation or are not configurable.
The following table describes the SQL transformation properties:

Property
Description
Run Time Location
Enter a path relative to the Integration Service node that runs the SQL transformation session.
If this property is blank, the Integration Service uses the environment variable defined on the Integration Service node to locate the DLL or shared library.
You must copy all DLLs or shared libraries to the run-time location or to the environment variable defined on the Integration Service node. The Integration Service fails to load the procedure when it cannot locate the DLL, shared library, or a referenced file.
Tracing Level
Sets the amount of detail included in the session log when you run a session containing this transformation. When you configure the SQL transformation tracing level to Verbose Data, the Integration Service writes each SQL query it prepares to the session log.
Is Partition able
Multiple partitions in a pipeline can use this transformation. Use the following options:
- No. The transformation cannot be partitioned. The transformation and other transformations in the same pipeline are limited to one partition. You might choose No if the transformation processes all the input data together, such as data cleansing.
- Locally. The transformation can be partitioned, but the Integration Service must run all partitions in the pipeline on the same node. Choose Locally when different partitions of the transformation must share objects in memory.
- Across Grid. The transformation can be partitioned, and the Integration Service can distribute each partition to different nodes.
Default is No.
Update Strategy
The transformation defines the update strategy for output rows. You can enable this property for query mode SQL transformations.
Default is disabled.
Transformation Scope
The method in which the Integration Service applies the transformation logic to incoming data. Use the following options:
- Row
- Transaction
- All Input
Set transaction scope to transaction when you use transaction control in static query mode.
Default is Row for script mode transformations.Default is All Input for query mode transformations.
Output is Repeatable
Indicates if the order of the output data is consistent between session runs.
- Never. The order of the output data is inconsistent between session runs.
- Based On Input Order. The output order is consistent between session runs when the input data order is consistent between session runs.
- Always. The order of the output data is consistent between session runs even if the order of the input data is inconsistent between session runs.
Default is Never.
Generate Transaction
The transformation generates transaction rows. Enable this property for query mode SQL transformations that commit data in an SQL query.
Default is disabled.
Requires Single
Thread Per Partition
Indicates if the Integration Service processes each partition of a procedure with one thread.
Output is Deterministic
The transformation generate consistent output data between session runs. Enable this property to perform recovery on sessions that use this transformation.
Default is enabled.




Create Mapping :
Step 1: Creating a flat file and importing the source from the flat file.
  • Create a Notepad and in it create a table by name bikes with three columns and three records in it.
  • Create one more notepad and name it as path for the bikes. Inside the Notepad just type in (C:\bikes.txt) and save it.
  • Import the source (second notepad) using the source->import from the file. After which we are goanna get a wizard with three subsequent windows and follow the on screen instructions to complete the process of importing the source.
Step 2: Importing the target and applying the transformation.
In the same way as specified above go to the targets->import from file and select an empty notepad under the name targetforbikes (this is one more blank notepad which we should create and save under the above specified name in the C :\).
  • Create two columns in the target table under the name report and error.
  • We are all set here. Now apply the SQL transformation.
  • In the first window when you apply the SQL transformation we should select the script mode.
  • Connect the SQ to the ScriptName under inputs and connect the other two fields to the output correspondingly.
Snapshot for the above discussed things is given below.

Step 3: Design the work flow and run it.
  • Create the task and the work flow using the naming conventions.
  • Go to the mappings tab and click on the Source on the left hand pane to specify the path for the output file.

Step 4: Preview the output data on the target table.



Audit Table Preparation

How to load session statistics into a Database Table

 
 
The below solution will help you to load the session statistics into a database table which can be used for audit purpose. In real life, developer generally doesn’t have access to the metadata tables. This solution will help to get the session statistics for audit purpose.
 
Solution:
Create a Database table to store the session statistics.
Note: The following syntax is for Sybase DB. Please change according to your DB.
create table Infa_Audit
(
workflow_name varchar(50),
start_time datetime,
end_time datetime,
success_rows numeric,
failed_rows numeric
)
Create two sessions e.g. Session1 and Session2.  
Session1:
The Session1 will be your actual session for which you want to load the statistics.
Session2:
Session2 will be used to load the statistics of Session1 into database table. For this, create a mapping and define the below mapping variables



The flow of Mapping should look like this




Where, Source will be a dummy source and inside the expression assign all the mapping variables to output ports. The audit table will be the target.
 
 
The workflow will look like as shown below.




Create workflow variables as shown below





Assign the values to workflow variables in Assignment Task as shown below





In the Pre-session variable assignment tab of Session2, assign the mapping variables to workflow variable as shown below:




Execute the workflow.

Tuesday, 10 January 2012

Concatenation and active transformations in PowerCenter mappings

The goal of this post is to explain how passive and active transformations interact with path concatenation. To understand this topic, we will first discuss how PowerCenter handles data and performs passive-passive path concatenation.
Having developed an understanding of the mechanics of this process, we will then examine how active transformations interfere with concatenation. The transformations that we examine will be filters sorters, and aggregators. Even though active transformations prevent path concatenation, it is still possible to bring the two paths together.
The Joiner Transformation can be used to connect two paths when one is active and the other is passive.

PowerCenter Mapping Rules

To understand how active transformations affect mappings it is first important to understand how Informatica PowerCenter manages passive concatenation of data.
Before we begin with any examples, we'll state several rules that need to be followed to ensure that passive-passive concatenation can take place.

Basic Rules of Concatenation

Records within a mapping can be split and reconnected as long as several rules are adhered to.
  1. If the two parts of a record are to be concatenated they must have the same internal ID number.
  2. The relative order of the records must be maintained in both paths so that records are in the same position as when they left the transformation.
  3. Transformations on either path cannot drop a row from the flow.

Passive-Passive Concatenation

Within a mapping it is possible to separate components of are cord in a pipeline. When the records are split, they carry the same internal
ID and at the time of the split, they are in the same relative order in both pipelines.
In figure 1. we can see that the data flow has been split. We note that records in both pipelines have maintained the same ID (rule 1.),the same relative order (rule 2.) and all records are present (rule 3..).
Since the two paths have met the requirements, Passive-Passive Concatenation can take place. The parts of the records can be brought back together and concatenated.

Passive and Passive Concatenation 




Scenarios

The next set of description and figures describes scenarios where the Designer will not allow a mapping to be constructed because they violate one of the three rules listed.

Drop a Record on 1 Path - Filter

The first scenario shows a mapping in which one of the pipeline paths drops a record. In this case the mapping was built with a filter transformation on one of the pipelines. Since the requirement for concatenation is that both parts of the record are present for concatenation to take place, there is an issue with the mapping. 
In figure 2, we can see that in one of the pipelines, record 103 has disappeared. The filter has removed the row which means that the filter is an active transformation.
In this case, the Designer will not allow this mapping to be built.

Passive-Active Concatenation - Filter 




















The Designer is unable to determine definitively if the Filter Transformation will ever drop any records. The Designer only knows that the Filter Transformation may drop rows if certain data is passed to it, and so the Designer considers this possibility and prevents the concatenation of the paths.

Re-order the Rows - Sort

If a Sorter transformation is employed, the potential exists that the order of the records in the pipeline will change. Since ID numbers are assigned as records leave the transformation, the records that leave the sorter will have new ID numbers.
In figure 3 we can see that the records in the pipelines have different internal IDs from their original values. In this case, the two pipelines cannot be connected.
Once again, the Designer will not allow the mapping to be built.

 Passive-Active Concatenation - Sorter Transformation 




















Entirely new records - Aggregator

Some transformations within PowerCenter have the ability to create new records that are wholly distinct from the input records.
The Aggregator Transformation is one such example. All records that enter the transformation are consumed and processed into new records.
Since these records are new and distinct, they cannot be concatenated to the other pipeline.
In figure 4. we can see that the aggregator has produced new records that have no relation to the other pipeline.
Passive-Active Concatenation - Aggregator Transformation




















The Solution to Passive to Active Concatenation - Joiner Transformation

The Joiner Transformation can be used to reconnect the paths even though an active transformation was used on one of the pipelines. The Joiner Transformation allows the two paths to be pulled together where one or more of the paths are active.
The Joiner Transformation works in a way similar to a database join. It first reads one set of data (from one of the pipelines) and then processes records in a stream from the other pipeline. The transformation is configured with a join condition that tells PowerCenter the fields to compare between the paths (the join criteria).
As records stream to the transformation, new records can be produced by the transformation if the condition is met.
Passive-Active Concatenation Solution - Joiner Transformation 
















I think this post help you more..

Thursday, 8 September 2011

DWH Banking Project Explanation


                             DWH Bank   Project Explanation

First you have to start with
Ø  Objective of the project and what are client expectations
Ø  Your involvement and responsibility of your job and limitations of job, Project architecture and team structure.

Client Introduction:
            Main objective of this project is we are providing a system with all the information regarding Transactions of entire banks all over the country. We will get the daily transaction data from all branches at the end of the day. We have to validate the transactions and implement the business logic based on the      transactions type or transaction code. We have to load all historical data into DWH and once finished historical data. We have to load Delta Loads.

Also explain about:
Ø  Target Staging Area.
Ø  Mappings:
o   Source to staging mappings.
o   Staging to warehousing.


Each transaction contains Transaction code. Based on the transaction code we can identify whether that transaction belongs to withdraw, deposit, loan, payment… based on that code business logic will be change. We validate and calculate the measure and load to database.

Mapping explanation:
In Informatica mapping, we first lookup all the transaction codes with code master table to identify the transaction type to implement the correct logic and filter the unnecessary transactions. Because in BANK there are lot of transactions will be there but we have to consider only required transactions for the project. The transaction code exists in the code master table are only transactions we have to consider and other transactions load into one table called Wrap table and invalid records( transaction code missing, null, spaces) to Error table. For each dimension table we are creating surrogate key and load into DWH tables.

SCD2 Mapping:
We are implementing SCD2 mapping for customer dimension to keep history of the customers. We are using SCD2 Date method.

Architecture of Project:

 Analysis
                  Requirement Gathering
                                                            Design
                                                                          Development
                                                                                                   Testing
                                                                                                                 Production

Analysis and Requirement Gathering:

Output:            Analysis Doc, Subject Area
100% in onsite,
Business Analyst, project manager.
Gather the useful information for the DSS and indentifying the subject areas, identify the schema objects and all.

Design:

Output:            Technical Design Doc’s, HLD,UTP
ETL Lead, BA and Data Architect
80% onsite .( Schema design in Erwin and implement in database and preparing the technical design document for ETL.
20% offshore: HLD & UTP
HLD:
            High Level Design Document, Based on the Technical specs. Developers   has to   create the HLD, it will have the Informatica flow chart. What are the  transformations required for that mapping. HLD will cover only 75% of  requirement.
UTP:
            Unit Test Plan. Write the test cases based on the requirement.

Development:

Output:            Bugs free code, UTR, Integration Test Plan
ETL Team and offshore BA
100% offshore
Based on the HLD. U have to create the mappings. After that code review and code standard review will be done by another team member. Based on the review comments we have to update the mapping. Unit testing based on the UTP. We have to fill the UTP and enter the expected values and name it as UTR (Unit Test Results). 2 times code review and 2 times unit testing will be conducted in this phase. Migrating to testing repository. Integration test plan has to prepare by the senior people.



Testing:
Output:            ITR, UAT, Deployment Doc and User Guide
Testing Team, Business Analyst and Client.
80% offshore
based on the integration test plan testing the application and gives the bugs list to the developer. Developers will fix the bugs in the development repository and again migrated to testing repository. Again testing starts till the bugs free code.
20% Onsite
UAT                User Accept Testing. Client will do the UAT. This is last phase of the ETL                         project.

Production:
50% offshore 50% onsite
Work will be distributed between offshore and onsite based on the run time of the Loading. Mapping bugs needs to fix by Development team. Development team will support for warranty period based on agreement days.

In ETL projects Three Repositories. For each repository access permissions and location will be different.
Development:              E1
Testing:                       E2
Production:                 E3

Note:   E1, E2, E3 are Repository names. Nothing else.

Dimensions:
             Employee
            Customer
            Agent
            Transaction
            Date

Facts are relation between all those tables.





 Thanks & Regards,
SkyEss DWH Team..





Wednesday, 1 June 2011

Best Practices


The following lists some performance considerations when building mappings.    

1.  Filter Expressions - try to evaluate them in a port expression. Try to create the filter (true/false) answer inside a port expression upstream. Complex filter expressions slow down the mapping.  Again, expressions/conditions operate fastest in an Expression Object with an output port for the result.  Turns out - the longer the expression, or the more complex - the more severe the speed degradation. Place the actual expression (complex or not) in an EXPRESSION OBJECT upstream from the filter. Compute a single numerical flag: 1 for true, 0 for false as an output port. Pump this in to the filter - you should see the maximum performance ability with this configuration.

2.  Remove all "DEFAULT" value expressions where possible. Having a default value - even the "ERROR(xxx)" command slows down the session. It causes an unnecessary evaluation of values for every data element in the map. The only time you want to use "DEFAULT value is when you have to provide a default value for a specific port. There is another method: placing a variable with an IIF(xxxx, DEFAULT VALUE, xxxx) condition within an expression. This will always be faster (if assigned to an output port) than a default value.

3.  Variable Ports are "slower" than Output Expressions. Whenever possible, use output expressions instead of variable ports. The variables are good for "static - and state driven" but do slow down the processing time - as they are allocated/reallocated each pass of a row through the expression transform.
4.  Datatype conversion - perform it in a port expression. Simply mapping a string to an integer, or an integer to a string will perform the conversion, however it will be slower than creating an output port with an expression like: to_integer(xxxx) and mapping an integer to an integer. It's because PMServer is left to decide if the conversion can be done mid-stream which seems to slow things down.

5.  Unused Ports. Surprisingly, unused output ports have no affect on performance. This is a good thing. However in general it is good practice to remove any unused ports in the mapping, including variables. Unfortunately - there is no "quick" method for identifying unused ports.

6.  String Functions. String functions definitely have an impact on performance. Particularly those that change the length of a string (substring, ltrim, rtrim, etc.). These functions slow the map down considerably, the operations behind each string function are expensive (de-allocate, and re-allocate memory within a READER block in the session). String functions are a necessary and important part of ETL, we do not recommend removing their use completely, only try to limit them to necessary operations. One of the ways to tune these is to use "varchar/varchar2" data types in your database sources, or to use delimited strings in source flat files (as much as possible). This will help reduce the need for "trimming" input. If your sources are in a database, perform the LTRIM/RTRIM functions on the data coming in from a database SQL statement, this will be much faster than operationally performing it mid-stream.

7.  IIF Conditionals are costly. When possible - arrange the logic to minimize the use of IIF conditionals. This is not particular to Informatica.  It is costly in ANY programming language. It introduces "decisions" within the tool; it also introduces multiple code paths across the logic (thus increasing complexity). Therefore - when possible, avoid utilizing an IIF conditional - again, the only possibility here might be (for example) an ORACLE DECODE function applied to a SQL source.

8.  Sequence Generators slow down mappings. Unfortunately there is no "fast" and easy way to create sequence generators. The cost is not that high for using a sequence generator inside of Informatica, particularly if you are caching values (cache at around 2000) - seems to be the suite spot. However - if at all avoidable, this is one "card" up a sleeve that can be played. If you don't absolutely need the sequence number in the map for calculation reasons, and you are utilizing Oracle, then let SQL*Loader create the sequence generator for all Insert Rows. If you're using Sybase, don't specify the Identity column as a target - let the Sybase Server generate the column. Also - try to avoid "reusable" sequence generators - they tend to slow the session down further, even with cached values.

9.  Test Expressions slow down sessions. Expressions such as: IS_SPACES tend slow down the mappings, this is a data validation expression which has to run through the entire string to determine if it is spaces, much the same as IS_NUMBER has to validate an entire string. These expressions (if at all avoidable) should be removed in cases where it is not necessary to "test" prior to conversion. Be aware however, that direct conversion without testing (conversion of an invalid value) will kill the transformation. If you absolutely need a test expression for a numerical value, try this: IIF(<field> * 1 >= 0,<field>,NULL) preferably you don't care if it's zero. An alpha in this expression should return a NULL to the computation. Yes - the IIF condition is slightly faster than the IS_NUMBER - because IS_NUMBER parses the entire string, where the multiplication operator is the actual speed gain.

10.       Reduce Number of OBJETS in a map. Frequently, the idea of these tools is to make the "data translation map" as easy as possible. All to often, that means creating "an" expression for each throughput/translation (taking it to an extreme of course). Each object adds computational overhead to the session and timings may suffer. Sometimes if performance is an issue / goal, you can integrate several expressions in to one expression object, thus reducing the "object" overhead. In doing so - you could speed up the map.

11.       Update Expressions - Session set to Update Else Insert. If you have this switch turned on - it will definitely slow the session down - Informatica performs 2 operations for each row: update (w/PK), then if it returns a ZERO rows updated, performs an insert. The way to speed this up is to "know" ahead of time if you need to issue a DD_UPDATE or DD_INSERT inside the mapping, then tell the update strategy what to do. After which you can change the session setting to: INSERT and UPDATE AS UPDATE or UPDATE AS INSERT.

12.       Multiple Targets are too slow. Frequently mappings are generated with multiple targets, and sometimes multiple sources. This (despite first appearances) can really burn up time. If the architecture permits change, and the users support re-work, then try to change the architecture -> 1 map per target is the general rule of thumb. Once reaching one map per target, the tuning gets easier. Sometimes it helps to reduce it to 1 source and 1 target per map. But - if the architecture allows more modularization 1 map per target usually does the trick. Going further, you could break it up: 1 map per target per operation (such as insert vs update). In doing this, it will provide a few more cards to the deck with which you can "tune" the session, as well as the target table itself. Going this route also introduces parallel operations. For further info on this topic, see my architecture presentations on Staging Tables, and 3rd normal form architecture (Corporate Data Warehouse Slides).

13.       Slow Sources - Flat Files. If you've got slow sources, and these sources are flat files, you can look at some of the following possibilities. If the sources reside on a different machine, and you've opened a named pipe to get them across the network - then you've opened (potentially) a can of worms. You've introduced the network speed as a variable on the speed of the flat file source. Try to compress the source file, FTP PUT it on the local machine (local to PMServer), decompress it, and then utilize it as a source.  If you're reaching across the network to a relational table - and the session is pulling many rows (over 10,000) then the source system itself may be slow.  You may be better off using a source system extract program to dump it to file first, then follow the above instructions. However, there is something your SA's and Network Ops folks could do (if necessary) - this is covered in detail in the advanced section.  They could backbone the two servers together with a dedicated network line (no hubs, routers, or other items in between the two machines).  At the very least, they could put the two machines on the same sub-net. Now, if your file is local to PMServer but is still slow, examine the location of the file (which device is it on). If it's not on an INTERNAL DISK then it will be slower than if it were on an internal disk (C drive for you folks on NT).  This doesn't mean a unix file LINK exists locally, and the file is remote - it means the actual file is local.

14.       Too Many Aggregators. If your map has more than 1 aggregator, chances are the session will run very slowly - unless the CACHE directory is extremely fast, and your drive seek/access times are very high.  Even still, placing aggregators end-to-end in mappings will slow the session down by factors of at least 2.  This is because of all the I/O activity being a bottleneck in Informatica.  What needs to be known here is that Informatica's products: PM / PC up through 4.7x are NOT built for parallel processing.  In other words, the internal core doesn't put the aggregators on threads, nor does it put the I/O on threads - therefore being a single strung process it becomes easy for a part of the session/map to become a "blocked" process by I/O factors.

15.       Maplets containing Aggregators. Maplets are a good source for replicating data logic. But just because an aggregator is in a maplet doesn't mean it won't affect the mapping. The reason maplets that don't affect speed of the mappings is they are treated as a part of the mapping once the session starts.  In other words, if you have an aggregator in a maplet, followed by another aggregator in a mapping you will still have the problem mentioned above in #14. Reduce the number of aggregators in the entire mapping (including maplets) to 1 if possible. If necessary split the mapping up in to several different mappings, use intermediate tables in the database if required to achieve processing goals.

16.       Eliminate "too many lookups". What happens and why? Well - with too many lookups, your cache is eaten in memory - particularly on the 1.6 / 4.6 products. The end result is that there is no memory left for the sessions to run in. The DTM reader/writer/transformer threads are not left with enough memory to be able to run efficiently. PC 1.7, PM 4.7 solve some of these problems by caching some of these lookups out to disk when the cache is full. But you still end up with contention - in this case, with too many lookups, you're trading in Memory Contention for Disk Contention.  The memory contention may be worse than the disk contention because the system OS end's up thrashing (swapping in and out of TEMP/SWAP disk space) with small block sizes to try and locate your lookup row.   As the row goes from lookup to lookup, the swapping / thrashing gets worse.

17.       Lookups & Aggregators Fight. The lookups and the aggregators fight for memory space as discussed above. Each requires Index Cache, and Data Cache and they "share" the same HEAP segments inside the core.   Particularly in the 4.6 / 1.6 products and prior - these memory areas become critical, and when dealing with many rows - the session is almost certain to cause the server to "thrash" memory in and out of the OS Swap space. If possible, separate the maps - perform the lookups in the first section of the maps, position the data in an intermediate target table - then a second map reads the target table and performs the aggregation (also provides the option for a group by to be done within the database)... Another speed improvement...


Tips & Tricks And Other Observations

Copying Sessions

When coping sessions from one repository to another, do not use the right-click copy/paste commands. This will copy the session in the current folder. Instead, use the CTRL-C, CTRL-V or the menu copy/paste commands.


IIF and Datatypes


If the precision and scale of value1 in an IIF statement is different than value 2, the precision for value 2 is rounded.

For example, if you have a value for sales of 7.75, and you have the following statement, a value of 7.7 will be returned:

IIF (ISNULL (Sales), 0.0, Sales)

If you had wanted two decimal places returned for sales, you would need to make your IIF statement look like the following:

IIF (IS Null (Sales), 0.00, Sales)


Lookup SQL Performance Considerations


A lookup SQL query always places an order by on the SQL for every lookup port.   For example, if the Lookup transformation includes three lookup ports used in the mapping, ITEM_ID, ITEM_NAME, and PRICE, the lookup query is:

SELECT  ITEM_NAME, PRICE, ITEM_ID FROM ITEMS_DIM ORDER BY ITEM_ID, ITEM_NAME, PRICE.

Therefore, if your lookups are slow in loading, you may need to have an index that your ORDER BY statement can utilize. 

Unconnected Lookup Port Default


If you have an unconnected lookup, Informatica ignores what is in the default port.  It always returns a null if the lookup is unsuccessful.

Unexpected Results When A Null Value Is In A Calculation


If you have an expression transformation that adds two values together (e.g.  A+B), if one of the values happens to contain a null value, the resulting sum will be set to null.  

One solution to this problem is to make sure you always set your default value for your port to 0 for numeric values.


Sessions and Using Debugger
 When using the debugger , always create a new session specifically for the debugger instead of using the original session created for the mapping.  The debugger can do things such as change your session to be a test load instead of a regular mode.  

Invalid Batch Ids
Let your administrator know when you are moving mappings/sessions from one repository to another – especially when you are moving to production.   If you are going to be running the sessions utilizing the pmcmd command, the server will need to be restarted.   If you don’t restart the server, you will receive ‘invalid batch id’ when you try to run something using pmcmd.   I have talked to Informatica about this and they state that their future architecture will eliminate this problem.

Resetting Designer or Server Manager Windows
 If you ever have your windows (e.g. navigator window, toolbars) undocked in designer or server manager, you can reset them back the standard settings by going to the \winnt directory and deleting either your pmdesign.ini file (to reset designer) or you pmsrvmgr.ini file (to reset your server manager).

Using Sequence Generators Across Multiple Environments
 If you ever struggle with managing your sequence generated ids between dev, fit, and prod, you can modify your mappings to make it seamless to move from one environment to another.   The following lists the steps you will need to include in your mappings:

1) Add a disconnected lookup that retrieves the max key from your target table.   Note:  This process will only work if you only have one session inserting into a target table at a time.
2) Create a mapping variable (e.g. $max_cust_key) and assign the max key from the disconnected lookup to the mapping variable.
3) Add a sequence generator to your mapping and specify that the value should be reset to 1 every time.
4) Pass the next_val from the sequence generator to an expression transformation.  Call this port something like ‘next_id’.    Add a new port called ‘next_key’ that uses the following formula:   $max_cust_key + ‘next_key’.   
5) Use ‘next_key’ as the key when you insert your records.


Saturday, 28 May 2011

complex mapping..........!

Complex Mapping ,I'm sure everybody may have there own definition for this.

I hope, most of them will agree with me, on the following:---
1.If you have lot of Pipelines with Lots of Joiner's,Normalizer's,more number of Target instances to Load.
2.If a mapping is tryping to do both Insert's,Updates and deletes.

A advice in these cases is ,
Keep the Mapping more readible and simple,by splitting into smaller Mappings.