Top Informatica Interview Questions and Answers

Last updated on Feb 18 2022
Shankar Shankar Trivedi

Table of Contents

Top Informatica Interview Questions and Answers

What is Lookup transformation?

  • Lookup transformation is used to look up a source, source qualifier, or target in order to get relevant data.
  • It is used to look up a ‘flat file’, ‘relational table’, ‘view’ or ‘synonym’.
  • Lookup can be configured as Active or Passive as well as Connected or Unconnected transformation.
  • When the mapping contains the lookup transformation, the integration service queries the lookup data and compares it with lookup input port values. One can use multiple lookup transformations in a mapping.
  • The lookup transformation is created with the following type of ports:
    1. Input port (I)
    2. Output port (O)
    3. Look up Ports (L)
    4. Return Port (R)

What are the various types of transformation?

  • Aggregator transformation
  • Expression transformation
  • Filter transformation
  • Joiner transformation
  • Lookup transformation
  • Normalizer transformation
  • Rank transformation
  • Router transformation
  • Sequence generator transformation
  • Stored procedure transformation
  • Sorter transformation
  • Update strategy transformation
  • XML source qualifier transformation

How many input parameters can exist in an unconnected lookup?

Any number of input parameters can exist. For instance, you can provide input parameters like column , column , column , and so on. But the return value would only be one.Informatica

What are the differences between connected lookup and unconnected lookup?

Connected vs Unconnected Lookups
Connected Lookup Unconnected Lookup
It receives input from the pipeline & participates in the data flow. . It receives input from the result of an LKP.
. It can use both, dynamic and static cache. . It can’t be dynamic.
. It can return more than one column value i.e. output port. . It can return only one column value.
. It caches all lookup columns. . It caches only the lookup output ports in the return port & lookup conditions.
. It supports user-defined default values. . It doesn’t support user-defined default values.

Name the different lookup cache(s)?

Informatica lookups can be cached or un-cached (no cache). Cached lookups can be either static or dynamic. A lookup cache can also be divided as persistent or non-persistent based on whether Informatica retains the cache even after completing session run or if it deletes it.

  • Static cache
  • Dynamic cache
  • Persistent cache
  • Shared cache
  • Recache

Is ‘sorter’ an active or passive transformation?

It is an active transformation because it removes the duplicates from the key and consequently changes the number of rows.

What is the difference between active and passive transformation?

Active Transformation:- An active transformation can perform any of the following actions:

  • Change the number of rows that pass through the transformation: For instance, the Filter transformation is active because it removes rows that do not meet the filter condition.
  • Change the transaction boundary: For e.g., the Transaction Control transformation is active because it defines a commit or roll back transaction based on an expression evaluated for each row.
  • Change the row type: For e.g., the Update Strategy transformation is active because it flags rows for insert, delete, update, or reject.

Passive Transformation: A passive transformation is one which will satisfy all these conditions:

  • Does not change the number of rows that pass through the transformation
  • Maintains the transaction boundary
  • Maintains the row type

Name the output files created by Informatica server during session running.

  • Informatica server log: Informatica server (on UNIX) creates a log for all status and error messages (default name: pm.server.log). It also creates an error log for error messages. These files will be created in the Informatica home directory.
  • Session log file: Informatica server creates session log files for each session. It writes information about sessions into log files such as initialization process, creation of SQL commands for reader and writer threads, errors encountered and load summary. The amount of detail in the session log file depends on the tracing level that you set.
  • Session detail file: This file contains load statistics for each target in mapping. Session detail includes information such as table name, number of rows written or rejected. You can view this file by double clicking on the session in the monitor window.
  • Performance detail file: This file contains session performance details which tells you where performance can be improved. To generate this file, select the performance detail option in the session property sheet.
  • Reject file: This file contains the rows of data that the writer does not write to targets.
  • Control file: Informatica server creates a control file and a target file when you run a session that uses the external loader. The control file contains the information about the target flat file such as data format and loading instructions for the external loader.
  • Post session email: Post session email allows you to automatically communicate information about a session run to designated recipients. You can create two different messages. One if the session completed successfully and another if the session fails.
  • Indicator file: If you use the flat file as a target, you can configure the Informatica server to create an indicator file. For each target row, the indicator file contains a number to indicate whether the row was marked for insert, update, delete or reject.
  • Output file: If a session writes to a target file, the Informatica server creates the target file based on file properties entered in the session property sheet.
  • Cache files: When the Informatica server creates a memory cache, it also creates cache files. For the following circumstances, Informatica server creates index and data.

What are the types of groups in router transformation?

  • Input group
  • Output group
  • Default group

What is the difference between STOP and ABORT options in Workflow Monitor?

On issuing the STOP command on the session task, the integration service stops reading data from the source although it continues processing the data to targets. If the integration service cannot finish processing and committing data, we can issue the abort command.

ABORT command has a timeout period of seconds. If the integration service cannot finish processing data within the timeout period, it kills the DTM process and terminates the session

How can we store previous session logs?

If you run the session in the time stamp mode then automatically session log out will not overwrite the current session log.

Go to Session Properties –> Config Object –> Log Options

Select the properties as follows:

Save session log by –> SessionRuns

Save session log for these runs –> Change the number that you want to save the number of log files (Default is )

If you want to save all of the log files created by every run, and then select the option Save session log for these runs –> Session TimeStamp

You can find these properties in the session/workflow Properties.

What are the similarities and differences between ROUTER and FILTER?

The differences are:

Advantages of Router transformation over Filter transformation:

  • Better Performance; because in mapping, the Router transformation Informatica server processes the input data only once instead of as many times, as you have conditions in Filter transformation.
  • Less complexity; because we use only one Router transformation instead of multiple Filter transformations.
  • Router transformation is more efficient than Filter transformation.

For E.g.:

Imagine we have departments in source and want to send these records into tables. To achieve this, we require only one Router transformation. In case we want to get same result with Filter transformation then we require at least Filter transformations.

Similarity:

A Router and Filter transformation are almost same because both transformations allow you to use a condition to test data.

Why is sorter an active transformation?

When the Sorter transformation is configured to treat output rows as distinct, it assigns all ports as part of the sort key. The integration service discards duplicate rows that were compared during the sort operation. The number of input rows will vary as compared to the output rows and hence it is an active transformation.

When do you use SQL override in a lookup transformation?

You should override the lookup query in the following circumstances:

  1. Override the ORDER BY clause. Create the ORDER BY clause with fewer columns to increase performance. When you override the ORDER BY clause, you must suppress the generated ORDER BY clause with a comment notation.
    Note: If you use pushdown optimization, you cannot override the ORDER BY clause or suppress the generated ORDER BY clause with a comment notation.
  2. A lookup table name or column names contains a reserved word. If the table name or any column name in the lookup query contains a reserved word, you must ensure that they are enclosed in quotes.
  3. Use parameters and variables. Use parameters and variables when you enter a lookup SQL override. Use any parameter or variable type that you can define in the parameter file. You can enter a parameter or variable within the SQL statement, or use a parameter or variable as the SQL query. For example, you can use a session parameter, $ParamMyLkpOverride, as the lookup SQL query, and set $ParamMyLkpOverride to the SQL statement in a parameter file. The designer cannot expand parameters and variables in the query override and does not validate it when you use a parameter or variable. The integration service expands the parameters and variables when you run the session.
  4. A lookup column name contains a slash (/) character. When generating the default lookup query, the designer and integration service replace any slash character (/) in the lookup column name with an underscore character. To query lookup column names containing the slash character, override the default lookup query, replace the underscore characters with the slash character, and enclose the column name in double quotes.
  5. Add a WHERE clause. Use a lookup SQL override to add a WHERE clause to the default SQL statement. You might want to use the WHERE clause to reduce the number of rows included in the cache. When you add a WHERE clause to a Lookup transformation using a dynamic cache, use a Filter transformation before the Lookup transformation to pass rows into the dynamic cache that match the WHERE clause.
    Note: The session fails if you include large object ports in a WHERE clause.
  6. Other. Use a lookup SQL override if you want to query lookup data from multiple lookups or if you want to modify the data queried from the lookup table before the Integration Service caches the lookup rows. For example, use TO_CHAR to convert dates to strings.

What are data driven sessions?

When you configure a session using update strategy, the session property data driven instructs Informatica server to use the instructions coded in mapping to flag the rows for insert, update, delete or reject. This is done by mentioning DD_UPDATE or DD_INSERT or DD_DELETE in the update strategy transformation.

“Treat source rows as” property in session is set to “Data Driven” by default when using a update strategy transformation in a mapping.

What are mapplets?

  • A Mapplet is a reusable object that we create in the Mapplet Designer.
  • It contains a set of transformations and lets us reuse that transformation logic in multiple mappings.

What is the difference between Mapping and Mapplet?

How can we delete duplicate rows from flat files?

We can make use of sorter transformation and select distinct option to delete the duplicate rows.

What is the use of source qualifier?

The source qualifier transformation is an active, connected transformation used to represent the rows that the integrations service reads when it runs a session. You need to connect the source qualifier transformation to the relational or flat file definition in a mapping. The source qualifier transformation converts the source data types to the Informatica native data types. So, you should not alter the data types of the ports in the source qualifier transformation.

The source qualifier transformation can be used to perform the following tasks:

  • Joins: You can join two or more tables from the same source database. By default, the sources are joined based on the primary key-foreign key relationships. This can be changed by explicitly specifying the join condition in the “user-defined join” property.
  • Filter rows: You can filter the rows from the source database. The integration service adds a WHERE clause to the default query.
  • Sorting input: You can sort the source data by specifying the number for sorted ports. The integration service adds an ORDER BY clause to the default SQL query
  • Distinct rows: You can get distinct rows from the source by choosing the “Select Distinct” property. The integration service adds a SELECT DISTINCT statement to the default SQL query.
  • Custom SQL Query: You can write your own SQL query to do calculations.

What are the different ways to filter rows using Informatica transformations?

  • Source Qualifier
  • Joiner
  • Filter
  • Router

What are the different transformations where you can use a SQL override?

  • Source Qualifier
  • Lookup
  • Target

Why is it that in some cases, SQL override is used?

The Source Qualifier provides the SQL Query option to override the default query. You can enter any SQL statement supported by your source database. You might enter your own SELECT statement, or have the database perform aggregate calculations, or call a stored procedure or stored function to read the data and perform some tasks.

State the differences between SQL Override and Lookup Override?

  • The role of SQL Override is to limit the number of incoming rows entering the mapping pipeline, whereas Lookup Override is used to limit the number of lookup rows to avoid the whole table scan by saving the lookup time and the cache it uses.
  • Lookup Override uses the “Order By” clause by default. SQL Override doesn’t use it and should be manually entered in the query if we require it
  • SQL Override can provide any kind of ‘join’ by writing the query
    Lookup Override provides only Non-Equi joins.
  • Lookup Override gives only one record even if it finds multiple records for a single condition
    SQL Override doesn’t do that.

 What is parallel processing in Informatica?

After optimizing the session to its fullest, we can further improve performance by exploiting underutilized hardware power. This refers to parallel processing and we can achieve this in Informatica PowerCenter using Partitioning Sessions.

The Informatica PowerCenter Partitioning Option increases the performance of the PowerCenter through parallel data processing. The Partitioning option will let you split the large data set into smaller subsets which can be processed in parallel to get a better session performance.

What are the different ways to implement parallel processing in Informatica?

We can implement parallel processing using various types of partition algorithms:

Database partitioning: The Integration Service queries the database system for table partition information. It reads partitioned data from the corresponding nodes in the database.

Round-Robin Partitioning: Using this partitioning algorithm, the Integration service distributes data evenly among all partitions. It makes sense to use round-robin partitioning when you need to distribute rows evenly and do not need to group data among partitions.

Hash Auto-Keys Partitioning: The PowerCenter Server uses a hash function to group rows of data among partitions. When the hash auto-key partition is used, the Integration Service uses all grouped or sorted ports as a compound partition key. You can use hash auto-keys partitioning at or before Rank, Sorter, and unsorted Aggregator transformations to ensure that rows are grouped properly before they enter these transformations.

Hash User-Keys Partitioning: Here, the Integration Service uses a hash function to group rows of data among partitions based on a user-defined partition key. You can individually choose the ports that define the partition key.

Key Range Partitioning: With this type of partitioning, you can specify one or more ports to form a compound partition key for a source or target. The Integration Service then passes data to each partition depending on the ranges you specify for each port.

Pass-through Partitioning: In this type of partitioning, the Integration Service passes all rows from one partition point to the next partition point without redistributing them.

What are the different levels at which performance improvement can be performed in Informatica?

Mention a few design and development best practices for Informatica.

Mapping design tips: Standards – sticking to consistent standards is beneficial in the long run. This includes naming conventions, descriptions, environment settings, parameter files, documentation, among others.

  • Reusability – in order to react quickly to potential changes, use Informatica components like mapplets, worklets, and reusable transformations.
  • Scalability – when designing and developing mappings, it is a good practice to keep volumes in mind. This is caching, queries, partitioning, initial vs incremental loads.
  • Simplicity – it is recommended to create multiple mappings instead of few complex ones. Use Staging Area and try to keep the processing logic as clear and simple as possible.
  • Modularity – use the modular design technique (common error handling, reprocessing).

Mapping development best practices

  • Source Qualifier – use shortcuts, extract only the necessary data, limit read of columns and rows on source. Try to use the default query options (User Defined Join, Filter) instead of using SQL Query override which may impact database resources and make unable to use partitioning and push-down.
  • Expressions – use local variables to limit the amount of redundant calculations, avoid datatype conversions, reduce invoking external scripts (coding outside of Informatica), provide comments, use operators (||, +, /) instead of functions. Keep in mind that numeric operations are generally faster than string operations.
  • Filter – use the Filter transformation as close to the source as possible. If multiple filters need to be applied, usually it’s more efficient to replace them with Router.
  • Aggregator – use sorted input, also use as early (close to the source) as possible and filter the data before aggregating.
  • Joiner – try to join the data in Source Qualifier wherever possible, and avoid outer joins. It is good practice to use a source with fewer rows, such as a Master source.
  • Lookup – relational lookup should only return ports that meet the condition. Call Unconnected Lookup in expression (IIF). Replace large lookup tables with joins whenever possible. Review the database objects and add indexes to database columns when possible. Use Cache Calculator in session to eliminate paging in lookup cache.

What are the different types of profiles in Informatica?

Explain shared cache and re cache.

To answer this question, it is essential to understand persistence cache. If we are performing lookup on a table, it looks up all the data brings it inside the data cache. However, at the end of each session, the Informatica server deletes all the cache files. If you configure the lookup as a persistent cache, the server saves the lookup under an anonymous name. Shared cache allows you to use this cache in other mappings by directing it to an existing cache.

After a while, data in a table becomes old or redundant. In a scenario where new data enters the table, re cache ensures that the data is refreshed and updated in the existing and new cache.

What do you mean by Enterprise Data Warehousing?

When the organization data is created at a single point of access it is called as enterprise data warehousing. Data can be provided with a global view to the server via a single source store. One can do periodic analysis on that same source. It gives better results but however the time required is high.

What the difference is between a database, a data warehouse and a data mart?

Database includes a set of sensibly affiliated data which is normally small in size as compared to data warehouse. While in data warehouse there are assortments of all sorts of data and data is taken out only according to the customer’s needs. On the other hand datamart is also a set of data which is designed to cater the needs of different domains. For instance, an organization having different chunk of data for its different departments i.e. sales, finance, marketing etc.

How many numbers of sessions can we have in one group?

We can have any number of sessions, but it is advisable to have lesser number of sessions in a batch because it will become easier for migration.

Differentiate between a mapping parameter and a mapping variable.

The values that alter during the session’s implementation is known as mapping variables, whereas the values that don’t alter during the session’s implementation is known as mapping parameters.

Mention the advantages of partitioning a session.

The main advantage of partitioning a session is to make the server’s process and competence better. Another advantage is that it implements the solo sequences within the session.

What are the features of complex mapping?

The features of complex mapping are as follows:

  • There are more numbers of transformations
  • It uses complex business logic

How can we identify whether a mapping is correct or not without a connecting session?

With the help of the debugging option, we can identify whether a mapping is correct or not without connecting sessions.

Can we use mapping parameters or variables, developed in one mapping, into any other reusable transformation?

Yes, we can use mapping parameters or variables into any other reusable transformation because they doesn’t have any mapplet.

What is meant by a domain?

When all related relationships and nodes are covered by a sole organizational point, its called domain. Through this data management can be improved.

What is the difference between a repository server and a powerhouse?

Repository server controls the complete repository which includes tables, charts, and various procedures etc. Its main function is to assure the repository integrity and consistency. While a powerhouse server governs the implementation of various processes among the factors of server’s database repository.

How many repositories can be created in informatica?

There can be any number of repositories in informatica but eventually it depends on number of ports.

What is the benefit of partitioning a session?

Partitioning a session means solo implementation sequences within the session. It’s main purpose is to improve server’s operation and efficiency. Other transformations including extractions and other outputs of single partitions are carried out in parallel.

How are indexes created after completing the load process?

For the purpose of creating indexes after the load process, command tasks at session level can be used. Index creating scripts can be brought in line with the session’s workflow or the post session implementation sequence. Moreover this type of index creation cannot be controlled after the load process at transformation level.

Explain sessions. Explain how batches are used to combine executions?

A teaching set that needs to be implemented to convert data from a source to a target is called a session. Session can be carried out using the session’s manager or pmcmd command. Batch execution can be used to combine sessions executions either in serial manner or in a parallel. Batches can have different sessions carrying forward in a parallel or serial manner.

How many numbers of sessions can one group in batches?

One can group any number of sessions but it would be easier for migration if the number of sessions is lesser in a batch.

Explain the difference between mapping parameter and mapping variable?

When values change during the session’s execution it’s called a mapping variable. Upon completion the Informatica server stores the end value of a variable and is reused when session restarts. Moreover, those values that do not change during the session’s execution are called mapping parameters. Mapping procedure explains mapping parameters and their usage. Values are allocated to these parameters before starting the session.

What is complex mapping?

Following are the features of complex mapping.

  • Difficult requirements
  • Many numbers of transformations
  • Complex business logic

How can one identify whether mapping is correct or not without connecting session?

One can find whether the session is correct or not without connecting the session is with the help of debugging option.

Can one use mapping parameter or variables created in one mapping into any other reusable transformation?

Yes, one can do because reusable transformation does not contain any mapplet or mapping.

Explain the use of aggregator cache file?

Aggregator transformations are handled in chunks of instructions during each run. It stores transitional values which are found in local buffer memory. Aggregators provides extra cache files for storing the transformation values if extra memory is required.

Briefly describe lookup transformation?

Lookup transformations are those transformations which have admission right to RDBMS based data set. The server makes the access faster by using the lookup tables to look at explicit table data or the database. Concluding data is achieved by matching the look up condition for all look up ports delivered during transformations.

What does role play dimension mean?

The dimensions that are utilized for playing diversified roles while remaining in the same database domain are called role playing dimensions.

How can repository reports be accessed without SQL or other transformations?

Repositoryreports are established by metadata reporter. There is no need of SQL or other transformation since it is a web app.

What are the types of metadata that stores in repository?

The types of metadata includes Source definition, Target definition, Mappings, Mapplet, Transformations.

Explain the code page compatibility?

When data moves from one code page to another provided that both code pages have the same character sets then data loss cannot occur. All the characteristics of source page must be available in the target page. Moreover, if all the characters of source page are not present in the target page then it would be a subset and data loss will definitely occur during transformation due the fact the two code pages are not compatible.

How can you validate all mappings in the repository simultaneously?

All the mappings cannot be validated simultaneously because each time only one mapping can be validated.

Briefly explain the Aggregator transformation?

It allows one to do aggregate calculations such as sums, averages etc. It is unlike expression transformation in which one can do calculations in groups.

Describe Expression transformation?

Values can be calculated in single row before writing on the target in this form of transformation. It can be used to perform non aggregate calculations. Conditional statements can also be tested before output results go to target tables.

What do you mean by filter transformation?

It is a medium of filtering rows in a mapping. Data needs to be transformed through filter transformation and then filter condition is applied. Filter transformation contains all ports of input/output, and the rows which meet the condition can only pass through that filter.

What is Joiner transformation?

Joiner transformation combines two affiliated heterogeneous sources living in different locations while a source qualifier transformation can combine data emerging from a common source.

What is Lookup transformation?

It is used for looking up data in a relational table through mapping. Lookup definition from any relational database is imported from a source which has tendency of connecting client and server. One can use multiple lookup transformation in a mapping.

How Union Transformation is used?

It is a diverse input group transformation which can be used to combine data from different sources. It works like UNION All statement in SQL that is used to combine result set of two SELECT statements.

What do you mean Incremental Aggregation?

Option for incremental aggregation is enabled whenever a session is created for a mapping aggregate. Power center performs incremental aggregation through the mapping and historical cache data to perform new aggregation calculations incrementally.

What is the difference between a connected look up and unconnected look up?

When the inputs are taken directly from other transformations in the pipeline it is called connected lookup. While unconnected lookup doesn’t take inputs directly from other transformations, but it can be used in any transformations and can be raised as a function using LKP expression. So it can be said that an unconnected lookup can be called multiple times in mapping.

What is a mapplet?

A recyclable object that is using mapplet designer is called a mapplet. It permits one to reuse the transformation logic in multitude mappings moreover it also contains set of transformations.

Briefly define reusable transformation?

Reusable transformation is used numerous times in mapping. It is different from other mappings which use the transformation since it is stored as a metadata. The transformations will be nullified in the mappings whenever any change in the reusable transformation is made.

What is surrogate key?

Surrogate key is a replacement for the natural prime key. It is a unique identification for each row in the table. It is very beneficial because the natural primary key can change which eventually makes update more difficult. They are always used in form of a digit or integer.

What are the prerequisite tasks to achieve the session partition?

In order to perform session partition, one need to configure the session to partition source data and then installing the Informatica server machine in multifold CPU’s.

Which files are created during the session rums by informatics server?

During session runs, the files created are namely Error’s log, Bad file, Workflow low and session log.

Briefly define a session task?

It is a chunk of instruction the guides Power center server about how and when to transfer data from sources to targets.

What does command task mean?

This specific task permits one or more than one shell commands in Unix or DOS in windows to run during the workflow.

What is standalone command task?

This task can be used anywhere in the workflow to run the shell commands.

What is meant by pre and post session shell command?

Command task can be called as the pre or post session shell command for a session task. One can run it as pre session command r post session success command or post session failure command.

What is predefined event?

It is a file-watch event. It waits for a specific file to arrive at a specific location.

How can you define user defied event?

User defined event can be described as a flow of tasks in the workflow. Events can be created and then raised as need arises.

What is a work flow?

Work flow is a bunch of instructions that communicates server about how to implement tasks.

What are the different tools in workflow manager?

Following are the different tools in workflow manager namely

  • Task Designer
  • Worklet Designer
  • Workflow Designer

Tell me any other tools for scheduling purpose other than workflow manager pmcmd?

The tool for scheduling purpose other than workflow manager can be a third-party tool like ‘CONTROL M’.

What is OLAP (On-Line Analytical Processing?

A method by which multi-dimensional analysis occurs.

What are the different types of OLAP? Give an example?

ROLAP eg.BO, MOLAP eg.Cognos, HOLAP, DOLAP

What do you mean by worklet?

When the workflow tasks are grouped in a set, it is called as worklet. Workflow tasks includes timer, decision, command, event wait, mail, session, link, assignment, control etc.

What is the use of target designer?

Target Definition is created with the help of target designer.

Where can we find the throughput option in informatica?

Throughput option can be found in informatica in workflow monitor. In workflow monitor, right click on session, then click on get run properties and under source/target statistics we can find throughput option.

What is target load order?

Target load order is specified on the basis of source qualifiers in a mapping. If there are multifold source qualifiers linked to different targets then one can entitle order in which informatica server loads data into targets.

Differentiate between Informatica and DataStage.

Criteria Informatica DataStage
GUI for development and monitoring PowerDesigner, Repository Manager, Worflow Designer, and Workflow Manager DataStage Designer, Job Sequence Designer, and Director
Data integration solution Step-by-step solution Project-based integration solution
Data transformation Good Excellent

What is Informatica PowerCenter?

Informatica PowerCenter is an ETL/data integration tool that has a wide range of applications. This tool allows users to connect and fetch data from different heterogenous sources and subsequently process the same.

For example, users can connect to a SQL Server Database or an Oracle Database, or both, and also integrate the data from both these databases to a third system.

Mention some typical use cases of Informatica.

There are many typical use cases of Informatica, but this tool is predominantly leveraged in the following scenarios:

  • When organizations migrate from the existing legacy systems to new database systems
  • When enterprises set up their data warehouse
  • While integrating data from various heterogenous systems including multiple databases and file-based systems
  • For data cleansing

How can we filter rows in Informatica?

There are two ways to filter rows in Informatica, they are as follows:

  • Source Qualifier Transformation: It filters rows while reading data from a relational data source. It minimizes the number of rows while mapping to enhance performance. Also, Standard SQL is used by the filter condition for executing in the database.
  • Filter Transformation: It filters rows within a mapped data from any source. It is added close to the source to filter out the unwanted data and maximize performance. It generates true or false values based on conditions.

Differentiate between Joiner and Lookup transformations.

Joiner Lookup
It is not possible to override the query It is possible to override the query
Only the ‘=’ operator is available All operators are available for use
Users cannot restrict the number of rows while reading relational tables Users can restrict the number of rows while reading relational tables
It is possible to join tables with Joins It behaves as Left Outer Join while connecting with the database

In Informatica Workflow Manager, how many repositories can be created?

Depending upon the number of ports that are required, repositories can be created. In general, however, there can be any number of repositories.

What are the types of lookup transformation?

There are four different types of lookup transformation:

  • Relational or flat file lookup: It performs lookup on relational tables.
  • Pipeline lookup: It performs lookup on application sources.
  • Connected or unconnected lookup: While the connected lookup transformation receives data from source, performs lookup, and returns the result to the pipeline, the unconnected lookup happens when the source is not connected. It returns one column to the calling transformation.
  • Cached or uncached lookup: Lookup transformation can be configured to cache lookup data, or we can directly query the lookup source every time a lookup is invoked.

How do pre- and post-session shell commands function?

A command task can be called as a pre- or post-session shell command for a session task. Users can run it as a pre-session command, a post-session success command, or a post-session failure command. Based on use cases, the application of shell commands can be changed or altered.

What can we do to improve the performance of Informatica Aggregator Transformation?

Aggregator performance improves dramatically if records are sorted before passing to the aggregator and if the ‘sorted input’ option under Aggregator Properties is checked. The record set should be sorted on those columns that are used in the Group By operation. It is often a good idea to sort the record set in the database level, e.g., inside a source qualifier transformation, unless there is a chance that the already sorted records from the source qualifier can again become unsorted before reaching the aggregator.

How can we update a record in the target table without using Update Strategy?

A target table can be updated without using ‘Update Strategy.’ For this, we need to define the key in the target table in Informatica level, and then we need to connect the key and the field we want to update in the mapping target. In the session level, we should set the target property as ‘Update as Update’ and check the ‘Update’ check box.

Let us assume, we have a target table ‘Customer’ with fields as ‘Customer ID,’ ‘Customer Name,’ and ‘Customer Address.’ Suppose if we want to update ‘Customer Address’ without an Update Strategy, then we have to define ‘Customer ID’ as the primary key in Informatica level, and we will have to connect ‘Customer ID’ and ‘Customer Address’ fields in the mapping. If the session properties are set correctly as described above, then the mapping will only update the ‘Customer Address’ field for all matching customer IDs.

Why do we use mapping parameters and mapping variables?

Basically, mapping parameters and mapping variables represent values in mappings and mapplets.

Mapping Parameters

  • Mapping parameters represent constant values that are defined before running a session.
  • After creation, parameters appear in Expression Editor.
  • These parameters can be used in source qualifier filter, in user-defined joins, or for overriding.

Mapping Variables

  • As opposed to mapping parameters, mapping variables can change values during sessions.
  • The last value of a mapping variable is saved to the repository at the end of each successful session by the Integration Service. However, it is possible to override saved values with parameter files.
  • Basically, mapping variables are used to perform incremental reads of data sources.

Define the surrogate key.

A surrogate key is basically an identifier that uniquely identifies modeled entities or objects in a database. Not being derived from any other data in the database, surrogate keys may or may not be used as primary keys.

It is basically a unique sequential number. If an entity exists in the outside world and modeled within the database, or represents an object within the database, it is denoted by a surrogate key. In these cases, surrogate keys for specific objects or modeled entities are generated internally.

Explain sessions and also shed light on how batches are used to combine executions.

A session is nothing but a teaching set which is ought to be implemented to convert data from a source to a target. To carry out sessions, users need to leverage the session’s manager or use the pmcmd command. For combining sessions, in either a serial or a parallel manner, batch execution is used. Any number of sessions can be grouped into batches for migration.

What is incremental aggregation?

Basically, incremental aggregation is the process of capturing changes in the source and calculating aggregations in a session. This process incrementally makes the integration service to update targets and avoids the process of calculating aggregations on the entire source.

Upon the first load, the table becomes as below:

On the next load, the data will be aggregated with the next session date.

How can we delete duplicate rows from flat files?

We can delete duplicate rows from flat files by leveraging the sorter transformation and selecting the distinct option. Selecting this option will delete the duplicate rows.

What are the features of Informatica Developer ?

From an Informatica Developer’s perspective, some of the new features in Informatica Developer .. are as follows:

  • In the new version, lookup can be configured as an active transformation—it can return multiple rows on a successful match.
  • Now, we can write SQL override on uncached lookup also. Previously, we could do it only on cached lookup.
  • Control over the size of our session log: In a real-time environment, we can control the session log file size or log file time.
  • Database deadlock resilience feature: This will ensure that our session does not immediately fail if it encounters any database deadlock. It will retry the operation. We can configure the number of retry attempts.

What are the advantages of using Informatica as an ETL tool over Teradata?

First up, Informatica is a data integration tool, while Teradata is an MPP database with some scripting and fast data movement capabilities.

Advantages of Informatica over Teradata:

  • It functions as a metadata repository for the organization’s ETL ecosystem. Informatica jobs (sessions) can be arranged logically into worklets and workflows in folders. It leads to an ecosystem which is easier to maintain and quicker for architects and analysts to analyze and enhance.
  • Job monitoring and recovery: It is easy-to-monitor jobs using Informatica Workflow Monitor. It is also easier to identify and recover in the case of failed jobs or slow-running jobs. It exhibits an ability to restart from failure row step.
  • Informatica Market Place: It is a one-stop shop for lots of tools and accelerators to make SDLC faster and improve application support.
  • It enables plenty of developers in the market with varying skill levels and expertise to interact.
  • Lots of connectors to various databases are available, including support for Teradata MLoad, TPump, FastLoad, and Parallel Transporter in addition to the regular (and slow) ODBC drivers.
  • Surrogate key generation through shared sequence generators inside Informatica could be faster than generating them inside the database.
  • If a company decides to move away from Teradata to another solution, then vendors like Infosys can execute migration projects to move the data and change the ETL code to work with the new database quickly, accurately, and efficiently using automated solutions.
  • Pushdown optimization can be used to process the data in the database.
  • It has an ability to code ETL such that processing load is balanced between the ETL server and the database box—this is useful if the database box is ageing and/or in case the ETL server has a fast disk/large enough memory and CPU to outperform the database in certain tasks.
  • It has an ability to publish processes as web services.

Advantages of Teradata over Informatica:

  • Cheaper (initially): No initial ETL tool license costs. There is only less OPEX costs as one doesn’t need to pay for yearly support from Informatica Corp.
  • Great choice if all the data to be loaded are available as structured files—which can then be processed inside the database after an initial stage load.
  • Good choice for a lower complexity ecosystem.
  • Only Teradata developers or resources with good ANSI/Teradata SQL/BTEQ knowledge are required to build and enhance the system.

Differentiate between various types of schemas in data warehousing.

Star Schema

Star schema is the simplest style of data mart schema in computing. It is an approach which is most widely used to develop data warehouses and dimensional data marts. It features one or more fact tables referencing to numerous dimension tables.

Snowflake Schema
A logical arrangement of tables in a multidimensional database, snowflake schema is represented by centralized fact tables which are connected to multidimensional tables. Dimensional tables in a star schema are normalized using snowflaking. Once normalized, the resultant structure resembles a snowflake with the fact table at the middle. Low-cardinality attributes are removed, and separate tables are formed.

Fact Constellation Schema

Fact constellation schema is a measure of online analytical processing (OLAP), and OLAP happens to be a collection of multiple fact tables sharing dimension tables and viewed as a collection of stars. It can be seen as an extension of the star schema.

Define OLAP. What are the different types of OLAP?

OLAP or Online Analytical Processing is a specific category of software that allows users to analyze information from multiple database systems simultaneously. Using OLAP, analysts can extract and have a look at business data from different sources or points of view.

Types of OLAP:

  • ROLAP: ROLAP or Relational OLAP is an OLAP server that maps multidimensional operations to standard relational operations.
  • MOLAP: MOLAP or Multidimensional OLAP uses array-based multidimensional storage engines for multidimensional views on data. Numerous MOLAP servers use two levels of data storage representation to handle dense and sparse datasets.
  • HOLAP: HOLAP of Hybrid OLAP combines both ROLAP and MOLAP for faster computation and higher scalability of data.

What is target load order? How to set it?

When a mapplet is used in a mapping, Designer allows users to set target load order for all sources that pertain to the mapplet. In Designer, users can set the target load order in which Integration Service sends rows to targets within the mapping. A target load order group is basically a collection of source qualifiers, transformations, and targets linked together in a mapping. The target load order can be set to maintain referential integrity while operating on tables that have primary and secondary keys.

Steps to Set the Target Load Order

Step : Create a mapping that contains multiple target load order groups
Step : Click on Mappings and then select Target Load Plan
Step : The Target Load Plan dialog box lists all Source Qualifier transformations with targets that receive data from them
Step : Select a Source Qualifier and click on the Up and Down buttons to change the position of the it
Step : Repeat Steps and for other Source Qualifiers if you want to reorder them
Step : Click on OK after you are done

What the difference is between a database, a data warehouse and a data mart?

Database includes a set of sensibly affiliated data which is normally small in size as compared to data warehouse. While in data warehouse there are assortments of all sorts of data and data is taken out only according to the customer’s needs. On the other hand datamart is also a set of data which is designed to cater the needs of different domains. For instance an organization having different chunk of data for its different departments i.e. sales, finance, marketing etc.

What do you mean by Enterprise Data Warehousing?

When the organization data is created at a single point of access it is called as enterprise data warehousing. Data can be provided with a global view to the server via a single source store. One can do periodic analysis on that same source. It gives better results but however the time required is high.

What is meant by a domain?

When all related relationships and nodes are covered by a sole organizational point, its called domain. Through this data management can be improved.

What is the difference between a repository server and a powerhouse?

Repository server controls the complete repository which includes tables, charts, and various procedures etc. Its main function is to assure the repository integrity and consistency. While a powerhouse server governs the implementation of various processes among the factors of server’s database repository.

How many repositories can be created in informatica?

There can be any number of repositories in informatica but eventually it depends on number of ports.

What is the benefit of partitioning a session?

Partitioning a session means solo implementation sequences within the session. It’s main purpose is to improve server’s operation and efficiency. Other transformations including extractions and other outputs of single partitions are carried out in parallel.

How are indexes created after completing the load process?

For the purpose of creating indexes after the load process, command tasks at session level can be used. Index creating scripts can be brought in line with the session’s workflow or the post session implementation sequence. Moreover, this type of index creation cannot be controlled after the load process at transformation level.

Explain sessions. Explain how batches are used to combine executions?

A teaching set that needs to be implemented to convert data from a source to a target is called a session. Session can be carried out using the session’s manager or pmcmd command. Batch execution can be used to combine sessions executions either in serial manner or in a parallel. Batches can have different sessions carrying forward in a parallel or serial manner.

How many numbers of sessions can one group in batches?

One can group any number of sessions but it would be easier for migration if the number of sessions is lesser in a batch.

Explain the difference between mapping parameter and mapping variable?

When values change during the session’s execution it’s called a mapping variable. Upon completion the Informatica server stores the end value of a variable and is reused when session restarts. Moreover, those values that do not change during the sessions execution are called mapping parameters. Mapping procedure explains mapping parameters and their usage. Values are allocated to these parameters before starting the session.

Define Target Designer.

If we are required to perform ETL operations, we need source data, target tables, and the required transformations. Target Designer in Informatica allows us to create target tables and modify the pre-existing target definitions.

Target definitions can be imported from various sources, including flat files, relational databases, XML definitions, Excel worksheets, etc.

For opening Target Designer, click on the Tools menu and select the Target Designer option.

What are the advantages of Informatica?

The advantages of Informatica as a data integration tool are many.

  • It facilitates the effective and efficient communication and transformation of data between different sources.
  • Informatica is faster, cheaper, and easy to learn.
  • Monitoring jobs becomes easy with it, and so do recovering failed jobs and pointing out slow jobs.
  • It has many robust features including database information, data validation, migration of projects from one database to another, etc.

List some of the PowerCenter client applications with their basic purpose.

  • Repository Manager: An administrative tool which is used to manage repository folders, objects, groups, etc.
  • Administration Console: Used to perform service tasks
  • PowerCenter Designer: Contains several designing tools including source analyzer, target designer, mapplet designer, mapping manager, etc.
  • Workflow Manager: Defines a set of instructions that are required to execute mappings
  • Workflow Monitor: Monitor’s workflows and tasks

What are sessions? List down their properties.

Available in the Workflow Manager, sessions are configured by creating a session task. Within a mapping program, there can be multiple sessions which can be either reusable or non-reusable.

Properties of Sessions

  • Session tasks can run concurrently or sequentially, as per the requirement.
  • They can be configured to analyze performance.
  • Sessions include log files, test load, error handling, commit interval, target properties, etc.

What are the various types of transformations possible in Informatica?

The various types of transformations are:

  • Aggregator Transformation
  • Expression Transformation
  • Normalizer Transformation
  • Rank Transformation
  • Filter Transformation
  • Joiner Transformation
  • Lookup Transformation
  • Stored procedure Transformation
  • Sorter Transformation
  • Update Strategy Transformation
  • XML Source Qualifier Transformation
  • Router Transformation
  • Sequence Generator Transformation

What are the features of connected lookup?

The features of connected lookup are as follows:

  • It takes in the input directly from the pipeline.
  • It actively participates in data flow, and both dynamic and static cache are used.
  • It caches all lookup columns and returns default values as the output when the lookup condition does not match.
  • It is possible to return more than one column value to the output port.
  • It supports user-defined default values.

Define junk dimensions.

Junk dimensions are structures that consist of a group of a few junk attributes such as random codes or flags. They form a framework to store related codes with respect to a specific dimension at a single place, instead of creating multiple tables for the same.

What is the use of Rank Transformation?

Be it active or connected, rank transformation is used to sort and rank a set of records either from the top or from the bottom. It is also used to select data with the largest or smallest numeric value based on specific ports.

Define Sequence Generator transformation.

Available in both passive and connected configurations, the Sequence Generator transformation is responsible for the generation of primary keys or a sequence of numbers for calculations or processing. It has two output ports that can be connected to numerous transformations within a mapplet. These ports are:

  • NEXTVAL: This can be connected to multiple transformations for generating a unique value for each row or transformation.
  • CURRVAL: This port is connected when NEXTVAL is already connected to some other transformation within the mapplet.

What is the purpose of the INITCAP function?

When invoked, the INITCAP function capitalizes the first character of each word in a string and converts all other characters to lowercase.

Syntax:

INITTCAP(string_name)

Define enterprise data warehousing?

When the data of an organization is developed at a single point of access, it is known as enterprise data warehousing.

Differentiate between a database and a data warehouse?

Database has a group of useful information which is brief in size as compared to data warehouse. In data warehouse, there are sets of every kind of data whether it is useful or not, and the data is extracted as per the requirement of the customer.

What do you understand by the term ‘domain’?

The term ‘domain’ refers to all interlinked relationship and nodes that are undertaken by sole organizational point.

Differentiate between a repository server and a powerhouse.

A repository server mainly guarantees the repository reliability and uniformity, while a powerhouse server tackles the execution of many procedures between the factors of server’s database repository.

How can we create indexes after completing the load process?

With the help of command task at the session level, we can create indexes after the load procedure.

Define sessions in Informatica ETL.

A session is a teaching group that requires the transformation of information from the source to a target.

 

What is the use of the aggregator cache file?

If extra memory is needed, aggregator provides extra cache files for keeping the transformation values. It also keeps the transitional value that are there in the local buffer memory.

What is lookup transformation?

The transformation that has entrance right to RDBMS is known as lookup transformation.

What do you understand by the term ‘role-playing dimension’?

The dimensions that are used for playing diversified roles while remaining in the same database domain are known as role-playing dimensions.

Define parallel processing?

Parallel processing helps in further improvement of performance under hardware power. The parallel processing is actually done by using the partitioning sessions. This partitioning option of the Power Center in Informatica increases the performances of the Power Center by parallel data processing. This allows the large data set to be divided into a smaller subset and this is also processed in order to get a good and better performance of the session.

What are the different types of methods for the implementation of parallel processing in Informatica?

There are different types of algorithms that can be used to implement the parallel processing. These are as follows –

Database Partitioning – Database partitioning is actually a type of table partitioning information. There is a particular type of service that queries the database system or the information of the database, named the Integration Service. Basically, it looks up the partitioned data from the nodes of the database.
Round-Robin Partitioning – With the aid of this, the Integration service does the distribution of data across all partitions evenly. It also helps in grouping data in a correct way.
Hash Auto-keys partitioning – The hash auto keys partition is used by the power center server to group data rows across partitions. These grouped ports are used as a compound partition by the Integration Service.
Hash User-Keys Partitioning – This type of partitioning is the same as auto keys partitioning but here rows of data are grouped on the basis of a user-defined or a user-friendly partition key. The ports can be chosen individually that correctly defines the key.
Key Range Partitioning – More than one type of ports can be used to form a compound partition key for a specific source with it’s aid, the key range partitioning. Each partition consists of different ranges and data is passed based on the mentioned and specified range by the Integration Service.
Pass-through Partitioning – Here, the data are passed from one partition point to another. There is no distribution of data.

What are the best mapping development practices?

Best mapping development practices are as follows –

Source Qualifier – This includes extracting the necessary data keeping aside the unnecessary ones. It also includes limiting columns and rows. Shortcuts are mainly used in the source qualifier. The default query options like for example User Defined Join and Filter etc, are suitable to use other than using source qualifier query override. The latter doesn’t allow the use of partitioning possible all the time.
Expressions – It includes the use of local variables in order to limit the number of huge calculations. Avoiding data type conversions and reducing invoking external coding is also part of an expression. Using operators are way better than using functions as numeric operations are better and faster than string operation.
Aggregator – Filtering the data is a necessity before the Aggregation process. It is also important to use sorted input.
Filter – The data needs a filter transformation and it is a necessity to be close towards the source. Sometimes, multiple filters are also needed to be used which can also be later replied by a router.
Joiner – The data is required to be joined in the Source Qualifier as it is important to do so. It is also important to avoid the outer joins. A fewer row is much more efficient to be used as a Master Source.
Lookup – Here, joins replace the large lookup tables and the database is reviewed. Also, database indexes are added to columns. Lookups should only return those ports that meet a particular condition.

What are the different mapping design tips for Informatica?

The different mapping design tips are as follows –

Standards – The design should be of a good standard. Following a standard consistently is proven to be beneficial in the long run projects. Standards include naming descriptions, conventions, environmental settings, documentation and parameter files etc.
Reusability – Using reusable transformation is the best way to react to the potential changes as quickly as possible. mapplets and worklets, these types of Informatica components are best suited to be used.
Scalability – It is important to scale while designing. In the development of mappings, the volume must be correct.
Simplicity – It is always better to create different mappings instead of creating one complex mapping. It is all about creating a simple and logical process of design
Modularity – This includes reprocessing and using modular techniques for designing.

What is the meaning of the word ‘session’? Give an explanation of how to combine execution with the assistance of batches?

Converting a data from a source to a target is generally implemented by a teaching service and this is known as a session. Usually, session’s manager executes the session. In order to combine session’s executions, batches are used in two ways – serially or parallelly.

How many numbers of sessions is grouped in one batch? 

Any number of sessions can be grouped in one batch but however, for an easier migration process, it is better if the number is lesser in one batch.

Differentiate between mapping parameter and mapping variable?

Mapping variable refers to the changing values of the sessions’ execution. On the other hand, when the value doesn’t change during the session then it is called mapping parameters. Mapping procedure explains the procedure of the mapping parameters and the usage of this parameter. Values are best allocated before the beginning of the session to these mapping parameters.

 What are the features of complex mapping?  

. Difficult requirements
. Numerous transformations
. Complex logic regarding business

These are the three most important features of complex mapping.

Which option helps in finding whether the mapping is correct or not?  

The debugging option helps in judging whether the mapping is correct or not without really connecting to the session.

What do you mean by OLAP? 

OLAP or also known as On-Line Analytical Processing is the method with assistance of which multi-dimensional analysis occur.

Mention the different types of OLAP?

The different types of OLAP are ROLAP, HOLAP.

What is the meaning of surrogate key? 

The surrogate key is just the replacement in the place of the prime key. The latter is natural in nature. This is a different type of identity for each consisting of different data.

What is a session task?

When the Power Centre Server transfers data from the source to the target, it is often guided by a set of instruction and this is known as the session task.

What is the meaning of command task?  

Command task only allows the flow of more than one shell command or sometimes flow of one shell command in Windows while the work is running.

What is the meaning of standalone command task?

The type of command task that allows the shell commands to run anywhere during the workflow is known as the standalone task.

Define workflow?

The workflow includes a set of instructions which allows the server to communicate for the implementation of tasks.

How many tools are there in workflow manager?  

There are four types of tools –

. Task Designer
. Task Developer
. Workflow Designer
. Worklet Designer

Define target load order?

Target load order is depended on the source qualifiers in a mapping. Generally, multiple source qualifiers are linked to a target load order.

Define Power Centre repository of Informatica?

Informatica Power Centre consists of the following Metadata like –

. Source Definition
. Session and session logs
. Workflow
. Target Definition
. Mapping
. ODBC Connection

Two repositories are as follows –

. Global Repositories
. Local Repositories

Mainly Extraction, Loading (ETL) and Transformation of the above-mentioned metadata are performed through the Power Centre Repository.

Name the scenario in which Informatica server reject files?

When the server faces a rejection of the update strategy transformation, it regrets files. The database consisting of the information and data also gets disrupted. This is a rare case scenario.

How to use Normalizer Transformation in Informatica?

  1. This is of type an Active T/R which reads the data from COBOL files and VSAM sources (virtual storage access method)
  2. Normalizer T/R act like a source Qualifier T/R while reading the data from COBOL files.
  3. Use Normalizer T/R that converting a each input record into multiple output records. This is known as Data pivoting.

What are the Limitations of Pushdown Optimization?

  1. Rank T/R cannot be pushed
  2. Transaction control T/R
  3. Sorted aggregation.

Procedure:

. Design a mapping with filter, rank and expression T/R.

. Create a session –> Double click the session select properties tab.

Attribute     Value                   
Pushdown optimization     Full

. Select the mapping tab –> set reader, writer connection with target load type normal.

. Click apply –> click ok  –> save the session.

. Create & start workflow.

Pushdown Optimization Viewer:-

Double click the session –> Select the mapping tab from left window –> select pushdown optimization.

Differences between Copy and Shortcut?

Copy Vs ShortCut
Copy Shortcut
. Copy an object to another folder . Dynamic link to an object in the folder
. Changes to original object doesn’t reflect . Dynamically reflects the changes to an original object
. Duplicate’s the space . Preserves the space
. Created from unshared folders . Created from shared folders

 How to use PMCMD Utiliy Command?

  1. It is a command based client program that communicates with integration service to perform some of the tasks which can also be performed using workflow manager client.
  2. Using PMCMD we can perform the following tasks:
    • Starting workflow.
    • Scheduling workflow.
  3. The PMCMD can be operated in two different modes:
    • Interactive Mode.
    • Command line Mode.

 Scheduling a Workflow?

A schedule is an automation of running the workflow at a given date and time.

. There are types of schedulers:

(i) Reusable scheduler

(ii) Non Reusable scheduler

(i) Reusable scheduler:-

A reusable scheduler can be assigned to multiple workflows.

(ii) Non Reusable scheduler:-

  • A non reusable scheduler is created specific to the workflow.
  • A non reusable scheduler can be converted into a reusable scheduler.

 

The following are the rd party schedulers:

. Cron (Unix based scheduling process)

. Tivoli

. Control M

. Autosys

. Tidal

. WLM (work hard manager)

– % production people will do scheduling.

– Before we run the workflow manually. Through scheduling we run workflow this is called Auto Running.

What is Dynamic Lookup Cache?

The cache updates or changes dynamically when lookup on target table.

  1. The dynamic lookup T/R allows for the synchronization of the target lookup table image in the memory with its physical table in the database.
  2. The dynamic lookup T/R or dynamic lookup cache is operated in only connected mode (connected lookup )
  3. Dynamic lookup cache support only equality conditions (=conditions)

 

New Lookup Row Description
The integration service does not update or insert the row in the cache
The integration service inserts the row into the cache
The integration service updates the row in the cache

How to use PowerCenter Command Line in Informatica?

The transformation language provides two comment specifiers to let you insert comments in expression:

– Two Dashes ( – – )

– Two Slashes ( / / )

The Power center integration service ignores all text on a line preceded by these two comment specifiers.

Differences between variable port and Mapping variable?

 

Variable Port Vs Mapping Variable
Variable Port Mapping Variable
. Local to the T/R . Local to the Mapping
. Values are non-persistant . Values are persistent
. Can’t be used with SQL override . Can be used with SQL override

 

  • Mapping variables is used for incremental extraction.
  • In mapping variables no need to change the data. It automatically changed.
  • In mapping parameter you have to change the data and time.

Which is the T/R that builts only single cache memory?

Rank can build two types of cache memory. But sorter always built only one cache memory.

Cache is also called Buffer.

What is XML Source Qualifier Transformation in Informatica?

  1. Reads the data from XMl files.
  2. XML source definition associates with XML source Qualifier.
  3. XML files are case sensitive markup language.
  4. Files are saved with an extension .XML.
  5. XML files are hierarchical (or) parent child relationship file formats.
  6. Files can be normalized or denormalized.

What is Load Order?

Design mapping applications that first loads the data into the dimension tables. And then load the data into the fact table.

  • Load Rule:- If all dimension table loadings are success then load the data into fact table.
  • Load Frequency:- Database gets refreshed on daily loads, weekly loads and montly loads.

What is Snowflake Schema?

A large denormalized dimention table is splitted into multiple normalized dimensions.

Advantage:
Select Query performance increases.

Disadvantage:
Maintanance cost increases due to more no. of tables.

Stand alone Email task?

  1. It can be used any where in the workflow, defined will Link conditions to notify the success or failure of prior tasks.
  2. Visible in Flow Diagram.
  3. Email Variables can be defined with stand alone email tasks.

What is Mapping Debugger?

  • Debugger is a tool. By using this we can identify records are loaded or not and correct data is loaded or not from one T/R to other T/R.
  • Session succeeded but records are not loaded. In this situation we have to use Debugger tool.

What is the functionality of F in informatica?

–> Next Instance

What T/R having Nocast?

Lookup T/R

Note:- Prevent wait is available in any task. It is available only in Event wait task.
– F  –>  Start Debbugger.
– Debugger is used for test the records are loader or not, correct data is loader or not.
– Debugger is used only for to test Valid Mapping but not invalid Mapping.

What is Worklet and types of worklets?

  1. A worklet is defined as group of related tasks.
  2. There are types of the worklet:
    • Reusable worklet
    • Non-Reusable worklet
  3. Worklet expands and executes the tasks inside the workflow.
  4. A workflow which contains the worklet is known as Parent Workflow.

(a) Reusable Worklet:-

Created using worklet designer tool.
Can be assigned to Multiple workflows.

(b) Non-Reusable Worklet:-

Created using workflow designer tool.
Created Specific to workflow.

What is Relative Mode?

In Real time we use this.

Relative Time: The timer task can start the timer from the start timer of the timer task, the start time of the workflow or worklet, or from the start time of the parent workflow.

 

– Timer task is mainly used for scheduling workflow.
– Workflow AM –> Timer (: AM) –> Absolute Mode
– Anytime workflow start after mins Timer –> ( mins) will start Relative Mode.

Difference between Filter and Router T/R?

Filter T/R Vs Router T/R
Filter T/R Router T/R
. Single condition . Multiple conditions
. Single Target . Multiple Targets
. Rejected rows cannot be captured . Default group captures rejected rows.

What is a Repository Manager?

It is a GVI based administrative client which allows to perform the following administrative tasks:

  1. Create, edit and delete folders.
  2. Assign users to access the folders with read, write and execute permissions.
  3. Backup and Restore repository objects.

What is Rank Transformation in Informatica?

  1. This a type of an active T/R which allows you to findout either top performance or bottom performers.
  2. Rank T/R is created with the following types of the port:
  • i. Input Port (I)
  • ii. Output Port (O)
  • iii. Rank Port (R)
  • iv. Variable Port (V)

What is meant by Informatica PowerCenter Architecture?

The following components get installed:

  • Power Center Clients
  • Power Center Repository.
  • Power Center Domain.
  • Power Center Repository Service  (PCRS)
  • Power Center Integration Service (PCIS)
  • Informatica administrator.

Mapping is nothing but ETL Application.

What is Workflow Monitor?

  1. It is a GUI based client application which allows use to monitor ETL objects running an ETL Server.
  2. Collect runtime statistics such as:
  1. No. of records extracted.
  2. No. of records loaded.
  3. No. of records rejected.
  4. Fetch session log
  5. Throughput
  • Complete information can be accessed from workflow monitor.
  • For every session one log file is created.

If informatica have own scheduler why using third party scheduler?

The client uses various applications (mainframes, oracle apps use Tivoli scheduling tool) and integrate different applications & scheduling that applications it is very easy by using third party schedulers.

What is Workflow Manager?

It is a GUI based client which allows you to create following ETL objects.

. Session

. Workflow

. Scheduler.

Session:

  • A session is a task that executes mapping.
  • A session is created for each Mapping.
  • A session is created to provide runtime properties.
  • A session is a set of instructions that tells ETL server to move the data from source to destination.

Workflow:

Workflow is a set of instructions that tells how to run the session tasks and when to run the session tasks.

What is Informatica PowerCenter?

A data integration tool which combines the data from multiple OLTP source systems, transforms the data into homogeneous format and delivers the data through out the enterprise at any speed.

It is a GUI based ETL product from informatica corporation which was founded in Red wood city, California.

There are many products in informatica corporation:

. Informatica Analyzer.

. Life cycle management.

. Master data

Having many products in informatica.

Informatica power center is one of the product of informatica.

Using informatica power center we will do the Extraction, transformation and loading.

What is a Dimensional Model?

  1. Data Modeling:- It is a process of designing the database by fulfilling business requirements specifications.
  2. A Data Modeler (or) Database Architech Designs the warehouse Database using a GUI based data modeling tool called “ERWin”.
  3. ERWin is a datamodeling tool from computer Associates (A).
  4. A dimensional modeling consists of following types of schemas designed for Datawarehouse:
    • Star Schema.
    • Snowflake Schema.
    • Galary Schema.
  5. A schema is a data model which consists of one or more tables.

How does Rank transformation handle string values?

Rank transformation can return the strings at the top or the bottom of a session sort order. When the Integration Service runs in Unicode mode, it sorts character data in the session using the selected sort order associated with the Code Page of IS which may be French, German, etc. When the Integration Service runs in ASCII mode, it ignores this setting and uses a binary sort order to sort character data.

What are the advantages of Informatica over other ETL tools?

Informatica is the world’s most popular data integration tool. It interoperates with the widest range of different standards, systems, and applications; it’s fast; and it is designed to adapt to the constant change in the field/market, the organization, and the system. Also, you can easily monitor jobs, and it’s easy to identify the causes of failed jobs.

What are the main components of Informatica?

The main features of Informatica are the client tools/applications, server, repository server, and repository.

What can Informatica be used for in an organization?

Informatica can be used for data migration—for example, a company is transitioning from an older mainframe system to a new database system; data warehousing—an ETL tool would be needed for moving data from the production system to the warehouse; data integration—incorporating data from multiple databases or file-based systems, for example; and cleaning up data.

What is an enterprise data warehouse?

An enterprise data warehouse is a single unified database that holds an organization’s business information and distributes it throughout the company. There are variations, but it likely includes a unified approach to sorting and presenting data, and data can be classified and accessed according to the subject.

Describe an Informatica workflow.

In the workflow manager, you build a workflow by logically connecting tasks to execute code (for example, scripts). The final workflow will automatically run all the tasks within it in the specified order.

What is the domain?

A domain is composed of relationships and nodes covered by one organizational point.

What are some of the types of transformation?

Some transformation types are aggregator, expression, filter, joiner, lookup, rank, router, and normalizer.

What’s the difference between active and passive transformation?

An active transformation can change the number of rows that pass through it, can change the transaction boundary and can change the actual row type. A passive transformation doesn’t change either the number of rows that pass through it or the row type and doesn’t change the transaction boundary.

Why might router transformation be better than filter transformation?

With router transformation, you can have better performance, and it’s less complex and more efficient than filter transformation.

Why would you want to partition a section?

It improves the server’s efficiency; other transformations are carried out in parallel.

What’s the difference between a mapping parameter and a mapping variable?

Mapping variables, as the name implies, are values that change during a session’s execution. Values that don’t change are called parameters.

How would your self-join in an Informatica mapping?

To self-join, place one transformation minimum between the source qualifier and the joiner in one branch minimum. You must pre-sort the data and then configure the joiner to accept sorted input.

What are the different join types within a joiner transformation?

There are four join types: normal join, master outer join, detail outer join and full outer join.

What are the different dimensions in Informatica?

Three dimensions are available in Informatica: junk, degenerative and conformed.

What is the difference between a session and a batch?

A session is a set of commands by which the server moves data to the target. A batch is a set of individual tasks.

How many sessions can be grouped in a batch?

There is no limit to the number of sessions that can comprise a batch. But the fewer the sessions, the easier the migration.

Describe the modes of data movement.

In the normal mode of data movement, a separate DML stmt is prepared and executed. In the bulk mode, a DML stmt is prepared and executed for multiple records at a time, improving efficiency.

What is the aggregator cache used for?

It stores transitional files found in the local buffer memory, and stores transformation values if extra memory is required.

What is the persistent lookup cache?

This data is stored on the server, saving time because a database query doesn’t need to happen when a lookup occurs.

What are Mapplets?

In the Mapplet Designer, you create mapplets, which are reusable objects that contain a set of transformations.

Describe the differences between a mapplet and a mapping.

Mapplets contain only transformations, can be reused and are developed for complex calculations. Mappings include source, target, and transformations; they are not reusable; and are used for less complex calculations than mapplets, such as for what data to move to a target.

How does a pipeline partition improve performance?

A pipeline partition lets you divide a pipeline into different reader/transformation/writer threads. The integration service can run the different partitions within the mapping at the same time, increasing efficiency.

What are some other types of partitioning aside from pipeline partitioning?

Other types of partitioning include database partitioning, round-robin partitioning, key-range partitioning, pass-through partitioning, hash user-keys partitioning and hash auto-keys partitioning.

Describe the differences between an SQL override and a lookup override.

When you want to limit the number of rows entering a mapping pipeline, you’d use an SQL override. When you want to limit the number of lookup rows to avoid scanning an entire table, you’d use the lookup override. Lookup override provides only one record even if multiple records for a condition exist. Also, SQL override doesn’t use the “order by” clause—you have to manually enter it in the query.

What are the configurable commit types?

There are three configurable commit types: target-based, source-based and user-defined.

What is INFORMATICA? Why do we need it?

INFORMATICA is a software development firm which offers some data integration solution for ETL, data virtualization, master data management, data quality, data replica, ultra messaging, etc.

Some of the popular INFORMATICA products are:

  • INFORMATICA PowerCenter
  • INFORMATICA PowerConnect
  • INFORMATICA Power Mart
  • INFORMATICA Power Exchange
  • INFORMATICA Power Analysis
  • INFORMATICA Power Quality

We need INFORMATICA while working with data systems that contain data to perform certain operations along with a set of rules. It facilitates operations line cleaning and modifying data from structured and unstructured data systems.

What is the format of INFORMATICA objects in a repository? What are the databases that it can connect to Windows?

INFORMATICA objects can be written in XML format.

Following is the list of databases that it can connect to:

  • SQL Server
  • Oracle
  • MS Access
  • MS Excel
  • DB
  • Sybase
  • Teradata

What is INFORMATICA PowerCenter?

It is an ETL/Data Integration tool that is used to connect and retrieve data from different sources and data processing. PowerCenter processes a high volume of data and supports data retrieval from ERP sources such as SAP, PeopleSoft, etc.

You can connect PowerCenter to database management systems like SQL and Oracle to integrate data into the third system.

Which are the different editions of INFORMATICA PowerCenter that are available?

Different editions of INFORMATICA PowerCenter are:

  • Standard Edition
  • Advanced Edition
  • Premium Edition

The current version of PowerCenter available is v with a high-performance increase.

How can you differentiate between PowerCenter and Power Map?          

Given below are the differences between PowerCenter and Power Map.

INFORMATICA PowerCenter INFORMATICA PowerMart
. Processes high volume of data Processes low volume of data
. Supports global and local repositories Supports only local repositories
. Supports data retrieval from ERP
Sources like SAP, PeopleSoft etc.
Do not support data retrieval from ERP sources
. Converts local repositories to global Do not convert local repositories to global

What are the different components of PowerCenter?

Given below are the important components of PowerCenter:

  • PowerCenter Service
  • PowerCenter Clients
  • PowerCenter Repository
  • PowerCenter Domain
  • Repository Service
  • Integration Service
  • PowerCenter Administration Console
  • Web Service Hub

What are the different clients of PowerCenter?

Here is the list of PowerCenter clients:

  • PowerCenter designer
  • PowerCenter workflow monitor
  • PowerCenter workflow manager
  • PowerCenter repository manager

What is INFORMATICA PowerCenter Repository?

PowerCenter Repository is a relational database or a system database that contains metadata such as,

  • Source definition
  • Target definition
  • Session and Session logs
  • Workflow
  • ODBC connection
  • Mapping

There are two types of Repositories:

  • Global Repositories
  • Local Repositories

PowerCenter Repository is required to perform Extraction, Transformation, and Loading(ETL) based on metadata.

How to elaborate Tracing Level?

Tracing level can be defined as the amount of information that the server writes in the log file. Tracing level is created and configured either at the transformation level or at session-level else at both the levels.

Given below are the types of tracing level:

  • None
  • Terse
  • Verbose Initialization
  • Verbose Data

How to elaborate PowerCenter integration service?

Integration services control the workflow and execution of PowerCenter processes.

There are three components of INFORMATICA integration services as shown in the below figure.

Integration Service Process: It is called as pmserver, integration service can start more than one process to monitor the workflow.

Load Balancing: Load Balancing refers to distributing the entire workload across several nodes in the grid. A load balancer conducts different tasks that include commands, sessions, etc.

Data Transformation Manager(DTM): Data Transformation Manager allows to perform the following data transformations:

  • Active: To change the number of rows in the output.
  • Passive: Cannot change the number of rows in the output.
  • Connected: Link to the other transformation.
  • Unconnected: No link to other transformations.

 

What is PowerCenter on Grid?

INFORMATICA has the feature of Grid computing which can be utilized for the largest data scalability in order to the performance. The grid feature is used for load balancing and parallel processing.

PowerCenter domains contain a set of multiple nodes to configure the workload and then run it on the Grid.

A domain is a foundation for efficient service administration served by the PowerCenter.

Node is an independent physical machine that is logically represented for running the PowerCenter environment.

What is Enterprise Data Warehousing?

When a large amount of data is assembled at a single access point then it is called Enterprise Data Warehousing. This data can be reused and analyzed at regular intervals or as per the need of the time requirement.

Considered as the central database or say a single point of access, enterprise data warehousing provides a complete global view and thus helps in decision support.

It can be more understood from the following points which define its features:

  • All important business information stored in this unified database can be accessed from anywhere across the organization.
  • Although the time required is more, periodic analysis on this single source always produces better results.
  • Security and integrity of data are never compromised while making it accessible across the organization.

What is the benefit of Session Partitioning?

While integration service is running in the environment the workflow is partitioned for better performance. These partitions are then used to perform Extraction, Transformation, and Loading.

How can we create an Index after completion of the Load Process?

Command tasks are used to create an Index. Command task scripts can be used in a session of the workflow to create an index.

What are Sessions?

Session is a set of instructions that are used while moving data from the source to the destination. We can partition the session to implement several sequences of sessions to improve server performance.

After creating a session we can use the server manager or command-line program pmcmd to stop or start the session.

How can we use Batches?

Batches are the collection of sessions that are used to migrate the data from the source to target on a server. Batches can have the largest number of sessions in it but they cause more network traffic whereas fewer sessions in a batch can be moved rapidly.

What is Mapping?

Mapping is a collection of source and targets which are linked with each other through certain sets of transformations such as Expression Transformation, Sorter Transformation, Aggregator Transformation, Router Transformation, etc.

What is Router Transformation?

Router transformation is used to filter the source data. You can use router transformation to split out a  single data source.

It is much like Filter transformation but the only difference is that filter transformation uses only one transformation condition and returns the rows that do not fulfill the condition, whereas router transformation uses multiple transformation conditions and returns the rows that match even a single condition.

What is Rank Transformation?

Rank transformation is Active as well as Connected. It is used to sort and rank a set of records either top or bottom. It is also used to select data with the largest or smallest numeric value based on a specific port.

What is Rank Index in Rank transformation?

Rank Index is assigned by the task designer to each record. The rank index port is used to store ranking position for each row. Rank transformation identifies each row from the top to bottom and then assigns Rank Index.

What is Status Code in INFORMATICA?

Code provides an Error handling mechanism during each session. Status Code is issued by the stored procedure to recognize whether it is committed successfully or not and provides information to the INFORMATICA server to decide whether the session has to be stopped or continued.

What are Junk Dimensions?

Junk dimension is a structure that consists of a group of some junk attributes such as random codes or flags. It forms a framework to store related codes with respect to a specific dimension at a single place instead of creating multiple tables for the same.

What is Mapplet in Informatica?

Mapplet is a reusable object that contains a certain set of rules for transformation and transformation logic that can be used in multiple mappings. Mapplet is created in the Mapplet Designer in the designer tool.

What is Decode in Informatica?

To understand Decode, let’s consider it as similar to the CASE statement in SQL. It is basically the function that is used by an expression transformation in order to search a specific value in a record.

There can be unlimited searches within the Decode function where a port is specified for returning result values. This function is usually used in cases where it is required to replace nested IF statements or to replace lookup values by searching in small tables with constant values.

Decode is a function that is used within Expression transformation. It is used just like the CASE statement in SQL to search a specific record.

Below is a simple example of a CASE in SQL:

Syntax:

SELECT EMPLOYEE_ID,

CASE

WHEN EMPLOYEE_AGE <= THEN ‘Young’ WHEN EMPLOYEE_AGE > AND AGE <= THEN

‘Knowledgeable’ WHEN EMPLOYEE_AGE > AND AGE = THEN ‘Wise’

ELSE ‘Very Wise’

END AS EMPLOYEE_WISDOM

FROM EMPLOYEE

What is Joiner Transformation in INFORMATICA?

With the help of Joiner transformation, you can make use of Joins in INFORMATICA.

It is based on two sources namely:

  • Master source
  • Detail source

Following joins can be created using Joiner transformation as in SQL.

  • Normal Join
  • Full Outer Join
  • Master Outer Join(Right Outer Join)
  • Detail Outer Join(Left Outer Join)

What is Aggregator Transformation in INFORMATICA?

Aggregator transformation can be active or connected. It works as the GROUP BY clause in SQL. It is useful to perform aggregate calculations on groups in INFORMATICA PowerCenter. It performs an aggregate calculation on data using aggregate type function viz. SUM, AVG, MAX, and MIN.

What is Sequence Generator Transformation in INFORMATICA?

Sequence Generator Transformation can be passive or connected. Its basic use is to generate integer value with NEXTVAL and CURRVAL.

What is Union Transformation in INFORMATICA?

Union transformation is used to combine the data from different sources and frame it with the same port and data type. It is much like a clause in SQL.

What is Source Qualifier Transformation in INFORMATICA?

Source Qualifier transformation is useful in Mapping, whenever we add relational flat files it is automatically created. It is an active and connected transformation that represents those rows which are read by integration service.

What is INFORMATICA Worklet?

Worklet works as a Mapplet with the feature of reusability, the only difference is that we can apply the Worklet to any number of workflows in INFORMATICA, unlike Mapplet. Worklet saves the logic and tasks at a single place to reuse.

Worklet is much similar to the Mapplet and is defined as the group of tasks that can be either reusable or non-reusable at the workflow level. It can be added to as many workflows as required. With its reusability feature, much time is saved as reusable logic can be developed once and can be placed from where it can be reused.

In the INFORMATICA PowerCenter environment, Mapplets are considered as the most advantageous feature. They are created in Mapplet designers and are a part of the Designer tool.

It basically contains a set of transformations that are designed to be reused in multiple mapping.

Mapplets are said to be reusable objects which simplify mapping by:

  • Including multiple transformations and source definitions.
  • Not required to connect to all input and output ports.
  • Accept data from sources and pass to multiple transformations

Well, overall, when it is required to reuse the mapping logic then the logic should be placed in Mapplet.

What is SUBSTR in INFORMATICA?

SUBSTR is a function that extracts or removes a set of characters from a larger character set.

Syntax: SUBSTR( string, start [,length] )

Where,
string defines the character that we want to search.
start is an integer that is used to set the position where the counting should get started.
Length is an optional parameter that is used to count the length of a string to return from its starting position.

For Example, SUBSTR(Contact, , ), where we start at the th character of our contact and returns to the next characters.

What is Code Page Compatibility?

When data is transferred from the source code page to the target code page then all the characteristics of the source page must be present in the target page to prevent data loss, this feature is called Code Page Compatibility.     

Code page compatibility comes into picture when the INFORMATICA server is running in Unicode data movement mode. In this case, the two code pages are said to be identical when their encoded characters are virtually identical and thus results in no loss of data.

For complete accuracy, it is said that the source code page is the subset of the target code page.

How you can differentiate between Connected LookUp and Unconnected LookUp?

Connected Lookup is part of the data flow which is connected to another transformation, it takes data input directly from another transformation that performs a lookup. It uses both static and dynamic cache.

Unconnected Lookup does not take the data input from another transformation but it can be used as a function in any transformation using LKP(LookUp) expression. It uses the only static cache.

What is Incremental Aggregation?

Incremental aggregation is generated as soon as a session created. It is used to calculate changes in the source data that do not change target data with significant changes.

CUSTOMER_NO BILL_NO AMOUNT DATE
//
//
//
//
//
//
//
//

On the first load, the output is:

CUSTOMER_NO BILL_NO LOAD_KEY AMOUNT

Now, on the second load, it will aggregate the data with the next session date.

CUSTOMER_NO BILL_NO LOAD_KEY AMOUNT Remarks/Operation
The cache file is updated after aggregation
The cache file is updated after aggregation
No Change
No Change
No Change

What is a Surrogate Key?

A surrogate key is a sequentially generated integer value which is used as another substitute or replacement for the primary key which is required as a unique identification of each row in a table.

The primary key can be changed frequently as per the need which makes the update process more difficult for a future requirement, Surrogate key is the only solution for this problem.

What is the Session task and Command task?

Session task is a set of instructions that are to be applied while transferring data from source to target using session command. Session command can be either pre-session command or post-session command.

Command task is a specific task that allows one or multiple shell commands of UNIX to run in Windows during the workflow

What is the Standalone command task?

The standalone command task can be used to run Shell command anywhere and anytime in the workflow.

 What is Workflow? What are the components of the Workflow Manager?

Workflow is the way of a manner in which the task should be implemented. It is a collection of instructions that inform the server about how to implement the task.

Given below are the three major components of the Workflow Manager:

  • Task Designer
  • Task Developer
  • Workflow Designer

What is the Event and what are the tasks related to it?

The event can be any action or function that occurs in the workflow.

There are two tasks related to it, which includes:

  • Event Wait Task: This task waits until an event occurs, once the event is triggered this task gets accomplished and assigns the next task.
  • Events Raise Task: Event Raise task triggers the specific event in the workflow.

 What is a pre-defined event and User-defined event?

Predefined events are system-defined events that wait until the arrival of a specific file at a specific location. It is also called as File-Watcher event.

User-Defined events are created by the user to raise anytime in the workflow once created.

What is the Target Designer and Target Load Order?

Target Designer is used for defining the Target of data.

When there are multiple sources or a single source with multiple partitions linked to different targets through the INFORMATICA server then the server uses Target Load Order to define the order in which the data is to be loaded at a target.

What is the Staging Area?

The staging area is a database where temporary tables connected to the work area are stored or fact tables to provide inputs for data processing.

What is the difference between STOP and ABORT?

Differences are as follows:

  • STOP command runs on Session task, once it is raised, the integration service stops only reading the data in the data source but continues processing and writing it to the target.
  • ABORT command is used to completely stop the integration service from reading, processing and writing data to the target. It has its own timeout period of seconds to finish the processing and writing of data through integration service if not then it simply kills the session.

What are the different LookUp caches?

Lookup can be either cached or uncached. Basically, it is divided into five parts.

They are:

  • Static Cache
  • Dynamic Cache
  • Recache
  • Persistent Cache
  • Shared Cache

Static cache remains as it is without change while a session is running.

Dynamic Cache keeps updating frequently while a session is running.

How to update Source Definition?

There are two ways to update the source definition.

They are:

  • You can edit the existing source definition.
  • You can import a new source from the database.

How to implement Security measures using a Repository manager?

There are ways to implement security measures.

They are:

  • Folder permission within owners, groups, and users.
  • Locking (Read, Write, Retrieve, Save and Execute).
  • Repository Privileges viz.
    • Browse Repository.
    • Use the Workflow Manager (to create sessions and batches and set its properties).
    • Workflow Operator (to execute Session and batches).
    • Use Designer, Admin Repository (allows any user to create and manage repository).
    • Admin User (allows the user to create a repository server and set its properties).
    • SuperUser (all the privileges are granted to the user).

Enlist the advantages of INFORMATICA.

Being considered as the most favored Data Integration tool, there are multiple advantages that need to be enlisted.

They are:

  • It can effectively and very efficiently communicate and transform the data between different data sources like Mainframe, RDBMS, etc.
  • It is usually very faster, robust and easy learning than any other available platform.
  • With the help of INFORMATICA Workflow Monitor, jobs can be easily monitored, failed jobs can be recovered as well as slow running jobs can be pointed out.
  • It has features like easy processing of database information, data validation, migration of projects from one database to another, project development, iteration, etc.

Enlist a few areas or real-time situations where INFORMATICA is required.

Data Warehousing, Data Integration, Data migration & Application Migration from one platform to other platforms are a few examples of real-time usage areas.

Explain the ETL program with few examples.

Known for its uniqueness, ETL tool stands for Extract, Transform and Load tool which basically solves the purpose of extracting data and sending somewhere as defined by altering it.

To be very precise:

  • The extraction task is to collect the data from sources like the database, files, etc.
  • Transformation is considered as altering the data that has been received from the source.
  • Loading defines the process of feeding the altered data to the defined target.

To understand in a technical way, the ETL tool collects data from heterogeneous sources and alters to make it homogeneous so that it can be used further for analysis of the defined task.

Some basic program examples include:

  • Mappings derive the ETL process of reading data from their original sources where the mapping process is done in the designer.
  • Workflows consist of multiple tasks that are decided and designed in the Workflow Manager Window.
  • The task consists of a set of multiple steps that determine the sequence of actions to be performed during run-time.

Enlist the differences between Database and Data Warehouse.

Refer to the below table to understand the differences between the two:

Database Data Warehouse
It stores/records current and up to date which is used in daily operations It stores/analyze historical data which is used for information support on a long-term basis.
Its orientation is on Online Transactional processing which includes simple and short transactions. Its orientation is on Online Analytical Processing which includes complex queries.
It consists of detailed and primitive data where its view is flat relational. It consists of summarized a consolidated data where its view is multidimensional.
Low performance is observed for Analytical queries. Analytical queries are judged here as high performance.
Efficiency is determined by measuring transaction throughput. Efficiency is determined by measuring query throughput and response time.

Explain the features of the Connected and Unconnected lookup.

The features of Connected Lookup can be explained as follows:

  • There is a direct source of input from the pipeline for connected lookup.
  • It has active participation in data flow and both dynamic, as well as static cache, is used as the case is.
  • It caches all lookup columns and returns the default values as the output when the lookup condition does not match.
  • More than one column values can be returned to the output port.
  • Multiple output values are passed as well as output ports are linked to another transformation.
  • Connected lookup supports user-defined default values.

The features of unconnected lookup can be explained as follows:

  • Unconnected lookup uses static cache and its source of input is the result received from the output of LKP expression.
  • It caches only the lookup output ports and returns the value as NULL when the lookup condition does not match.
  • Only one column is returned from each port.
  • Only one output value is passed to another transformation.
  • User-defined default values are not supported by unconnected lookup.

During the running session, output files are created by the Informatica server. Enlist a few of them.

Mentioned below are the few output files:

  • Cache files: These files are created at the time of memory cache creation. For circumstances like Lookup transformation, Aggregator transformation, etc index and data cache files are created by the Informatica server.
  • Session detail file: As the name defines, this file contains load statistics like table name, rows rejected or written for each target in mapping and can be viewed in the monitor window.
  • Performance detail file: This file is a part of the session property sheet and contains session performance information in order to determine improvement areas.
  • INFORMATICA server log: The server creates a log for all status and error messages and can be seen in the home directory.
  • Session log file: For each session, the server creates a session log file depending on the set tracing level. The information that can be seen in log files about sessions can be:
    • Session initialization process,
    • SQL commands creation for reader and writer threads,
    • List of errors encountered and
    • Load summary
  • Post-session email: This helps in communicating the information about the session (session completed/session failed) to the desired recipients automatically.
  • Reject file: This file contains information about the data that has not been used/written to targets.
  • Control file: In case, when the session uses the external loader, the control file consists of loading instructions and data format about the target file.
  • Indicator file: This file basically contains a number that highlights the rows marked for INSERT/UPDATE/DELETE or REJECT.
  • Output file: The output file is created based on the file properties.

How to differentiate between the Active and Passive transformations?

To understand the difference between Active and Passive transformations, let us see its features which will explain the differences in a better way.

The action performed by Active transformations includes:

  • Changing the number of rows as per the requirement, that passes through the transformations. For Example, Filter transformation that deletes the row that does not meet the condition.
  • Changing the transaction boundary by setting the rollback and commit points. For Example, Transaction control transformation.
  • Changing the row type for INSERT/ UPDATE/DELETE or REJECT.

The action performed by Passive transformations includes:

  • The number of rows passing through the transformation is never changed.
  • The transaction boundary is set.
  • Row type is set.

Enlist the tasks for which Source qualifier transformation is used.

Source qualifier is considered as an active transformation that reads the rows that are involved in integration service within the running session. It determines the way in which the data is fetched from the source and is automatically added while adding a source to mapping.

The list of different tasks where source qualifier is used is as follows:

  • Rows filtering
  • Data sorting
  • Custom query creation
  • Joining tables from the same source
  • Selecting distinct values

Mention a few Power Centre client applications with their basic purpose?

Tasks like session and workflow creation, monitoring workflow progress, designing Mapplets, etc are performed by PowerCenter client applications.

Enlisted below is the list of PowerCenter client applications with their purpose:

  • Repository Manager: It is an administrative tool and its basic purpose is to manage repository folders, objects, groups, etc.
  • Administration Console: Here the service tasks like start/stop, backup/restore, upgrade/delete, etc are performed.
  • Power center designer: The designer consists of various designing tools that serve various purposes. These designing tools are:
    • Source Analyzer
    • Target Designer
    • Transformation Developer
    • Mapplet Designer
    • Mapping Manager
  • Workflow Manager: Its basic purpose is to define a set of instructions/workflow that is required to execute mappings designed in the designer. To help develop a workflow, there are tools available, namely Task developer, Workflow designer, Worklet designer.
  • Workflow Monitor: As the name suggests, the Workflow monitor, monitors the workflow or tasks. The list of windows available are:
    • Navigator window
    • Output window
    • Time window
    • Properties window
    • Task view
    • Gantt chart view

Enlist the various types of Transformations.

The various types of transformations are as follows:

  • Aggregator transformation
  • Expression transformation
  • Normalizer transformation
  • Rank transformation
  • Filter transformation
  • Joiner transformation
  • Lookup transformation
  • Stored procedure transformation
  • Sorter transformation
  • Update strategy transformation
  • XML source qualifier transformation
  • Router transformation
  • Sequence Generator transformation

What is Dynamic Cache?

INFORMATICA lookups can be categorized either as cached or uncached. In the case of Dynamic cache, rows can be inserted or deleted at the time of passing the rows and is synchronized with the target.  The cache memory is refreshed every time after delete/update operations within the session.

What is decode in Static cache?

Static cache is the one that is neither updated nor refreshed in the session run. It is the default cache and returns the value only when the return condition is true. In other cases, it returns Null value. Insert or Update cache cannot be performed in this case.

Mention a few advantages of Router transformation over Filter transformation.

Router transformation and Filter transformation are the same because both of them use a condition to test and filter the data.

However, the advantages of Router over filter transformation can be understood by the below-mentioned points.

Router Transformation:

  • It allows more than one test condition.
  • Provide the ability to test the same input data on multiple numbers of conditions.
  • In the case of mapping, input data is processed only once by the server and hence performance is improved.
  • Less complex and more efficient.
  • The records that fail the test condition are never blocked instead are passed on to the default group.

 Enlist some properties of sessions.

A session is available in the workflow manager and is configured by creating a session task. Within a mapping program, there can be multiple sessions and it can be either reusable or non-reusable.

Some of the properties of the session are as follows:

  • As per the requirement, session tasks can be run either concurrently or sequentially.
  • A session can be configured to analyze the performance.
  • To create or run a session task, it is required to have general information about Session name, schedule and integration service.
  • Other important property of session includes Session log file, the test load, error handling, commit interval, target properties, etc.

 

So, this brings us to the end of the Informatica Interview Questions blog. This Tecklearn ‘Top Informatica Interview Questions and Answers’ helps you with commonly asked questions if you are looking out for a job in Informatica or Data Warehousing Domain. If you wish to learn Informatica and build a career in Data Warehousing domain, then check out our interactive, Informatica Training, that comes with 24*7 support to guide you throughout your learning period.

https://www.tecklearn.com/course/informatica-training-and-certification/

Informatica Training

About the Course

Tecklearn’s Informatica Training will help you master Data Integration concepts such as ETL and Data Mining using Informatica PowerCenter. It will also make you proficient in Advanced Transformations, Informatica Architecture, Data Migration, Performance Tuning, Installation & Configuration of Informatica PowerCenter. You will get trained in Workflow Informatica, data warehousing, Repository Management and other processes.

Why Should you take Informatica Training?

  • Informatica professionals earn up to $130,000 per year – Indeed.com
  • GE, eBay, PayPal, FedEx, EMC, Siemens, BNY Mellon & other top Fortune 500 companies use Informatica.
  • Key advantages of Informatica PowerCenter: Excellent GUI interfaces for Administration, ETL Design, Job Scheduling, Session monitoring, Debugging, etc.

What you will Learn in this Course?

Informatica PowerCenter 10 – An Overview

  • Informatica & Informatica Product Suite
  • Informatica PowerCenter as ETL Tool
  • Informatica PowerCenter Architecture
  • Component-based development techniques

Data Integration and Data Warehousing Fundamentals

  • Data Integration Concepts
  • Data Profile and Data Quality Management
  • ETL and ETL architecture
  • Brief on Data Warehousing

Informatica Installation and Configuration

  • Configuring the Informatica tool
  • How to install the Informatica operational administration activities and integration services

Informatica PowerCenter Transformations

  • Visualize PowerCenter Client Tools
  • Data Flow
  • Create and Execute Mapping
  • Transformations and their usage
  • Hands On

Informatica PowerCenter Tasks & Workflows

  • Informatica PowerCenter Workflow Manager
  • Reusability and Scheduling in Workflow Manager
  • Workflow Task and job handling
  • Flow within a Workflow
  • Components of Workflow Monitor

Advanced Transformations

  • Look Up Transformation
  • Active and Passive Transformation
  • Joiner Transformation
  • Types of Caches
  • Hands On

More Advanced Transformations – SQL (Pre-SQL and Post-SQL)

  • Load Types – Bulk, Normal
  • Reusable and Non-Reusable Sessions
  • Categories for Transformation
  • Various Types of Transformation – Filter, Expression, Update Strategy, Sorter, Router, XML, HTTP, Transaction Control

Various Types of Transformation – Rank, Union, Stored Procedure

  • Error Handling and Recovery in Informatica
  • High Availability and Failover in Informatica
  • Best Practices in Informatica
  • Debugger
  • Performance Tuning

Performance Tuning, Design Principles & Caches

  • Performance Tuning Methodology
  • Mapping design tips & tricks
  • Caching & Memory Optimization
  • Partition & Pushdown Optimization
  • Design Principles & Best Practices

Informatica PowerCenter Repository Management

  • Repository Manager tool (functionalities, create and delete, migrate components)
  • PowerCenter Repository Maintenance

Informatica Administration & Security

  • Features of PowerCenter 10
  • Overview of the PowerCenter Administration Console
  • Integration and repository service properties
  • Services in the Administration Console (services, handle locks)
  • Users and groups

Command Line Utilities

  • Infacmd, infasetup, pmcmd, pmrep
  • Automate tasks via command-line programs

More Advanced Transformations – XML

  • Java Transformation
  • HTTP Transformation

 

Got a question for us? Please mention it in the comments section and we will get back to you.

 

 

0 responses on "Top Informatica Interview Questions and Answers"

Leave a Message

Your email address will not be published. Required fields are marked *