ETL Testing Interview Questions & Answers

1) What is ETL?

ETL stands for extract, transform, and load. It can consolidate the scattered data for any organization while working with different departments. It can very well handle the data coming from different departments.

2) What are the ETL Testing Operations?

• Verify the data is transforming correctly according to business requirements and rules.

• Make sure that all projected data is loaded into the data warehouse without any data loss and truncation.

• Make sure that ETL application appropriately rejects, replaces with default values and reports invalid data.

• Make sure that data is loaded in data warehouse within prescribed and expected time frames to confirm improved performance and scalability

3) What are the key benefits of ETL Testing?

• Minimize the risk of the data loss

• Data security

• Data Accuracy

• Reporting efficiency

4) What is Data warehouse?

A Data warehouse is a subject oriented, integrated, time variant, non volatile collection of data in support of management's decision making process.

5) What is the difference between Data Warehousing and Data Mining?

Data mining, the operational data is analyzed using statistical techniques and clustering techniques to find the hidden patterns and trends. So, the data mines do some kind of summarization of the data and can be used by data warehouses for faster analytical processing for business intelligence.

Data warehouse may make use of a data mine for analytical processing of the data in a faster way.

6) What are the types of Data Warehouse applications?

• Info processing

• Analytical processing

• Data mining

7) What is real time data-wearhousing?

In real time data-warehousing, the warehouse is updated every time the system performs a transaction.

It reflects the real time business data.

This means that when the query is fired in the warehouse, the state of the business at that time will be returned.

8) Explain the use lookup tables and Aggregate tables?

An aggregate table contains summarized view of data.

Lookup tables, using the primary key of the target, allow updating of records based on the lookup condition.

9) Define slowly changing dimensions (SCD)?

SCD are dimensions whose data changes very slowly.

eg: city or an employee.

This dimension will change very slowly.

The row of this data in the dimension can be either replaced completely without any track of old record OR a new row can be inserted, OR the change can be tracked.

10) What is cube grouping?

A transformer built set of similar cubes is known as cube grouping. They are generally used in creating smaller cubes that are based on the data in the level of dimension.

11) What is data wearhousing?

A data warehouse can be considered as a storage area where relevant data is stored irrespective of the source.

Data warehousing merges data from multiple sources into an easy and complete form.

12) What is virtual Data Wearhousing?

A virtual data warehouse provides a collective view of the completed data. It can be considered as a logical data model of the containing metadata

13) What is data modeling and data mining?

Data Modeling is a technique used to define and analyze the requirements of data that supports organization’s business process. In simple terms, it is used for the analysis of data objects in order to identify the relationships among these data objects in any business.

Data Mining is a technique used to analyze datasets to derive useful insights/information. It is mainly used in retail, consumer goods, telecommunication and financial organizations that have a strong consumer orientation in order to determine the impact on sales, customer satisfaction and profitability.

14) Compare ETL & Manual development?

ETL - The process of extracting data from multiple sources.(ex. flat files, XML, COBOL, SAP etc) is more simpler with the help of tools.

Manual - Loading the data other than flat files and oracle table need more effort.

ETL - High and clear visibility of logic.

Manual - complex and not so user friendly visibility of logic.

ETL - Contains Meta data and changes can be done easily.

Manual - No Meta data concept and changes needs more effort.

ETL- Error hadling,log summary and load progress makes life easier for developer and maintainer.

Manual - need maximum effort from maintenance point of view.

ETL - Can handle Historic data very well.

Manual - as data grows the processing time degrades.

These are some differences b/w manual and ETL development.

15) What are the modules in Power Mart?

1. PowerMart Designer

2. Server

3. Server Manager

4. Repository

5. Repository Manager

16) What are the different Lookup methods used in Informatica?

1. Connected lookup

2. Unconnected lookup

Connected lookup will receive input from the pipeline and sends output to the pipeline and can return any number of values. It does not contain return port.

Unconnected lookup can return only one column. it contain return port.

17) Do we need an ETL tool? When do we go for the tools in the market?

ETL Tool:

It is used to Extract(E) data from multiple source systems(like RDBMS, Flat files, Mainframes, SAP, XML etc) transform(T) them based on Business requirements and Load(L) in target locations.(like tables, files etc).

Need of ETL Tool:

An ETL tool is typically required when data scattered across different systems.(like RDBMS, Flat files, Mainframes, SAP, XML etc).

18) What is the difference between etl tool and olap tools?

ETL tool is ment for extraction data from the legecy systems and load into specified data base with some process of cleansing data.

ex: Informatica,data stage ....etc

OLAP is ment for Reporting purpose in OLAP data available in Multidirectional model. so that you can write simple query to extract data from the data base.

ex: Business objects,Cognos....etc

19) What is data cleaning?

Data cleaning is also known as data scrubbing.

Data cleaning is a process which ensures the set of data is correct and accurate. Data accuracy and consistency, data integration is checked during data cleaning. Data cleaning can be applied for a set of records or multiple sets of data which need to be merged.