1. With data mining, the best way to accomplish this is by setting aside some of your data in a vault to isolate it from the mining process; once the mining is complete, the results can be tested against the isolated data to confirm the model's _______.

A. Validity

B. Security

C. Integrity

D. None of above

Ans:  A

 

2. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by _______ tools typical of decision support systems.

A. Introspective

B. Intuitive

C. Reminiscent

D. Retrospective

Ans:   D

 

3. The technique that is used to perform these feats in data mining is called modeling, and this act of model building is something that people have been doing for a long time, certainly before the _______ of computers or data mining technology.

A. Access

B. Advent

C. Ascent

D. Avowal

Ans:   B

 

4. Classification consists of examining the properties of a newly presented observation and assigning it to a predefined ________.

A. Object

B. Container

C. Subject

D. Class

Ans:   D

 

5. During business hours, most ______ systems should probably not use parallel execution.

A. OLAP

B. DSS

C. Data Mining

D. OLTP

Ans:   D

 

6. In contrast to statistics, data mining is ______ driven.

A. Assumption

B. Knowledge

C. Human

D. Database

Ans:   B

 

7. Data mining derives its name from the similarities between searching for valuable business information in a large database, for example, finding linked products in gigabytes of store scanner data, and mining a mountain for a _________ of valuable ore.

A. Furrow

B. Streak

C. Trough

D. Vein

Ans:   D

 

8. As opposed to the outcome of classification, estimation deal with __________ valued outcome.

A. Discrete

B. Isolated

C. Continuous

D. Distinct

Ans:  A

 

9. The goal of ideal parallel execution is to completely parallelize those parts of a computation that are not constrained by data dependencies. The smaller the portion of the program that must be executed __________, the greater the scalability of the computation.

A. In Parallel

B. Distributed

C. Sequentially

D. None of above

Ans:   C

 

10. Data mining evolve as a mechanism to cater the limitations of ________ systems to deal massive data sets with high dimensionality, new data types, multiple heterogeneous data resources etc.

A. OLTP

B. OLAP

C. DSS

D. DWH

Ans:  A

 

11. The goal of ideal parallel execution is to completely parallelize those parts of a computation that are not constrained by data dependencies. The ______ the portion of the program that must be executed sequentially, the greater the scalability of the computation.

A. Larger

B. Smaller

C. Unambiguous

D. Superior

Ans:  B

 

12. The goal of ________ is to look at as few blocks as possible to find the matching records(s).

A. Indexing

B. Partitioning

C. Joining

D. None of above

Ans:  A

 

13. In nested-loop join case, if there are ‘M’ rows in outer table and ‘N’ rows in inner table, time complexity is

A. (M log N)

B. (log MN)

C. (MN)

D. (M + N)

Ans:  C

 

14. Many data warehouse project teams waste enormous amounts of time searching in vain for a _________.

A. Silver Bullet

B. Golden Bullet

C. Suitable Hardware

D. Compatible Product

Ans:  A

 

15. A dense index, if fits into memory, costs only ______ disk I/O access to locate a record by given key.

A. One

B. Two

C. lg (n)

D. n

Ans:   A

 

16. All data is ________ of something real.

I An Abstraction

II A Representation

Which of the following option is true?

A. I Only

B. II Only

C. Both I & II

D. None of I & II

Ans:  A

 

18. The key idea behind ___________ is to take a big task and break it into subtasks that can be processed concurrently on a stream of data inputs in multiple, overlapping stages of execution.

A. Pipeline Parallelism

B. Overlapped Parallelism

C. Massive Parallelism

D. Distributed Parallelism

Ans:  A

 

19. Non uniform distribution, when the data is distributed across the processors, is called ______.

A. Skew in Partition

B. Pipeline Distribution

C. Distributed Distribution

D. Uncontrolled Distribution

Ans:  A

 

20. The goal of ideal parallel execution is to completely parallelize those parts of a computation that are not constrained by data dependencies. The smaller the portion of the program that must be executed __________, the greater the scalability of the computation.

A. None of these

B. Sequentially

C. In Parallel

D. Distributed

Ans:  B

 

21. Data mining is a/an __________ approach, where browsing through data using data mining techniques may reveal something that might be of interest to the user as information that was unknown previously.

A. Exploratory

B. Non-Exploratory

C. Computer Science

Ans:  A

 

22. Data mining evolve as a mechanism to cater the limitations of ________ systems to dealmassive data sets with high dimensionality, new data types, multiple heterogeneous data resources etc.

A. OLTP

B. OLAP

C. DSS

D. DWH

Ans:   A

 

23. ________ is the technique in which existing heterogeneous segments are reshuffled, relocated into homogeneous segments.

A. Clustering

B. Aggregation

C. Segmentation

D. Partitioning

Ans:   A

 

24. To measure or quantify the similarity or dissimilarity, different techniques are available. Which of the following option represent the name of available techniques?

A. Pearson correlation is the only technique

B. Euclidean distance is the only technique

C. Both Pearson correlation and Euclidean distance

D. None of these

Ans:  A

 

25. For a DWH project, the key requirement are ________ and product experience.

A. Tools

B. Industry

C. Software

D. None of these

Ans:  B

 

26. Pipeline parallelism focuses on increasing throughput of task execution, NOT on _______ sub-task execution time.

A. Increasing

B. Decreasing

C. Maintaining

D. None of these

Ans:   B

 

27. Focusing on data warehouse delivery only often end up _________.

A. Rebuilding

B. Success

C. Good Stable Product

D. None of these

Ans:  D

 

28. Pakistan is one of the five major ________ countries in the world.

A. Cotton-growing

B. Rice-growing

C. Weapon Producing

Ans:  A

 

29. ______ is a process which involves gathering of information about column through execution of certain queries with intention to identify erroneous records.

A. Data profiling

B. Data Anomaly Detection

C. Record Duplicate Detection

D. None of these

Ans:  A

 

30. Relational databases allow you to navigate the data in ________ that is appropriate using the primary, foreign key structure within the data model.

A. Only One Direction

B. Any Direction

C. Two Direction

D. None of these

Ans:  B

 

31. DSS queries do not involve a primary key

A. True

B. False

Ans:  A

 

32. _______ contributes to an under-utilization of valuable and expensive historical data, and inevitably results in a limited capability to provide decision support and analysis.

A. The lack of data integration and standardization

B. Missing Data

C. Data Stored in Heterogeneous Sources

Ans:  A

 

33. DTS allows us to connect through any data source or destination that is supported by ________.

A. OLE DB

B. OLAP

C. OLTP

D. Data Warehouse

Ans:  A

 

34. If some error occurs, execution will be terminated abnormally and all transactions will be rolled back. In this case when we will access the database we will find it in the state that was before the ________.

A. Execution of package

B. Creation of package

C. Connection of package

Ans:  A

 

35. The need to synchronize data upon update is called

A. Data Manipulation

B. Data Replication

C. Data Coherency

D. Data Imitation

Ans:  C

 

36. Taken jointly, the extract programs or naturally evolving systems formed a spider web, also known as

A. Distributed Systems Architecture

B. Legacy Systems Architecture

C. Online Systems Architecture

D. Intranet Systems Architecture

Ans:  B

 

37. It is observed that every year the amount of data recorded in an organization is

A. Doubles

B. Triples

C. Quartiles

D. Remains same as previous year

Ans:  A

 

38. Pre-computed _______ can solve performance problems

A. Aggregates

B. Facts

C. Dimensions

Ans:  A

 

39. The degree of similarity between two records, often measured by a numerical value between _______, usually depends on application characteristics.

A. 0 and 1

B. 0 and 10

C. 0 and 100

D. 0 and 99

Ans:  A

 

40. The purpose of the House of Quality technique is to reduce ______ types of risk.

A. Two

B. Three

C. Four

D. All

Ans:  A

 

41. NUMA stands for __________

A. Non-uniform Memory Access

B. Non-updateable Memory Architecture

C. New Universal Memory Architecture

Ans:  A

 

42. There are many variants of the traditional nested-loop join. If the index is built as part of the query plan and subsequently dropped, it is called

A. Naive nested-loop join

B. Index nested-loop join

C. Temporary index nested-loop join

D. None of these

Ans:  C

 

43. The Kimball s iterative data warehouse development approach drew on decades of experience to develop the ______.

A. Business Dimensional Lifecycle

B. Data Warehouse Dimension

C. Business Definition Lifecycle

D. OLAP Dimension

Ans:  A

 

44. During the application specification activity, we also must give consideration to the organization of the applications.

A. True

B. False

Ans:  A

 

45. The most recent attack is the ________ attack on the cotton crop during 2003- 04, resulting in a loss of nearly 0.5 million bales.

A. Boll Worm

B. Purple Worm

C. Blue Worm

D. Cotton Worm

Ans:  A

 

46. The users of data warehouse are knowledge workers in other words they are_________ in the organization.

A. Decision maker

B. Manager

C. Database Administrator

D. DWH Analyst

Ans:  A

 

47. _________ breaks a table into multiple tables based upon common column values.

A. Horizontal splitting

B. Vertical splitting

Ans:  A

 

48. _____modeling technique is more appropriate for data warehouses.

A. entity-relationship

B. dimensional

C. physical

D. None of the given

Ans:  B

 

49. Multi-dimensional databases (MDDs) typically use _______ formats to store pre-summarized cube structures.

A. SQL

B. proprietary file

C. Object oriented

D. Non- proprietary file

Ans:  B

 

50. Data warehousing and on-line analytical processing (OLAP) are _______ elements of decision support system.

A. Unusual

B. Essential

C. Optional

D. None of the given

Ans:  B

 

51. Analytical processing uses ______ , instead of record level access.

A. multi-level aggregates

B. Single-level aggregates

C. Single-level hierarchy

D. None of the Given

Ans:  A

 

52. The divide & conquer cube partitioning approach helps alleviate the ______ limitations of MOLAP implementation.

A. Flexibility

B. Maintainability

C. Security

D. Scalability

Ans:  D

 

53. Data Warehouse provides the best support for analysis while OLAP carries out the _________ task.

A. Mandatory

B. Whole

C. Analysis

D. Prediction

Ans:  C

 

54. Data Warehouse provides the best support for analysis while OLAP carries out the _________ task.

A. Mandatory

B. Whole

C. Analysis

D. Prediction

Ans:  C

 

55. Virtual cube is used to query two similar cubes by creating a third “virtual” cube by a join between two cubes.

A. True

B. False

Ans:  A

 

56. Data warehousing and on-line analytical processing (OLAP) are _______ elements of decision support system.

A. Unusual

B. Essential

C. Optional

D. None of the given 

E. Add comment

Ans:  B