1. With data mining, the best way to accomplish this is by setting aside some of your data in a vault to isolate it from the mining process; once the mining is complete, the results can be tested against the isolated data to confirm the model's _______.
A. Validity
B. Security
C. Integrity
D. None of above
Ans: A
2. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by _______ tools typical of decision support systems.
A. Introspective
B. Intuitive
C. Reminiscent
D. Retrospective
Ans: D
3. The technique that is used to perform these feats in data mining is called modeling, and this act of model building is something that people have been doing for a long time, certainly before the _______ of computers or data mining technology.
A. Access
B. Advent
C. Ascent
D. Avowal
Ans: B
4. Classification consists of examining the properties of a newly presented observation and assigning it to a predefined ________.
A. Object
B. Container
C. Subject
D. Class
Ans: D
5. During business hours, most ______ systems should probably not use parallel execution.
A. OLAP
B. DSS
C. Data Mining
D. OLTP
Ans: D
6. In contrast to statistics, data mining is ______ driven.
A. Assumption
B. Knowledge
C. Human
D. Database
Ans: B
7. Data mining derives its name from the similarities between searching for valuable business information in a large database, for example, finding linked products in gigabytes of store scanner data, and mining a mountain for a _________ of valuable ore.
A. Furrow
B. Streak
C. Trough
D. Vein
Ans: D
8. As opposed to the outcome of classification, estimation deal with __________ valued outcome.
A. Discrete
B. Isolated
C. Continuous
D. Distinct
Ans: A
9. The goal of ideal parallel execution is to completely parallelize those parts of a computation that are not constrained by data dependencies. The smaller the portion of the program that must be executed __________, the greater the scalability of the computation.
A. In Parallel
B. Distributed
C. Sequentially
D. None of above
Ans: C
10. Data mining evolve as a mechanism to cater the limitations of ________ systems to deal massive data sets with high dimensionality, new data types, multiple heterogeneous data resources etc.
A. OLTP
B. OLAP
C. DSS
D. DWH
Ans: A
11. The goal of ideal parallel execution is to completely parallelize those parts of a computation that are not constrained by data dependencies. The ______ the portion of the program that must be executed sequentially, the greater the scalability of the computation.
A. Larger
B. Smaller
C. Unambiguous
D. Superior
Ans: B
12. The goal of ________ is to look at as few blocks as possible to find the matching records(s).
A. Indexing
B. Partitioning
C. Joining
D. None of above
Ans: A
13. In nested-loop join case, if there are ‘M’ rows in outer table and ‘N’ rows in inner table, time complexity is
A. (M log N)
B. (log MN)
C. (MN)
D. (M + N)
Ans: C
14. Many data warehouse project teams waste enormous amounts of time searching in vain for a _________.
A. Silver Bullet
B. Golden Bullet
C. Suitable Hardware
D. Compatible Product
Ans: A
15. A dense index, if fits into memory, costs only ______ disk I/O access to locate a record by given key.
A. One
B. Two
C. lg (n)
D. n
Ans: A
16. All data is ________ of something real.
I An Abstraction
II A Representation
Which of the following option is true?
A. I Only
B. II Only
C. Both I & II
D. None of I & II
Ans: A
18. The key idea behind ___________ is to take a big task and break it into subtasks that can be processed concurrently on a stream of data inputs in multiple, overlapping stages of execution.
A. Pipeline Parallelism
B. Overlapped Parallelism
C. Massive Parallelism
D. Distributed Parallelism
Ans: A
19. Non uniform distribution, when the data is distributed across the processors, is called ______.
A. Skew in Partition
B. Pipeline Distribution
C. Distributed Distribution
D. Uncontrolled Distribution
Ans: A
20. The goal of ideal parallel execution is to completely parallelize those parts of a computation that are not constrained by data dependencies. The smaller the portion of the program that must be executed __________, the greater the scalability of the computation.
A. None of these
B. Sequentially
C. In Parallel
D. Distributed
Ans: B
21. Data mining is a/an __________ approach, where browsing through data using data mining techniques may reveal something that might be of interest to the user as information that was unknown previously.
A. Exploratory
B. Non-Exploratory
C. Computer Science
Ans: A
22. Data mining evolve as a mechanism to cater the limitations of ________ systems to dealmassive data sets with high dimensionality, new data types, multiple heterogeneous data resources etc.
A. OLTP
B. OLAP
C. DSS
D. DWH
Ans: A
23. ________ is the technique in which existing heterogeneous segments are reshuffled, relocated into homogeneous segments.
A. Clustering
B. Aggregation
C. Segmentation
D. Partitioning
Ans: A
24. To measure or quantify the similarity or dissimilarity, different techniques are available. Which of the following option represent the name of available techniques?
A. Pearson correlation is the only technique
B. Euclidean distance is the only technique
C. Both Pearson correlation and Euclidean distance
D. None of these
Ans: A
25. For a DWH project, the key requirement are ________ and product experience.
A. Tools
B. Industry
C. Software
D. None of these
Ans: B
26. Pipeline parallelism focuses on increasing throughput of task execution, NOT on _______ sub-task execution time.
A. Increasing
B. Decreasing
C. Maintaining
D. None of these
Ans: B
27. Focusing on data warehouse delivery only often end up _________.
A. Rebuilding
B. Success
C. Good Stable Product
D. None of these
Ans: D
28. Pakistan is one of the five major ________ countries in the world.
A. Cotton-growing
B. Rice-growing
C. Weapon Producing
Ans: A
29. ______ is a process which involves gathering of information about column through execution of certain queries with intention to identify erroneous records.
A. Data profiling
B. Data Anomaly Detection
C. Record Duplicate Detection
D. None of these
Ans: A
30. Relational databases allow you to navigate the data in ________ that is appropriate using the primary, foreign key structure within the data model.
A. Only One Direction
B. Any Direction
C. Two Direction
D. None of these
Ans: B
31. DSS queries do not involve a primary key
A. True
B. False
Ans: A
32. _______ contributes to an under-utilization of valuable and expensive historical data, and inevitably results in a limited capability to provide decision support and analysis.
A. The lack of data integration and standardization
B. Missing Data
C. Data Stored in Heterogeneous Sources
Ans: A
33. DTS allows us to connect through any data source or destination that is supported by ________.
A. OLE DB
B. OLAP
C. OLTP
D. Data Warehouse
Ans: A
34. If some error occurs, execution will be terminated abnormally and all transactions will be rolled back. In this case when we will access the database we will find it in the state that was before the ________.
A. Execution of package
B. Creation of package
C. Connection of package
Ans: A
35. The need to synchronize data upon update is called
A. Data Manipulation
B. Data Replication
C. Data Coherency
D. Data Imitation
Ans: C
36. Taken jointly, the extract programs or naturally evolving systems formed a spider web, also known as
A. Distributed Systems Architecture
B. Legacy Systems Architecture
C. Online Systems Architecture
D. Intranet Systems Architecture
Ans: B
37. It is observed that every year the amount of data recorded in an organization is
A. Doubles
B. Triples
C. Quartiles
D. Remains same as previous year
Ans: A
38. Pre-computed _______ can solve performance problems
A. Aggregates
B. Facts
C. Dimensions
Ans: A
39. The degree of similarity between two records, often measured by a numerical value between _______, usually depends on application characteristics.
A. 0 and 1
B. 0 and 10
C. 0 and 100
D. 0 and 99
Ans: A
40. The purpose of the House of Quality technique is to reduce ______ types of risk.
A. Two
B. Three
C. Four
D. All
Ans: A
41. NUMA stands for __________
A. Non-uniform Memory Access
B. Non-updateable Memory Architecture
C. New Universal Memory Architecture
Ans: A
42. There are many variants of the traditional nested-loop join. If the index is built as part of the query plan and subsequently dropped, it is called
A. Naive nested-loop join
B. Index nested-loop join
C. Temporary index nested-loop join
D. None of these
Ans: C
43. The Kimball s iterative data warehouse development approach drew on decades of experience to develop the ______.
A. Business Dimensional Lifecycle
B. Data Warehouse Dimension
C. Business Definition Lifecycle
D. OLAP Dimension
Ans: A
44. During the application specification activity, we also must give consideration to the organization of the applications.
A. True
B. False
Ans: A
45. The most recent attack is the ________ attack on the cotton crop during 2003- 04, resulting in a loss of nearly 0.5 million bales.
A. Boll Worm
B. Purple Worm
C. Blue Worm
D. Cotton Worm
Ans: A
46. The users of data warehouse are knowledge workers in other words they are_________ in the organization.
A. Decision maker
B. Manager
C. Database Administrator
D. DWH Analyst
Ans: A
47. _________ breaks a table into multiple tables based upon common column values.
A. Horizontal splitting
B. Vertical splitting
Ans: A
48. _____modeling technique is more appropriate for data warehouses.
A. entity-relationship
B. dimensional
C. physical
D. None of the given
Ans: B
49. Multi-dimensional databases (MDDs) typically use _______ formats to store pre-summarized cube structures.
A. SQL
B. proprietary file
C. Object oriented
D. Non- proprietary file
Ans: B
50. Data warehousing and on-line analytical processing (OLAP) are _______ elements of decision support system.
A. Unusual
B. Essential
C. Optional
D. None of the given
Ans: B
51. Analytical processing uses ______ , instead of record level access.
A. multi-level aggregates
B. Single-level aggregates
C. Single-level hierarchy
D. None of the Given
Ans: A
52. The divide & conquer cube partitioning approach helps alleviate the ______ limitations of MOLAP implementation.
A. Flexibility
B. Maintainability
C. Security
D. Scalability
Ans: D
53. Data Warehouse provides the best support for analysis while OLAP carries out the _________ task.
A. Mandatory
B. Whole
C. Analysis
D. Prediction
Ans: C
54. Data Warehouse provides the best support for analysis while OLAP carries out the _________ task.
A. Mandatory
B. Whole
C. Analysis
D. Prediction
Ans: C
55. Virtual cube is used to query two similar cubes by creating a third “virtual” cube by a join between two cubes.
A. True
B. False
Ans: A
56. Data warehousing and on-line analytical processing (OLAP) are _______ elements of decision support system.
A. Unusual
B. Essential
C. Optional
D. None of the given
E. Add comment
Ans: B