Introduction to Machine Learning Quiz Answer

Week-1 

Week 1 Comprehensive

1.
Question 1
Which of the following are necessary for supervised machine learning? (Choose all that are correct)

1 point

  • A model
  • Learning from data
  • Labeled training data
  • Human to teach the machine

2.
Question 2
What decision boundary can logistic regression provide?

1 point

  • Arbitrarily complex functions
  • Jagged edges
  • Smooth curves
  • Linear

3.
Question 3
What is the primary advantage of using multiple filters?

1 point

  • More complexity is always better.
  • This requires less compute power.
  • This allows the model to look for subtypes of the classification.
  • This is simpler to implement.

4.
Question 4
Which one of the following best describes transfer learning in the context of document analysis?

1 point

  • All parameters of the model are different between individuals.
  • Parameters at the bottom of the model are transferable across all people and documents, while the parameters at the top are different between individuals.
  • All parameters of the model are transferable across all people and documents.
  • Parameters at the top of the model are transferable across all people and documents, while the parameters at the bottom are different between individuals.

5.
Question 5
Given the following image of data classifications, which of the following models would you choose?

1 point

  • Logistic regression
  • Multilayer perceptron

6.
Question 6
What new feature did neural networks acquire in 2010?

1 point

  • A new computational platform: the GPU
  • A new application: image search
  • A new operation: convolution
  • A new name: Deep Learning

7.
Question 7
Which of the following is convolved with layer 2 features, or sub-motifs?

1 point

  • Layer 2 feature map
  • Layer 1 feature map
  • Layer 3 feature map

8.
Question 8
Which of the following gives the best conceptual meaning of convolution?

1 point

  • Surveying a feature map for high-level motif.
  • Selecting an atomic element from an image.
  • Stacking a collection of feature maps.
  • Shifting a filter to every location in an image.

9.
Question 9
What does transfer learning mean in the context of medical imaging?

1 point

  • Just as assigning categories to images in ImageNet required millions of images, so too does analyzing medical images require millions of labeled medical images.
  • Sufficient labeled radiological images can be used to learn all of the model parameters, so they can be used for ophthalmological or dermatological images.
  • Once the convolutional layers are learned from labeled medical images, the top layers can be inferred from the parameters found with data from ImageNet.
  • Weights of convolutional layers learned from ImageNet transfer to medical images, so we only need learn new parameters at the top of the network.

10.
Question 10
What is the primary advantage of having a deep architecture?

1 point

  • There is a higher probability that each motif is used in the classifier.
  • The model shares knowledge between motifs through their shared substructures.
  • A model can learn each top-level motif in isolation.
  • The parameters of a deep architecture are less expensive to compute.
Week 2 Comprehensive

1.
Question 1
What does the equation for the loss function do conceptually?

1 point

  • Mathematically define network outputs
  • Penalize overconfidence
  • Ignore historical statistical developments
  • Reward indecision

2.
Question 2
What is overfitting?

1 point

  • Overfitting refers to the fact that more complexity is always better, which is why deep learning works.
  • Model complexity fits too well to training data and will not generalize in the real-world.
  • Model complexity is perfectly matched to the data.
  • Model complexity is not enough to capture the nuance of the data and will under-perform in the real-world.

3.
Question 3
Why should the test set only be used once?

1 point

  • More than one use can lead to bias.
  • More than one use can lead to overfitting.
  • The model cannot learn anything new from subsequent uses.
  • It is expensive to use more than once.

4.
Question 4
Which two of the following describe the purpose of a validation set?

1 point

  • To estimate the performance of a model.
  • To pick the best performing model.
  • To test the performance in lieu of real-world data.
  • To learn the model parameters.

5.
Question 5
How do we learn our network?

1 point

  • Gradient descent
  • Downhill skiing
  • Monte Carlo simulation
  • Analytically determine global minimum

6.
Question 6
What technique is used to minimize loss for a large data set?

1 point

  • Newton’s method
  • Taylor series expansion
  • Stochastic gradient descent
  • Gradient descent

7.
Question 7
Which of the following are benefits of stochastic gradient descent?

1 point

  • With stochastic gradient descent, the update time does not scale with data size.
  • Stochastic gradient descent finds the solution more accurately.
  • Stochastic gradient descent can update many more times than gradient descent.
  • Stochastic gradient descent gets near the solution quickly.
  • Stochastic gradient descent finds a more exact gradient than gradient descent.

8.
Question 8
Why is gradient descent computationally expensive for large data sets?

1 point

  • Large data sets do not permit computing the loss function, so a more expensive measure is used.
  • Calculating the gradient requires looking at every single data point.
  • Large data sets require deeper models, which have more parameters.
  • There are too many local minima for an algorithm to find.

9.
Question 9
What are the two main benefits of early stopping?

1 point

  • It helps save computation cost.
  • It performs better in the real world.
  • It improves the training loss.
  • There is rigorous statistical theory on it.

10.
Question 10
Why are optimization and validation at odds?

1 point

  • Optimization seeks to do as well as possible on a training set, while validation seeks to generalize to the real world.
  • Optimization seeks to generalize to the real world, while validation seeks to do as well as possible on a validation set.
  • Optimization seeks to do as well as possible on a training set, while validation seeks to do as well as possible on a validation set.
  • They are not at odds—they have the same goal.
Week 3 Comprehensive

1.
Question 1
Which of the following indicates whether a doctor or machine is doing well at finding positive examples in a data set?

1 point

  • Positive Predictive Value
  • Likelihood Ratio
  • Sensitivity
  • Specificity

2.
Question 2
Which of the following is used to distinguish the false positive rate from the false negative rate?

1 point

  • Sensitivity
  • False Negative
  • Negative Predictive Value
  • Specificity

3.
Question 3
Which of the following is the best conceptual definition of one dimensional convolution?

1 point

  • “Inverting” of a shape, where the inversion matches a feature.
  • “Sliding” of two signals, where a matched feature gives a high value of convolution.
  • “Intertwining” of two signals, where one wraps around the other to form a feature.
  • “Distortion” of one signal, according to the feature shape

4.
Question 4
Which of the following can a user choose when designing a convolutional layer? (Choose all that are correct.)

1 point

  • Filter depth
  • Filter size
  • Filter number
  • Filter stride
  • Filter weights

5.
Question 5
What is a fully connected readout?

1 point

  • A layer with ten classifications.
  • A layer with connections to all feature maps.
  • The vectorization of a pooling layer.
  • A layer with a single neuron for each output class.

6.
Question 6
Why are nonlinear activation functions preferable?

1 point

  • Nonlinear activation functions are preferable because they are used in generalized linear models in statistics.
  • Nonlinear activation functions increase the functional capacity of the neural network by allowing the representation of nonlinear relationships between features in input.
  • Nonlinear activation functions are preferable because they have been used historically.
  • Nonlinear activation functions are NOT preferable to linear ones, as they lose information in systems with high variance.

7.
Question 7
Which of the following are benefits of pooling? (Choose all that are correct.)

1 point

  • Decreases bias.
  • Combats overfitting.
  • Vectorizes the data.
  • Encourages translational invariance.
  • Reduces computational complexity.

8.
Question 8
How are parameters that minimize the loss function found in practice?

1 point

  • Fractal geometry
  • Gradient descent
  • Simplex algorithm
  • Stochastic gradient descent

9.
Question 9
Which of the following is an advantage of hierarchical representation of image features?

1 point

  • Eliminating bias.
  • Decreasing the computational complexity.
  • Better leveraging all training data.
  • Decreasing variance in the model.

10.
Question 10
Why does transfer learning work?

1 point

  • Top-level features are specialized for a particular task, while low-level features are universal to all images.
  • All layers of filters can be learned by studying the mammalian receptive fields.
  • Low-level features are specialized for a particular task, while top-level features are universal to all images.
  • All images are composed of pixels with three color channels.
Week 4 Comprehensive

1.
Question 1
What is meant by “word vector”?

1 point

  • The latitude and longitude of the place a word originated.
  • A vector of numbers associated with a word.
  • Assigning a corresponding number to each word.
  • A vector consisting of all words in a vocabulary.

2.
Question 2
Which word is a synonym for “word vector”?1 point

  • Norm
  • Array
  • Embedding
  • Stack

3.
Question 3
What is the term for a set of vectors, with one vector for each word in the vocabulary?

1 point

  • Space
  • Array
  • Codebook
  • Embedding

4.
Question 4
What is natural language processing?

1 point

  • Making natural text conform to formal language standards.
  • Translating natural text characters to unicode representations.
  • Translating human-readable code to machine-readable instructions.
  • Taking natural text and making inferences and predictions.

5.
Question 5
What is the goal of learning word vectors?

1 point

  • Find the hidden or latent features in a text.
  • Labelling a text corpus, so a human doesn’t have to do it.
  • Determine the vocabulary in the codebook.
  • Given a word, predict which words are in its vicinity.

6.
Question 6
What function is the generalization of the logistic function to multiple dimensions?

1 point

  • Hyperbolic tangent function
  • Exponential log likelihood
  • Squash function
  • Softmax function

7.
Question 7
What is the continuous bag of words (CBOW) approach?

1 point

  • Vectors for the neighborhood of words are averaged and used to predict word n.
  • Word n is used to predict the words in the neighborhood of word n.
  • Word n is learned from a large corpus of words, which a human has labeled.
  • The code for word n is fed through a CNN and categorized with a softmax.

8.
Question 8
What is the Skip-Gram approach?

1 point

  • Word n is used to predict the words in the neighborhood of word n.
  • The code for word n is fed through a CNN and categorized with a softmax.
  • Word n is learned from a large corpus of words, which a human has labeled.
  • Vectors for the neighborhood of words are averaged and used to predict word n.

9.
Question 9
What is the goal of the recurrent neural network?

1 point

  • Learn a series of images that form a video.
  • Predict words more efficiently than Skip-Gram.
  • Synthesize a sequence of words.
  • Classify an unlabeled image.

10.
Question 10
Which model is the state-of-the-art for text synthesis?

1 point

  • Long short-term memory
  • CNN
  • Multilayer perceptron
  • CBOW