Introduction to Machine Coursera Week 2 Quiz
In this blog you will find the correct answer of the Coursera quiz Introduction to Machine Coursera Week 2 Quiz mixsaver always try to brings best blogs and best coupon codes
Week 2 Comprehensive
What does the equation for the loss function do conceptually?
- Mathematically define network outputs
- Penalize overconfidence
- Ignore historical statistical developments
- Reward indecision
What is overfitting?
- Overfitting refers to the fact that more complexity is always better, which is why deep learning works.
- Model complexity fits too well to training data and will not generalize in the real-world.
- Model complexity is perfectly matched to the data.
- Model complexity is not enough to capture the nuance of the data and will under-perform in the real-world.
Why should the test set only be used once?
- More than one use can lead to bias.
- More than one use can lead to overfitting.
- The model cannot learn anything new from subsequent uses.
- It is expensive to use more than once.
Which two of the following describe the purpose of a validation set?
- To estimate the performance of a model.
- To pick the best performing model.
- To test the performance in lieu of real-world data.
- To learn the model parameters.
How do we learn our network?
- Gradient descent
- Downhill skiing
- Monte Carlo simulation
- Analytically determine global minimum
What technique is used to minimize loss for a large data set?
- Newton’s method
- Taylor series expansion
- Stochastic gradient descent
- Gradient descent
Which of the following are benefits of stochastic gradient descent?
- With stochastic gradient descent, the update time does not scale with data size.
- Stochastic gradient descent finds the solution more accurately.
- Stochastic gradient descent can update many more times than gradient descent.
- Stochastic gradient descent gets near the solution quickly.
- Stochastic gradient descent finds a more exact gradient than gradient descent.
Why is gradient descent computationally expensive for large data sets?
- Large data sets do not permit computing the loss function, so a more expensive measure is used.
- Calculating the gradient requires looking at every single data point.
- Large data sets require deeper models, which have more parameters.
- There are too many local minima for an algorithm to find.
What are the two main benefits of early stopping?
- It helps save computation cost.
- It performs better in the real world.
- It improves the training loss.
- There is rigorous statistical theory on it.
Why are optimization and validation at odds?
- Optimization seeks to do as well as possible on a training set, while validation seeks to generalize to the real world.
- Optimization seeks to generalize to the real world, while validation seeks to do as well as possible on a validation set.
- Optimization seeks to do as well as possible on a training set, while validation seeks to do as well as possible on a validation set.
- They are not at odds—they have the same goal.