In this blog you will find the correct answer of the Coursera quiz Mastering Data Analysis in Excel Coursera week 6 Quiz mixsaver always try to brings best blogs and best coupon codes
 

 

Week 6 Quiz:

 

1 First Binary Classification Model 

You work for a bank as a business data analyst in the credit card risk modeling department Your bank conducted a bold experiment three years ago: for a single day it  quietly issued credit cards to everyone who applied, regardless of their credit risk, until  the bank had issued 600 cards without screening applicants  

After three years, 150, or 25percentage, of those card recipients defaulted: they failed to  pay back at least some of the money they owed However, the bank collected very  valuable proprietary data that it can now use to optimize its future card issuing process  

The bank initially collected six pieces of data about each person: 

· Age 

· Years at current employer 

· Years at current address 

· Income over the past year 

· Current credit card debt, and 

· Current automobile debt 

In addition, the bank now has a binary outcome: default = 1, and no default = 0  

Your first assignment is to analyze the data and create a binary classification model to  forecast future defaults  

You will combine data from the above six inputs to output a single score Use the  Soldier Performance spreadsheet for a simple example of combining multiple inputs  

Forecasting Soldier Performance  

The relative rank ordering of scores will determine the model’s effectiveness For  convenience in particular, so that you can use the AUC Calculator Spreadsheet you are  asked to use a scale for your score that has a maximum less than 3 5 and a minimum  greater than 3 5  

At first you are not told what your bank’s own best estimate for its cost per False  Negative (accepted applicant who becomes a defaulting customer) and False Positive  (rejected customer who would not have defaulted) classification  

Therefore, the best you can do is to design your model to maximize the Area Under the  ROC Curve, or AUC 

You are told that if your model is effective ( high enough AUC, not defined further) and  robust (again not defined, but in general this means relatively little decrease in AUC  across multiple sets of new data) then it may be adopted by the bank as its predictive  model for default, to determine which future applicants will be issued credit cards  

You are first given a Training Set of 200 out of the 600 people in the experiment The  Data For Final Project (below) has both the training set and test set you will need  

Design your model using the Training Set Standardized versions of the input data also  provided for your convenience You may combine the six inputs by adding them to, or  subtracting them from, each other, taking simple ratios, etc Exclude inputs that are not  helpful and then experiment with how to combine the most informative inputs  

Note that will need some of your quiz answers again later, so please write them down  and keep track of them as you go along  

Question: What is your model? Give it as a function of the two or more of the six inputs  For example: (Age + Years at Current Address) Income not a great model  

Your model should have at least two inputs  

What do you think? 

Your answer cannot be more than 10000 characters  

no


 

2 What is your model’s AUC on the Training Set? Use two digits to the right of the  decimal place  

Enter answer here 

70


 

3 Initial Assessment for Over fitting (testing your model on new data) 

Next test your model, without changing any parameters, on the Test Set of 200  additional applicants See the Test Set spreadsheet It is part of the Data For Final  Project (below) and has both the training and test set  

Data Final Project  

Hint: Make and use a second copy of the AUC Calculator Spreadsheet so that you can  compare Test Set and Training Set results easily 

AUC Calculator and Review of AUC Curve  

What is your model’s new AUC on the Test Set? Give two digits to the right of the  decimal place  

Enter answer here 

0 80


 

4 Finding the Cost Minimizing Threshold for your Model 

Now that you have, hopefully, developed your model to the point where it is relatively  robust across the training set and test set, your boss at the bank finally gives you its  current rough estimate of the bank’s average costs for each type of classification error  

 Note that all bank models here include only profits and losses within three years of  when a card is issued, so the impact of out years (years beyond 3) can be ignored  

Cost Per False Negative: dollar5000 

Cost Per False Positive: dollar2500 

For the 600 individuals that were automatically given cards without being classified, the  total cost of the experiment turned out to be 25percentagex(dollar5000)x600 or  dollar750,000 This is dollar1,250 per event  

Only models with lower cost per event than dollar1,250 should have any value  

Question: What is the threshold score on the Training Set data for your model that  minimizes Cost per Event? You will need this number to answer later questions  

Hint: Using theAUC Calculator Spreadsheet, identify which Column displays the same  cost per event (row 17) as the overall minimum cost per event shown in Cell J2 The  threshold is shown in row 10 of that Column What the threshold means is that at and  above this number everything is classified as a default  

AUC Calculator and Review of AUC Curve  

Enter answer here 

3 5


 

5 Finding the Minimum Cost Per Event

Question: Again referring only to the Training Set data, what is the overall minimum cost  per event? 

Hint: You will need this number to answer later questions If you used the AUC  Calculator, the overall minimum cost per event will be displayed in Cell J2  

Note: for Coursera to interpret your answer correctly you must give your answer as an  integer no decimals or dollar sign  

For Example enter dollar800 00 as 800  

Enter answer here 

600


 

6 Comparing the New Minimum Cost Per Event on Test Set Data 

When you compared AUC for the Training and Test Sets, all that is necessary is to look  up the two different values in Cell G8 But to get an accurate measure of the cost  savings using the original model on new data, you can not automatically use the new  threshold that results in the overall lowest cost per event on the Test Set  

Remember that your model is being tested for its ability to forecast but the new  optimal threshold will be known only after the outcomes for the entire Test Set are  known  

All you can use is the model you developed on the Training Set data and the threshold  from the Training Set that you should have recorded when answering Question 4  

Question: At that same threshold score (NOT the threshold score that would minimize  costs for the new Test Set, but the old threshold score that minimized costs on the  Training Set) what is the cost per event on the test set? 

Hint: Using the AUC Calculator Spreadsheet previously provided, locate the column on  the Training Set data that has the lowest cost per event That same column and  threshold in the Test Set copy of the AUC Calculator will have a new cost per event,  displayed in row 17 This is almost always higher than the minimum cost per event on  the Training Set, and also higher than what the minimal cost per event would be on the  Test Set, if one could know the new optimal threshold in advance This number is the  actual cost per event when applying the model and threshold developed with the  Training Set to the new, Test Set data  

Note: for Coursera to interpret your answer correctly you must give your answer as an  integer no decimals or dollar sign 

For Example enter dollar800 00 as 800 

Enter answer here 

700 00


 

7 Putting a Dollar Value on Your Model Plus the Data 

Assume your Test Set cost per event results from Question 6 are sustainable long  term  

Question: How much money does the bank save, per event, using your model and its  data inputs, instead of issuing credit cards to everyone who asks? 

Hint: the cost of issuing credit cards to everyone (no model, no forecast) has been  determined to be 25percentagexdollar5000 = dollar1,250 per event Dollar value of the  model plus data is the difference between dollar1,250 and your number  

Note: for Coursera to interpret your answer correctly you must give your answer as an  integer no decimals or dollar sign  

For Example enter dollar800 00 as 800  

Enter answer here 

100


 

8 Payback Period for Your Model 

Question: Given that it apparently cost the bank dollar750,000 to conduct the three  year experiment, if the bank processes 1000 credit card applicants per day on average,  how many days will it take to ensure future savings will pay back the bank’s initial  investment? 

Give number rounded to the nearest day (integer value)  

Hint: multiply your answer to Question 7 the cost savings per applicant by 1000 to  get the savings per day  

Enter answer here

3


 

9 Any model that is reducing uncertainty will have a True Positive Rate… 

? …Less than the Test Incidence (percentage of outcomes classified as default ) ? …Equal to the Test Incidence (percentage of outcomes classified as default ) ? …Greater than the Test Incidence (percentage of outcomes classified as default ) 

10 Given that the base rate of default in the population is 25percentage, any test that  is reducing uncertainty will have a Positive Predictive Value (PPV)… 

? …Less than 25 

? …Greater than 25 

? …Equal to 25 

11 Given that the base rate of default in the population is 25percentage, any test that  is reducing uncertainty will have a Negative Predictive Value (NPV)… 

? …Less than 75 

? …Greater than 75 

? Equal to 75 

12 Confusion Matrix Metrics To determine all performance metrics for a binary  classification, it is sufficient to have three values 

The Condition Incidence (here the default rate of 25percentage) 

The probability of True Positives (the True Positive rate multiplied by the Condition  Incidence) 

The Test Incidence (also called classification incidence the sum of the probability  of True Positives and False Positives) 

These three values can all be obtained from the AUC Calculator Spreadsheetand and  then used as inputs to the Information Gain Calculator Spreadsheet to determine all  other performance metrics  

AUC Calculator and Review of AUC Curve  

Information Gain Calculator  

Question: What is your model’s True Positive Rate? 

Save this answer as it will be needed again for Part 3 (Quiz 3)

Enter answer here 

30


 

13 Question: What is your models test incidence ? 

Save this answer as it will be needed again for Part 3 (Quiz 3) 

Enter answer here 

20


 

Mastering Data Analysis in Excel Quiz Answer 

Part 2: Should the Bank Buy Third Party Credit  Information? 

1 Introduction 

Part 2 is intended to illustrate how binary classification performance metrics make it  possible for you to put an exact value, in dollars per event, on new information that  relates to a predictive model  

Note that new information will be worth far more if it is compared to no forecasting  model rather than the state of partial knowledge available from the current model  Sellers of information (and data science consultants!) love to take credit for any  information gain they achieve over the base rate  

Very often some intermediate state of knowledge is already available for which no  additional spending is required Evaluating the realistic incremental financial gain from  new information, whether licensing a third party commercial database or collecting new data internally, is therefore of great practical value, as this sets an upper bound on what  your Company should be willing to pay to license or create the new information  

In this case study, your boss has been in discussions with an advanced machine  learning predictive analytics credit risk analytics company that claims to score  individual probability of default with very high information gain Let’s call the company  Eggertopia Eggertopia sales representatives claim their pre processed risk scores 

can achieve AUC values as high as 85 or even higher However, Eggertopia scores  are sold per event, and they are expensive! 

Your boss asks you to determine the incremental financial value to the bank of  purchasing Eggertopia risk scores on future credit card applicants  

Eggertopia agrees to apply its algorithms to generate credit scores for the 400  individuals in the Training and Test Sets Eggertopia scores do not need to be  combined with anything else to make a model However, since the scores range from  approximately 600 (best credit risk) to 4900 (most likely to default) they will need to be  standardized and adjusted to fit the 3 5 to 3 5 range of the AUC Calculator  Spreadsheet (below) 

AUC Calculator and Review of AUC Curve  

You will determine the sustainable AUC of the Eggertopia scores, the sustainable cost  per event, and the savings per event, when comparing Eggertopia data to the base rate  forecast  

You will then calculate the incremental savings per event if you compare use of  Eggertopia data to use of your current model developed in Part 1  

Question: What is the AUC of the Eggertopia Scores on the Training Set? Give your  answer to two digits to the right of the decimal point  

? 83 

? 85 

? 88 

? 95 

2 What is the optimum threshold on the training set to minimize the average cost per  test? 

? 15 

? 1 

? 25 

? 2 

3 What is the average cost per event at the Training Set optimum threshold? ? dollar640

? dollar600 

? dollar500 

? dollar540 

4 What is the AUC of the Eggertopia scores on the Test Set? 

? 85 

? 88 

? 80 

? 75 

5 Using the same threshold as used on the training set, what is the cost per event of  the Eggertopia scores on the Test Set? Round to the nearest dollar  

? dollar838 

? dollar803 

? dollar833 

? dollar823 

6 If the bank did not have your model, or any other way of forecasting default, what is  the maximum (break even) price per event that the bank could theoretically pay for  Eggertopia scores? In other words, what are Eggertopia’s scores’ absolute savings per  event? 

Hint: Calculate the difference between the cost per event at a 25percentage default  rate, and the cost per event using Eggertopia scores 

? dollar423 

? dollar425 

? dollar412 

? dollar418 

7 What is the True Positive rate of the forecasting model using Eggertopia Scores? 

? 70 

? 72 

? 76 

? 74

8 What is its Positive Predictive Value (PPV) of the forecasting model using Eggertopia  scores? 

Hint: To calculate the PPV, divide the portion of True Positives by the total number of  Positive Classifications Review confusion matrix definitions and letter designations on  the Information Gain Spreadsheet, PPV is defined at Cell G41 , obtain True Positive  and False Positive Rates from the AUC Calculator Spreadsheet, and use algebra to  solve  

Information Gain Calculator  

? 52 

? 50 

? 48 

? 54 

9 Incremental Financial Value of Eggertopia Scores 

You calculated a cost per event for your own predictive model on Test Set data to  answer Quiz 1 Part 1, Question 6  

Incremental Financial Value of Eggertopia Scores 

You calculated a cost per event for your own predictive model on Test Set data to  answer Quiz 1 Part 1, Question 6  

Question: Assuming that the performance of the Eggertopia model and your model both  remain stable on any future data (a big assumption), what is the maximum, or break  even, price that the bank could pay per score for Eggertopia, given that it already has  your model and data? 

no


 

Mastering Data Analysis in Excel Quiz Answer

Part 3: Comparing the Information Gain of  Alternative Data and Models 

1 Comparing the Information Gain of Eggertopia Scores and Your Model Both the Eggertopia Scores and your binary classification model can be thought of as  tools to reduce uncertainty about future default outcomes of credit card applicants  

Your own model, developed in Part 1, identifies dependencies between, on the one  hand, the six types on input data collected by the bank, and on the other hand, the  binary outcome default/no default  

If we assume that the dependencies identified by Eggertopia Scores and by your model  on the Test Set are stable and representative of all future data (a big assumption) we  can draw some further conclusions about how much information gain, or reduction in  uncertainty, is provided by each  

Definitions are given in the Information Gain Calculator Spreadsheet, provided below  

Information Gain Calculator  

Question: On your model’s Test Set results, what is the conditional entropy of default,  given your test classifications? 

Hint: you need your model’s true positive rate from Part 1, Question 12, and test  incidence proportion of events your model classifies as default from Part 1, question  13 Use the condition incidence of 25percentage and your model’s True Positive rate to  calculate the portion of TPs Then you have the inputs needed to use the Information  Gain Calculator Spreadsheet  

What do you think? 

Your answer cannot be more than 10000 characters  

no


 

2 Recall that the entropy of the original base rate, minus the conditional entropy of  default given your test classification, equals the Mutual Information between default and  the test  

I(XY) = H(X) H(XY)  

The population of potential credit card customers consists of 25percentage future  defaulters The base rate incidence of default ( 25, 75) has an uncertainty, or entropy,  of H( 25, 75) = 25xlog4 + 75xlog1 333 = 8113 bits 

Question: On your test set results, what is the Mutual Information, or information Gain,  in average bits per event? 

What do you think? 

Your answer cannot be more than 10000 characters  

no


 

3 Recall that Percentage Information Gain (P I G ) is the ratio of I(X;Y)/H(X)  

Question: on your Test Set results, what is the Percentage Information Gain (P I G ) of  your model? 

What do you think? 

Your answer cannot be more than 10000 characters  

no


 

4 Since you have, for you model on the Test Set, a savings per event, and a bits per  event (Mutual Information) you can calculate a savings per bit This is a powerful  concept, because it places a financial value directly on the information content of a  model (or additional data source, like the Eggertopia scores)  

Question: How many dollars does the bank save, for every bit of information gain  achieved by your model? 

What do you think? 

Your answer cannot be more than 10000 characters  

no


 

5 Information Gain of Eggertopia Scores over the Base Rate 

For questions in this section, assume your model and the data it uses are not available  the bank’s choice is between Eggertopia scores and the base rate  

Question: What is the Mutual Information of the Eggertopia Scores? 

In other words, on the Test Set, What is the information gain, in average bits per event,  over the base rate of ( 25, 75) offered by the Eggertopia Scores?

? 1305 bits per event 

? 1255 bits per event 

? 1243 bits per event 

? 1205 bits per event 

6 On the test set, what is the Eggertopia scores’ Percentage Information Gain (PIG)? 

? 14 85percentage 

? 15 35percentage 

? 15 25percentage 

? 13 95percentage 

7 If Eggertopia data were free, and your model was unavailable, what would the dollar  savings per bit of information extracted be? 

Dollar savings are dollar412 rounded to the nearest dollar from quiz 2, question 6 

? Value would be dollar427 per bit  

? Value would be dollar3,427 per bit  

? Value would be dollar3,627 per bit  

8 Incremental Information Gain of Eggertopia Scores Compared to Your Model and  Available Data (any answer scores) 

(For this section, assume your Model and the Data it uses are available)  

Question: What is the incremental information gain of the Eggertopia scores, over your  model from Part 1, in average bits per event, if any? 

What do you think? 

Your answer cannot be more than 10000 characters  

no


 

9 What is the maximum (break even) price the bank should pay for Eggertopia scores,  per score, if your model from Part 1 and data are already available? 

What do you think? 

Your answer cannot be more than 10000 characters 

no


 

10 At the above maximum (break even) price per score, what would be the value per  bit of incremental information gained from the Eggertopia scores? Give your answer in  dollar/bit  

What do you think? 

no


 

Mastering Data Analysis in Excel Quiz Answer Part 4: Modeling Profitability Instead of Default 

1 Modeling Profitability Instead of Default 

Modeling Profitability Level as a Continuous Output (Instead of Binary Classification  Default/No Default) 

Introduction 

Both your own model and the forecast based on Eggertopia scores are binary  classifications: they forecast one of just two outcomes: Default or No Default Your  boss is interested in the idea that it might be preferable instead to model and forecast  profits and losses as continuous values, using a a multivariate linear regression model  on the same six input variables This idea has arisen because the bank has been  reviewing individual profit and loss numbers for each customer over the three year  period and has made an interesting discovery: some defaulting customers carried so  much debt for so long, and paid so much interest on it, that they were profitable for the  bank even though they defaulted! Many customers who seem to have risky spending  behaviors are also among the most profitable for a lending business And, at the  opposite extreme,customers who always paid off their cards in full each month never  defaulted but were not very profitable: the bank barely broke even, or even lost money,  on its safest borrowers  

Your boss asks you to forecast each applicant’s expected profitability, in dollars,before  deciding whether or not to issue them a credit card He wants to know how reliable this 

type of forecast would be: what is the range above and below the point estimate that will  be correct 90percentage of the time? 

Although it might be possible to combine the six inputs in other ways, in the interests of  time and focusing on the key learning objectives, we will use only a simple linear  combination of the six input variables for Part 4 of this Project (You should not include  the Eggertopia Scores as an input variable)  

Question 1 is about the coefficients or betas used to combine the standardized inputs  to get the best fit line on standardized outputs on the Training Set We then use those  fixed betas to measure the observed residual error of the model on the Test Set  

Questions 2 through 6 concern the forecasts on the Test Set  

Questions 7 through 11 look at the Training Set results so that they can be compared  (for possible over fitting) against the Test Set Results  

Questions 12 through 14 are about the uncertainty that remains in a new individual  forecast of profitability  

Use the Excel Linest function on the six inputs and profitability output on the 200  Training Set applicants to calculate the coefficients (the betas ) that result in the best  fit line  

Question: Do you feel prepared to take this quiz? 

? Yes 

? No 

2 Question: What are your values for each beta on the Training Set? 

Age 

Years at current employer 

Years at current address 

Income over the past year 

Current credit card debt 

Current automobile debt 

? 01, 19, 07, 64, 06, 0 

? 01, 19, 07, 64, 06, 0 

? 01, 19, 07, 64, 06, 0 

3 For this question, use the Liner Regression Forecasting explanation and Excel  spreadsheet 

Question: What is the root mean square residual (the standard deviation of model  error) on Standardized output for the Test Set? 

? 5835 

? 8109 

? 0 6750 

? 6875 

? 3250 

4 For this question, use the Linear Regression Forecasting Explanation and  Spreadsheet  

Question: What is the observed correlation R on the Test Set? 

? 0 7378 

? 8095 

? 7590 

? 7332 

5 For this question, use the Linear Regression Forecasting explanation and Excel  spreadsheet  

Question: What is the Standard deviation of model error, in Dollars, for the Test Set? 

? dollar3,996 81 

? dollar3,411 80 

? dollar3,885 14 

? dollar3,379 36 

6 For this question, use the Linear Regression Forecasting explanation and Excel  spreadsheet: 

Question: What is the 90percentage confidence interval, in dollars, for the Test Set? 

? dollar6,390 49 above the point estimate, and dollar6,390 49 below the point  estimate 

? dollar5,611 91 above the point estimate, and dollar5,611 91 below the point estimate ? dollar6,574 17 above the point estimate, and dollar6,574 17 below the point estimate ? dollar5,558 55 above the point estimate, and dollar5,558 55 below the point estimate

7 What is the Percentage Information Gain (P I G ) on the Test Set? 

? 27 7percentage 

? 18 9percentage 

? 26 4percentage 

? 37 2percentage 

8 For this question, use the Linear Regression Forecasting explanation and Excel  spreadsheet: 

Question: What is the Correlation, R, of your model on the Training Set? 

? 7505 

? 7805 

? 8095 

9 For this question, use the Linear Regression Forecasting explanation and Excel  spreadsheet: 

You need to quantify the uncertainty in a regression model forecast of applicants’ future  profitability Assume that both the forecast profits and the errors have a Gaussian  distribution You will calculate the standard deviation of model error on standardized  data, the standard deviation in dollars of the model error, and the 90percentage  confidence interval for profitability estimates  

Question: What is the standard deviation of your model error on the standardized  Training Set output? 

? 587 

? 487 

? 487 

? 587 

10 For this question, use the Linear Regression Forecasting explanation and Excel  spreadsheet  

Question: What is the standard deviation of model error in dollars on the Training Set? xxThis may seem similar to question 5, but Q5 refers to the Test Set 

? dollar3,379 36 

? dollar4,379 36 

? dollar5,500 87 

? dollar4,312 91 

11 For this question, use the Linear Regression Forecasting explanation and Excel  spreadsheet  

Question: What is the 90percentage confidence interval, in dollars, on the Training Set? xxThis may seem similar to question 6, but Q6 refers to the Test Set  

? dollar5,558 55 

? dollar6,211 18 

? dollar5,328 93 

? dollar7,128 55 

12 For this question, use the Linear Regression Forecasting explanation and Excel  spreadsheet  

Question: What is the Percentage Information Gain (P I G ) on the Training Set? xxThis may seem similar to question 7, but Q7 refers to the Test Set  

? 36 5percentage 

? 37 5percentage 

? 41 4percentage 

? 32 4percentage 

13 Questions 13 through 15 use the same example applicant  

The following data are known about the sample applicant: 

Age: 42 00 

Years at Employer: 12 44 

Years at Address: 0 9 

Income: dollar 121,400

CC debt: 34,228 

Auto debt: 23,411 

To convert above inputs to standardized form, locate the Training Set Spreadsheet (first  bottom tab of workbook) in the Data for Final Project Workbook  

Data for Final Project  

Use the input means Cells C207:H207 and standard deviations Cells C209:H209  

Use the Training Set profitability mean dollar 1,905 51 and standard deviation dollar  5755 91 from the Profit and Loss (last bottom tab) Spreadsheet  

Use the Test Set standard deviation of error on standardized outputs of 6750 Question: What is the point estimate of profitability, in dollars? 

? dollar10,683 61 

? dollar11,109 61 

? dollar8,451 61 

? dollar10,683 61 

14 The following data are known about the sample applicant: 

Age: 42 00 

Years at Employer: 12 44 

Years at Address: 0 9 

Income: dollar121,400 

CC debt: 34,228 

Auto debt: 23,411 

To convert above inputs to standardized form, locate the Training Set Spreadsheet (first  bottom tab) in the Data for Final Project Workbook  

Use those means Cells C207:H207 and standard deviations Cells C209:H209  

Use the Training Set profitability mean dollar1,905 51 and standard deviation  dollar5755 91 from the Profit and Loss (last tab on bottom) Spreadsheet

Use the Test Set standard deviation of error on standardized outputs of 6750 Question: With 50percentage confidence, what is the range of profitability? 

? Range from dollar13,304 16 to dollar8,063 06  

? Range from dollar12,962 61 to dollar10,683 61 

? Range from dollar11,823 28 to dollar9,543 94 

? Range from dollar10,683 61 to dollar2,278 99 

15 The following data are known about the sample applicant: 

Age: 42 00 

Years at Employer: 12 44 

Years at Address: 0 9 

Income: dollar121,400 

CC debt: 34,228 

Auto debt: 23,411 

To convert above inputs to standardized form, locate the Training Set Spreadsheet  (bottom tab) in the Data for Final Project Workbook  

Use those means Cells C207:H207 and standard deviations Cells C209:H209  

Use the Training Set profitability mean dollar1,905 51 and standard deviation  dollar5755 91 from the Profit and Loss (bottom tab) Spreadsheet 

Use the Test Set standard deviation of error on standardized outputs of 6750  Question: With 99percentage confidence, what is the range of profitability? 

? Range from dollar10,683 61 to dollar8,704 31 

? Range from dollar19,388 27 to 10,683 61  

? Range from dollar16,388 27 to dollar7,704 31 

? Range from dollar20,691 32 to dollar675 90  

16 Comparing Test Set and Training Set Performance

Question 15: Between the Training Set and the Test Set, the dollar value of the  standard deviation of model error… 

? Increased by more than 50percentage, which leads to the conclusion of model over  fitting  

? Increased by more than 25percentage, which suggests possible model over fitting  ? Decreased by about 15percentage, which suggests a very strong model on Test Set  data  

? Increased by less than 20percentage, which suggests minimal model over fitting  

Mastering Data Analysis in Excel Quiz Answer 

Peer graded Assignment: Part 5: Modeling Credit  Card Default Risk and Customer Profitability 

Project Title x 

Give your project a descriptive title 

Modeling Credit Card Default Risk and  Customer Profitability


 

What is your predictive model? 

a Describe the arithmetic clearly so that another learner could implement your model  on new standardized input data if they wished  

b Give an example of the score you would assign the following applicant, whether they  would be approved or rejected for a credit card and why 

a) The main thing we should do is examine the connection of the  factors, and recognize which are the most important in the model  At that point, we should distinguish the boundaries or  coefficients that will go with the factors of said model, utilizing  the direct relapse procedure in Excel found in the course The  most applicable parametric qualities are: Years at a current 


 

business: 0 19 pay over the previous year: 0 08 Current  Visa obligation: 0 19 Current car obligation: 0 07 Then, with these coefficients and considering the relationship, we will make  our model Which is: SCORE = 0 19 x Years at a current boss  0 08 x salary over the previous year 0 19 x Current  Mastercard obligation 0 07 x Current vehicle obligation  b)Considering that by upgrading AUC, we got the limit for the  base expense/occasion as 0 25 A score beneath 0 04 for  instance will be resolved as a contrary test, which interprets as a  monetarily productive individual, who could be affirmed for a  Visa 


 

Give an example of the score you would assign the following applicant, whether they  would be approved or rejected for a credit card and why  

b)Considering that by streamlining AUC, we got the edge  for the base expense/occasion as 0 25 A score  underneath 0 04 for instance will be resolved as a  contrary test, which interprets as a monetarily beneficial  individual, who could be endorsed for a Mastercard 


 

What would the bank’s average profit per applicant be (net profits divided by 200) when  using your predictive model on the Training Set? 

The average profit per applicant will be 794dollar on the training  set 


 

What is the incremental financial value per applicant of your model over no model on  the Training Set?

The incremental financial value per applicant of your  model over no model on the Training Set is dollar654  41


 

Evaluate your model on the Test Set data How confident are you that your model does  not over fit the Training Set data? The only basis to evaluate over fitting is to give the  same metrics on the Test Set and Training Set, and compare them  

The model has an extraordinary  

performance in both information tests,  since the relationship is very much  applied, and the parametric coefficients  discovered are right, this infers that the  AUC is high and doesn’t change  

impressively, notwithstanding keeping up  the assessed costs per occasion 


 

Evaluate your model on the Test Set data How confident are you that your model does  not over fit the Training Set data? 

A Choose between three broad degrees of confidence: very somewhat or not at all  (Note that not at all is still an acceptable answer if you give persuasive reasons for  why you chose this answer)  

B Explain the evidence your degree of confidence is based upon Your explanation  should include the test set profits and training set profits per applicant  

How much confidence to have in the model must relate to the relationship between the  profits per applicant on the Training Set and the Test Set

a) Very 

b) Because the AUC in both information  tests is high and 

steady, it is away from of the proficiency of  the 

model Also, it keeps up a decent  

assessed benefit 

edge on the grounds that the expenses per  occasion are not essentially 

changed 


 

Important link: 

? Mastering Data Analysis in Excel Coursera week 1  Quiz 

? Mastering Data Analysis in Excel Coursera week 2  Quiz 

? Mastering Data Analysis in Excel Coursera week 3  Quiz 

? Mastering Data Analysis in Excel Coursera week 4  Quiz 

? Mastering Data Analysis in Excel Coursera week 5  Quiz