Data Science Methodology Course Answers

Week 1 – From Problem to Approach and From Requirements to Collection

Quiz - From Problem to Approach

Q1 3

Q2Q31 1 edited

Q4 edited 1

Q5

LAB-1 Assessment
Click here to view

2. From Requirements to Collection

Q1 4

Q2Q31 2 edited

Q4Q51 edited 1

Lab Assessment
Click here to view

Week 2 – From Understanding to Preparation and From Modeling to Evaluation

1. From Understanding to Preparation

Q1 5

Q2

Q3

Q4 1

Q5 1

LAB-From-Understanding-to-Preparation
Click here to view

2. From Modeling to Evaluation

Q1 6

Q2Q31 3

Q4Q51 1

LAB-From-Modeling-to-Evaluation Assessment
Click here to view

Week 3 – From Understanding to Preparation and From Modeling to Evaluation

1. From Deployment to Feedback/Quiz - From Deployment to Feedback

Q1 7 edited

Q2 1

Q3 1

Q4 2

Q5 2

Q6

Q7Q81 edited 1

Q9

Q10 edited

2. Final Assignment

Data Science Methodology Final Assignment: Emails

Q1. Which topic did you choose to apply the data science methodology to? 2 marks)

Ans: The topic that I have chosen to apply data science methodology to is Emails. I believe by automatically classifying emails, productivity can be increased drastically.

Next, you will play the role of the client and the data scientist.

Q2. Using the topic that you selected, complete the Business Understanding stage by coming up with a problem that you would like to solve and phrasing it in the form of a question that you will use data to answer. (3 marks)

You are required to:

Describe the problem, related to the topic you selected. Phrase the problem as a question to be answered using data. For example, using the food recipes use case discussed in the labs, the question that we defined was, “Can we automatically determine the cuisine of a given dish based on its ingredients?”.

Ans: Daily, we receive 100’s of emails every day and it may not be possible to look at all of them. We can determine which emails are worth taking a second look by organizing them into various categories like Promotions, Updates, Social, Order Receipts, Important/Not Important, Spam etc.

Our Question would be: “Is it possible to automatically determine the type/category of email based on the content of the email?”

Q3. Briefly explain how you would complete each of the following stages for the problem that you described in the Business Understanding stage, so that you are ultimately able to answer the question that you came up with. (5 marks):

  1. Analytic Approach
  2. Data Requirements
  3. Data Collection
  4. Data Understanding and Preparation
  5. Modeling and Evaluation

You can always refer to the labs as a reference with describing how you would complete each stage for your problem.

Ans:

  1. Analytic Approach:

A Yes/No answer can be applied to this problem so we can use a classification model.

  1. Data Requirements:

To create the model, we will require information regarding the sender including email address, domain, subject, language ,if the email has an attachment or not, and body of the email to see if it contains a list (presence of a list could help classify the email as an order).

  1. Data Collection:

We can gather all these data from email accounts from various email inboxes (Gmail, Hotmail, yahoo, outlook etc.). We can further merge the emails from the various inboxes to create a good dataset. Descriptive statistics & visualizations can be applied to the data set to assess the content quality and if we have the required information.

  1. Data Understanding and Preparation:

We should remove the redundant data from our dataset. This could be two copies of the same email sent to different inboxes. Since we are working with text, we need to perform text analysis. We should ensure proper groupings to help classify the emails properly. These groupings should be done based on certain keywords present in the subject or content of the email.

  1. Modeling and Evaluation:

We create the classification model. We evaluate the results of the model and see how much is classified correctly or incorrectly. Using this feedback we can tweak the model to add parameters and perform necessary changes to ensure that we’re getting the intended results.