Troubleshooting Debugging Technique Coursera Quiz

Practice Quiz: Binary Searching a Problem

Total points: 5
Score: 100%

Question 1

You have a list of computers that a script connects to in order to gather SNMP traffic and calculate an average for a set of metrics. The script is now failing, and you do not know which remote computer is the problem. How would you troubleshoot this issue using the bisecting methodology?

Run the script with the first half of the computers.
Run the script with last computer on the list.
Run the script with first computer on the list
Run the script with two-thirds of the computers.

Bisecting when troubleshooting starts with splitting the list of computers and choosing to run the script with one half.

Question 2

The find_item function uses binary search to recursively locate an item in the list, returning True if found, False otherwise. Something is missing from this function. Can you spot what it is and fix it? Add debug lines where appropriate, to help narrow down the problem.

def find_item(list, item):
  #Returns True if the item is in the list, False if not.
  if len(list) == 0:
    return False ## OK

  #Is the item in the center of the list?
  middle = len(list)//2 ## OK
  if list[middle] == item:
    return True ## OK

  #Is the item in the first half of the list? 
  ## if item < list[middle]: ## Incorrect
  if item in list[:middle]:
  #Call the function with the first half of the list
    return find_item(list[:middle], item) ## OK
  else:
    #Call the function with the second half of the list
    return find_item(list[middle+1:], item) ## OK

  return False

#Do not edit below this line - This code helps check your work!
list_of_names = ["Parker", "Drew", "Cameron", "Logan", "Alex", "Chris", "Terry", "Jamie", "Jordan", "Taylor"]

print(find_item(list_of_names, "Alex")) ## True
print(find_item(list_of_names, "Andrew")) ## False
print(find_item(list_of_names, "Drew")) ## True
print(find_item(list_of_names, "Jared")) ## False

Output:

True
False
True
False

Question 3

The binary_search function returns the position of key in the list if found, or -1 if not found. We want to make sure that it’s working correctly, so we need to place debugging lines to let us know each time that the list is cut in half, whether we’re on the left or the right. Nothing needs to be printed when the key has been located.

For example, binary_search([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 3) first determines that the key, 3, is in the left half of the list, and prints “Checking the left side”, then determines that it’s in the right half of the new list and prints “Checking the right side”, before returning the value of 2, which is the position of the key in the list.

Add commands to the code, to print out “Checking the left side” or “Checking the right side”, in the appropriate places.

def binary_search(list, key):
    #Returns the position of key in the list if found, -1 otherwise.

    #List must be sorted:
    list.sort()
    left = 0
    right = len(list) - 1

    while left <= right:
        middle = (left + right) // 2

        if list[middle] == key:
            return middle
        if list[middle] > key:
            print("Checking the left side")
            right = middle - 1
        if list[middle] < key:
            print("Checking the right side")
            left = middle + 1
    return -1 

print(binary_search([10, 2, 9, 6, 7, 1, 5, 3, 4, 8], 1))
"""Should print 2 debug lines and the return value:
Checking the left side
Checking the left side
0
"""

print(binary_search([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 5))
"""Should print no debug lines, as it's located immediately:
4
"""

print(binary_search([10, 9, 8, 7, 6, 5, 4, 3, 2, 1], 7))
"""Should print 3 debug lines and the return value:
Checking the right side
Checking the left side
Checking the right side
6
"""

print(binary_search([1, 3, 5, 7, 9, 10, 2, 4, 6, 8], 10))
"""Should print 3 debug lines and the return value:
Checking the right side
Checking the right side
Checking the right side
9
"""

print(binary_search([5, 1, 8, 2, 4, 10, 7, 6, 3, 9], 11))
"""Should print 4 debug lines and the "not found" value of -1:
Checking the right side
Checking the right side
Checking the right side
Checking the right side
-1
"""

Output:

Checking the left side
Checking the left side
0
4
Checking the right side
Checking the left side
Checking the right side
6
Checking the right side
Checking the right side
Checking the right side
9
Checking the right side
Checking the right side
Checking the right side
Checking the right side
-1

Question 4

When trying to find an error in a log file or output to the screen, what command can we use to review, say, the first 10 lines?

wc
tail
head
bisect

The head command will print the first lines of a file, 10 lines by default.

Question 5

The best_search function compares linear_search and binary_search functions, to locate a key in the list, and returns how many steps each method took, and which one is the best for that situation. The list does not need to be sorted, as the binary_search function sorts it before proceeding (and uses one step to do so). Here, linear_search and binary_search functions both return the number of steps that it took to either locate the key, or determine that it’s not in the list. If the number of steps is the same for both methods (including the extra step for sorting in binary_search), then the result is a tie. Fill in the blanks to make this work.

def linear_search(list, key):
    #Returns the number of steps to determine if key is in the list 

    #Initialize the counter of steps
    steps=0
    for i, item in enumerate(list):
        steps += 1
        if item == key:
            break
    return steps 

def binary_search(list, key):
    #Returns the number of steps to determine if key is in the list 

    #List must be sorted:
    list.sort()

    #The Sort was 1 step, so initialize the counter of steps to 1
    steps=1

    left = 0
    right = len(list) - 1
    while left <= right:
        steps += 1
        middle = (left + right) // 2

        if list[middle] == key:
            break
        if list[middle] > key:
            right = middle - 1
        if list[middle] < key:
            left = middle + 1
    return steps 

def best_search(list, key):
    steps_linear = linear_search(list, key) 
    steps_binary = binary_search(list, key) 
    results = "Linear: " + str(steps_linear) + " steps, "
    results += "Binary: " + str(steps_binary) + " steps. "
    if (steps_linear < steps_binary):
        results += "Best Search is Linear."
    elif (steps_linear > steps_binary):
        results += "Best Search is Binary."
    else:
        results += "Result is a Tie."

    return results

print(best_search([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 1))
#Should be: Linear: 1 steps, Binary: 4 steps. Best Search is Linear.

print(best_search([10, 2, 9, 1, 7, 5, 3, 4, 6, 8], 1))
#Should be: Linear: 4 steps, Binary: 4 steps. Result is a Tie.

print(best_search([10, 9, 8, 7, 6, 5, 4, 3, 2, 1], 7))
#Should be: Linear: 4 steps, Binary: 5 steps. Best Search is Linear.

print(best_search([1, 3, 5, 7, 9, 10, 2, 4, 6, 8], 10))
#Should be: Linear: 6 steps, Binary: 5 steps. Best Search is Binary.

print(best_search([5, 1, 8, 2, 4, 10, 7, 6, 3, 9], 11))
#Should be: Linear: 10 steps, Binary: 5 steps. Best Search is Binary.

Output:

Linear: 1 steps, Binary: 4 steps. Best Search is Linear.
Linear: 4 steps, Binary: 4 steps. Result is a Tie.
Linear: 4 steps, Binary: 5 steps. Best Search is Linear.
Linear: 6 steps, Binary: 5 steps. Best Search is Binary.
Linear: 10 steps, Binary: 5 steps. Best Search is Binary.

Practice Quiz: Introduction to Debugging

Total points: 5
Score: 100%

Question 1

What is part of the final step when problem solving?

Documentation
Long-term remediation
Finding the root cause
Gathering information

Long-term remediation is part of the final step when problem solving.

Question 2

Which tool can you use when debugging to look at library calls made by the software?

top
strace
tcpdump
ltrace

the ltrace tool is used to look at library calls made by the software.

Question 3

What is the first step of problem solving?

Prevention
Gathering information
Long-term remediation
Finding the root cause

Gathering information is the first step taken when problem solving.

Question 4

What software tools are used to analyze network traffic to isolate problems? (Check all that apply)

tcpdump
wireshark
strace
top

The tcpdump tool is a powerful command-line analyzer that captures or “sniffs” TCP/IP packets.

Wireshark is an open source tool for profiling network traffic and analyzing TCP/IP packets.

Question 5

The strace (in Linux) tool allows us to see all of the _ our program has made.

Network traffic
Disk writes
System calls
Connection requests

The strace command shows us all the system calls our program made. System calls are the calls that the programs running in our computer make to the running kernel.

Practice Quiz: Understanding the Problem

Total points: 5
Score: 100%

Question 1

When a user reports that an “application doesn’t work,” what is an appropriate follow-up question to gather more information about the problem?

Is the server plugged in?
Why do you need the application?
Do you have a support ticket number?
What should happen when you open the app?

Asking the user what an expected result should be will help you gather more information to understand and isolate the problem.

Question 2

What is a heisenbug?

The observer effect.
A test environment.
The root cause.
An event viewer.

The observer effect is when just observing a phenomenon alters the phenomenon.

Question 3

The compare_strings function is supposed to compare just the alphanumeric content of two strings, ignoring upper vs lower case and punctuation. But something is not working. Fill in the code to try to find the problems, then fix the problems.

import re
def compare_strings(string1, string2):
  #Convert both strings to lowercase 
  #and remove leading and trailing blanks
  string1 = string1.lower().strip()
  string2 = string2.lower().strip()

  #Ignore punctuation
  ## punctuation = r"[.?!,;:-']"
  punctuation = r"[.?!,;:'-]"
  string1 = re.sub(punctuation, r"", string1)
  string2 = re.sub(punctuation, r"", string2)

  #DEBUG CODE GOES HERE
  """
  change r"[.?!,;:-']" with r"[.?!,;:'-]" in punctuation variable 
  because of pattern error (Character range is out of order ('-' pattern))
  """

  return string1 == string2

print(compare_strings("Have a Great Day!", "Have a great day?")) ## True
print(compare_strings("It's raining again.", "its raining, again")) ## True
print(compare_strings("Learn to count: 1, 2, 3.", "Learn to count: one, two, three.")) ## False
print(compare_strings("They found some body.", "They found somebody.")) ## False

Output:

True
True
False
False

Question 4

How do we verify if a problem is still persisting or not?

Restart the device or server hardware
Attempt to trigger the problem again by following the steps of our reproduction case
Repeatedly ask the user
Check again later

If we can recreate the circumstances of the issue, we can verify whether the problem continues to occur.

Question 5

The datetime module supplies classes for manipulating dates and times, and contains many types, objects, and methods. You’ve seen some of them used in the dow function, which returns the day of the week for a specific date. We’ll use them again in the next_date function, which takes the date_string parameter in the format of “year-month-day”, and uses the add_year function to calculate the next year that this date will occur (it’s 4 years later for the 29th of February during Leap Year, and 1 year later for all other dates). Then it returns the value in the same format as it receives the date: “year-month-day”.

Can you find the error in the code? Is it in the next_date function or the add_year function? How can you determine if the add_year function returns what it’s supposed to? Add debug lines as necessary to find the problems, then fix the code to work as indicated above.

import datetime
from datetime import date

def add_year(date_obj):
  try:
    new_date_obj = date_obj.replace(year = date_obj.year + 1)
  except ValueError:
    ## This gets executed when the above method fails, 
    ## which means that we're making a Leap Year calculation
    new_date_obj = date_obj.replace(year = date_obj.year + 4)
  return new_date_obj ## OK

def next_date(date_string):
  ## Convert the argument from string to date object
  date_obj = datetime.datetime.strptime(date_string, r"%Y-%m-%d")
  next_date_obj = add_year(date_obj)
  ## print(f'{date_obj} | {next_date_obj}') ## OK

  ## Convert the datetime object to string, 
  ## in the format of "yyyy-mm-dd"
  ## next_date_string = next_date_obj.strftime("yyyy-mm-dd")
  next_date_string = next_date_obj.strftime("%Y-%m-%d")
  return next_date_string

today = date.today()  ## Get today's date
print(next_date(str(today))) 
## Should return a year from today, unless today is Leap Day

print(next_date("2021-01-01")) ## Should return 2022-01-01
print(next_date("2020-02-29")) ## Should return 2024-02-29

Output:

2020-08-03 00:00:00 | 2021-08-03 00:00:00
2021-08-03
2021-01-01 00:00:00 | 2022-01-01 00:00:00
2022-01-01
2020-02-29 00:00:00 | 2024-02-29 00:00:00
2024-02-29

Introduction to Debugging

Video: What is debugging?

What is the general description of debugging?

Fixing bugs in the code of the application
Fixing problems in the system running the application
Fixing issues related to hardware
Fixing configuration issues in the software

Generally, debugging means fixing bugs in the code of the application.

Video: Problem Solving Steps

What is the second step of problem solving?

Short-term remediation
Long-term remediation
Finding the root cause
Gathering information

Finding the root cause is the second step taken when problem solving.

Video: Silently Crashing Application

Which command can you use to scroll through a lot of text output after tracing system calls of a script?

strace -o fail.strace ./script.py
strace ./script.py | less
strace ./script.py
strace ./script.py -o fail.strace

Piping the less command allows you to scroll through a lot of text output.

Understanding the Problem

Video: “It Doesn’t Work”

When a user reports that a “website doesn’t work,” what is an appropriate follow-up question you can use to gather more information about the problem?

What steps did you perform?
Is the server receiving power?
What server is the website hosted on?
Do you have support ticket number?

Asking the user what steps they performed will help you gather more information in order to better understand and isolate the problem.

Video: Creating a Reproduction Case

A program fails with an error, “No such file or directory.” You create a directory at the expected file path and the program successfully runs. Describe the reproduction case you’ll submit to the program developer to verify and fix this error.

A report explaining to open the program without the specific directory on the computer
A report with application logs exported from Windows Event Viewer?
A report listing the contents of the new directory
A report listing the differences between strace and ltrace logs.

This a specific way to reproduce the error and verify it exists. The developer can work on fixing it right away.

Video: Finding the Root Cause

Generally, understanding the root cause is essential for _?

Purchasing new devices
Producing test data
Avoiding interfering with users
Providing the long-term resolution

Understanding the root cause is essential for providing the long-term resolution.

Video: Dealing with Intermittent Issues

What sort of software bug might we be dealing with if power cycling resolves a problem?

Poorly managed resources
A heisenbug
Logs filling up
A file remains open

Power cycling releases resources stored in cache or memory, which gets rid of the problem.

Binary Searching a Problem

What is binary search?

When searching for more than one element in a list, which of the following actions should you perform first in order to search the list as quickly as possible?

Sort the list
Do a binary search
Do a linear search
Use a base three logarithm

A list must be sorted first before it can take advantage of the binary search algorithm.

Video: Applying Binary Search in Troubleshooting

When troubleshooting an XML configuration file that’s failed after being updated for an application, what would you bisect in the code?

File format
File quantity
Folder location
Variables

The list of variables in the file can be bisected or tested in halves continuously until a single root cause is found.

Peer Graded Assessment

Click here to view

SRC

Click here to view

2. Slowness

Practice Quiz: Slow Code

Total points: 5
Score: 100%

Question 1

Which of the following is NOT considered an expensive operation?

Parsing a file
Downloading data over the network
Going through a list
Using a dictionary

Using a dictionary is faster to look up elements than going through a list.

Question 2

Which of the following may be the most expensive to carry out in most automation tasks in a script?

Loops
Lists
Vector
Hash

Loops that run indefinitely, and include subtasks to complete before moving on can be very expensive for most automation tasks.

Question 3

Which of the following statements represents the most sound advice when writing scripts?

Aim for every speed advantage you can get in your code
Use expensive operations often
Start by writing clear code, then speed it up only if necessary
Use loops as often as possible

If we don’t notice any slowdown, then there’s little point trying to speed it up.

Question 4

In Python, what is a data structure that stores multiple pieces of data, in order, which can be changed later?

A hash
Dictionaries
Lists
Tuples

Lists are efficient, and if we are either iterating through the entire list or are accessing elements by their position, lists are the way to go.

Question 5

What command, keyword, module, or tool can be used to measure the amount of time it takes for an operation or program to execute? (Check all that apply)

time
kcachegrind
cProfile
break

We can precede the name of our commands and scripts with the “time” shell builtin and the shell will output execution time statistics when they complete.

The kcachegrind tool is used for profile data visualization that, if we can insert some code into the program, can tell us how long execution of each function takes.

cProfile provides deterministic profiling of Python programs, including how often and for how long various parts of the program executed.

Practice Quiz: Understanding Slowness

Total points: 5
Score: 100%

Question 1

Which of the following will an application spend the longest time retrieving data from?

CPU L2 cache
RAM
Disk
The network

An application will take the longest time trying to retrieve data from the network.

Question 2

Which tool can you use to verify reports of ‘slowness’ for web pages served by a web server you manage?

The top tool
The ab tool
The nice tool
The pidof tool

The ab tool is an Apache Benchmark tool used to figure out how slow a web server is based on average timing of requests.

Question 3

If our computer running Microsoft Windows is running slow, what performance monitoring tools can we use to analyze our system resource usage to identify the bottleneck? (Check all that apply)

Performance Monitor
Resource Monitor
Activity Monitor
top

Performance Monitor is a system monitoring program that provides basic CPU and memory resource measurements in Windows.

Resource Monitor is an advanced resource monitoring utility that provides data on hardware and software resources in real time.

Question 4

Which of the following programs is likely to run faster and more efficiently, with the least slowdown?

A program with a cache stored on a hard drive
A program small enough to fit in RAM
A program that reads files from an optical disc
A program that retrieves most of its data from the Internet

Since RAM access is faster than accessing a disk or network, a program that can fit in RAM will run faster.

Question 5

What might cause a single application to slow down an entire system? (Check all that apply)

A memory leak
The application relies on a slow network connection
Handling files that have grown too large
Hardware faults

Memory leaks happen when an application doesn’t release memory when it is supposed to.

If files generated by the application have grown overly large, slowdown will occur if the application needs to store a copy of the file in RAM in order to use it.

Practice Quiz: When Slowness Problems Get Complex

Total points: 5
Score: 100%

Question 1

Which of the following can cache database queries in memory for faster processing of automated tasks?

Threading
Varnish
Memcached
SQLite

Memchached is a caching service that keeps most commonly accessed database queries in RAM.

Question 2

What module specifies parts of a code to run in separate asynchronous events?

Threading
Futures
Asyncio
Concurrent

Asyncio is a module that lets you specify parts of the code to run as separate asynchronous events.

Question 3

Which of the following allows our program to run multiple instructions in parallel?

Threading
Swap space
Memory addressing
Dual SSD

Threading allows a process to split itself into parallel tasks.

Question 4

What is the name of the field of study in computer science that concerns itself with writing programs and operations that run in parallel efficiently?

Memory management
Concurrency
Threading
Performance analysis

Concurrency in computer science is the ability of different sections or units of a program, algorithm, or problem to be executed out of order or in partial order, without impacting the final result.

Question 5

What would we call a program that often leaves our CPU with little to do as it waits on data from a local disk and the Internet?

Memory-bound
CPU-bound
User-bound
I/O bound

If our program mainly finds itself waiting on local disks or the network, it is I/O bound.

Understanding Slowness

Video: Why is my computer slow?

When addressing slowness, what do you need to identify?

The bottleneck
The device
The script
The system

The bottleneck could be the CPU time, or time spent reading data from disk.

Video: How Computers Use Resources

After retrieving data from the network, how can an application access that same data quicker next time?

Use the swap
Create a cache
Use memory leak
Store in RAM

A cache stores data in a form that’s faster to access than its original form.

Video: Possible Causes of Slowness

A computer becomes sluggish after a few days, and the problem goes away after a reboot. Which of the following is the possible cause?

Files are growing too large.
A program is keeping some state while running.
Files are being read from the network.
Hard drive failure.

A program keeping a state without any change can slow down a computer up until it is rebooted.

Slow Code

Video: Writting Efficient Code

What is the cProfile module used for?

For parsing files.
To analyze a C program.
To count functions calls
To remove unnecessary functions.

The cProfile module is used to count how many times functions are called, and how long they run.

Video: Using the Right Data Structures

Which of the following has values associated with keys in Python?

A hash
A dictionary
A HashMap
An Unordered Map

Python uses a dictionary to store values, each with a specific key

Video: Expensive Loops

Your Python script searches a directory, and runs other tasks in a single loop function for 100s of computers on the network. Which action will make the script the least expensive?

Read the directory once
Loop the total number of computers
Service only half of the computers
Use more memory

Reading the directory once before the loop will make the script less expensive to run.

Video: Keeping Local Results

Your script calculates the average number of active user sessions during business hours in a seven-day period. How often should a local cache be created to give a good enough average without updating too often?

Once a week
Once a day
Once a month
Once every 8 hours

A local cache for every day can be accessed quickly, and processed for a seven-day average calculation.

Video: Slow Script with Expensive Loop

You use the time command to determine how long a script runs to complete its various tasks. Which output value will show the time spent doing operations in the user space?

Real
Wall-clock
Sys
User

The user value is the time spent doing operations in the user space.

Understanding Slowness

Video: Why is my computer slow?

When addressing slowness, what do you need to identify?

The bottleneck
The device
The script
The system

The bottleneck could be the CPU time, or time spent reading data from disk.

Video: How Computers Use Resources

After retrieving data from the network, how can an application access that same data quicker next time?

Use the swap
Create a cache
Use memory leak
Store in RAM

A cache stores data in a form that’s faster to access than its original form.

Video: Possible Causes of Slowness

A computer becomes sluggish after a few days, and the problem goes away after a reboot. Which of the following is the possible cause?

Files are growing too large.
A program is keeping some state while running.
Files are being read from the network.
Hard drive failure.

A program keeping a state without any change can slow down a computer up until it is rebooted.

When Slowness Problems Get Complex

Video: Parallelizing Operations

A script is _ if you are running operations in parallel using all available CPU time.

I/O bound
Threading
CPU bound
Asyncio

A script is CPU bound if you’re running operations in parallel using all available CPU time.

Video: Slowly Growing in Complexity

You’re creating a simple script that runs a query on a list of product names of a very small business, and initiates automated tasks based on those queries. Which of the following would you use to store product names?

SQLite
Microsoft SQL Server
Memcached
CSV file

A simple CSV file is enough to store a list of product names.

Video: Dealing with Complex Slow Systems

A company has a single web server hosting a website that also interacts with an external database server. The web server is processing requests very slowly. Checking the web server, you found the disk I/O has high latency. Where is the cause of the slow website requests most likely originating from?

Local disk
Remote database
Slow Internet
Database index

The local disk I/O latency is causing the application to wait too long for data from disk.

Video: Using Threads to Make Things Go Faster

Which module makes it possible to run operations in a script in parallel that makes better use of CPU processing time?

Executor
Futures
Varnish
Concurrency

The futures module makes it possible to run operations in parallel using different executors.

When Slowness Problems Get Complex

Video: Parallelizing Operations

A script is _ if you are running operations in parallel using all available CPU time.

I/O bound
Threading
CPU bound
Asyncio

A script is CPU bound if you’re running operations in parallel using all available CPU time.

Video: Slowly Growing in Complexity

SQLite
Microsoft SQL Server
Memcached
CSV file

A simple CSV file is enough to store a list of product names.

Video: Dealing with Complex Slow Systems

Local disk
Remote database
Slow Internet
Database index

The local disk I/O latency is causing the application to wait too long for data from disk.

Video: Using Threads to Make Things Go Faster

Which module makes it possible to run operations in a script in parallel that makes better use of CPU processing time?

Executor
Futures
Varnish
Concurrency

The futures module makes it possible to run operations in parallel using different executors.

Graded Assessment

Click here to view

SRC

Click here to view

3. Crashing Program

Practice Quiz: Code that Crashes

Total points: 5
Score: 100%

Question 1

Which of the following will let code run until a certain line of code is executed?

Breakpoints
Watchpoints
Backtrace
Pointers

Breakpoints let code run until a certain line of code is executed.

Question 2

Which of the following is NOT likely to cause a segmentation fault?

Wild pointers
Reading past the end of an array
Stack overflow
RAM replacement

Segmentation fault is not commonly caused by a new RAM card in the system.

Question 3

A common error worth keeping in mind happens often when iterating through arrays or other collections, and is often fixed by changing the less than or equal sign in our for loop to be a strictly less than sign. What is this common error known as?

Segmentation fault
backtrace
The No such file or directory error
Off-by-one error

The Off-by-one bug, often abbreviated as OB1, frequently happens in computer programming when an iterative process iterates one time too many or too little.

Question 4

A very common method of debugging is to add print statements to our code that display information, such as contents of variables, custom error statements, or return values of functions. What is this type of debugging called?

Backtracking
Log review
Printf debugging
Assertion debugging

Printf debugging originated in name with using the printf() command in C++ to display debug information, and the name stuck. This type of debugging is useful in all languages.

Question 5

When a process crashes, the operating system may generate a file containing information about the state of the process in memory to help the developer debug the program later. What are these files called?

Log files
Core files
Metadata file
Cache file

Core files (or core dump files) record an image and status of a running process, and can be used to determine the cause of a crash.

Practice Quiz: Handling Bigger Incidents

Total points: 5
Score: 100%

Question 1

Which of the following would be effective in resolving a large issue if it happens again in the future?

Incident controller
Postmortem
Rollbacks
Load balancers

A postmortem is a detailed document of an issue which includes the root cause and remediation. It is effective on large, complex issues.

Question 2

During peak hours, users have reported issues connecting to a website. The website is hosted by two load balancing servers in the cloud and are connected to an external SQL database. Logs on both servers show an increase in CPU and RAM usage. What may be the most effective way to resolve this issue with a complex set of servers?

Use threading in the program
Cache data in memory
Automate deployment of additional servers
Optimize the database

Automatically deploying additional servers to handle the loads of requests during peak hours can resolve issues with a complex set of servers.

Question 3

It has become increasingly common to use cloud services and virtualization. Which kind of fix, in particular, does virtual cloud deployment speed up and simplify?

Deployment of new servers
Application code fixes
Log reviewing
Postmortems

Virtualization makes deployment of VM servers in the cloud a fast and relatively simple process.

Question 4

What should we include in our postmortem? (Check all that apply)

Root cause of the issue
How we diagnosed the problem
How we fixed the problem
Who caused the problem

In order to learn about the problem and how it happens in general, we should include what caused it this time.

Awesome! By clarifying how we identified the problem, it can be more easily identified in the future.

In order to share with reviewers how the issue was resolved, it’s important to include what we did to solve it this time.

Question 5

In general, what is the goal of a postmortem? (Check all that apply)

To identify who is at fault
To allow prevention in the future
To allow speedy remediation of similar issues in the future
To analyze all system bugs

By describing the cause of the problem, we can learn to avoid the same circumstances in the future.

By describing in detail how we fixed the problem, we can help others or ourselves fix the same problem more quickly in the future.

Practice Quiz: Why Programs Crash

Total points: 5
Score: 100%

Question 1

When using Event Viewer on a Windows system, what is the best way to quickly access specific types of logs?

Export logs
Create a custom view
Click on System Reports
Run the head command

The Create Custom View action is used to filter through logs based on certain criteria.

Question 2

An employee runs an application on a shared office computer, and it crashes. This does not happen to other users on the same computer. After reviewing the application logs, you find that the employee didn’t have access to the application. What log error helped you reach this conclusion?

“No such file or directory”
“Connection refused”
“Permission denied”
“Application terminated”

In this case, the “Permission denied” error means that the user didn’t have access to the application executable in order to run it.

Question 3

What tool can we use to check the health of our RAM?

Event Viewer
S.M.A.R.T. tools
memtest86
Process Monitor

memtest86 and memtest86+ are memory analysis software programs designed to test and stress test the random access memory of an x86 architecture system for errors, by writing test patterns to most memory addresses, then reading data back and checking for errors.

Question 4

You’ve just finished helping a user work around an issue in an application. What important but easy-to-forget step should we remember to do next?

Fix the code
Report the bug to the developers
Reinstall the program
Change the user’s password

If there is a repeatable error present in a program, it is proper etiquette to report the bug in detail to the developer.

Question 5

A user is experiencing strange behavior from their computer. It is running slow and lagging, and having momentary freeze-ups that it does not usually have. The problem seems to be system-wide and not restricted to a particular application. What is the first thing to ask the user as to whether they have tried it?

Adding more RAM
Reinstalling Windows
Identified the bottleneck with a resource monitor
Upgrade their HDD to an SSD

The first step is identifying the root cause of the problem. Resource monitors such as Activity Monitor (MacOS), top (Linux and MacOS) or Resource Monitor (Windows) can help us identify whether our bottleneck is CPU-based or memory-based.

Why Programs Crash

Video: System That Crash

A user reported an application crashes on their computer. You log in and try to run the program and it crashes again. Which of the following steps would you perform next to reduce the scope of the problem?

Check the health of the RAM
Switch the hard drive into another computer
Check the health of the hard drive
Review application logs

Reviewing logs is the next best step to determine if logs reveal any reason for the crash.

Video: Understanding Crashing Applications

Where should you look for application logs on a Windows system?

The /var/log directory
The .xsession-errors file
The Console app
The Event Viewer app

The Event Viewer app contains logs on a Windows system.

What to do when you can’t fix the program?

An application fails in random intervals after it was installed on a different operating system version. What can you do to work around the issue?

Use a wrapper
Use a container
Use a watchdog
Use an XML format

A container allows the application to run in its own environment without interfering with the rest of the system.

Video: Internal Server Error

Where is a common location to view configuration files for a web application running on a Linux server?

/etc/
/var/log/
/srv/
/

The /etc directory will contain the application folder that stores configuration files.

Code that Crashes

Video: Accessing Invalid Memory

Which of the following can assist in finding out if invalid operations are occurring in a program running on a Windows system?

Valgrind
Dr. Memory
PBD files
Segfaults

Dr. Memory can assist in finding out if invalid operations are occurring in a program running on Windows or Linux.

Video: Unhandled Errors and Exceptions

What can you use to notify users when an error occurs, the reason why it occurred, and how to resolve it?

The pdb module
The logging module
Use printf debugging
The echo command

The logging module sets debug messages to show up when the code fails.

Video: Fixing Someone Else’s Code

After getting acquainted with the program’s code, where might you start to fix a problem?

Run through tests
Read the comments
Locate the affected function
Create new tests

Start working on the function that produced the error, and the function(s) that called it.

Video: Debugging a Segmentation Fault

When debugging code, what command can you use to figure out how your program reached the failed state?

gdb
backtrace
ulimit
list

The backtrace command can be used to show a summary of the function calls that were used to the point where the failure occurs.

Video: Debugging a Python Crash

When debugging in Python, what command can you use to run the program until it crashes with an error?

pdb3
next
continue
KeyError

Running the continue command after starting the pdb3 debugger will execute the program until it finishes or crashes.

Handling Bigger Incidents

Video: Crashes in Complex Systems

A website is producing service errors when loading certain pages. Looking at the logs, one of three web servers isn’t responding correctly to requests. What can you do to restore services, while troubleshooting further?

Deploy a new web server
Roll back application changes
Remove the server from the pool
Create standby servers

Removing the server from the pool will provide full service to users from the remaining web servers

Video: Communication and Documenting During Incidents

Which of the following persons is responsible for communicating with customers that are affected by an access issue with a website?

Communications lead
Manager
Incident controller
Software engineer

The communications lead provides timely updates on the incident and answers questions from users.

Video: Writing Effective Postmortems

When writing an effective postmortem of an incident, what should you NOT include?

What caused the issue
Who caused the issue
What the impact was
The short-term remediation

A postmortem of an incident should not include the person(s) who caused the issue.

Graded Assessment

Click here to view

4. Managing Resources

Practice Quiz: Making Our Future Lives Easier

Total points: 5
Score: 100%

Question 1

Which proactive practice can you implement to make troubleshooting issues in a program easier when they happen again, or face other similar issues?

Create and update documentation
Use a test environment.
Automate rollbacks.
Set up Unit tests.

Documentation that includes good instructions on how to resolve an issue can assist in resolving the same, or similar issue in the future.

Question 2

Which of the following is a good example of mixing and matching resources on a single server so that the running services make the best possible use of all resources?

Run two applications that are CPU intensive between two servers.
Run a CPU intensive application on one server, and an I/O intensive application on another server.
Run a RAM intensive application and a CPU intensive application on a server.
Run two applications that are RAM and I/O intensive on a server.

An application that uses a lot of RAM can still run while CPU is mostly used by another application on the same server.

Question 3

One strategy for debugging involves explaining the problem to yourself out loud. What is this technique known as?

Monitoring
Rubber Ducking
Testing
Ticketing

Rubber ducking is the process of explaining a problem to a “rubber duck”, or rather yourself, to better understand the problem.

Question 4

When deploying software, what is a canary?

A test for how components of a program interact with each other
A test of a program’s components
A test deployment to a subset of production hosts
A small section of code

Reminiscent of the old term “canary in a coal mine”, a canary is a test deployment of our software, just to see what happens.

Question 5

It is advisable to collect monitoring information into a central location. Given the importance of the server handling the centralized collecting, when assessing risks from outages, this server could be described as what?

A failure domain
A problem domain
CPU intensive
I/O intensive

A failure domain is a logical or physical component of a system that might fail.

Practice Quiz: Managing Computer Resources

Total points: 5
Score: 100%

Question 1

How can you profile an entire Python application?

Use an @profile label
Use the guppy module
Use Memory Profiler
Use a decorator

Guppy is a Python library with tools to profile an entire Python application.

Question 2

Your application is having difficulty sending and receiving large packets of data, which are also delaying other processes when connected to remote computers. Which of the following will be most effective on improving network traffic for the application?

Running the iftop program
Increase storage capacity
Increase memory capacity
Use traffic shaping

Traffic shaping can mark data packets and assign higher priorities when being sent over the network.

Question 3

What is the term referring to the amount of time it takes for a request to reach its destination, usually measured in milliseconds (ms)?

Bandwidth
Latency
Number of connections
Traffic shaping

Latency is a measure of the time it takes for a request to reach its destination.

Question 4

If your computer is slowing down, what Linux program might we use to determine if we have a memory leak and what process might be causing it?

top
gparted
iftop
cron

The top command will show us all running processes and their memory usage in Linux.

Question 5

Some programs open a temporary file, and immediately _ the file before the process finishes, then the file continues to grow, which can cause slowdown.

open
close
delete
write to

Sometimes a file is marked as deleted right after it is opened, so the program doesn’t “forget” later. The file is then written to, but we can’t see this as the file is already marked as deleted, but will not actually be deleted until the process is finished.

Practice Quiz: Managing Our Time

Total points: 5
Score: 100%

Question 1

Using the Eisenhower Decision Matrix, which of the following is an example of an event or task that is both Important, and Urgent?

Office gossip
Replying to emails
Internet connection is down
Follow-up to a recently resolved issue

It’s important for users to have Internet to work, and it must be resolved right away.

Question 2

You’re working on a web server issue that’s preventing all users from accessing the site. You then receive a call from user to reset their user account password. Which appropriate action should you take when prioritizing your tasks?

Reset the user’s password
Create a script to automate password resets
Ask the user to open a support ticket.
Ignore the user, and troubleshoot web server.

Ask the user to open a support ticket so that the request can be placed into the queue while you work on the most urgent issue at hand.

Question 3

What is it called when we make more work for ourselves later by taking shortcuts now?

Technical debt
Ticket tracking
Eisenhower Decision Matrix
Automation

Technical debt is defined as the implied cost of additional rework caused by choosing an easy (limited) solution now instead of using a better, but more difficult, solution.

Question 4

What is the first step of prioritizing our time properly?

Work on urgent tasks first
Assess the importance of each issue
Make a list of all tasks
Estimate the time each task will take

Before we can even decide which task to do first, we need to make a list of our tasks.

Question 5

If an issue isn’t solved within the time estimate that you provided, what should you do? (Select all that apply)

Explain why
Drop everything and perform that task immediately
Give an updated time estimate
Put the task at the end of the list

Communication is key, and it’s best to keep everyone informed.

If your original estimate turned out to be overly optimistic, it’s appropriate to re-estimate.

Managing Computer Resources

Video: Memory Leaks and How to Prevent Them

Which of the following descriptions most likely points to a possible memory leak?

Application process uses more memory even after a restart.
Garbage collector carries out its task.
The function returns after it completes.
Valgrind figures out memory usage.

An app that still needs a lot of memory, even after a restart, most likely points to a memory leak.

Video: Managing Disk Space

Which of the following is an example of unnecessary files on a server storage device that can affect applications from running if not cleaned up properly?

A SQL database
A mailbox database
A set of application files
A set of large temporary files

Large temporary files may remain if an application crashes because it’s not cleaned up automatically.

Video: Network Saturation

The custom application running on a server can’t receive new connections. Existing connections are sending and receiving data in a reasonable time. Which of the following explains the reason why new sessions can’t be established with the server?

Too many connections
High network latency
Low network bandwidth
No traffic shaping

There are limits to how many connections a single server can have, which will prevent new connections.

Graded Assessment

Click here to view