Practice Quiz: Binary Searching a Problem
- Total points: 5
- Score: 100%
Question 1
You have a list of computers that a script connects to in order to gather SNMP traffic and calculate an average for a set of metrics. The script is now failing, and you do not know which remote computer is the problem. How would you troubleshoot this issue using the bisecting methodology?
- Run the script with the first half of the computers.
- Run the script with last computer on the list.
- Run the script with first computer on the list
- Run the script with two-thirds of the computers.
Bisecting when troubleshooting starts with splitting the list of computers and choosing to run the script with one half.
Question 2
The find_item function uses binary search to recursively locate an item in the list, returning True if found, False otherwise. Something is missing from this function. Can you spot what it is and fix it? Add debug lines where appropriate, to help narrow down the problem.
def find_item(list, item):
#Returns True if the item is in the list, False if not.
if len(list) == 0:
return False ## OK
#Is the item in the center of the list?
middle = len(list)//2 ## OK
if list[middle] == item:
return True ## OK
#Is the item in the first half of the list?
## if item < list[middle]: ## Incorrect
if item in list[:middle]:
#Call the function with the first half of the list
return find_item(list[:middle], item) ## OK
else:
#Call the function with the second half of the list
return find_item(list[middle+1:], item) ## OK
return False
#Do not edit below this line - This code helps check your work!
list_of_names = ["Parker", "Drew", "Cameron", "Logan", "Alex", "Chris", "Terry", "Jamie", "Jordan", "Taylor"]
print(find_item(list_of_names, "Alex")) ## True
print(find_item(list_of_names, "Andrew")) ## False
print(find_item(list_of_names, "Drew")) ## True
print(find_item(list_of_names, "Jared")) ## False
Output:
True
False
True
False
Question 3
The binary_search function returns the position of key in the list if found, or -1 if not found. We want to make sure that it’s working correctly, so we need to place debugging lines to let us know each time that the list is cut in half, whether we’re on the left or the right. Nothing needs to be printed when the key has been located.
For example, binary_search([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 3) first determines that the key, 3, is in the left half of the list, and prints “Checking the left side”, then determines that it’s in the right half of the new list and prints “Checking the right side”, before returning the value of 2, which is the position of the key in the list.
Add commands to the code, to print out “Checking the left side” or “Checking the right side”, in the appropriate places.
def binary_search(list, key):
#Returns the position of key in the list if found, -1 otherwise.
#List must be sorted:
list.sort()
left = 0
right = len(list) - 1
while left <= right:
middle = (left + right) // 2
if list[middle] == key:
return middle
if list[middle] > key:
print("Checking the left side")
right = middle - 1
if list[middle] < key:
print("Checking the right side")
left = middle + 1
return -1
print(binary_search([10, 2, 9, 6, 7, 1, 5, 3, 4, 8], 1))
"""Should print 2 debug lines and the return value:
Checking the left side
Checking the left side
0
"""
print(binary_search([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 5))
"""Should print no debug lines, as it's located immediately:
4
"""
print(binary_search([10, 9, 8, 7, 6, 5, 4, 3, 2, 1], 7))
"""Should print 3 debug lines and the return value:
Checking the right side
Checking the left side
Checking the right side
6
"""
print(binary_search([1, 3, 5, 7, 9, 10, 2, 4, 6, 8], 10))
"""Should print 3 debug lines and the return value:
Checking the right side
Checking the right side
Checking the right side
9
"""
print(binary_search([5, 1, 8, 2, 4, 10, 7, 6, 3, 9], 11))
"""Should print 4 debug lines and the "not found" value of -1:
Checking the right side
Checking the right side
Checking the right side
Checking the right side
-1
"""
Output:
Checking the left side
Checking the left side
0
4
Checking the right side
Checking the left side
Checking the right side
6
Checking the right side
Checking the right side
Checking the right side
9
Checking the right side
Checking the right side
Checking the right side
Checking the right side
-1
Question 4
When trying to find an error in a log file or output to the screen, what command can we use to review, say, the first 10 lines?
- wc
- tail
- head
- bisect
The head command will print the first lines of a file, 10 lines by default.
Question 5
The best_search function compares linear_search and binary_search functions, to locate a key in the list, and returns how many steps each method took, and which one is the best for that situation. The list does not need to be sorted, as the binary_search function sorts it before proceeding (and uses one step to do so). Here, linear_search and binary_search functions both return the number of steps that it took to either locate the key, or determine that it’s not in the list. If the number of steps is the same for both methods (including the extra step for sorting in binary_search), then the result is a tie. Fill in the blanks to make this work.
def linear_search(list, key):
#Returns the number of steps to determine if key is in the list
#Initialize the counter of steps
steps=0
for i, item in enumerate(list):
steps += 1
if item == key:
break
return steps
def binary_search(list, key):
#Returns the number of steps to determine if key is in the list
#List must be sorted:
list.sort()
#The Sort was 1 step, so initialize the counter of steps to 1
steps=1
left = 0
right = len(list) - 1
while left <= right:
steps += 1
middle = (left + right) // 2
if list[middle] == key:
break
if list[middle] > key:
right = middle - 1
if list[middle] < key:
left = middle + 1
return steps
def best_search(list, key):
steps_linear = linear_search(list, key)
steps_binary = binary_search(list, key)
results = "Linear: " + str(steps_linear) + " steps, "
results += "Binary: " + str(steps_binary) + " steps. "
if (steps_linear < steps_binary):
results += "Best Search is Linear."
elif (steps_linear > steps_binary):
results += "Best Search is Binary."
else:
results += "Result is a Tie."
return results
print(best_search([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 1))
#Should be: Linear: 1 steps, Binary: 4 steps. Best Search is Linear.
print(best_search([10, 2, 9, 1, 7, 5, 3, 4, 6, 8], 1))
#Should be: Linear: 4 steps, Binary: 4 steps. Result is a Tie.
print(best_search([10, 9, 8, 7, 6, 5, 4, 3, 2, 1], 7))
#Should be: Linear: 4 steps, Binary: 5 steps. Best Search is Linear.
print(best_search([1, 3, 5, 7, 9, 10, 2, 4, 6, 8], 10))
#Should be: Linear: 6 steps, Binary: 5 steps. Best Search is Binary.
print(best_search([5, 1, 8, 2, 4, 10, 7, 6, 3, 9], 11))
#Should be: Linear: 10 steps, Binary: 5 steps. Best Search is Binary.
Output:
Linear: 1 steps, Binary: 4 steps. Best Search is Linear.
Linear: 4 steps, Binary: 4 steps. Result is a Tie.
Linear: 4 steps, Binary: 5 steps. Best Search is Linear.
Linear: 6 steps, Binary: 5 steps. Best Search is Binary.
Linear: 10 steps, Binary: 5 steps. Best Search is Binary.
Practice Quiz: Introduction to Debugging
- Total points: 5
- Score: 100%
Question 1
What is part of the final step when problem solving?
- Documentation
- Long-term remediation
- Finding the root cause
- Gathering information
Long-term remediation is part of the final step when problem solving.
Question 2
Which tool can you use when debugging to look at library calls made by the software?
- top
- strace
- tcpdump
- ltrace
the ltrace tool is used to look at library calls made by the software.
Question 3
What is the first step of problem solving?
- Prevention
- Gathering information
- Long-term remediation
- Finding the root cause
Gathering information is the first step taken when problem solving.
Question 4
What software tools are used to analyze network traffic to isolate problems? (Check all that apply)
- tcpdump
- wireshark
- strace
- top
The tcpdump tool is a powerful command-line analyzer that captures or “sniffs” TCP/IP packets.
Wireshark is an open source tool for profiling network traffic and analyzing TCP/IP packets.
Question 5
The strace (in Linux) tool allows us to see all of the _ our program has made.
- Network traffic
- Disk writes
- System calls
- Connection requests
The strace command shows us all the system calls our program made. System calls are the calls that the programs running in our computer make to the running kernel.
Practice Quiz: Understanding the Problem
- Total points: 5
- Score: 100%
Question 1
When a user reports that an “application doesn’t work,” what is an appropriate follow-up question to gather more information about the problem?
- Is the server plugged in?
- Why do you need the application?
- Do you have a support ticket number?
- What should happen when you open the app?
Asking the user what an expected result should be will help you gather more information to understand and isolate the problem.
Question 2
What is a heisenbug?
- The observer effect.
- A test environment.
- The root cause.
- An event viewer.
The observer effect is when just observing a phenomenon alters the phenomenon.
Question 3
The compare_strings function is supposed to compare just the alphanumeric content of two strings, ignoring upper vs lower case and punctuation. But something is not working. Fill in the code to try to find the problems, then fix the problems.
import re
def compare_strings(string1, string2):
#Convert both strings to lowercase
#and remove leading and trailing blanks
string1 = string1.lower().strip()
string2 = string2.lower().strip()
#Ignore punctuation
## punctuation = r"[.?!,;:-']"
punctuation = r"[.?!,;:'-]"
string1 = re.sub(punctuation, r"", string1)
string2 = re.sub(punctuation, r"", string2)
#DEBUG CODE GOES HERE
"""
change r"[.?!,;:-']" with r"[.?!,;:'-]" in punctuation variable
because of pattern error (Character range is out of order ('-' pattern))
"""
return string1 == string2
print(compare_strings("Have a Great Day!", "Have a great day?")) ## True
print(compare_strings("It's raining again.", "its raining, again")) ## True
print(compare_strings("Learn to count: 1, 2, 3.", "Learn to count: one, two, three.")) ## False
print(compare_strings("They found some body.", "They found somebody.")) ## False
Output:
True
True
False
False
Question 4
How do we verify if a problem is still persisting or not?
- Restart the device or server hardware
- Attempt to trigger the problem again by following the steps of our reproduction case
- Repeatedly ask the user
- Check again later
If we can recreate the circumstances of the issue, we can verify whether the problem continues to occur.
Question 5
The datetime module supplies classes for manipulating dates and times, and contains many types, objects, and methods. You’ve seen some of them used in the dow function, which returns the day of the week for a specific date. We’ll use them again in the next_date function, which takes the date_string parameter in the format of “year-month-day”, and uses the add_year function to calculate the next year that this date will occur (it’s 4 years later for the 29th of February during Leap Year, and 1 year later for all other dates). Then it returns the value in the same format as it receives the date: “year-month-day”.
Can you find the error in the code? Is it in the next_date function or the add_year function? How can you determine if the add_year function returns what it’s supposed to? Add debug lines as necessary to find the problems, then fix the code to work as indicated above.
import datetime
from datetime import date
def add_year(date_obj):
try:
new_date_obj = date_obj.replace(year = date_obj.year + 1)
except ValueError:
## This gets executed when the above method fails,
## which means that we're making a Leap Year calculation
new_date_obj = date_obj.replace(year = date_obj.year + 4)
return new_date_obj ## OK
def next_date(date_string):
## Convert the argument from string to date object
date_obj = datetime.datetime.strptime(date_string, r"%Y-%m-%d")
next_date_obj = add_year(date_obj)
## print(f'{date_obj} | {next_date_obj}') ## OK
## Convert the datetime object to string,
## in the format of "yyyy-mm-dd"
## next_date_string = next_date_obj.strftime("yyyy-mm-dd")
next_date_string = next_date_obj.strftime("%Y-%m-%d")
return next_date_string
today = date.today() ## Get today's date
print(next_date(str(today)))
## Should return a year from today, unless today is Leap Day
print(next_date("2021-01-01")) ## Should return 2022-01-01
print(next_date("2020-02-29")) ## Should return 2024-02-29
Output:
2020-08-03 00:00:00 | 2021-08-03 00:00:00
2021-08-03
2021-01-01 00:00:00 | 2022-01-01 00:00:00
2022-01-01
2020-02-29 00:00:00 | 2024-02-29 00:00:00
2024-02-29
Introduction to Debugging
Video: What is debugging?
What is the general description of debugging?
- Fixing bugs in the code of the application
- Fixing problems in the system running the application
- Fixing issues related to hardware
- Fixing configuration issues in the software
Generally, debugging means fixing bugs in the code of the application.
Video: Problem Solving Steps
What is the second step of problem solving?
- Short-term remediation
- Long-term remediation
- Finding the root cause
- Gathering information
Finding the root cause is the second step taken when problem solving.
Video: Silently Crashing Application
Which command can you use to scroll through a lot of text output after tracing system calls of a script?
- strace -o fail.strace ./script.py
- strace ./script.py | less
- strace ./script.py
- strace ./script.py -o fail.strace
Piping the less command allows you to scroll through a lot of text output.
Understanding the Problem
Video: “It Doesn’t Work”
When a user reports that a “website doesn’t work,” what is an appropriate follow-up question you can use to gather more information about the problem?
- What steps did you perform?
- Is the server receiving power?
- What server is the website hosted on?
- Do you have support ticket number?
Asking the user what steps they performed will help you gather more information in order to better understand and isolate the problem.
Video: Creating a Reproduction Case
A program fails with an error, “No such file or directory.” You create a directory at the expected file path and the program successfully runs. Describe the reproduction case you’ll submit to the program developer to verify and fix this error.
- A report explaining to open the program without the specific directory on the computer
- A report with application logs exported from Windows Event Viewer?
- A report listing the contents of the new directory
- A report listing the differences between strace and ltrace logs.
This a specific way to reproduce the error and verify it exists. The developer can work on fixing it right away.
Video: Finding the Root Cause
Generally, understanding the root cause is essential for _?
- Purchasing new devices
- Producing test data
- Avoiding interfering with users
- Providing the long-term resolution
Understanding the root cause is essential for providing the long-term resolution.
Video: Dealing with Intermittent Issues
What sort of software bug might we be dealing with if power cycling resolves a problem?
- Poorly managed resources
- A heisenbug
- Logs filling up
- A file remains open
Power cycling releases resources stored in cache or memory, which gets rid of the problem.
Binary Searching a Problem
What is binary search?
When searching for more than one element in a list, which of the following actions should you perform first in order to search the list as quickly as possible?
- Sort the list
- Do a binary search
- Do a linear search
- Use a base three logarithm
A list must be sorted first before it can take advantage of the binary search algorithm.
Video: Applying Binary Search in Troubleshooting
When troubleshooting an XML configuration file that’s failed after being updated for an application, what would you bisect in the code?
- File format
- File quantity
- Folder location
- Variables
The list of variables in the file can be bisected or tested in halves continuously until a single root cause is found.
Peer Graded Assessment
SRC
2. Slowness
Practice Quiz: Slow Code
- Total points: 5
- Score: 100%
Question 1
Which of the following is NOT considered an expensive operation?
- Parsing a file
- Downloading data over the network
- Going through a list
- Using a dictionary
Using a dictionary is faster to look up elements than going through a list.
Question 2
Which of the following may be the most expensive to carry out in most automation tasks in a script?
- Loops
- Lists
- Vector
- Hash
Loops that run indefinitely, and include subtasks to complete before moving on can be very expensive for most automation tasks.
Question 3
Which of the following statements represents the most sound advice when writing scripts?
- Aim for every speed advantage you can get in your code
- Use expensive operations often
- Start by writing clear code, then speed it up only if necessary
- Use loops as often as possible
If we don’t notice any slowdown, then there’s little point trying to speed it up.
Question 4
In Python, what is a data structure that stores multiple pieces of data, in order, which can be changed later?
- A hash
- Dictionaries
- Lists
- Tuples
Lists are efficient, and if we are either iterating through the entire list or are accessing elements by their position, lists are the way to go.
Question 5
What command, keyword, module, or tool can be used to measure the amount of time it takes for an operation or program to execute? (Check all that apply)
- time
- kcachegrind
- cProfile
- break
We can precede the name of our commands and scripts with the “time” shell builtin and the shell will output execution time statistics when they complete.
The kcachegrind tool is used for profile data visualization that, if we can insert some code into the program, can tell us how long execution of each function takes.
cProfile provides deterministic profiling of Python programs, including how often and for how long various parts of the program executed.
Practice Quiz: Understanding Slowness
- Total points: 5
- Score: 100%
Question 1
Which of the following will an application spend the longest time retrieving data from?
- CPU L2 cache
- RAM
- Disk
- The network
An application will take the longest time trying to retrieve data from the network.
Question 2
Which tool can you use to verify reports of ‘slowness’ for web pages served by a web server you manage?
- The top tool
- The ab tool
- The nice tool
- The pidof tool
The ab tool is an Apache Benchmark tool used to figure out how slow a web server is based on average timing of requests.
Question 3
If our computer running Microsoft Windows is running slow, what performance monitoring tools can we use to analyze our system resource usage to identify the bottleneck? (Check all that apply)
- Performance Monitor
- Resource Monitor
- Activity Monitor
- top
Performance Monitor is a system monitoring program that provides basic CPU and memory resource measurements in Windows.
Resource Monitor is an advanced resource monitoring utility that provides data on hardware and software resources in real time.
Question 4
Which of the following programs is likely to run faster and more efficiently, with the least slowdown?
- A program with a cache stored on a hard drive
- A program small enough to fit in RAM
- A program that reads files from an optical disc
- A program that retrieves most of its data from the Internet
Since RAM access is faster than accessing a disk or network, a program that can fit in RAM will run faster.
Question 5
What might cause a single application to slow down an entire system? (Check all that apply)
- A memory leak
- The application relies on a slow network connection
- Handling files that have grown too large
- Hardware faults
Memory leaks happen when an application doesn’t release memory when it is supposed to.
If files generated by the application have grown overly large, slowdown will occur if the application needs to store a copy of the file in RAM in order to use it.
Practice Quiz: When Slowness Problems Get Complex
- Total points: 5
- Score: 100%
Question 1
Which of the following can cache database queries in memory for faster processing of automated tasks?
- Threading
- Varnish
- Memcached
- SQLite
Memchached is a caching service that keeps most commonly accessed database queries in RAM.
Question 2
What module specifies parts of a code to run in separate asynchronous events?
- Threading
- Futures
- Asyncio
- Concurrent
Asyncio is a module that lets you specify parts of the code to run as separate asynchronous events.
Question 3
Which of the following allows our program to run multiple instructions in parallel?
- Threading
- Swap space
- Memory addressing
- Dual SSD
Threading allows a process to split itself into parallel tasks.
Question 4
What is the name of the field of study in computer science that concerns itself with writing programs and operations that run in parallel efficiently?
- Memory management
- Concurrency
- Threading
- Performance analysis
Concurrency in computer science is the ability of different sections or units of a program, algorithm, or problem to be executed out of order or in partial order, without impacting the final result.
Question 5
What would we call a program that often leaves our CPU with little to do as it waits on data from a local disk and the Internet?
- Memory-bound
- CPU-bound
- User-bound
- I/O bound
If our program mainly finds itself waiting on local disks or the network, it is I/O bound.
Understanding Slowness
Video: Why is my computer slow?
When addressing slowness, what do you need to identify?
- The bottleneck
- The device
- The script
- The system
The bottleneck could be the CPU time, or time spent reading data from disk.
Video: How Computers Use Resources
After retrieving data from the network, how can an application access that same data quicker next time?
- Use the swap
- Create a cache
- Use memory leak
- Store in RAM
A cache stores data in a form that’s faster to access than its original form.
Video: Possible Causes of Slowness
A computer becomes sluggish after a few days, and the problem goes away after a reboot. Which of the following is the possible cause?
- Files are growing too large.
- A program is keeping some state while running.
- Files are being read from the network.
- Hard drive failure.
A program keeping a state without any change can slow down a computer up until it is rebooted.
Slow Code
Video: Writting Efficient Code
What is the cProfile module used for?
- For parsing files.
- To analyze a C program.
- To count functions calls
- To remove unnecessary functions.
The cProfile module is used to count how many times functions are called, and how long they run.
Video: Using the Right Data Structures
Which of the following has values associated with keys in Python?
- A hash
- A dictionary
- A HashMap
- An Unordered Map
Python uses a dictionary to store values, each with a specific key
Video: Expensive Loops
Your Python script searches a directory, and runs other tasks in a single loop function for 100s of computers on the network. Which action will make the script the least expensive?
- Read the directory once
- Loop the total number of computers
- Service only half of the computers
- Use more memory
Reading the directory once before the loop will make the script less expensive to run.
Video: Keeping Local Results
Your script calculates the average number of active user sessions during business hours in a seven-day period. How often should a local cache be created to give a good enough average without updating too often?
- Once a week
- Once a day
- Once a month
- Once every 8 hours
A local cache for every day can be accessed quickly, and processed for a seven-day average calculation.
Video: Slow Script with Expensive Loop
You use the time command to determine how long a script runs to complete its various tasks. Which output value will show the time spent doing operations in the user space?
- Real
- Wall-clock
- Sys
- User
The user value is the time spent doing operations in the user space.
Understanding Slowness
Video: Why is my computer slow?
When addressing slowness, what do you need to identify?
- The bottleneck
- The device
- The script
- The system
The bottleneck could be the CPU time, or time spent reading data from disk.
Video: How Computers Use Resources
After retrieving data from the network, how can an application access that same data quicker next time?
- Use the swap
- Create a cache
- Use memory leak
- Store in RAM
A cache stores data in a form that’s faster to access than its original form.
Video: Possible Causes of Slowness
A computer becomes sluggish after a few days, and the problem goes away after a reboot. Which of the following is the possible cause?
- Files are growing too large.
- A program is keeping some state while running.
- Files are being read from the network.
- Hard drive failure.
A program keeping a state without any change can slow down a computer up until it is rebooted.
When Slowness Problems Get Complex
Video: Parallelizing Operations
A script is _ if you are running operations in parallel using all available CPU time.
- I/O bound
- Threading
- CPU bound
- Asyncio
A script is CPU bound if you’re running operations in parallel using all available CPU time.
Video: Slowly Growing in Complexity
You’re creating a simple script that runs a query on a list of product names of a very small business, and initiates automated tasks based on those queries. Which of the following would you use to store product names?
- SQLite
- Microsoft SQL Server
- Memcached
- CSV file
A simple CSV file is enough to store a list of product names.
Video: Dealing with Complex Slow Systems
A company has a single web server hosting a website that also interacts with an external database server. The web server is processing requests very slowly. Checking the web server, you found the disk I/O has high latency. Where is the cause of the slow website requests most likely originating from?
- Local disk
- Remote database
- Slow Internet
- Database index
The local disk I/O latency is causing the application to wait too long for data from disk.
Video: Using Threads to Make Things Go Faster
Which module makes it possible to run operations in a script in parallel that makes better use of CPU processing time?
- Executor
- Futures
- Varnish
- Concurrency
The futures module makes it possible to run operations in parallel using different executors.
When Slowness Problems Get Complex
Video: Parallelizing Operations
A script is _ if you are running operations in parallel using all available CPU time.
- I/O bound
- Threading
- CPU bound
- Asyncio
A script is CPU bound if you’re running operations in parallel using all available CPU time.
Video: Slowly Growing in Complexity
You’re creating a simple script that runs a query on a list of product names of a very small business, and initiates automated tasks based on those queries. Which of the following would you use to store product names?
- SQLite
- Microsoft SQL Server
- Memcached
- CSV file
A simple CSV file is enough to store a list of product names.
Video: Dealing with Complex Slow Systems
A company has a single web server hosting a website that also interacts with an external database server. The web server is processing requests very slowly. Checking the web server, you found the disk I/O has high latency. Where is the cause of the slow website requests most likely originating from?
- Local disk
- Remote database
- Slow Internet
- Database index
The local disk I/O latency is causing the application to wait too long for data from disk.
Video: Using Threads to Make Things Go Faster
Which module makes it possible to run operations in a script in parallel that makes better use of CPU processing time?
- Executor
- Futures
- Varnish
- Concurrency
The futures module makes it possible to run operations in parallel using different executors.
Graded Assessment
SRC
3. Crashing Program
Practice Quiz: Code that Crashes
- Total points: 5
- Score: 100%
Question 1
Which of the following will let code run until a certain line of code is executed?
- Breakpoints
- Watchpoints
- Backtrace
- Pointers
Breakpoints let code run until a certain line of code is executed.
Question 2
Which of the following is NOT likely to cause a segmentation fault?
- Wild pointers
- Reading past the end of an array
- Stack overflow
- RAM replacement
Segmentation fault is not commonly caused by a new RAM card in the system.
Question 3
A common error worth keeping in mind happens often when iterating through arrays or other collections, and is often fixed by changing the less than or equal sign in our for loop to be a strictly less than sign. What is this common error known as?
- Segmentation fault
- backtrace
- The No such file or directory error
- Off-by-one error
The Off-by-one bug, often abbreviated as OB1, frequently happens in computer programming when an iterative process iterates one time too many or too little.
Question 4
A very common method of debugging is to add print statements to our code that display information, such as contents of variables, custom error statements, or return values of functions. What is this type of debugging called?
- Backtracking
- Log review
- Printf debugging
- Assertion debugging
Printf debugging originated in name with using the printf() command in C++ to display debug information, and the name stuck. This type of debugging is useful in all languages.
Question 5
When a process crashes, the operating system may generate a file containing information about the state of the process in memory to help the developer debug the program later. What are these files called?
- Log files
- Core files
- Metadata file
- Cache file
Core files (or core dump files) record an image and status of a running process, and can be used to determine the cause of a crash.
Practice Quiz: Handling Bigger Incidents
- Total points: 5
- Score: 100%
Question 1
Which of the following would be effective in resolving a large issue if it happens again in the future?
- Incident controller
- Postmortem
- Rollbacks
- Load balancers
A postmortem is a detailed document of an issue which includes the root cause and remediation. It is effective on large, complex issues.
Question 2
During peak hours, users have reported issues connecting to a website. The website is hosted by two load balancing servers in the cloud and are connected to an external SQL database. Logs on both servers show an increase in CPU and RAM usage. What may be the most effective way to resolve this issue with a complex set of servers?
- Use threading in the program
- Cache data in memory
- Automate deployment of additional servers
- Optimize the database
Automatically deploying additional servers to handle the loads of requests during peak hours can resolve issues with a complex set of servers.
Question 3
It has become increasingly common to use cloud services and virtualization. Which kind of fix, in particular, does virtual cloud deployment speed up and simplify?
- Deployment of new servers
- Application code fixes
- Log reviewing
- Postmortems
Virtualization makes deployment of VM servers in the cloud a fast and relatively simple process.
Question 4
What should we include in our postmortem? (Check all that apply)
- Root cause of the issue
- How we diagnosed the problem
- How we fixed the problem
- Who caused the problem
In order to learn about the problem and how it happens in general, we should include what caused it this time.
Awesome! By clarifying how we identified the problem, it can be more easily identified in the future.
In order to share with reviewers how the issue was resolved, it’s important to include what we did to solve it this time.
Question 5
In general, what is the goal of a postmortem? (Check all that apply)
- To identify who is at fault
- To allow prevention in the future
- To allow speedy remediation of similar issues in the future
- To analyze all system bugs
By describing the cause of the problem, we can learn to avoid the same circumstances in the future.
By describing in detail how we fixed the problem, we can help others or ourselves fix the same problem more quickly in the future.
Practice Quiz: Why Programs Crash
- Total points: 5
- Score: 100%
Question 1
When using Event Viewer on a Windows system, what is the best way to quickly access specific types of logs?
- Export logs
- Create a custom view
- Click on System Reports
- Run the head command
The Create Custom View action is used to filter through logs based on certain criteria.
Question 2
An employee runs an application on a shared office computer, and it crashes. This does not happen to other users on the same computer. After reviewing the application logs, you find that the employee didn’t have access to the application. What log error helped you reach this conclusion?
- “No such file or directory”
- “Connection refused”
- “Permission denied”
- “Application terminated”
In this case, the “Permission denied” error means that the user didn’t have access to the application executable in order to run it.
Question 3
What tool can we use to check the health of our RAM?
- Event Viewer
- S.M.A.R.T. tools
- memtest86
- Process Monitor
memtest86 and memtest86+ are memory analysis software programs designed to test and stress test the random access memory of an x86 architecture system for errors, by writing test patterns to most memory addresses, then reading data back and checking for errors.
Question 4
You’ve just finished helping a user work around an issue in an application. What important but easy-to-forget step should we remember to do next?
- Fix the code
- Report the bug to the developers
- Reinstall the program
- Change the user’s password
If there is a repeatable error present in a program, it is proper etiquette to report the bug in detail to the developer.
Question 5
A user is experiencing strange behavior from their computer. It is running slow and lagging, and having momentary freeze-ups that it does not usually have. The problem seems to be system-wide and not restricted to a particular application. What is the first thing to ask the user as to whether they have tried it?
- Adding more RAM
- Reinstalling Windows
- Identified the bottleneck with a resource monitor
- Upgrade their HDD to an SSD
The first step is identifying the root cause of the problem. Resource monitors such as Activity Monitor (MacOS), top (Linux and MacOS) or Resource Monitor (Windows) can help us identify whether our bottleneck is CPU-based or memory-based.
Why Programs Crash
Video: System That Crash
A user reported an application crashes on their computer. You log in and try to run the program and it crashes again. Which of the following steps would you perform next to reduce the scope of the problem?
- Check the health of the RAM
- Switch the hard drive into another computer
- Check the health of the hard drive
- Review application logs
Reviewing logs is the next best step to determine if logs reveal any reason for the crash.
Video: Understanding Crashing Applications
Where should you look for application logs on a Windows system?
- The /var/log directory
- The .xsession-errors file
- The Console app
- The Event Viewer app
The Event Viewer app contains logs on a Windows system.
What to do when you can’t fix the program?
An application fails in random intervals after it was installed on a different operating system version. What can you do to work around the issue?
- Use a wrapper
- Use a container
- Use a watchdog
- Use an XML format
A container allows the application to run in its own environment without interfering with the rest of the system.
Video: Internal Server Error
Where is a common location to view configuration files for a web application running on a Linux server?
- /etc/
- /var/log/
- /srv/
- /
The /etc directory will contain the application folder that stores configuration files.
Code that Crashes
Video: Accessing Invalid Memory
Which of the following can assist in finding out if invalid operations are occurring in a program running on a Windows system?
- Valgrind
- Dr. Memory
- PBD files
- Segfaults
Dr. Memory can assist in finding out if invalid operations are occurring in a program running on Windows or Linux.
Video: Unhandled Errors and Exceptions
What can you use to notify users when an error occurs, the reason why it occurred, and how to resolve it?
- The pdb module
- The logging module
- Use printf debugging
- The echo command
The logging module sets debug messages to show up when the code fails.
Video: Fixing Someone Else’s Code
After getting acquainted with the program’s code, where might you start to fix a problem?
- Run through tests
- Read the comments
- Locate the affected function
- Create new tests
Start working on the function that produced the error, and the function(s) that called it.
Video: Debugging a Segmentation Fault
When debugging code, what command can you use to figure out how your program reached the failed state?
- gdb
- backtrace
- ulimit
- list
The backtrace command can be used to show a summary of the function calls that were used to the point where the failure occurs.
Video: Debugging a Python Crash
When debugging in Python, what command can you use to run the program until it crashes with an error?
- pdb3
- next
- continue
- KeyError
Running the continue command after starting the pdb3 debugger will execute the program until it finishes or crashes.
Handling Bigger Incidents
Video: Crashes in Complex Systems
A website is producing service errors when loading certain pages. Looking at the logs, one of three web servers isn’t responding correctly to requests. What can you do to restore services, while troubleshooting further?
- Deploy a new web server
- Roll back application changes
- Remove the server from the pool
- Create standby servers
Removing the server from the pool will provide full service to users from the remaining web servers
Video: Communication and Documenting During Incidents
Which of the following persons is responsible for communicating with customers that are affected by an access issue with a website?
- Communications lead
- Manager
- Incident controller
- Software engineer
The communications lead provides timely updates on the incident and answers questions from users.
Video: Writing Effective Postmortems
When writing an effective postmortem of an incident, what should you NOT include?
- What caused the issue
- Who caused the issue
- What the impact was
- The short-term remediation
A postmortem of an incident should not include the person(s) who caused the issue.
Graded Assessment
4. Managing Resources
Practice Quiz: Making Our Future Lives Easier
- Total points: 5
- Score: 100%
Question 1
Which proactive practice can you implement to make troubleshooting issues in a program easier when they happen again, or face other similar issues?
- Create and update documentation
- Use a test environment.
- Automate rollbacks.
- Set up Unit tests.
Documentation that includes good instructions on how to resolve an issue can assist in resolving the same, or similar issue in the future.
Question 2
Which of the following is a good example of mixing and matching resources on a single server so that the running services make the best possible use of all resources?
- Run two applications that are CPU intensive between two servers.
- Run a CPU intensive application on one server, and an I/O intensive application on another server.
- Run a RAM intensive application and a CPU intensive application on a server.
- Run two applications that are RAM and I/O intensive on a server.
An application that uses a lot of RAM can still run while CPU is mostly used by another application on the same server.
Question 3
One strategy for debugging involves explaining the problem to yourself out loud. What is this technique known as?
- Monitoring
- Rubber Ducking
- Testing
- Ticketing
Rubber ducking is the process of explaining a problem to a “rubber duck”, or rather yourself, to better understand the problem.
Question 4
When deploying software, what is a canary?
- A test for how components of a program interact with each other
- A test of a program’s components
- A test deployment to a subset of production hosts
- A small section of code
Reminiscent of the old term “canary in a coal mine”, a canary is a test deployment of our software, just to see what happens.
Question 5
It is advisable to collect monitoring information into a central location. Given the importance of the server handling the centralized collecting, when assessing risks from outages, this server could be described as what?
- A failure domain
- A problem domain
- CPU intensive
- I/O intensive
A failure domain is a logical or physical component of a system that might fail.
Practice Quiz: Managing Computer Resources
- Total points: 5
- Score: 100%
Question 1
How can you profile an entire Python application?
- Use an @profile label
- Use the guppy module
- Use Memory Profiler
- Use a decorator
Guppy is a Python library with tools to profile an entire Python application.
Question 2
Your application is having difficulty sending and receiving large packets of data, which are also delaying other processes when connected to remote computers. Which of the following will be most effective on improving network traffic for the application?
- Running the iftop program
- Increase storage capacity
- Increase memory capacity
- Use traffic shaping
Traffic shaping can mark data packets and assign higher priorities when being sent over the network.
Question 3
What is the term referring to the amount of time it takes for a request to reach its destination, usually measured in milliseconds (ms)?
- Bandwidth
- Latency
- Number of connections
- Traffic shaping
Latency is a measure of the time it takes for a request to reach its destination.
Question 4
If your computer is slowing down, what Linux program might we use to determine if we have a memory leak and what process might be causing it?
- top
- gparted
- iftop
- cron
The top command will show us all running processes and their memory usage in Linux.
Question 5
Some programs open a temporary file, and immediately _ the file before the process finishes, then the file continues to grow, which can cause slowdown.
- open
- close
- delete
- write to
Sometimes a file is marked as deleted right after it is opened, so the program doesn’t “forget” later. The file is then written to, but we can’t see this as the file is already marked as deleted, but will not actually be deleted until the process is finished.
Practice Quiz: Managing Our Time
- Total points: 5
- Score: 100%
Question 1
Using the Eisenhower Decision Matrix, which of the following is an example of an event or task that is both Important, and Urgent?
- Office gossip
- Replying to emails
- Internet connection is down
- Follow-up to a recently resolved issue
It’s important for users to have Internet to work, and it must be resolved right away.
Question 2
You’re working on a web server issue that’s preventing all users from accessing the site. You then receive a call from user to reset their user account password. Which appropriate action should you take when prioritizing your tasks?
- Reset the user’s password
- Create a script to automate password resets
- Ask the user to open a support ticket.
- Ignore the user, and troubleshoot web server.
Ask the user to open a support ticket so that the request can be placed into the queue while you work on the most urgent issue at hand.
Question 3
What is it called when we make more work for ourselves later by taking shortcuts now?
- Technical debt
- Ticket tracking
- Eisenhower Decision Matrix
- Automation
Technical debt is defined as the implied cost of additional rework caused by choosing an easy (limited) solution now instead of using a better, but more difficult, solution.
Question 4
What is the first step of prioritizing our time properly?
- Work on urgent tasks first
- Assess the importance of each issue
- Make a list of all tasks
- Estimate the time each task will take
Before we can even decide which task to do first, we need to make a list of our tasks.
Question 5
If an issue isn’t solved within the time estimate that you provided, what should you do? (Select all that apply)
- Explain why
- Drop everything and perform that task immediately
- Give an updated time estimate
- Put the task at the end of the list
Communication is key, and it’s best to keep everyone informed.
If your original estimate turned out to be overly optimistic, it’s appropriate to re-estimate.
Managing Computer Resources
Video: Memory Leaks and How to Prevent Them
Which of the following descriptions most likely points to a possible memory leak?
- Application process uses more memory even after a restart.
- Garbage collector carries out its task.
- The function returns after it completes.
- Valgrind figures out memory usage.
An app that still needs a lot of memory, even after a restart, most likely points to a memory leak.
Video: Managing Disk Space
Which of the following is an example of unnecessary files on a server storage device that can affect applications from running if not cleaned up properly?
- A SQL database
- A mailbox database
- A set of application files
- A set of large temporary files
Large temporary files may remain if an application crashes because it’s not cleaned up automatically.
Video: Network Saturation
The custom application running on a server can’t receive new connections. Existing connections are sending and receiving data in a reasonable time. Which of the following explains the reason why new sessions can’t be established with the server?
- Too many connections
- High network latency
- Low network bandwidth
- No traffic shaping
There are limits to how many connections a single server can have, which will prevent new connections.