Course 4: Process Data from Dirty to Clean, all weekly challenge quiz answers of this course are provided in this article from week 1 to week 5 to help students solving this exam.

Process Data from Dirty to Clean Weekly Challenge 1 Answers
Q1. Which of the following conditions are necessary to
ensure data integrity? Select all that apply.
- Statistical
power
- Completeness
- Accuracy
- Privacy
Q2. What is one potential problem associated with data
manipulation that analysts must be aware of?
- Data
manipulation can help organize a dataset.
- Data
manipulation can separate a dataset among different locations.
- Data
manipulation can make a dataset easier to read.
- Data
manipulation can introduce errors.
Q3. A data analyst is given a dataset for analysis. It
includes data about the total population of every country in the previous 20
years. Based on the available data, an analyst will be able to determine which
country was the most populous from 2016 to 2017.
- True
- False
Q4. A data analyst is given a dataset for analysis.
June 2014 Invoices – Sheet1.csv
Which of the following has duplicate data?
- Data
for Valando on 2/18/2014
- Data
for Valando on 1/1/2014
- Data
for Symteco on 5/20/2014
- Data
for Symteco on 2/21/2014
Q5. A data analyst is working on a project about the global
supply chain. They have a dataset with lots of relevant data from Europe and
Asia. However, they decide to generate new data that represents all continents.
What type of insufficient data does this scenario describe?
- Data
that keeps updating
- Data
that’s outdated
- Data
that’s geographically limited
- Data
from only one source
Q6. A car manufacturer wants to learn more about the brand
preferences of electric car owners. There are millions of electric car owners
in the world. Who should the company survey?
- A
sample of car owners who most recently bought an electric car
- A
sample of all electric car owners
- A
sample of car owners who have owned more than one electric car
- The
entire population of electric car owners
Q7. Fill in the blank: Sampling bias in data collection
happens when a sample isn’t representative of _____.
- a
dataset about the population
- the
population most affected by the data
- a
subset of the population
- the
population as a whole
Q8. Which of the following processes helps ensure a close
alignment of data and business objectives?
- Completing
data replication
- Transferring
data multiple times
- Having
data update automatically during analysis
- Maintaining data integrity
Process Data from Dirty to Clean Weekly Challenge 2
Answers
Q1. Which of the following terms describe dirty data? Select
all that apply.
- Irrelevant
- Incomplete
- Infallible
- Incorrect
Q2. Field length is a spreadsheet tool for determining if a
field has been duplicated.
- True
- False
Q3. A data analyst notices that the customer in row 2 shares
the same Customer ID as the customer in row 6. What does this scenario
describe?
A |
B |
C |
D |
D |
1 |
Last name |
First name |
Middle initial |
Customer ID |
2 |
Smith |
Leonardo |
R. |
64078 |
3 |
Lee |
Natasha |
E. |
92862 |
4 |
Wallace |
Luciana |
M. |
55107 |
5 |
Xiao |
Hua |
A. |
88492 |
6 |
Smith |
Leo |
R. |
64078 |
7 |
Chaudhuri |
Toby |
T. |
34694 |
8 |
Lee |
Tasha |
P. |
18295 |
9 |
Walton |
Mason |
Q. |
58239 |
10 |
Richards |
Felix |
S. |
12765 |
11 |
Guillermo |
Beth |
I. |
27593 |
12 |
Walton |
Nadine |
J. |
67292 |
12 |
Walton |
Nadine |
J. |
67292 |
- Duplicate data
- Mislabeled
data
- Inconsistent
data
- Obsolete
data
Q4. Fill in the blank: Conditional formatting is a
spreadsheet tool that changes how _____ appear when values meet a specific
condition.
- filters
- cells
- queries
- charts
Q5. A data analyst uses the SPLIT function to divide a text
string around a specified character and put each fragment into a new, separate
cell. What is the specified character separating each item called?
- Delimiter
- Unit
- Partition
- Substring
Q6. For a function to work properly, data analysts must
follow each function’s predetermined structure. What is this structure called?
- Syntax
- Validation
- Summary
- Algorithm
Q7. You are working with the following selection of a
spreadsheet:
A |
B |
|
1 |
Customer |
Address |
2 |
Sally Stewart |
9912 School St. North Wales, PA 19454 |
3 |
Lorenzo Price |
8621 Glendale Dr. Burlington, MA 01803 |
4 |
Stella Moss |
372 W. Addison Street Brandon, FL 33510 |
5 |
Paul Casey |
9069 E. Brickyard Road Chattanooga, TN 37421 |
In order to extract the five-digit postal code from
Burlington, MA, what is the correct function?
- =LEFT(5,B3)
- =RIGHT(B3,5)
- =RIGHT(5,B3)
- =LEFT(B3,5)
Q8. A data analyst in a human resources department is
working with the following selection of a spreadsheet:
A |
B |
C |
D |
|
1 |
Year Hired |
Last 4 of SS# |
Department |
Employee ID |
2 |
2019 |
1192 |
Marketing |
|
3 |
2014 |
2683 |
Operations |
|
4 |
2020 |
1939 |
Strategy |
|
5 |
2009 |
3208 |
Graphics |
They want to create employee identification numbers (IDs) in
column D. The IDs should include the year hired plus the last four digits of
the employee’s Social Security Number (SS#). What function will create the ID
20093208 for the employee in row 5?
- =CONCATENATE(A5,B5)
- =CONCATENATE(A5+B5)
- =CONCATENATE(A5:B5)
- =CONCATENATE(A5*B5)
Q9. An analyst is cleaning a new dataset containing 500
rows. They want to make sure the data contained from cell B2 through cell B300
does not contain a number greater than 50. Which of the following COUNTIF
function syntaxes could be used to answer this question? Select all that apply.
- =COUNTIF(B2:B300,>50)
- =COUNTIF(B2:B300,”<=50”)
- =COUNTIF(B2:B300,<=50)
- =COUNTIF(B2:B300,”>50″)
Q10. The V in VLOOKUP stands for what?
- Virtual
- Vertical
- Visual
- Variable
Q11. Fill in the blank: Data mapping is the process of _____
fields from one data source to another.
- matching
- linking
- merging
- extracting
Q12. Describe the relationship between a primary key and a
foreign key.
- A
primary key references a row in which each value is unique. A foreign key
is a column within a table that is a primary key in another table.
- A
primary key is a field within a table that is a foreign key in another
table. A foreign key references a column in which each value is unique
- A
primary key references a column in a table in which each value is unique.
A foreign key is a field within a table that is a primary key in another
table.
- A
primary key references a field within a table that is a foreign key in
another table. A foreign key references a row in which each value is
unique. Correct
Process Data from Dirty to Clean Weekly Challenge 3 Answers
Q1. Data analysts choose SQL for which of the following
reasons? Select all that apply.
- SQL
is a programming language that can also create web apps
- SQL
is a powerful software program
- SQL
is a well-known standard in the professional community
- SQL
can handle huge amounts of data
Q2. In which of the following situations would a data
analyst use spreadsheets instead of SQL? Select all that apply.
- When
visually inspecting data
- When
working with a dataset with more than 1,000,000 rows
- When
working with a small dataset
- When
using a language to interact with multiple database programs
Q3. A data analyst creates many new tables in their
company’s database. When the project is complete, the analyst wants to remove
the tables so they don’t clutter the database. What SQL commands can they use
to delete the tables?
- INSERT
INTO
- CREATE
TABLE IF NOT EXISTS
- UPDATE
- DROP
TABLE IF EXISTS
Q4. A data analyst is cleaning customer data for an online
retail company. They are working with the following section of a database:
The analyst wants to find out if the state data is
consistent and if any text strings contain more than two characters. What is
the correct SQL clause to use to find any text strings containing more than two
characters?
- WHERE(state)
> 2
- DISTINCT(state)
> 2
- SUBSTR(state)
> 2
- LENGTH(state)
> 2
Q5. Fill in the blank: The _____ function counts the number
of characters a string contains.
- SUBSTR
- CAST
- LENGTH
- TRIM
Q6. In SQL databases, what data type refers to a number that
contains a decimal?
- Integer
- String
- Boolean
- Float
Q7. Fill in the blank: In SQL databases, the _____ function
can be used to convert data from one datatype to another.
- TRIM
- LENGTH
- SUBSTR
- CAST
Q8. Fill in the blank: The _____ function can be used to
return non-null values in a list.
- CONCAT
- TRIM
- COALESCE
- CAST
Process Data from Dirty to Clean Weekly Challenge 4 Answers
Q1. The data collected for an analysis project has just been
cleaned. What are the next steps for a data analyst? Select all that apply.
- Verification
- Reporting
- Certification
- Validation
Q2. A data analyst is in the verification step. They
consider the business problem, the goal, and the data involved in their
analytics project. What scenario does this describe?
- Reporting
on the data
- Seeing
the big picture
- Considering
the stakeholders
- Visualizing
the data
Q3. Which function removes leading, trailing, and repeated
spaces in data?
- CUT
- CROP
- TRIM
- TIDY
Q4. A data analyst uses the COUNTA function to count which
of the following?
- The
total number of headers in a specific range.
- The
total number of values within a specified range.
- The
total number of entries in a changelog.
- The
specific numbers in a dataset.
Q5. A WHEN statement considers one or more conditions and
returns a value as soon as that condition is met.
- True
- False
Q6. What is the process of tracking changes, additions,
deletions, and errors during data cleaning?
- Recording
- Documentation
- Observation
- Cataloging
Q7. Fill in the blank: A changelog contains a _____ list of
modifications made to a project.
- approximate
- random
- chronological
- synchronized
Q8. Reviewing version history is an effective way to view a
changelog in SQL.
- True
- False
Process Data from Dirty to Clean Course Week 05 Challenge Answers
Scenario 1, questions 1-5
Q1. You are a data analyst at a small analytics company.
Your company is hosting a project kick-off meeting with a new client,
Meer-Kitty Interior Design. The agenda includes reviewing their goals for the
year, answering any questions, and discussing their available data.
Meer-Kitty Interior Design About Us Page.pdf
Meer-Kitty Interior Design Business Plan.pdf
Meer-Kitty Interior Design has two goals. They want to
expand their online audience, which means getting their company and brand known
by as many people as possible. They also want to launch a line of high-quality
indoor paint to be sold in-store and online. You decide to consider the data
about indoor paint first.
Kitty Survey Feedback – Meer-Kitty survey feedback.csv
You are pleased to find that the available data is aligned
to the business objective. However, you do some research about confidence level
for this type of survey and learn that you need at least 120 unique responses
for the survey results to be useful. Therefore, the dataset has two
limitations: First, there are only 40 responses; second, a Meer-Kitty superfan,
User 588, completed the survey 11 times.
As the survey has too few responses and numerous
duplicates that are skewing results, what are your options? Select all that
apply.
- Repeat
the survey in order to create a new, improved dataset.
- Locate
another dataset about indoor paint.
- Remove
the duplicates from the data and proceed with analysis.
- Talk
with stakeholders and ask for more time.
Q2. During the meeting, you also learn that Meer-Kitty
videos are hosted on their website. For each product offered, there is an
accompanying video for customers to learn more. So, more views for a video
suggests greater consumer interest.
Your goal is to identify which videos are most popular, so
Meer-Kitty knows what topics to explore in the future. Unfortunately,
Meer-Kitty has just three months of data available because they only recently
launched the videos on their site.
Without enough data to identify long-term trends about the
video subjects that people prefer, what should you do?
- Find
an alternate data source that will still enable you to meet your
objective.
- Watch
the videos and use your gut instinct to identify which are most
successful.
- Tell
the client you’re sorry, but there is no way to meet their objective.
- Move
ahead with the data you have to determine the top video subjects.
Q3. Now that you’ve identified some limitations with
Meer-Kitty’s data, you want to communicate your concerns to stakeholders. In
addition to insufficient video trend data, your main concern with the indoor
paint survey is that the data isn’t representative of the population as a
whole.
Clearly, one particular respondent, the superfan, is
overrepresented. This means the data doesn’t represent the population as a
whole.
When surveying people for Meer-Kitty in the future, what
are some best practices you can use to address some of the issues associated
with sampling bias? Select all that apply.
- Increase
sample size
- Use
data that keeps updating
- Use
data from only one source
- Use
random sampling
Q4. The stakeholders understand your concerns and agree to
repeat the indoor paint survey. In a few weeks, you have a much better dataset
with more than 150 responses and no duplicates.
Kitty Survey Feedback – New Meer-Kitty survey feedback.csv
You notice that questions 4 and 5 are dependent on the
respondent’s answer to question 3. So, you need to determine how many people
answered Yes to question 3, then compare that to responses to questions 4 and
5. That way, you will know if questions 4 and 5 have any nulls.
You decide to use a spreadsheet tool that changes how
cells appear when they contain the word Yes. Which tool do you use?
- Data
validation
- Conditional
formatting
- Filtering
- CONCATENATE
Q5. You continue cleaning the data. You use tools such as
remove duplicates and COUNTIF to ensure the dataset is complete, correct, and
relevant to the problem you’re trying to solve. Then, you complete the
verification and reporting processes to share the details of your data-cleaning
effort with your team.
While reviewing, your team notes one aspect of data cleaning
that would improve the dataset even more. They point out that the new survey
also has a new question in Column G: “What are your favorite indoor paint
colors?” This was a free-response question, so respondents typed in their
answers. Some people included multiple different colors of paint. In order to
determine which colors are most popular, it will be necessary to put each color
in its own cell.
What spreadsheet function enables you to put each of the
colors in Column G into a new, separate cell?
- Delimit
- MID
- Divide
- SPLIT
Scenario 2, questions 6-10
Q6. You’ve completed this program and are interviewing for a
junior data scientist position. The job is at B.Spoke Market Research, a
company that analyzes market conditions using customer surveys and other
research methods. The detailed job description can be found below:
C4 B.Spoke Market Research Job Description.pdf
So far, you’ve had a phone interview with a recruiter and
you’ve secured a second interview with the B.Spoke team. The recruiter’s email
can be found below:
C4 S2 Email from Recruiter.pdf
You arrive 15 minutes early for your interview. Soon, you
are escorted into a conference room, where you meet Jodie Choi, the data
science lead. After welcoming you, the behavioral interview begins.
For your first question, your interviewer wants to learn
about your experience with spreadsheets. She says: Sometimes the team needs
data that is stored in different spreadsheets. So, we use a spreadsheet
function to find the information we need.
There is a spreadsheet function that searches for a value
in the first column of a given range and returns the value of a specified cell
in the row in which it is found. It is called SEARCH.
- True
- False
Q7. Next, your interviewer wants to know more about your
understanding of tools that work in both spreadsheets and SQL. She explains
that the data her team receives from customer surveys sometimes has many
duplicate entries.
She says: Spreadsheets have a great tool for that called
remove duplicates. In SQL, you can include DISTINCT to do the same thing. In
which part of the SQL statement do you include DISTINCT?
- The
FROM statement
- The
WHERE statement
- The
UPDATE statement
- The
SELECT statement
Q8. Now, your interviewer explains that the data team
usually works with very large amounts of customer survey data. After receiving
the data, they import it into a SQL table. But sometimes, the new dataset
imports incorrectly and they need to change the format.
She asks: What function would you use to convert data in
a SQL table from one datatype to another?
- CONVERT
- CHANGE
- CAST
- COALESCE
Q9. Next, your interviewer explains that one of their
clients is an online retailer that needs to create product numbers for a vast
inventory. Her team does this by combining the text strings for product number,
manufacturing date, and color.
She asks: Which SQL function would you use to add strings
together to create new text strings?
- COMBINE
- CREATE
- COALESCE
- CONCAT
Q10. For your final question, your interviewer explains that
her team often comes across data with extra spaces.
She asks: Which function would enable you to eliminate
those extra spaces? You respond: To eliminate extra spaces for consistency, use
the TRIM function.
- True
- False
Post a Comment