Q: 14
Identify how the count_if function and the count where x is null can be used
Consider a table random_values with below data.
What would be the output of below query?
select count_if(col > 1) as count_
a. count(*) as count_b.count(col1) as count_c from random_values col1
0
1
2
NULL -
2
3
Options
Discussion
A. official exam guide covers count_if and their quirks pretty well. Practice sets can help with NULL logic too.
A for sure. count_if only tallies up when the condition is true, not just non-null, and count(*) always gives the total rows even if there are NULLs. Pretty sure that fits the expected values. If someone sees it differently let me know.
C/D? I feel like it's one of those, mainly because count_if can be tricky if the example data includes NULLs. My guess is D since usually count(*) covers all and count(col1) skips NULLs, but not totally sure if there are three or four hits for col > 1. If I'm missing a detail in the sample table let me know.
D imo
Its A. The trap here is thinking count_if counts NULLs, but it doesn't-it only counts where the condition is true. Also, count(*) includes nulls, while count(col1) skips those. So if there are three values >1, six total rows, and five non-null col1 entries, A matches up. Pretty sure this lines up based on Databricks docs but happy to hear other takes.
A
A makes sense, count_if(col > 1) will only count rows meeting that condition and skips NULL by default. count(*) always counts everything, but count(col1) drops NULLs. Seen this type of logic in other Databricks sets. If I’m off, open to correction.
B or A. With count_if(col > 1), you have to watch for NULLs because if a value is NULL, that row doesn't get counted even if col1 exists. In similar Databricks questions, three is right when there's only three values strictly greater than 1 (not just non-NULLs). Really depends on the sample data. If there are hidden NULLs or zeros it's easy to trip up here.
D . I figured count_if(col > 1) would grab four values if there isn’t an extra NULL tripping things up, and count(*) always grabs all six. Not 100% but the logic felt right to me, anyone disagree?
B or D. I thought count_if(col > 1) would give 4 because I assumed there are four values greater than 1 in the example data, not three. The count(*) and count(col1) totals make sense for 6 and 5, but that first number keeps tripping me up. Maybe I'm missing a NULL trap in the sample? Let me know if anyone reads it differently.
Be respectful. No spam.