SQL – The Other Coding Language
Learning how to handle SQL interview questions is a fairly easy way to stand out for analyst roles. Speaking as a current data science manager, most academic candidates don’t spend enough time learning SQL before they start working in industry. While many of them were talented programmers – and most likely could pick it up very easily – that adds one more thing to the hiring manager’s to-do list. If a candidate is able to provide solid answers to a SQL technical interview, I’m comfortable they will be able to immediately contribute to the team.
Audience For SQL Interview Questions
This interview question guide is intended for candidates pursuing the following roles:
- Statistical Analyst / Data Scientist
- Reporting Analysts in a business function (marketing, finance, strategy)
- IT Business Analysts supporting BI / Reporting functions
The trick to standing out in interviews for these jobs? Understanding the difference between being an analyst and a database administrator; 80% of the typical SQL curriculum isn’t terribly important to a business analyst. Master topics such as queries and data validation. Acquire a basic understanding of how to read table specs, create/copy tables, and automate these activities. You are unlikely to be quizzed very deeply about the details of the balance of the material, by virtue of the fact IT generally takes ownership of these areas and locks most business analysts out of them. So focus your SQL interview preparation on mastering the basics of queries.
This guide is oriented around this strategy; we also drop in a couple of answers to the other topics if you see a need to demonstrate a little extra prowess in advanced SQL.
Basic SQL Interview Questions
- Name some common objects in a SQL database?
- Tables – Stores data in fields; each field is usually predefined by the schema.
- Queries – Code written to manipulate data or the database
- Indexes – Mapping of a table intended to make joining along a certain field faster
- Views – Procedure to combine data from one or more tables into a standard data set; generally takes the form of a query but is saved in the database and can be joined with other tables or views.
- Stored Procedures – “scripts” which are saved in the database and will execute a series of steps; capabilities vary by SQL version but most advanced SQL dialects (PL/SQL, MSSSQL) offer a advanced options for automating common activities, including access to email and logging utilities.
- Triggers – code set up to monitor the status of a table and execute a stored procedure if certain conditions are met
- What are the basic operations used to manipulate data in SQL?
- Select – Used to retrieve and aggregate data from a table
- Update – Changes values of an existing record in a table
- Insert – Adds a new record into a data table
- Delete – Removes a record from a table
- What commands do we use to manipulate a table?
- Create – creates the table; generally with a list of fields we want to create and their specifications
- Drop – removes a table from the database
- Why are commits important?
- For reasons of efficiency (and to support transactions, see below), queries that involved editing a record may keep their changes in local (user specific) memory rather than formally writing them to a database. The commit command is a signal that you’re done and the database needs to preserve the state.
- What is a transaction?
- A transaction is a form of stored procedure where all parts must execute for the results to be committed (saved) in the system. A good example would be transferring money between accounts at a bank, especially if there was a small chance of the process running into an error. Using a transaction for this would allow you to stop and roll-back the change if you realized you couldn’t complete the transaction accurately. For example, imagine moving $10,000 between two accounts and learning there was an error in the second account number (getting the money). Doing this as a transaction allows you to stop and roll-back the transaction before it takes effect with the final commit.
- What Can You Do In A Select Statement?
- Many possible answers this this question, but a good answer should include most of the following.
- Select records from a specific table (specified in the FROM clause)
- Calculate aggregate statistics such as count, sum, min, max, average
- Filter records to match criteria at the record level in the WHERE clause
- Filter records based on aggregate statistics using the HAVING clause
- Group data using the GROUP BY clause
- Sort records using the ORDER BY clause
- Join Two Tables together using the FROM clause and WHERE clause
- How Would I see only the top X results from a SQL Query?
- Specific verbiage can vary slightly by platform, but you usually want to add a LIMIT clause to your select statement. LIMIT will return the first X records identified by the query
- Make sure you using ORDER BY to ensure the records are in the right order
- What happens if I don’t specify an ORDER BY clause?
- Most implementations will return the results sorted in alpha-numeric order, using the list of fields specified in the group by clause
- This could, theoretically, vary by implementation.
- What happens if my query doesn’t have a GROUP BY clause?
- SQL queries that don’t use aggregate functions (SUM, COUNT, etc) will return data, generally unsorted.
- If your query has an aggregate function, the query will return an error since the SQL engine doesn’t know how to calculate your aggregate function.
- How Do SQL Select Statements Handle Duplicate Rows?
- The default setting of a SQL select statement is to return everything, even duplicates.
- If you want to only look at unique values, use the DISTINCT parameter
- If you want to count only the unique values – for example, tell me how many days we had sales for a specific month, most implementation have a COUNT DISTINCT option. So the example would look like COUNT(DISTINCT transaction_date) as selling days.
Writing SQL Where Clauses
- SQL Where Clauses: How Would I Selecting A Specific Value
- This is generally implemented as: Where <field name> = value.
- Don’t forget to watch out for number vs. character values; if a numeric value is stored in a character field, you will need to reference it as: Where selected_field = ‘9’ rather than Where Selected Field = 9.
- SQL Where Clauses: Selecting A Range of Values
- Use the BETWEEN operator in a WHERE statement
- This is generally implemented as: Where <field name> BETWEEN value and value.
- SQL Where Clauses: Selecting From a List of Values
- Use the IN operator in a WHERE statement
- This is generally implemented as: Where <field name> IN (value1, value2, value3, value4)
- Pragmatic comment: Many programmers look down their nose at this approach, but it can be an incredibly simple and transparent (to your successor) way to slam together a basic report in a hurry. Don’t waste time on setting up complicated selects if you already know the exact accounts / locations you want to see. Just dump those in a list of values and cut and paste it into your where clause.
- Can I select a query based on the results of another query?
- Yes, this is commonly done using a feature called a sub query. You set up the subquery to return a list of values for an identifier field, such as an account number, that represent the audience you want to select. Then you use that list as the list for an IN operator.
- For example: Suppose we want to look at the sales trend of everyone who bought design services last year. We would query the invoice database to select the (distinct) customers who were invoiced for designer hours last year. This would give us a list of accounts. We would then use that list as the filter for our regular analytical sales report, with the following WHERE clause:
Where acct_id IN (Select Distinct acct_id from invoices where product=’designer’)
- How Can I Search a Text Field for a Pattern?
- Write a query statement which includes the LIKE operator.
- The Like operator functions as a wildcard search – you search the text field for parts that match the condition.
- If you are working with an advanced database system (Oracle, SQL Server), more advanced tools such as regular expressions or high speed text search functions may also be available.
Insert and Update Operations
Parlor Tricks and Gotcha’s
Data Quality – Checking The Results
This is another very easy way for a new developer to quickly stand out from their peers in the interview. Learn how to check the results of your work and discuss this topic in the interview. In reality, checking your data will consume at least 30% – 50% of the time involved in a typical business intelligence project. And most new data analysts seem oblivious to the idea that the database is wrong or their query isn’t rolling up data correctly.
Talk data quality in the interview. Trust me. Nobody else talks quality.
So how do you validate the results of your work? A couple of pointers:
- Identify a trusted source, ideally from your Finance team, which you can reconcile aggregate results to. For example, if I’m doing a sales analysis, I feel more comfortable if I can take an aggregate result from my sales database and tie it out to Finance’s divisional or company totals.
- Once you have a value total result for an unfiltered query, slice your way down to the part you need using filters where you can identify what was not selected. So if the company total is $500 MM in sales, and I’m delivering a report on the east region, I should be able to split out sales by region code and validate what I’m keeping / dropping makes sense.
- If we’ve published this report before, pull up a copy of the last report and compare your product with it. Go Line by Line and ask yourself if the differences make sense. Incidentally, this is a good way to write the cover letter for your report.
- If the query represents data in an operational system (like customer service), find a friend with access to that system to pull up a few examples and take a close look at them.
- When reporting on a multi-step process, sketch out a map of how the big picture process should work and identify how much things should change between each stage in a perfect world. Compare your report with these results and ask questions about any big gaps….
- In addition to all of the above, make a quality check list (a few notes) and tick off each item before you ship.
Trust me. Everyone talks queries, but nobody talks quality. As a hiring manager, knowing you know how to check your work and validate the data you’re getting isn’t garbage gives you a huge leg up on other candidates.