SQL Server SME Interview Experience
This post “SQL Server SME Interview Experience” can help you in understanding and answering the most common question for a SQL Server DBA or Developer. A few days back I got an email from one of the follower; she has given an interview for one of the top product based company for the position SQL Server SME and product owner role in Toronto and shared the conversation which was happened with the interviewer. I just wanted to make it as a post which might help few people around.
She is 10+ years of experience in delivering enterprise database solutions using Microsoft SQL Server and applied for SQL Server SME with one of the world’s top product company. Initially she had two telephonic interviews and got invited for face to face. Below is the conversation with the interviewer:
SQL Server SME interview experience shared by a follower
Interviewer: Hi Good Morning
Me: Very Good morning, how are you today?
Interviewer: I am doing great, any questions?
Me: (Surprised) Hmm I have gone through the job role description but would like to know more about the responsibilities.
Interviewer: Well to simply describe the job role, SME will be taking care of the entire XXXXX product suite database systems. He or She should be the point of contact for all database development and administration activities. Also the role would be responsible for all product related technical escalations.
Me: Got it, but how a SQL Server SME can be a product owner? Apart from the technical point what other skill set you are looking for?
Interviewer: Well the SME will not responsible from the business or domain prospect. But the person should be able to handle all product related escalations, if any website is not working or the end customers facing a technical issue they would come to you and you need to take the ownership and should handle the problem with the help of concern technical team.
Me: I understand, in fact I am also working for a product development company. Other than the database activities I need to participate in product enhancements related technical discussions and brainstorming sessions.
Interviewer: Yes I knew that. In our environment you many need to attend the performance issues on frequent basis. Can we talk something about performance?
Me: Sure! It’s one of my interested areas. I had a very good experience in dealing with the performance issues.
Interviewer: That’s’ great! Ok, let’s say you are newly joined in an environment and you are assigned to handle all database operations for a mission critical application also from the technical point you are the owner of this application. The very first day you got a call from a business analyst saying that application is running dead slow and you are asked to check it immediately. What is your action plan? Tell me how do you handle, please include each and every step in detail.
Me: As a first step I would check if the entire application is slow or just a specific part is not running well. Also would check for the details “If Any Errors”, “Time Out”, “Access from My location” etc.
Interviewer: Ok, “entire application is running slow, I am not even able to open the home page. It’s taking a long time and ended with a timeout error” then what is your next step?
Me: Since it’s a mission critical application I’ll immediately open an incident with a bridge call and will add all the required teams into that Ex: Application, DBA, Server, Network, Proxy, Storage, Middleware etc. Meanwhile I will send a high alert notification to all the required teams and business stake holders.
Interviewer: Great, ok all teams are joined the call and you sent the notification too. What next?
Me: First thing I would check if any maintenance activity in place without a proper communication. I would check the ADM (Application Dependency Mapping) which means all components / layers/ tiers involved in the application. I would ask the respective teams to quickly perform a high level health check for the components which shouldn’t take more than 5 min.
Interviewer: As I told you, I need the maximum information, can you please elaborate?
Me: Sure, I would ask
- Application team to check the exact error message coming in while trying to access the application and ask them to verify the problem is not location or the resource specific.
- Server team to perform Health Check for all servers involved
- DBA team to check the Database Servers Health
- Web-service / Middleware team to perform HC on all Middleware components
- Network / Proxy / DNS Team to perform HC (Health Check) on DNS, Firewall, Load Balancers and Network Connectivity
Also I would check if any recent changes happened on the infrastructure from any of these teams.
Interviewer: Perfect, let’s say you are representing the database systems and you need to take care of Database Part. Can you tell me what you need to check while doing HC?
Me: I would quickly check:
- SQL Server Services are up and running
- All Databases are online
- Disk Space
- CPU, Memory Usage
- Suspected Errors in Log
Interviewer: By the way HC means do you people check it manually?
Me: We do use pre-defined scripts with scheduled jobs. We’ll just run the job and it captures all required information and email the HC report.
Interviewer: See the problem is Application running slow; does it make sense checking the database servers? If I say that I am not able to login to the app then you might check the DB side. But here the case is different and they are not able to open the application home page.
Me: Yes, it clearly makes sense because we have already insisted the respected teams to investigate from the different prospects. I have seen applications where the complete metadata is dynamic which means in an application page everything is dynamic including the text box size, location, label etc. In that case if it fails to establish a connection with the database may delay the load time.
Interviewer: Agreed! Ok now tell me what is your approach if incase one of the parameter is failing in database health check. Let’s say CPU / Memory / Disk / Service / Database / page errors in log file. How do you handle these scenarios?
Me: Well, on a given server if there is a problem with one of the resources, first thing we need to make sure that the problem is from the SQL Server. There are chances that a server can have services other than SQL Server or there might be more than one SQL Instance on server. Thereof first and foremost thing is we need to identify the problematic point.
If it is SQL Server causing high CPU, identify the database and then the query / batch / proc causing the high CPU. Usually high I/O operations cause the high CPU. The most common issues “Row by Row processing ex: Cursors”, “User Defined Functions”, “Updating Statistics with Full Scan”, “Parallelism Issues”, etc.
If it is SQL Server causing high Memory, identify the database and then the actual database / query / batch which is taking the majority portion in memory. There are different caches available, identify the problematic area that’s might be a data cache or plan cache or wrong memory limit configurations.
Disk is full for some reason, quickly check the options, first try to copy files to other drive and make some free space, reasons might be a huge log file backup, dump files, ldf file size increased etc.
Just like these I’ll try possible options / solutions when I see the actual problem.
Interviewer: Great! Ok after the initial 10 min investigation, everyone reported back all are working fine, network, Database Servers, Windows / Linux servers, Application Servers, Services, Firewall, and Middleware etc. Everyone reported everything looks good, and of course you also didn’t see any issue from database side. But still the application is not opening and ending with a timeout message. What’s your next step?
Me: From my experience in technology anything and everything can be traceable, mostly the problem can be identified from the returning error message and from log files. If still we can’t identify the problem then we would ask the Change Management team to provide all recent changes implemented on all Application supporting servers /components / services. For example Application / Database code changes, Patching related, network / Firewall changes etc. But I am sure we can find the bottleneck from somewhere in these checks.
Interviewer: By the way from the first step, you are involving the network and firewall teams any specific reason for that?
Me: See as you said my role is not just restricted for database ownership but extended as an app owner. In this situation application home page is not opening and failing with a timeout error. From my experience there are more chances to have a bottleneck at application lever or at network / firewall level and less chances to have a problem from DB side. And that’s the reason I am involving network and firewall teams along with the app team.
Interviewer: Appreciated! Coming back to our actual question, unfortunately no luck, you got a response from all teams saying that there are no recent changes implemented on any of these servers / components which are related to the application. Time is running out but still application is getting timeout. What’s your next step?
Me: Are there any third party tools / services involved in this entire architecture? As I told you it should be caught in the initial check itself. I have seen few scenarios where application was down only for the few customers and it was working for the rest. In that time we had difficulty in identifying the problem. But in the current scenario what we have been discussing the application is down / not responding for all and still not able to identify the problem.
Interviewer: Application not working only for specific customers? Is that an internal application?
Me: No! It’s a public / internet facing application and the customers from few countries were not able to access it from internet.
Interviewer: That’s interesting, can you explain more about the situation and how did you resolve it?
Me: It was a problem with third party component and we came to know that after few hours of investigation.
Interviewer: Can you explain the exact situation and how did you resolve it? I need all possible details, I have the patience can you explain if you have time?
Me: Sure I do.
I was being a database architect for a critical application. We have got an incident as Application was not being accessible from a specific list of countries. We were surprised to see that the application was not accessible only from specific locations; unfortunately app architect was not available at that time. We scheduled a meeting with the end customer who was facing the access issue; network team collected the trace route and other logs information, even after analyzing that they couldn’t identify the problem. Meantime we have got the application architecture diagram and we could see there is a third party “XXXXXXXXX” “Content Delivery Network” which was using to protect our application.
Any request comes from the internet will reach that third party CDN firewall, the request is verified and from there it is redirected it to our company firewall. Every time when it redirects the request to our company firewall it uses a predefined list of IP addresses. The third party team added new IP’s to enhance their capability and these IP’s are not shared with us. When our end customers are trying to access the website their requests were redirecting from the changed IP addresses to our company firewall. Since our company firewall not able to identify the IP address (These are the new IP’s) it was dropping the connection. The moment our proxy team added these IP addresses to our firewall whitelist the application started working in those areas and customers sent the confirmation.
In this scenario I didn’t do anything special from technical point but I analyzed the architecture diagram and identified the third party network direction provided details to the network and proxy team.
Interviewer: That’s a fantastic experience. Ok, you found there is a problem and you quickly provided the solution and application started working fine. In this entire incident procedure do you see any process level mistakes either from my questions or from your answers?
Me: I can’t say it’s a problem but from my point of view, if it is a mission critical application we should already have all required monitoring in place and from the business prospect any incident should be identified by the automated alerts or monitoring system. If an incident is identified by a customer or from a business people indicates the failure from the infrastructure team.
Interviewer: True! Ok, what will be the next step?
Me: I’ll insist to open a Root Cause Analysis with a problem request to provide a long term solution to prevent these kinds of failures.
Interviewer: Hmm, ok let’s come back to my first question again. Now it’s from a different prospective, they are able to login to the application without any issue but they are facing an issue with a specific page. The page is getting timeout. How do you deal with that?
Me: As we discussed earlier in most of the cases when a specific page or part of the application is running slow which means that problem is with the application / SQL coding or with the underlying database.
Interviewer: Ok, go on how you identify the problem?
Me: I would quickly perform Health Check on database servers just to make sure no issues from the resource utilization. Based on a SQL version I’ll quickly run a server side trace or take the help of extended events to identify the actual query / SQL code / procedure / function that’s causing the problem. Meantime I would ask application team to investigate from application side.
Interviewer: Ok you narrow down the filter and identified the bottleneck. One of the procedures is running slow and it’s taking 7 min. How do you troubleshoot it?
Me: I would check the stored procedure execution plan to identify the primary bottleneck. I handled lot of situations like these.
Interviewer: Great to know that. So can you tell me the possible failure points which might causes the performance slow down?
- Bad Parameter Sniffing
- Scalar User Defined functions using in a query or with a loop
- TEMPDB PAGELATCH contention
- The CXPACKET WAIT TYPE
- Blocking or long running transactions
- Cross Database Queries on huge datasets
- Poorly written queries
- Wrongly configuring Server/Database options
- Poorly designed Indexes
Interviewer: Great! In a project you are representing the technology lead role. You have a scheduled meeting with the customers and it’s a critical deal. You are going to give them a presentation on how cloud based model works for Database Management systems. This meeting determines a million dollar deal. You prepared well and all set for meeting, the moment you are about to enter into meeting room you got a call and unfortunately it’s a medical emergency at home and no one is there to help at home except you. How do you handle that?
Me: I would inform the situation to my manager and I attend the home emergency. I’ll try to look for an alternative person who can manage the meeting with my assistance.
Interviewer: Ok tell me what’s your priority family or Career?
Me: Not even my family, it’s always “ME” is a priority for me, my dreams, my wishes, my career, it’s always me. I work hard, work smart because it’s my career and I have my own goals. To answer your question my priority is “My Health”, “Family” then my “Career”. In fact these are equally important for anyone but the one who manage or balance between these parameters can enjoy the success.
Interviewer: So you had an experience with our company isn’t it?
Me: Sorry I didn’t get it
Interviewer: I mean you had discussion with our techies, I think this is the 3rd time you are interacting with us correct? What’s your opinion on our technical panel?
Me: To be frank I never ever seen a 2 and half hours of technical discussion on the same question. Overall I had a strange experience with your technical team and unknowingly it created enthusiasm on your work environment.
Interviewer: (A big smile) Any more questions for me?
Me: You still didn’t tell me why the application was down?
Interviewer: I think you covered the maximum scenarios! So what do you think are you going to win this interview?
Interviewer: How do you say that?
Me: I believe you are a good decision maker.
Interviewer: Okhay, would you join us if I offer you less than your current salary?
Me: Of course I would, if I can get a share in product sales profit
Interviewer: Aww, smart enough ….you may expect a call from our team sometime in the next week.
Me: Thank you, I am grateful for your time and patience.
She got a call on the same week and she was informed that they were going make an offer. On behalf of all of our followers we wish you a very big success and all the very best. Thank you so much for sharing your experience.
Guys, if someone wants to share your interview experiences please do share on firstname.lastname@example.org. Just send us the questions and the interview details where and when you attend etc. Your personal details are not mentioned in anywhere without your notice.