Real Interview Questions: Prepare for DevOps, SRE, Cloud & Data Engineering Related Roles #5

Below is a curated list of real candidate experiences, shared directly via LinkedIn. A big thank you to everyone who contributed their real DevOps interview experience and questions and provided valuable insights. LinkedIn post links are included for reference. This page is intended to support the community—especially those preparing for DevOps/SRE/Cloud &Data Engineering related interviews or considering a job change.

List of all of our interview experience and questions can be FOUND HERE

Real DevOps interview experience and questions

Experience #1: Posted on July 19, 2025

LinkedIn Post: https://tinyurl.com/2a2msrn3

Active Directory

1️⃣ What is Active Directory?
A: A directory service for centralized domain management.

2️⃣ Difference between Domain & Workgroup?
A: Domain is centralized; Workgroup is peer-to-peer.

3️⃣ What is Group Policy?
A: Tool to manage user, computer, and security settings.

4️⃣ What is a Domain Controller?
A: Server that authenticates users in a domain.

5️⃣ FSMO Roles in AD?
A: Schema Master, Domain Naming Master, RID, PDC, Infrastructure Master.

DNS & DHCP

6️⃣ What is DNS?
A: Resolves domain names to IP addresses.

7️⃣ What is DHCP?
A: Automatically assigns IP addresses to devices.

8️⃣ Forward vs Reverse Lookup?
A: Forward: Name to IP. Reverse: IP to Name.

9️⃣ What is a DNS Zone?
A: Part of DNS namespace containing resource records.

10️⃣ What is DHCP Lease?
A: Temporary IP assignment duration to a client.

Windows Server & Infrastructure

11️⃣ What is RAID?
A: Data storage technique for redundancy & performance.

12️⃣ Common RAID levels?
A: RAID 0, 1, 5, 10.

13️⃣ What is RDP?
A: Remote Desktop Protocol for remote server access.

14️⃣ What is Windows Event Viewer?
A: Tool to view system logs & events.

15️⃣ How to check server IP?
A: Use ipconfig command.

VMware & Virtualization

16️⃣ What is a Hypervisor?
A: Software to run virtual machines.

17️⃣ Type-1 vs Type-2 Hypervisor?
A: Type-1 on bare metal; Type-2 on host OS.

18️⃣ What is vMotion?
A: Live VM migration between ESXi hosts.

19️⃣ What is VM Snapshot?
A: Saved VM state at a specific point.

20️⃣ What is ESXi?
A: VMware’s Type-1 hypervisor.

Cloud & Azure Basics

21️⃣ What is Cloud Computing?
A: On-demand access to IT resources over internet.

22️⃣ IaaS vs PaaS vs SaaS?
A: Infra, Platform, and Software as a Service.

23️⃣ What is Azure AD?
A: Microsoft’s cloud-based identity management service.

24️⃣ What is a Resource Group in Azure?
A: Container for managing Azure resources.

25️⃣ What is VM Scale Set?
A: Auto-scaling group of VMs.

Networking Basics

26️⃣ What is a VLAN?
A: Logical separation of networks within a switch.

27️⃣ What is a Default Gateway?
A: IP address for traffic to exit the network.

28️⃣ What is a Subnet Mask?
A: Defines network & host portions of an IP.

29️⃣ What is Ping?
A: Network tool to test connectivity.

30️⃣ What is Traceroute?
A: Shows path taken by a packet to destination.


Experience #2: Posted on July 18, 2025

LinkedIn Post: https://tinyurl.com/j8cm9wny

✅ Linux & System Administration
1️⃣ How do you perform an in-place upgrade from RHEL 7 to RHEL 8 using leapp? What precautions do you take?
2️⃣ Faced a Linux boot failure due to fstab misconfiguration? How did you recover?
3️⃣ Steps to configure network bonding in RHEL for redundancy & throughput?
4️⃣ How do you tune kernel parameters for performance in RHEL? Tools & files involved?
5️⃣ How do you securely configure user access and sudo privileges in Linux?
✅ AWS & EC2
6️⃣ How do you assign a secondary private IP to an EC2 instance and make it persist after reboot?
7️⃣ How do you optimize S3 cost using storage classes & lifecycle policies?
8️⃣ What is S3 Object Lock and when should you enable it?
9️⃣ How do you deploy a 3-tier architecture using EC2, RDS & Load Balancer?
🔟 How do you manage IAM roles & security groups in a multi-instance EC2 setup?
✅ CI/CD & Jenkins
1️⃣1️⃣ How do you implement multi-branch pipelines in Jenkins? Why are they important?
1️⃣2️⃣ What are your best practices for securing Jenkins and credentials?
1️⃣3️⃣ How do you troubleshoot failed deployments in GitLab CI/CD?
✅ Kubernetes & OpenShift
1️⃣4️⃣ Difference between a container and a pod? When to use StatefulSets over Deployments?
1️⃣5️⃣ Faced pod scheduling failures due to resource limits? How did you resolve it?
1️⃣6️⃣ How do you expose services using NodePort, Ingress & LoadBalancer?
1️⃣7️⃣ How do you handle RBAC permission issues during app deployment?
1️⃣8️⃣ Role of kubelet, kube-proxy & etcd in a K8s cluster?
1️⃣9️⃣ Challenges with persistent volumes? How did you solve them?
2️⃣0️⃣ What tools/commands do you use to debug networking issues inside a pod?
✅ Terraform & IaC
2️⃣1️⃣ How do you use the lifecycle block to ignore changes to specific fields like hostname?
2️⃣2️⃣ Safely creating/modifying EC2 instances using AWS provider in Terraform?
2️⃣3️⃣ How do you handle rollback/change management when updating critical infra?
✅ Real-World Experience
2️⃣4️⃣ Describe a production issue involving node pressure in Kubernetes. How did you mitigate it?
2️⃣5️⃣ In a cloud migration, how did you choose between in-place OS upgrades vs provisioning new VMs?
2️⃣6️⃣ What monitoring/alerting tools did you use with Azure Monitor or CloudWatch to scale apps?


Experience #3: Posted on July 18, 2025

LinkedIn Post: https://tinyurl.com/4a8e8hz3

✅ CI/CD & Jenkins
• Can you elaborate on the process you followed to design and implement an end-to-end CI/CD pipeline using Jenkins?
• What were the major challenges you faced during the implementation?
• How did you ensure the security of the CI/CD pipeline?
• How have you configured Jenkins to automate deployment after each commit?
• Which three Jenkins plugins would you enable to track status and what info do they provide?
• How do you set up Jenkins to alert when a deployment fails? What details would you include in the message?
• How would you design your Jenkins pipeline to ensure that failure in one microservice doesn’t block others?
• How would you detect a failed deployment in one microservice and what steps would you automate to verify rollback?
• What deployment metrics do you track in Jenkins across microservices and how do you present them to stakeholders?

✅ Terraform & Infrastructure as Code
• How have you leveraged Terraform in your role to manage AWS infrastructure?
• Can you provide an example of a complex provisioning task automated using Terraform?
• What best practices do you follow when writing Terraform configurations?

✅ Kubernetes
• Walk me through the process of manually scaling a Kubernetes deployment from 3 to 10 replicas.
• What exact command would you use with kubectl scale deployment?


Experience #4: Posted on July 18, 2025

LinkedIn Post: https://tinyurl.com/yv9dv3hr

******Scenario-Based Linux Interview Questions******

🔧 1. High CPU Usage
Q: “A production server is running slow. How would you identify and troubleshoot high CPU usage in Linux?”
👉 Expected Answer:
Use top, htop, vmstat, or pidstat
Identify high-CPU processes (ps -eo pid,ppid,cmd,%mem,%cpu –sort=-%cpu)
Check cron jobs, application logs
Kill or restart misbehaving process if needed
Investigate memory swap usage
🐘 2. Disk Space Full
Q: “Your application failed to write logs. You found the disk is full. What steps would you take?”
👉 Checklist:
df -h and du -sh /* to locate space usage
Find large files: find / -type f -size +500M
Clean /var/log, rotate logs using logrotate
Clear package cache (yum clean all or apt clean)
Check hidden deleted open files: lsof | grep deleted
🔄 3. Service Not Starting
Q: “You try to start a service with systemctl start nginx, but it fails. How would you debug?”
👉 Diagnostic Steps:
systemctl status nginx
journalctl -xe or journalctl -u nginx
Check config files: /etc/nginx/nginx.conf
Verify port conflict with netstat -tulnp or ss -ltnp
Look for SELinux or permission issues
🔐 4. SSH Login Failure
Q: “You can’t SSH into a server. What do you check first?”
👉 Good Answer:
Network reachability: ping, telnet, nc
SSH port (default 22): is it open? (ss -tuln)
Check sshd service: systemctl status sshd
Review /var/log/auth.log or /var/log/secure
Permission on .ssh/authorized_keys, ownership
🧪 5. File Permission Issue
Q: “A script runs fine manually, but fails in cron. What might be wrong?”
👉 Key Points:
Cron runs with limited $PATH, env vars
Use absolute paths in scripts
Check user permissions (chmod, chown)
Redirect output to log for debugging: script.sh >> /tmp/cron.log 2>&1
📜 6. Crontab Not Working
Q: “Your scheduled cron job isn’t executing. How do you debug it?”
👉 Troubleshooting:
crontab -l, crontab -e
Check /var/log/cron or /var/log/syslog
Validate syntax in crontab (crontab.guru)
Ensure script is executable and has correct shebang
🌐 7. Port Already in Use
Q: “You try to run a web server, but it fails to bind to port 80. How would you resolve this?”
👉 Tools & Actions:
netstat -tulnp or ss -ltnp to find what’s using port 80
Kill or reconfigure conflicting service
Use lsof -i :80 to trace PID
Check if process is running as root (port < 1024 needs root)
🧩 8. Kernel Panic or Server Crash
Q: “A Linux server rebooted unexpectedly. How do you investigate the cause?”
👉 Analysis:
Check /var/log/messages, /var/crash, journalctl
Look at dmesg output for kernel logs
Check memory/disk errors
Enable kdump for future analysis
🧠 9. Symbolic vs Hard Links
Q: “A symbolic link is broken after moving the target file. Why?”
👉 Explanation:
Symlinks point to file path, not inode. Moving file breaks path.
Hard links share inode, so moving doesn’t break them
Use ln -s (symbolic) vs ln(hard).


Experience #5: Posted on July 17, 2025

LinkedIn Post: https://tinyurl.com/mr3nmj8t

******EC2 Interview Question & Answer******

Q: Your EC2 has a public IP and the port is open in the security group, but it’s unreachable. Why?

A: Check the subnet’s network ACL. If inbound or outbound rules are blocking traffic, the security group won’t help. NACLs silently drop traffic with no message.

Q: You shared an AMI with another AWS account, but they still can’t launch an instance from it. What’s usually missed?

A: Sharing the AMI isn’t enough. You also need to share the associated EBS snapshot. Without that, the AMI looks valid but fails at launch.

Q: You restored an RDS snapshot for staging, but some queries behave differently than production.

A: When you restore from a snapshot, RDS assigns the default parameter group by default. Custom parameter groups from production are not restored automatically. If not manually reassigned, staging may run with different settings, leading to changes in query behavior or performance.

Q: You enabled IAM roles for service accounts in EKS, but your pod can’t access S3. The role looks fine. What’s the catch?

A: The pod must be using a service account with the right annotation linking to the IAM role. If the pod defaults to the default service account or the annotation is missing, the role doesn’t apply.

Q: ALB is marking your targets as unhealthy, but hitting the app directly works fine.

A: ALB health checks are strict. If your app returns a 301 or a login page without a clean 200 OK, it’ll fail the check even if the app seems fine in the browser.

Q: You pushed a new image to ECR and updated your ECS task definition, but it still runs the old version.

A: If you’re using mutable tags like latest, ECS often pulls from cache. Unless you force a new digest or use a unique tag per version, you’ll keep running stale containers.

Q: In EKS, your stateful pod using an EBS volume is stuck in Pending. Why doesn’t it reschedule?

A: EBS volumes are limited to a single Availability Zone. If EKS places the pod on a node in a different AZ, the volume cannot attach. Make sure your node group includes nodes in the same AZ as the volume.


Experience #6: Posted on July 17, 2025

LinkedIn Post: https://tinyurl.com/yxrcszve

✅ Round 1: Screening Round (30 minutes)
– Walk me through your current project architecture and your role in it.
– Which DevOps tools have you worked with in the last 2 years?
– What AWS services have you used in production?
– How do you expose a Kubernetes application to external traffic?
– What is the purpose of a NAT Gateway?
– How do you check running processes in Linux?
– What command would you use to find files larger than 100MB?
– What is the difference between Deployment and StatefulSet in Kubernetes?
– What is a ConfigMap, and how is it different from a Secret?
– How do you check network connectivity between two servers?
– Describe your experience with CI/CD pipelines.

✅ Round 2: Technical Round (60 minutes)
– You have an application in Account A that needs to access an S3 bucket in Account B. How would you configure this?
– Write a Dockerfile for a Node.js application with multi-stage builds.
– How do you handle Terraform state file corruption?
– Your EC2 instance in a private subnet needs to download packages without NAT Gateway. What alternatives exist?
– How do you debug a container that has exited?
– You need to import an existing AWS VPC into Terraform. What are the steps?
– How would you implement blue-green deployment in Kubernetes?
– How do you manage secrets in Terraform without hardcoding them?
– What’s the difference between COPY and ADD commands in Dockerfile?
– How would you implement cross-account resource provisioning using Terraform?
– How would you handle secrets in a Docker container for a PHP application connecting to MySQL?
– An S3 bucket was created via Terraform, but someone manually added a policy. How do you handle this drift?
– How do you implement network policies to restrict pod-to-pod communication in Kubernetes?
– Write a Python script to backup all files older than 30 days from a directory.
– Your company’s cloud costs are increasing rapidly. – How would you approach cost optimization without impacting performance?
– How would you set up geolocation-based routing using AWS services?
– A critical production Kubernetes cluster is experiencing multiple issues. Pods are stuck in ImagePullBackOff, some pods are being evicted, and users are reporting 503 errors from the application. What troubleshooting process will you follow, and how can to avoid this in the future?

✅ Round 3: Behavioral Round
– How do you handle a situation where you’re asked to work on a technology you have no experience with?
– Describe a time when you had to work with tight deadlines and limited resources.
– Tell me about a mistake you made in production and how you handled it.
– Describe the most challenging technical problem you’ve solved in your career.
– How would you convince stakeholders to adopt a new technology or process?
– Tell me about a time when you had to learn a new tool quickly to solve a business problem.


Experience #7: Posted on July 17, 2025

LinkedIn Post: https://tinyurl.com/k3uvw33a

L1 & L2 level questions related to AWS, Terraform, Kubernetes, Docker, Git.

Level 1 –

1, cicd workflow, what kind of pipeline.
2, use of webhook
3, purpose of webhook
4, stages of pipeline…
5, shared libraries in jenkins?
6, how do we define shared libraries?
7, how are shared libraries written?
8, how do you define a pipeline and call it?
9, what kind of app you deploy on the pipeline?
10, basic structure, folder structure of helm?
11, what command are you using deployment in helm
12, in the Jenkins pipeline, the pipeline is running successfully but the build is not happening, what are the issues?
13, in kubernetes, what are the errors you are getting, why they come and how you resolve?
14, explain the crash loop back off,
15, image pull error?
16, command to go inside a pod?
17, how can you create the kubernetes class?
18, what are the steps to create the cluster?
19, what is the master node and other node?
20, code to create a cluster using terraform?
21, stages in docker images?
22, DB entry point, CMD
23, why do we use entrypoint, CMD
24, DB ec2, eks, ecs
25, command to connect ecs
26, which tool are you using for deployment?
27, which registry for storing the docker images?

Level 2 –

1, Branching strategy?
2, your release branch will break, then how u will avoid this kind of issues, then how do you merge?
3, in production having some bugs, how will you resolve?
4, typical deployment flow?
5, cicd workflow?
6, how do we do a full quality check?
7, jenkins file, different stages…
8, shared libraries in jenkins file?
9, typical structure of shared libraries…
10, are you aware of security scanning tools?
11, how do you pass the environment variables on docker build command.
12, what services do you use for storing the images?
13, DB, how do you establish the connection?
14, how do you scan the images at the registry level?
15, any extension you are using for image scanning?
16, authentication of eks cluster?
17, storing the secrets?
18, how to create lambda function, how it’s taking the artifacts.
19, options on lambda to push the artifacts?
20, what is email signing and helm chart signing?
21, which tool for signing the helm chart?


Experience #8: Posted on July 17, 2025

LinkedIn Post: https://tinyurl.com/3p6pf66y

These kinds of questions test your ability to think beyond theory and solve actual engineering challenges. If you’re preparing for DevOps/SRE roles, make sure you can confidently walk through questions like these:
🔹 Python & Scripting Challenges
1️⃣ You are given an array — how would you count the number of occurrences of each element and store the result in a dictionary format?
🔹 Application Scaling & Architecture
2️⃣ If you need to scale an application to handle more traffic, how would you approach it?
🔹 AWS Deep Dive
3️⃣ What are the different types of S3 storage classes available, and when would you use each?
🔹 Your Role in the Project
4️⃣ What are your current roles and responsibilities in your project?
🔹 Docker Knowledge
5️⃣ How can you improve a Dockerfile for performance, security, and efficiency?
6️⃣ Write a Dockerfile and walk through how it works.
🔹 Scenario-Based & System Design Questions
7️⃣ How do you manage secrets securely in CI/CD or Kubernetes environments?
8️⃣ What are the differences between blue-green and rolling deployments — and when would you use each?
9️⃣ How do you monitor and troubleshoot containers running in production?
🔟 Can you describe the CI/CD pipeline you’ve implemented in your current or past project, including the tools and steps involved?


Experience #9: Posted on July 17, 2025

LinkedIn Post: https://tinyurl.com/3jxunhz4

Below Interview Questions they have asked for SRE and Production support Engineer

1. Explain About yourself
2. ⁠Explain about your day to day to activities
3. ⁠what is Error Budget
4. ⁠What is meant by SLI and SLA
5. ⁠Is it SLA and Error budget both are same
6. ⁠which monitoring tools you’re using
7. ⁠Have you worked on splunk
8. ⁠what is meant by splunk Forwarder
9. ⁠what is meant by Index in splunk
10. ⁠Are you able to write splunk Queries
11. ⁠Have you worked on APPD
12. ⁠what is meant by Health rule violation
13. ⁠what is difference between egrep and grep in linux
14. ⁠what is meant by Zombie process
15. ⁠How you will get to know how many zombiee process are running in linux server
16. ⁠why we are using grep command
17. ⁠how to check CPU utilization
18. ⁠After executing TOP command what and all details you will see in results
19. ⁠Have you killed any process in linux
20. ⁠Once you kill any particular parent process , is it child process will run or not
21. ⁠How to view only list of directories in linux
22. ⁠why we are using tstree command
23. ⁠How to check particular process is running or not in linux
24. ⁠what is schema in database
25. ⁠In a table we have 10 Employees salary data, How to get top 10 highest employees salary
26. ⁠Explain a situation where you have worked with multiple resources to implement new process
27. ⁠why we are using VI editor
28. ⁠How to edit 10th line using vi editor in linux
29. ⁠How to check last 30 days modified files in linux server
30. How to find a file in linux


Experience #10: Posted on July 15, 2025

LinkedIn Post: https://tinyurl.com/mukme7a8

I’m sharing a few real questions I’ve encountered (or seen asked) during interviews. Hope this helps others in the same boat! 🚀
🧩 Jenkins / CI-CD Scenarios
🔹 Your Jenkins pipeline works in staging but fails in production. How would you go about debugging it?
🔹 Jenkins job is stuck in “Waiting for executor” – what steps would you take to resolve it?
🔹 How do you roll back a failed deployment using Jenkins?
🔹 How would you prevent accidental deployments from feature branches to production?
🐳 Docker Scenarios
🔹 One of your Docker containers keeps restarting — how do you troubleshoot?
🔹 Your image size is way too large. What steps will you take to optimize it?
🔹 How do you handle secrets (passwords, API keys) securely inside a Docker container?
☸️ Kubernetes Scenarios
🔹 A pod is stuck in CrashLoopBackOff — where do you start investigating?
🔹 A deployment fails mid-way in production — how do you roll it back?
🔹 You want a service accessible only within the cluster — how would you configure that?
🔹 How do you rotate Kubernetes secrets without downtime?
☁️ AWS / Cloud Infra
🔹 One of your EC2 instances is unresponsive — what would you check first?
🔹 How do you perform a zero-downtime deployment in AWS?
🔹 How do you configure autoscaling based on CPU utilization?
🔁 Git / Collaboration Scenarios
🔹 You and a teammate both pushed changes to the same branch and now there’s a conflict — what next?
🔹 You accidentally deleted a branch that wasn’t merged — how can you recover it?
🔹 How do you handle GitOps when managing multiple environments (dev/stage/prod)?


Experience #11: Posted on July 15, 2025

LinkedIn Post: https://tinyurl.com/2thhmvee

🔁 Azure Data Factory (ADF)

1️⃣ What are the key components of ADF (pipeline, dataset, linked service, trigger)?
2️⃣ How do you copy data from SQL Server to ADLS using ADF?
3️⃣ What are the different types of triggers in ADF?
4️⃣ How do you perform parameterization in ADF pipelines?
5️⃣ What’s the difference between Lookup and Get Metadata activity?
6️⃣ How do you handle failures and retries in ADF?

⚙️ Azure Databricks & PySpark

7️⃣ What is PySpark and where have you used it in your project?
8️⃣ Difference between repartition() and coalesce() in PySpark.
9️⃣ What is a broadcast join? When should we use it?
🔟 How do you write a DataFrame to Delta format in Databricks?
🔹 How do you handle null values in PySpark?

🧪 SQL & Data Validation

🔹 Write a SQL query to get the 2nd highest salary from a table.
🔹 How do you perform data validation after ingestion?
🔹 What are the types of joins in SQL with examples?
🔹 Difference between WHERE and HAVING clause.

🗃️ Azure Data Lake & Delta Lake

🔹 What is Delta Lake? How is it different from traditional file formats?
🔹 Explain Bronze, Silver, Gold architecture in your project.
🔹 How do you partition data in ADLS/Delta?

🔐 Security & Integration

🔹 How do you connect ADF to an on-prem SQL server?
🔹 What is a Self-hosted Integration Runtime?
🔹 How do you securely access data from ADLS Gen2 in Databricks?

🧠 Project & Real-Time Scenarios

🔹 Describe the architecture of your current project.
🔹 How do you handle late arriving or corrupted files in your pipeline?
🔹 How do you maintain SCD Type 2 in your pipeline using PySpark or ADF?

💬 Behavioral / Soft Skills (Infosys Focus)

🔹 How do you troubleshoot if a pipeline fails at midnight?
🔹 Tell me about a challenge you faced in your project and how you solved it.
🔹 Are you familiar with Agile methodology? How did you work in your sprint?

💡 Tips for Infosys Interviews:
✅ Be clear on the tools you’ve worked with — and how you used them.
✅ Have your project architecture ready to explain in 2–3 minutes.
✅ Expect questions around performance optimization and CI/CD basics.


Experience #12: Posted on July 14, 2025

LinkedIn Post: https://tinyurl.com/4kwff7en

I recently went through an interview and came across some interesting L1 & L2 level questions related to AWS, Terraform, Kubernetes, Docker, Git.

Level 1

1, write the python script for reading the output.
2, write the python script to output of dockerfile.
3, bash script – how to change the version of dockerfile.
4, explain the CI CD workflow?
5, components used in the pipeline.
6, what is sonarqube?
7, python script for triggering the Jenkins pipeline.
8, how to manage terraform?
9, version control, how to store the terraform files?
10, S3 uses?
11, troubleshoot failed Jenkins pipeline?
12, handle security in devops?
13, how do u ensure zero downtime?
14, load balancer?
15, troubleshooting steps followed in production failure?
16, how to configure multiple environments in the Jenkins pipeline?
17, challenges faced in a recent project?
18, possibility to build triggered is not triggered?

Level 2 – Client round.

1, end to end workflow of devops.
2, branching strategy.
3, typical branches in dev?
4, stages in pipeline, type of pipeline.
5, unit testing, which tool u r using?
6, docker deployment? Write the dockerfile?
7, structure of dockerfile?
8, how to deploy k8s in AWS?
9, components of K8s?
10, helm chart?
11, commands to deploy on k8s, advantage?
12, challenges faced?
13, troubleshoot methods?


Experience #13: Posted on July 14, 2025

LinkedIn Post: https://tinyurl.com/5fz4sfvd

Here’s the expanded and freshly curated set of questions from the round:
🚀 CI/CD & Azure DevOps Pipelines
✅ How do you implement pipeline validation and prevent accidental deployments to production?
✅ What’s the difference between runtime expressions and compile-time expressions in YAML pipelines?
✅ How do you conditionally trigger jobs based on branch name or file path?
✅ Explain how pipeline artifacts differ from build artifacts.

🚀 Kubernetes (AKS) & Containerization
✅ How do you handle pod eviction during node maintenance in AKS?
✅ What are taints and tolerations, and when would you use them?
✅ How do you configure pod affinity rules to ensure certain workloads run on the same node?
✅ What’s the impact of setting aggressive liveness probes — how can it lead to cascading failures?

🚀 Azure Cloud & Infrastructure Management
✅ How do you configure service-to-service authentication in Azure using managed identities?
✅ Explain the use of Azure Bastion and when it’s preferred over jumpboxes.
✅ How do you set up auto-scaling rules based on custom metrics in Azure Monitor?

🚀 Terraform & Infrastructure as Code (IaC)
✅ How do you safely refactor Terraform code in production environments?
✅ What are data blocks in Terraform and when are they required?
✅ How would you store and manage Terraform state files in a team environment?

🚀 Monitoring, Logging & Incident Response
✅ A service goes down and logs are missing — how do you troubleshoot it?
✅ What is your alert escalation policy design during off-hours?
✅ How do you visualize metrics and set alerts using Prometheus + Grafana for AKS?

🚀 Git, GitOps & Branching Strategy
✅ How do you implement GitOps with ArgoCD in a secure enterprise setup?
✅ How do you manage Git branching strategies across multiple teams working on the same repo?

🚀 Automation & Scripting (Bash / Python)
✅ Write a Bash script to restart only the failed pods in a specific Kubernetes namespace.
✅ How would you write a Python script to identify untagged AWS resources using Boto3?
✅ How do you automate certificate renewal across microservices in AKS?


Experience #14: Posted on July 11, 2025

LinkedIn Post: https://tinyurl.com/4c6w33a7

🔹 L1 – Foundation & Hands-On Check

1️⃣ What are the different types of Integration Runtimes in ADF?
2️⃣ How do you implement parameterized linked services and datasets in ADF?
3️⃣ Explain the difference between Lookup, Filter, and Stored Procedure activities.
4️⃣ How do you mount ADLS Gen2 in Azure Databricks using service principal credentials?
5️⃣ What is the difference between internal and external tables in Synapse SQL?

🔹 L2 – Scenarios, Optimization, Architecture

6️⃣ How would you design an end-to-end pipeline for ingesting 50+ tables from on-prem SQL to ADLS using ADF?
7️⃣ A Databricks notebook that joins 2 large tables is running slow. Walk through your performance tuning approach.
8️⃣ How do you handle schema drift when using ADF data flows or mapping data flows?
9️⃣ What are best practices for cost optimization when using Synapse Analytics in a production workload?
🔟 How would you implement data quality checks, logging, and error handling across a pipeline that spans ADF, Databricks, and Synapse?


Experience #15: Posted on July 11, 2025

LinkedIn Post: https://tinyurl.com/5fz4sfvd

1/ How would you optimize a slow-performing query with multiple JOINs?
2/ Write a query to calculate 7-day moving average while excluding current day sales.
3/ How do you detect session drops or breaks in user activity using timestamps?
4/ Write a query to calculate year-over-year growth of monthly active users.
5/ How can CTEs (Common Table Expressions) help in writing modular queries?
6/ Explain the difference between RANK, DENSE_RANK, and ROW_NUMBER with use cases.
7/ How do you flatten nested JSON data stored in SQL columns?
8/ Write a query to perform cohort analysis based on user signup month.
9/ How do you find returning users who were inactive in the previous month?
10/ Write a query to compare product sales before and after a campaign.
11/ How do you track user churn using transactional data?
12/ Explain how indexes impact JOIN and WHERE clause performance.
13/ Write a query to identify N consecutive days of user inactivity.
14/ How do you perform fuzzy matching or approximate string matching in SQL?
15/ How do you build a funnel conversion query (e.g. sign-up → purchase)?
16/ Write a query to calculate retention rates across weekly intervals.
17/ How can you normalize user engagement data across time zones?
18/ Write a query to find customers who only purchased from a specific category.
19/ What’s the difference between EXISTS vs. IN vs. JOIN for subqueries?
20/ Write a query to calculate LTV (Lifetime Value) of customers.
21/ How do you rank events within partitions of sessions?
22/ Write a query to detect duplicate events with different timestamps.
23/ Explain how window frames (ROWS BETWEEN) impact aggregation logic.
24/ Write a query to generate user-level summaries using GROUPING SETS.
25/ How do you handle slowly changing dimensions (SCD) in SQL?

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top