Behavioral interview questions that predict engineering performance

The behavioral interview questions worth asking aren't the clever ones. They're the ones tied to a specific competency, asked the same way to every candidate, and scored against a written rubric. That last part is what separates a useful behavioral interview from a vibes check. Structured interviews predict job performance at about 0.51, more than double the 0.20 of an unstructured chat, per Schmidt and Hunter's 85-year meta-analysis (Plum).

So this page isn't a list of 50 questions to "get to know" a candidate. It's 18 behavioral interview questions grouped by the competency each one measures, with what a strong answer actually shows and the red flags to watch for. Use it to build a structured round you can score, not just react to.

A quick honest note: behavioral signal only counts if you score it consistently. Google standardised on structured, rubric-scored behavioral and situational questions for exactly this reason (Google re:Work). The questions below come with the rubric language to make that possible.

Key Takeaways

Structured behavioral interviews predict performance at ~0.51 validity vs ~0.20 for unstructured ones (Schmidt and Hunter). The structure, not the question, is the signal.
Ask every candidate the same questions, mapped to specific competencies: ownership, collaboration, conflict, ambiguity, technical judgment, and growth.
Score each answer against a written rubric ("a strong answer shows X") instead of a gut reaction. That's what makes the round defensible and comparable.
The strongest answers are specific and first-person ("I decided", "I shipped"), with a real trade-off and a measurable outcome. Vague "we" answers are the most common red flag.
Run behavioral rounds the same way for every candidate, or the comparison between them means nothing.

How to use these questions (structure beats charisma)

Pick four to six questions across the competencies below, not 18. Ask the same set, in the same order, to every candidate for a given role. Score each answer right after the interview, against the rubric, before you discuss the candidate with anyone else. That sequence is the whole method.

The reason structure wins is boring and well-documented: it removes the interviewer's mood, memory, and halo effect from the score. An unstructured interview mostly measures how much the interviewer liked the candidate. A structured one measures the competency you actually defined. If you want the academic version, the scoring methodology we use walks through how a rubric converts an answer into a defensible score.

Each question below follows the same shape: what it tests, what a strong answer shows, and the red flags. The "what a strong answer shows" line is your rubric. Write the score against it, not against how confident the candidate sounded.

Ownership and accountability

These questions separate engineers who drive outcomes from engineers who complete tickets.

Tell me about a production incident you were responsible for. What happened and what did you do?

What it tests: accountability under pressure, and whether they run toward problems or away from them.
A strong answer shows: they name their own contribution to the cause without being asked, describe the immediate mitigation, and explain the durable fix they shipped afterward.
Red flags: the story has no "I", only "the system" or "the other team"; no mention of what they changed so it wouldn't recur.

Describe a project where you missed a deadline. What did you do about it?

What it tests: honesty and how they manage commitments they can't keep.
A strong answer shows: early escalation, a re-scoped plan, and a specific lesson applied to the next project.
Red flags: the miss is always someone else's fault, or they hid the slip until it was unavoidable.

Tell me about a time you pushed back on a requirement you disagreed with.

What it tests: whether they own the quality of what they build, not just the spec.
A strong answer shows: they raised a concrete risk, proposed an alternative, and either changed the plan or committed fully once overruled.
Red flags: they either never push back, or they sulk and quietly build it wrong.

Collaboration and communication

Most engineering work fails at the seams between people, not inside the code.

Walk me through a time you explained a technical decision to a non-technical stakeholder.

What it tests: whether they can translate, which predicts how much friction they create with product and leadership.
A strong answer shows: they led with the business impact, not the implementation, and checked for understanding.
Red flags: they describe talking at the stakeholder in jargon, or they couldn't simplify it at all.

Tell me about a code review that changed your mind.

What it tests: intellectual humility and how they treat peers' input.
A strong answer shows: a specific technical point they were wrong about, and genuine appreciation for the catch.
Red flags: they can't recall ever being wrong in review, or they frame review as an attack to survive.

Describe a time you had to get a team aligned without any authority.

What it tests: influence, which matters more as engineers get senior.
A strong answer shows: they built the case with evidence, addressed the loudest objection directly, and got a real decision.
Red flags: "I just told them what to do", or the alignment never actually happened.

Handling conflict and disagreement

How an engineer disagrees tells you how they'll behave on your team every week.

Tell me about a serious technical disagreement with a colleague. How did it resolve?

What it tests: whether they can fight about ideas without making it personal.
A strong answer shows: they steelmanned the other position, found the actual point of disagreement, and accepted the outcome based on evidence.
Red flags: the disagreement is described as a battle they won, with no respect for the other engineer.

Describe working with someone whose style clashed with yours.

What it tests: adaptability and self-awareness.
A strong answer shows: they name their own role in the friction and a concrete adjustment they made.
Red flags: the other person is simply "difficult", with zero reflection.

Tell me about feedback that was hard to hear.

What it tests: coachability, which predicts how fast they'll grow on your team.
A strong answer shows: specific feedback, an honest reaction, and a behaviour they changed.
Red flags: they can't think of any, or they disagreed with all of it.

Dealing with ambiguity and failure

Senior engineering is mostly working without a clear spec. These questions find the people who can.

Tell me about a time you had to start a project with unclear requirements.

What it tests: whether they freeze, guess, or reduce ambiguity systematically.
A strong answer shows: they identified the riskiest unknown, built the smallest thing to test it, and tightened the spec from there.
Red flags: they waited for someone to hand them a spec, or they built the whole thing on an unchecked assumption.

Describe something you built that failed.

What it tests: whether they take real risks and learn from them.
A strong answer shows: a genuine failure, a clear-eyed cause, and a changed approach, not a humble-brag.
Red flags: "my biggest failure is I work too hard", or a failure with no lesson attached.

How do you make a decision when you don't have enough data?

What it tests: judgment under uncertainty.
A strong answer shows: they name what they'd need, set a deadline to decide anyway, and make the decision reversible where possible.
Red flags: analysis paralysis, or reckless certainty with no plan to course-correct.

Technical judgment and trade-offs

Behavioral questions can surface engineering judgment without a whiteboard, which is why they belong next to your system design interview questions.

Tell me about a time you chose the "worse" technical solution on purpose.

What it tests: pragmatism, and whether they optimise for the business or for their own resume.
A strong answer shows: a deliberate trade-off (speed, cost, maintainability) tied to the actual constraint at the time.
Red flags: they always reach for the most complex or trendiest option, or they can't articulate a trade-off at all.

Describe a piece of tech debt you decided to live with.

What it tests: maturity about the difference between perfect and shipped.
A strong answer shows: a conscious decision, a documented risk, and a trigger for when they'd pay it down.
Red flags: "I never ship tech debt", or they rewrote a working system for purity with no business reason.

Tell me about a time you cut scope to hit a deadline.

What it tests: prioritisation under real constraints.
A strong answer shows: they protected the core value, cut the right edges, and communicated the cut.
Red flags: they cut quality silently, or refused to cut anything and missed entirely.

Growth and learning

The half-life of a specific framework is short. How someone learns is the durable signal.

What's something you changed your mind about in the last year, technically?

What it tests: whether they update their beliefs with evidence.
A strong answer shows: a specific reversal and what caused it.
Red flags: nothing has changed, or the change is shallow ("I switched editors").

How do you get up to speed in an unfamiliar codebase?

What it tests: learning method, which predicts ramp time on your team.
A strong answer shows: a repeatable approach (trace a real request, read the tests, ship a tiny change early).
Red flags: "I just read all the code", or no method at all.

Tell me about a skill you deliberately built because the role demanded it.

What it tests: self-direction.
A strong answer shows: a gap they spotted, a plan, and evidence they closed it.
Red flags: growth only happens when an employer forces it.

Situational questions to pair with these

Behavioral questions ask what someone did. Situational questions ask what they'd do. Pair a few in when a candidate is early-career and short on history, or when you want to probe judgment directly. They're still structured interview questions, so score them against the same rubric.

A teammate's pull request is blocking your release and they're offline for the day. What do you do?

What it tests: judgment and collaboration under time pressure.
A strong answer shows: they weigh reverting, waiting, or fixing forward against the actual risk, and they communicate the call rather than make it silently.
Red flags: they freeze and wait, or they merge changes they don't understand to unblock themselves.

You inherit a service with no tests and a deadline in two weeks. Where do you start?

What it tests: prioritisation in legacy and ambiguity.
A strong answer shows: characterisation tests around the riskiest path first, the smallest safe change, and a refusal to boil the ocean.
Red flags: "rewrite it from scratch", or "add full test coverage before touching anything."

Production is down. The obvious fix is risky and the safe fix is slow. How do you decide?

What it tests: decision-making under uncertainty.
A strong answer shows: they mitigate impact first, prefer the reversible option, communicate to stakeholders, and schedule the durable fix afterward.
Red flags: panic with no framework, or analysis paralysis while the outage continues.

Scoring behavioral answers without guessing

The questions are the easy part. The hard part, the part most teams skip, is scoring the same way across candidates and interviewers.

Use a simple four-level rubric per question: no evidence, partial evidence, solid evidence, strong evidence of the competency. Write the score immediately, cite the specific thing the candidate said, and don't adjust it after you hear how others scored. If two interviewers diverge by more than one level, that's a calibration conversation, not an average.

This is exactly the gap an AI interviewer for backend developers and other roles closes. It asks the same behavioral set every time, scores each answer against the rubric, and shows the transcript excerpt behind every score, so the comparison between candidates is real instead of a memory contest. If a multiple-choice skills test can't capture behavioral signal, a structured, scored conversation can.

Want to feel the difference before you run it on a candidate? Practice an AI interview and read the scorecard it produces, or hand candidates a free AI mock interview so they show up ready.

Frequently asked questions

What are good behavioral interview questions? Good ones map to a specific competency (ownership, collaboration, conflict, ambiguity, judgment, growth), ask for a real past example, and can be scored against a rubric. "Tell me about a production incident you owned" beats "are you a team player." Vague questions get vague answers.

How many behavioral interview questions should I ask? Four to six per interview, asked identically to every candidate. More than that and you trade depth for coverage. The consistency across candidates matters more than the number.

How do you score behavioral interview questions? Use a written rubric per question with three to four levels, score immediately after the interview, cite the specific answer, and don't change the score after hearing other interviewers. That's what turns behavioral signal into a comparable number.

Are behavioral or technical interviews more important for engineers? Both, and they measure different things. Technical rounds test whether they can do the work; behavioral rounds test whether they'll be effective doing it on your team. The strongest process scores both against rubrics rather than treating either as a gut call.

How should candidates prepare for behavioral interviews? Prepare three to four real stories with specifics (what you did, the trade-off, the outcome) that you can adapt to different competencies. Practising out loud helps. A scored mock run that grades you the way a hiring team would shows you where your answers are vague before it counts.

Build a behavioral round you can defend

Behavioral interview questions are only as good as the structure around them. Pick four to six tied to the competencies that matter for the role, ask them the same way every time, and score each answer against a written rubric. That single discipline moves you from a 0.20 coin flip to a 0.51 signal.

If you'd rather not run that structure by hand for every candidate, that's the job Expert Hire does. See a sample candidate scorecard, behavioral criteria included, with the rubric, the transcript, and the reasoning per score, and decide whether it matches the bar you'd set yourself. Browse the full question bank for the technical rounds that pair with these.

By the Expert Hire team. Last updated May 19, 2026.