Skip to content

Top MongoDB Interview Questions For Beginners in 2026

A practical guide to MongoDB interview questions for junior and mid level engineers covering BSON, schema design, indexing, aggregation, querying, and performance basics. Learn the concepts, patterns, and real world knowledge needed to confidently handle MongoDB technical interviews.

MongoDB interview questions cover image with a notebook checklist for BSON, indexes, and aggregation.
A practical guide to MongoDB interview questions for beginners and junior developers, covering the concepts that matter most in real projects.

A practical guide for junior and mid-level engineers preparing for MongoDB interviews. The questions here cover the concepts you're expected to know cold: BSON, documents, indexing basics, the aggregation operators that come up most, and the query patterns that separate "wrote a tutorial app" from "shipped a Mongo-backed feature."

If you can answer these without rehearsing — and explain the why — you'll handle the early rounds of any MongoDB interview. The harder material (storage internals, sharding, replication trade-offs, production tactics) lives in the companion advanced article.


1. Fundamentals

Q1. What is BSON and how is it different from JSON?

BSON ("Binary JSON") is the binary-encoded format MongoDB uses on the wire and on disk. The differences that actually matter:

  • More types. JSON has string, number, boolean, null, array, object. BSON adds ObjectId, Date, Decimal128, Binary, Int32 vs Int64 vs Double, Timestamp, Regex, MinKey/MaxKey, etc. — so a Decimal128 survives a round-trip; a JSON number doesn't.
  • Self-describing and length-prefixed. Every document and every field starts with its byte length, so the driver can skip fields without parsing them.
  • Ordered fields. BSON preserves insertion order of keys. JSON objects technically don't.

The cost: BSON is usually slightly larger than the equivalent JSON because of the type tags and length prefixes.

Q2. Document, collection, database — define them.

  • Document: a BSON object, up to 16 MB. Roughly equivalent to a row.
  • Collection: a set of documents. Schemaless by default but you can attach a $jsonSchema validator.
  • Database: a namespace of collections. A single MongoDB deployment can have many databases.

A frequent follow-up: why 16 MB? The limit exists to discourage "row-as-blob" patterns and to keep wire-protocol messages bounded. If you legitimately need bigger objects, use GridFS (which chunks across documents).

Q3. What's an _id field? Can I change it?

Every document has a unique _id within its collection; it's the implicit primary key and is automatically indexed. If you don't supply one, the driver generates an ObjectId (12 bytes: 4 timestamp, 5 random, 3 counter). You can use any BSON type as _id — UUIDs, strings, even subdocuments — but _id is immutable. To "change" it you have to insert a new document and delete the old one (ideally in a transaction).

Q4. What does single-document atomicity mean in practice?

Every write to a single document is atomic, even across embedded subdocuments and arrays. $set, $inc, $push, etc., on the same document in one update apply as a unit — no partial state visible to other readers.

That's the most underused feature in MongoDB schema design. If you embed related data, you get transactional updates for free, without reaching for multi-document transactions.


2. Schema Design

Q5. Embed or reference?

The most common interview question, and the one that gets the worst answers. The honest rule of thumb:

  • Embed when the related data is accessed together, has bounded growth, and doesn't need to be queried independently. Example: an order with its line items.
  • Reference when the related data is large, unbounded, shared across multiple parents, or queried on its own. Example: products referenced by many orders; users referenced by audit events.

The 16 MB document limit forces a decision when arrays grow — unbounded one-to-many is the classic embedding mistake. If "how many of these per parent?" has no ceiling, reference.

A useful middle ground is the extended reference: store just the fields you need from the referenced doc ({userId, userName} instead of the whole user), trading consistency for read speed.

Q6. What is $jsonSchema validation?

Per-collection schema rules MongoDB enforces on write. Set with db.runCommand({collMod: 'users', validator: {$jsonSchema: {...}}}). Validation levels are strict (default — every insert/update validated) or moderate (only docs that already matched the schema get re-validated on update). Validation action is error (reject) or warn (log).

It's optional, but extremely useful for enforcing invariants without giving up flexibility — the "schemaless" reputation MongoDB has does not mean you can't enforce a schema when you want to.


3. Indexing Basics

Q7. What index types does MongoDB support, and when do you use each?

  • Single field — the default. Indexes one field, ascending or descending.
  • Compound — multiple fields. Order matters; follow the ESR rule (next question). A compound index on {a, b, c} supports queries on {a}, {a, b}, {a, b, c} — but not {b} or {c} alone (this is the prefix rule).
  • Multikey — automatic when you index an array field; one index entry per array element. Restriction: you can't create a compound index across two array fields.
  • Text — full-text search. One per collection.
  • 2dsphere — geospatial queries ($near, $geoWithin).
  • Hashed — supports equality only, but is the basis of hashed sharding for even distribution.
  • TTL — special single-field index with expireAfterSeconds that auto-deletes old docs.
  • Partial — index only documents matching partialFilterExpression. Smaller, cheaper, supports unique-on-subset.
  • Unique — enforces uniqueness; pair with partialFilterExpression if you need "unique when present."
  • Sparse — index only docs that have the field. Largely superseded by partial indexes.

Q8. Explain the ESR rule.

For compound indexes, put fields in this order to maximize use:

  1. Equality predicates ({ status: 'active' })
  2. Sort fields (so MongoDB can walk the index in order — no in-memory sort)
  3. Range predicates ({ createdAt: { $gt: ... } })

Why: after a range scan the index is no longer ordered for subsequent fields, so anything after a range field in the index is wasted for sort purposes.

Q9. How do you tell if a query uses an index?

db.coll.find({...}).explain('executionStats')

Look at winningPlan.stage:

  • IXSCAN — index scan. Good.
  • COLLSCAN — collection scan. Almost always bad on a non-trivial collection.
  • FETCH over IXSCAN — the query needed to fetch the full document after the index lookup. Acceptable, but a covered query (next question) is faster.
  • SORT after the IXSCAN — means MongoDB did an in-memory sort. Often a sign your index doesn't cover the sort.

Key counters: totalKeysExamined vs totalDocsExamined vs nReturned. The ideal ratio is keysExamined ≈ docsExamined ≈ nReturned. If docsExamined >> nReturned, you're scanning too much.

Q10. What is a covered query?

A query is covered when:

  1. All fields used in the filter and the projection are in the same index.
  2. No field in the projection is missing from the index.
  3. You explicitly exclude _id ({_id: 0, name: 1}) unless _id is in the index.

A covered query never fetches the document — it returns straight from the index. Major win for read-heavy paths.


4. Querying

Q11. $lookup vs application-side joins?

$lookup performs a left-outer join inside the aggregation pipeline. It works, but it's expensive — the joined collection is read repeatedly per pipeline document unless the foreign field is indexed. Practical rules:

  • The foreign field must be indexed. Always.
  • $lookup after $match and $limit, never before — shrink the input first.
  • Prefer the pipeline form ($lookup with let + pipeline) over the simple form when you want to push filters down into the join.
  • For 1:1 reference lookups in a tight loop, an app-side find({_id: {$in: [...]}}) is sometimes faster.

Q12. What's the difference between $match and $project placement in a pipeline?

Pipeline order matters because each stage's output is the next stage's input. Two principles:

  • Push $match as early as possible so subsequent stages process fewer docs and can use an index (the query optimizer can move some $match stages before a $sort automatically, but not always).
  • Push $project early too — but only to remove large unused fields, not to compute things. Computing then matching is wasted work.

Q13. Walk me through $group, $bucket, $facet.

  • $group: classic SQL GROUP BY. { $group: { _id: "$category", total: { $sum: "$price" } } }.
  • $bucket / $bucketAuto: groups documents into a defined set of ranges (histograms).
  • $facet: runs multiple sub-pipelines in parallel on the same input. The killer use case: returning both a paged result and a total count from one query.
db.products.aggregate([
  { $match: { category: 'shoes' } },
  { $facet: {
      page:  [{ $sort: { price: 1 } }, { $skip: 0 }, { $limit: 20 }],
      total: [{ $count: 'count' }]
  }}
])

Q14. $elemMatch — when do you need it?

When you want array elements that match multiple criteria simultaneously on the same element. The difference is subtle:

// MATCHES if ANY element has score>80 OR ANY element has type='math'
db.students.find({ 'scores.score': { $gt: 80 }, 'scores.type': 'math' })

// MATCHES only if SOME element has score>80 AND type='math' together
db.students.find({ scores: { $elemMatch: { score: { $gt: 80 }, type: 'math' } } })

The first one returns a student with a math score of 70 and an art score of 90. The second wouldn't.

Q15. Pagination at scale — what's wrong with skip + limit?

skip(N) walks N documents to discard them. At N=100,000 that's a real cost on every page, and it grows linearly. The standard fix is range-based ("seek") pagination using the previous page's last sort key:

db.orders.find({ createdAt: { $lt: lastSeenCreatedAt } })
         .sort({ createdAt: -1 })
         .limit(20)

When the sort key has duplicates, fall back to a tie-breaker on _id.

Q16. What's findAndModify vs updateOne + read?

findOneAndUpdate (and the older findAndModify) atomically reads and updates a document, returning either the pre- or post-update version. The point is to avoid a race between read and update — for things like counters, work queues, or any "find first pending and mark it taken" pattern.


How to Practice These Questions

Reading the answers helps, but MongoDB starts to make more sense when you test things yourself.

Try running a few real queries, then check the explain() plan. See if MongoDB uses an IXSCAN or a COLLSCAN, compare how many documents were examined, and then create a different index to see what changes.

You can do the same with aggregation stages like $match, $group, $lookup, and $facet.

For more examples, you can also check the VisuaLeaf documentation, where MongoDB queries, indexes, schema design, and explain plans are explained with practical workflows.


Where to Go Next

The questions above cover what entry-to-mid-level engineers are expected to know. The next layer — WiredTiger internals, replication trade-offs, sharding strategy, multi-document transactions, production profiling, schema migration patterns — is where senior interviews live. That material is in the companion article: MongoDB Interview Questions for Advanced Engineers.

If you can answer 80% of the questions in this list without rehearsing, the highest-leverage way to keep going is to run your own explain() against a real dataset and read the output until every counter and stage name makes sense. That's the single biggest gap between "knows MongoDB" and "ships MongoDB."