Skip to content

[Bug]: NvmExpressDxe alternate queue size mu_change may have issue when queue size is 1 #1757

@spbrogan

Description

@spbrogan

Is there an existing issue for this?

  • I have searched existing issues

Current Behavior

Two cases to review.

  1. if blockio2 isn't installed because queue size is 1 then in MdeModulePkg\Bus\Pci\NvmExpressDxe\NvmExpressPassthru.c:570 where the queue id is set based on parameter Event being null is invalid. Regardless of callers Event parameter queueid 2 can never be used when queueid 2 is not created. Need to evaluate the full path to get here to see if it is possible.

  2. AI identified this path but needs to be human reviewed.
    [FINDING 4] NvmExpressPassThru: timeout reset path crashes/skipped with NULL TimerEvent
    Confidence: High-confidence
    Category: Missing NULL check / broken error recovery
    Lines: NvmExpressPassthru.c

Why this is a real bug:
When a blocking NVMe command times out, the timeout handler attempts gBS->SetTimer(Private->TimerEvent, TimerCancel, 0). However, Private->TimerEvent is only created when NVME_SUPPORT_BLOCKIO2(Private) is true (see NvmExpress.c). When the controller allocates only 1 IO queue pair, TimerEvent is NULL.

Trigger path:

Controller allocates 1 IO queue pair → TimerEvent is NULL.
A blocking PassThru command is issued (admin or sync IO).
The command times out (Cq phase bit never flips).
Code reaches line ~862: gBS->SetTimer(Private->TimerEvent, TimerCancel, 0).
SetTimer(NULL, ...) returns EFI_INVALID_PARAMETER.
goto EXIT is taken, skipping NvmeControllerReset().
The function returns EFI_INVALID_PARAMETER instead of EFI_TIMEOUT.
Why this is NOT a false positive:
The TimerEvent == NULL case is a normal operational scenario (controller grants only 1 IO queue pair). The timer cancel/restart is only needed for async processing, not for the controller reset itself. The reset is needed regardless.

Consequence:
After a command timeout on a single-queue-pair controller, the NVMe controller is NOT reset. The timed-out command remains pending on hardware. All subsequent NVMe commands will likely fail or return stale data, rendering the storage device non-functional. The caller receives EFI_INVALID_PARAMETER (misleading) instead of EFI_TIMEOUT.

Minimal fix direction:
Guard the SetTimer calls with if (Private->TimerEvent != NULL) and proceed to controller reset unconditionally.

Expected Behavior

using the nvme driver with only blockio and a single data queue should work as expected.

Steps To Reproduce

none. code inspection

Build Environment

n/a

Version Information

mu basecore release 202511

Urgency

Low

Are you going to fix this?

I will fix it

Do you need maintainer feedback?

No maintainer feedback needed

Anything else?

No response

Metadata

Metadata

Assignees

Labels

state:needs-triageNeeds to triaged to determine next stepstype:bugSomething isn't workingurgency:lowLittle to no impact

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions