Rediger

Monitor Managed DevOps Pools

Managed DevOps Pools provides several options for monitoring your pool instances. The Overview page provides predefined metrics charts, and you can configure custom charts on the Metrics page. Use these tools to monitor the health of your Managed DevOps Pools instances.

Available metrics

Managed DevOps Pools provides the following metrics:

Category: Latency

Metric Name in REST API Unit Aggregation Dimensions Time Grains DS Export
AllocationDurationMs

Average time to allocate requests (ms)
AllocationDurationMs Milliseconds Average PoolId, Type, ResourceRequestType, Image PT1M Yes
TimeSpentInPreviousStateMs

Time spent in previous state before transitioning to current state (ms)
TimeSpentInPreviousStateMs Milliseconds Average, Maximum, Minimum PoolId, Image, ImageVersion, PremountConfigurations, DataDiskType, VmPriority, PreviousState, NewState PT1M Yes

Category: Saturation

Metric Name in REST API Unit Aggregation Dimensions Time Grains DS Export
Allocated

Resources that are allocated
Allocated Count Average, Maximum, Minimum PoolId, SKU, Images, ProviderName PT1M Yes
Demand

Total active demand on the pool
Demand Count Average, Maximum, Minimum PoolId, SKU, Images, ProviderName PT1M Yes
NotReady

Resources that are not ready to be used
NotReady Count Average, Maximum, Minimum PoolId, SKU, Images, ProviderName PT1M Yes
PendingReimage

Resources that are pending reimage
PendingReimage Count Average, Maximum, Minimum PoolId, SKU, Images, ProviderName PT1M Yes
PendingReturn

Resources that are pending return
PendingReturn Count Average, Maximum, Minimum PoolId, SKU, Images, ProviderName PT1M Yes
Provisioned

Resources that are provisioned
Provisioned Count Average, Maximum, Minimum PoolId, SKU, Images, ProviderName PT1M Yes
Ready

Resources that are ready to be used
Ready Count Average, Maximum, Minimum PoolId, SKU, Images, ProviderName PT1M Yes
Starting

Resources that are starting
Starting Count Average, Maximum, Minimum PoolId, SKU, Images, ProviderName PT1M Yes
Total

Total Number of Resources
Total Count Average, Maximum, Minimum PoolId, SKU, Images, ProviderName PT1M Yes

Category: SaturationByCapability

Metric Name in REST API Unit Aggregation Dimensions Time Grains DS Export
Allocated

Resources that are allocated by capability
AllocatedByCapability Count Average, Maximum, Minimum PoolId, Image, ImageVersion, DataDiskType, VMPriority, PremountConfigurations PT1M Yes
Demand

Total active demand on the pool by capability
DemandByCapability Count Average, Maximum, Minimum PoolId, Image, ImageVersion, DataDiskType, VMPriority, PremountConfigurations PT1M Yes
NotReady

Resources that are not ready to be used by capability
NotReadyByCapability Count Average, Maximum, Minimum PoolId, Image, ImageVersion, DataDiskType, VMPriority, PremountConfigurations PT1M Yes
PendingReimage

Resources that are pending reimage by capability
PendingReimageByCapability Count Average, Maximum, Minimum PoolId, Image, ImageVersion, DataDiskType, VMPriority, PremountConfigurations PT1M Yes
PendingReturn

Resources that are pending return by capability
PendingReturnByCapability Count Average, Maximum, Minimum PoolId, Image, ImageVersion, DataDiskType, VMPriority, PremountConfigurations PT1M Yes
Provisioned

Resources that are provisioned by capability
ProvisionedByCapability Count Average, Maximum, Minimum PoolId, Image, ImageVersion, DataDiskType, VMPriority, PremountConfigurations PT1M Yes
Ready

Resources that are ready to be used by capability
ReadyByCapability Count Average, Maximum, Minimum PoolId, Image, ImageVersion, DataDiskType, VMPriority, PremountConfigurations PT1M Yes
Starting

Resources that are starting by capability
StartingByCapability Count Average, Maximum, Minimum PoolId, Image, ImageVersion, DataDiskType, VMPriority, PremountConfigurations PT1M Yes

Category: Traffic

Metric Name in REST API Unit Aggregation Dimensions Time Grains DS Export
Count

Number of requests in last dump
Count Count Count RequestType, Status, PoolId, Type, ErrorCode, FailureStage, Image PT1M Yes

Dimension filters and splitting

Azure Monitor supports filtering and splitting for metrics that have dimensions. Managed DevOps Pools provides the following dimensions. See the previous table for a list of which dimensions apply for a particular metric.

Dimension Description
DataDiskType Data disk type attached to the resource
ErrorCode One of the error codes listed in Error codes
FailureStage Stage of provisioning at which a request failed (used to group provisioning failures)
Image Image name
Images List of images
ImageVersion Version of the image
NewState State that the resource transitioned to
PoolId Name of Managed DevOps Pool
PremountConfigurations Premount configurations applied to the resource
PreviousState State that the resource transitioned from
ProviderName CI/CD provider (AzureProvider is currently the only provider)
RequestType Type of request made against the pool (for example, allocate, return, or reimage)
ResourceRequestType Type of resource allocation request being timed
SKU VM size
Status Agent status
Type
VMPriority VM priority (for example, Regular or Spot)

Filtering lets you choose which dimension values are included in the chart. You might want to show successful requests when you chart the Total number of agents provisions Count metric. You apply the filter on the Status dimension.

Splitting controls whether the chart displays separate lines for each value of a dimension or aggregates the values into a single line. Splitting allows you to visualize how different segments of the metric compare with each other. You can see one line for an average AllocationDurationMS across all pools, or you can see separate lines for each pool.

For more information, see Analyze Metrics, Use dimension filters and splitting.

View metrics on the Managed DevOps Pool Overview

The Overview page for your Managed DevOps Pool contains the following predefined metrics charts, which can be set to display metrics for the past hour, day, 7 days, or 30 days.

You can customize the charts or create your own. For more information, see Analyze metrics, Create a metric chart.

Pool Usage chart

The Pool Usage chart displays the following metrics.

  • Starting: Count of agents starting up and preparing to accept jobs.
  • Ready: Count of agents only and ready to accept jobs.
  • Allocated: Count of agents currently running jobs.
  • NotReady: Count of stateful agents that have completed a job but are not yet ready to accept a new job.
  • PendingReimage: Count of agents that have completed a job and are preparing to be reimaged. This status is typical if you have your pool configured for stateless agents with standby agent mode enabled.
  • PendingReturn: Number of Azure DevOps Agents that are post-cleanup, waiting to be deleted (which occurs in batches)
  • Provisioned: Count of online agents.
  • Total: Total number of agents.

Pool Provisioning Health chart

The Pool Provisioning Health chart displays the following metrics.

  • Count - Total number of agents provisioned, grouped by status (Completed/Failed)

Request Durations chart

The Request Durations chart displays the following metrics.

  • AllocationDurationMS - Average pool request duration

Failure Stages chart

The Failure Stages chart displays the following metrics.

  • Count - Total number of agents that failed to provision, grouped by FailureStage

Error Codes chart

The Error Codes chart displays the following metrics.

  • Count - Total number of agents that failed to provision, grouped by ErrorCode

For a list of error codes, see the following Error codes section.

Error codes

Error code Error message
AzureInternalServerError The VM allocation failed due to an internal error. Retry later or try deploying to a different location.
ClusterOutOfCapacity Allocation failed. Note that allocation for this subscription is constrained to a set of clusters, which may be out of capacity. To remove the cluster constraint, contact the subscription administrator or Microsoft Support. Read more about improving likelihood of allocation success at https://aka.ms/allocation-guidance.
CustomScriptError VM reported a failure when processing extension 'customScript' (publisher 'Microsoft.Compute' and type 'CustomScriptExtension'). Error message: 'Finished executing command'. More information on troubleshooting is available at https://aka.ms/VMExtensionCSEWindowsTroubleshoot.
DiskProcessingTimeout The processing of VM '...' is halted because of one or more disk processing errors encountered by VM '...' in the same Availability Set. Resolve the error with VM '...' before retrying the operation. For more information, refer to https://aka.ms/activitylog.
EndpointNotFound 404 - There are no listeners connected for the endpoint. TrackingId:00000000-0000-0000-0000-0000000000, SystemTracker:tipresourceprovider.servicebus.windows.net:tipresourceproviderconnection/pools/es_tap_prime_cus_d4ds, Timestamp:2024-02-15T21:15:57
ExceedingQuota Quota exceeded.
FailedToRetrieveUserPassword Failed to retrieve user password ... from Key Vault
ForbiddenByFirewall Forbidden
HTTPResponseBodyNotAvailable HTTP response body isn't available
ImageNotFound The image could not be found. Check the image and the version exists
ImageRemovedFromPool The given key was not present in the dictionary
ImageThrottling Too many simultaneous copy requests from a snapshot or image resource. Retry later.
InstallationOfWindowsUndeployable OS provisioning for VM failed. Error details: This installation of Windows is undeployable. Make sure the image is properly prepared (generalized). Instructions for Windows: https://azure.microsoft.com/documentation/articles/virtual-machines-windows-upload-image/
InsufficientCapacity Allocation failed. We do not have sufficient capacity for the requested VM size in this region. Read more about improving likelihood of allocation success at https://aka.ms/allocation-guidance
InvalidSubnetDelegation Subnet /subscriptions/{subscriptionId}/resourceGroups/{rgName}/providers/Microsoft.Network/virtualNetworks/{vnetName}/subnets/{subnetName} referenced by /subscriptions/{subscriptionId}/resourceGroups/{rgName}/providers/Microsoft.Compute/virtualMachineScaleSets/{}/updateGroups/version1/networkInterfaceConfigurations/nic/ipConfigurations/ipconfig can't be used because it contains external resources.
NetworkProfileProcessingTimeout An unexpected error occurred while processing the network profile of the VM. Retry later.
ProvisioningTimeOut Resource subscriptions/{subscriptionId}/resourceGroups/{rgName}/providers/Microsoft.Network/networkInterfaces/providers/Microsoft.Compute/virtualMachineScaleSets/{}/virtualMachines/networkInterfaces/nic not found. OS Provisioning for VM did not finish in the allotted time. The VM may still finish provisioning successfully. Check provisioning state later. Also, make sure the image has been properly prepared (generalized). Instructions for Windows: https://azure.microsoft.com/documentation/articles/virtual-machines-windows-upload-image/ Instructions for Linux: https://azure.microsoft.com/documentation/articles/virtual-machines-linux-capture-image/ If you are deploying more than 20 Virtual Machines concurrently, consider moving your custom image to shared image gallery. Refer to https://aka.ms/movetosig for the same.
RemoteNameCantBeResolved
ResourceGroupBeingDeleted The resource group ... is in deprovisioning state and can't perform this operation.
SecretDisabled Operation get isn't allowed on a disabled secret. Status: 403 (Forbidden) ErrorCode: Forbidden
ServiceUnavailable The service is unavailable now. Retry the request later.
SkuNotAvailable The requested VM size for resource 'Following SKUs failed for Capacity Restrictions:' is currently not available in location. Try another size or deploy to a different location or different zone. See https://aka.ms/azureskunotavailable for details.
TaskCanceled The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.
VirtualNetworkIsNotFound The Virtual Network might be deleted.
WorkerSetupFailed, UnableToDownloadWorkerCheckNetwork, UnableToDownloadWorkerCheckNetwork[<endpoint>] The Network infrastructure is blocking access to one of the prerequisite endpoints.
UnableToDownloadWorkerCheckNetwork_TLSIssue TLS Handshake failed when contacting prerequisite endpoints.

See also