Idle VM Auto-Shutdown
This feature is designed to enhance cost efficiency and resource utilization within your Google Cloud Platform (GCP) environment.
Introduction
The Idle VM Auto-Shutdown is an automated system developed to identify and manage underutilized Virtual Machines (VMs) within your GCP projects. Its primary goal is to optimize resource allocation, significantly reduce operational costs associated with idle compute resources, and improve overall cloud efficiency.
By proactively detecting VMs that are consuming resources without active workloads, this system helps prevent unnecessary expenditure on infrastructure that is not actively contributing value.
Importance and Benefits
Implementing the Idle VM Auto-Shutdown offers several key advantages:
- Cost Optimization: Directly reduces GCP billing by automatically stopping or notifying about VMs that are not in active use. This is particularly impactful for development, testing, and staging environments where VMs are often left running outside of working hours.
- Enhanced Resource Utilization: Ensures that compute resources are efficiently allocated and consumed, freeing up capacity and reducing the overall footprint of idle infrastructure.
- Operational Efficiency: Streamlines cloud resource management by automating the identification and handling of idle VMs, reducing manual oversight.
- Proactive Management: Provides a mechanism for continuous monitoring and management of VM lifecycles based on actual usage patterns.
Supported Services
The Idle VM Auto-Shutdown feature currently supports the following Google Cloud Platform services:
- Compute Engine: All standalone Compute Engine Virtual Machine instances.
- JupyterHub Services: Instances provisioned specifically for JupyterHub environments.
Future expansions to include additional GCP compute services are planned.
Idleness Detection Criteria
The system employs a robust, multi-metric approach to accurately determine VM idleness, minimizing false positives. Idleness is assessed over a defined observation window to ensure sustained inactivity.
The primary metrics utilized are:
Linux
-
CPU Utilization (OS Reported): A VM is considered CPU-idle if, over a 2-hour rolling window, its CPU time spent in the
idlestate is consistently greater than 80% for a continuous period of 7200 seconds (2 hours). This metric leverages OS-levelcpu_statedata. -
Memory Utilization (OS Reported): A VM is considered memory-idle if, over a 2-hour rolling window, the percentage of actively used memory (excluding cached/buffered memory) is consistently less than 20% for a continuous period of 7200 seconds (2 hours).
Policy Differentiation:
-
[Preview] Idle Shutdown: This policy is applied to VMs with the user label
idle-shutdown = "preview". It allows users to observe the behavior and potential impact of the idle shutdown mechanism without immediate enforcement. -
[Enforced] Idle Shutdown: This policy is applied to VMs with the user label
idle-shutdown = "enforced". VMs meeting the idle criteria under this policy will be subject to the configured shutdown actions.
Both policies require 100% of the evaluation period to meet the threshold for the alert to trigger.
Configuration and Opt-In Options
The Idle VM Auto-Shutdown feature is configured via resource labels applied directly to your VM instances. There are two primary operational modes: Preview and Enforced.
Default Behavior
- Service Catalog Resources: VMs provisioned through the [Your Company Name] service catalog for Compute Engine, JupyterHub, and 3D Slicer are by default opted into
idle-shutdown: previewmode. - Manually Created Resources: Manually created Compute Engine instances are not opted into this feature by default. Manual labeling is required to enable monitoring.
Opting into Enforced Mode
In Enforced mode, the system will automatically shut down an idle VM after detecting sustained idleness based on the defined criteria. Users will typically receive a notification prior to the shutdown action.
To enable Enforced mode for a VM:
- Navigate to the specific VM instance in the Google Cloud Console.
- Locate the Labels section.
-
Add or modify the label with the following key-value pair:
- Key:
idle-shutdown - Value:
enforced
Example:
idle-shutdown: enforced - Key:
Opting into Preview Mode
Preview mode enables idleness detection and user notification without performing an automatic shutdown. This mode is ideal for evaluating the feature's impact and identifying potential cost savings before enabling automatic actions.
To enable Preview mode for a VM:
- Navigate to the specific VM instance in the Google Cloud Console.
- Locate the Labels section.
-
Add or modify the label with the following key-value pair:
- Key:
idle-shutdown - Value:
preview
Example:
idle-shutdown: preview - Key:
Opting Out of the Feature
To disable the Idle VM Auto-Shutdown feature for a specific VM, preventing it from being monitored or managed by the system:
- Navigate to the specific VM instance in the Google Cloud Console.
- Locate the Labels section.
- Remove the
idle-shutdownlabel entirely from the VM.
Once the idle-shutdown label is removed, the VM will no longer be subject to idleness detection or auto-shutdown actions.
Notifications
Users will receive notifications regarding idle VM detections and impending shutdowns via in-app notifications.
- Preview Mode: Notifications will alert you that a VM has been detected as idle and would have been shut down if in Enforced mode.
- Enforced Mode: Notifications will alert you that a VM has been detected as idle and will be shut down shortly.
Support and Further Information
For any questions, assistance with configuration, or to provide feedback on the Idle VM Auto-Shutdown feature, please contact us via support tool.
Other Services
- Vertex AI Workbench: Workbenches are offered with built-in idle shutdown functionality, this is by-default enabled during time of creation and defaulted to 30 minutes. This can be updated via Google Cloud Platform console.