Troubleshooting VMware Networks: Traffic Congestion
This is the first of a series of blogs looking at some specific challenges IT teams face when it comes to troubleshooting VMware environments.
Network congestion is a common problem. For example, you may have enough memory and CPU cores to deploy 25 virtual machines on a single server, but find that the server's only network interface card (NIC) port is always full. This can cause VMs to report network errors or prevent them from communicating at all.
Network congestion can be exacerbated by monitoring the intra-VM traffic within your virtual environment as it can add significant traffic to those NICs.
Congestion Causes Network Latency
Before virtualization, a single application on a single server would typically use only a fraction of the server's network bandwidth. As multiple VMs spin up on the virtualized server, each VM will demand some of the available network bandwidth. Most servers are only fitted with a single NIC port, and it doesn't take long for network traffic on a virtualized server to overwhelm the NIC. Workloads sensitive to network latency may report errors, drop packets or even crash.
Resolving the Issue
Here are a couple of ways to solve the congestion issue:
- Add more NICs to your VM server
- Utilized a VMware DRS cluster of ESXi hosts to balance VMs across multiple servers.
Each of these solutions result in more NICs on virtual servers. This consumes more ports on existing network switches, if available. It may be necessary to upgrade or add more networks switches to support the traffic and connectivity. This requires the attention of a network architect who should be involved in the virtualization and consolidation effort from the earliest planning phase.
Reducing the Impact
As a company or business grows there will be a continued need for more hardware and switches. There might also be additional spend on network performance and security tools due to the amount of growing traffic they need to analyze. This means an increased need to monitor intra-VM traffic, which increases the potential of network congestion, which can lead to increased network latency, errors, dropped packets and crashes.
How can you reduce the negative impact of monitoring mission-critical intra-VM network traffic? What if you could capture, filter, slice and tunnel intra-VM traffic?