Home Blog Application Performance Troubleshooting

Blog

Oct 11
Application Performance Troubleshooting
Posted by Chris Marshall

I was recently called in to assist a customer with identifying the cause of slow performance for an application within their environment. I don’t know about you but I love a good networking who done it, so I could not resist accepting the case. The following blog post documents my experience and findings while working this investigation that will from here on out be known as the “The Case of the Slow TCP Retransmissions”.

This engagement gave me the chance to dust off and bust out one of my favorite network troubleshooting tools “Wireshark”. Wireshark is a free and open source packet analyzer. In an IT world where we are continually hearing about the benefits of abstracting away complexity, Wireshark allows us to really peek under the hood and see the low-level packet headers in all their gory detail.

But before we roll up our sleeves and dive into the technical stuff I first needed to find out some background information about this application and the issue at hand. Just like in a real-life crime thriller I did this by interviewing our witnesses namely the application support team and the end users.

My interrogation of our witnesses gleaned the following information:

  1. The application in question is used for a barcode inventory system.
  2. Application Infrastructure is deployed on premise and is made up of the following components:
    1. One (1) SQL back end Database
    2. Three (3) Front End Servers
    3. Two (2) Application Deliver Controller (Active/Passive Load Balancers)
  3. Application Infrastructure all deployed at customers main HQ.
  4. Slow performance experienced when connecting to the application from remote branch locations. One branch connected to the HQ via a point to point wireless bridge is particularly bad.
  5. No degradation of performance for users located at the HQ (although not many active users here)
  6. Client server communication is performed over TCP ports 4003 (fat client) and 23 (terminal).
  7. Slow performance was only experienced over the terminal application path (Telnet protocol)
  8. Issue started around one month ago.
  9. Changes in the environment:
    1. Application recently upgraded and migrated to new beefier hardware.
    2. Application placed behind ADC as part of this upgrade.
    3. Video traffic has recently been introduced onto the network.
  10. Slowness can be mitigated by configuring clients to communicate directly with the application front end servers.
  11. The issue can be easily replicated. Login times have gone from seconds to minutes – This is great news and will greatly simplify our troubleshooting efforts.

At this stage in the investigation I try to remain as impartial as possible and hold off on judgment until we have collected our evidence and looked at all the facts. With that being said, it’s hard to ignore a couple of answers (9 and 10) that really jump out and give us our first two prime suspects.

  • An Application Deliver Controller (AKA load balancer) has been added into the environment – Prime Suspect since clients configured to bypass these devices do not experience the slowness.
  • Video Traffic has been introduced into the network - Prime Suspect since performance issues are only reported from remote sites, leading me to believe that the WAN plays some role in this issue. Since the WAN is usually our point of bandwidth contention and real-time video traffic is big and bursty in nature it could well contribute to WAN bandwidth and buffer exhaustion.
With our interviews complete my next step was to perform a discovery of the environment. The aim of the discovery is to document the current environment and map out our application traffic flows. I’m also looking at the hardware that our traffic passes through as this will influence where and how we are going to capture our packets. Tools such as SPAN, RSPAN, ERSPAN, Embedded Packet Capture (EPC), Embedded Wireshark and ASA Packet Capture are available on various Cisco switching, routing and firewall platforms. I will not go into detail regarding these tools as each of them could warranty their own individual blog posts, but I will say that when I find equipment in the path that is capable of on-board packet captures such as ASA firewalls, ISR4K routers and Catalyst 9K switches then I am a very happy detective.

Discovery Results

Discovery was performed on the customers environment with a focus on recording the application flow and connectivity between the customers HQ data center (application hosted site) and the slowest branch site identified during our interview process (question 5 above).

The below Layer 1 and 2 Diagram was created outlining the application infrastructure connectivity within the HQ data center.

Application Performance Troubleshooting Discovery.png

The above diagram (Image A) shows the physical cabling for the active ADC load balancer (single 1Gb connection) and the three front end application servers, each housed on separate ESXi hosts. The ADC and the App servers are all layer 2 adjacent on VLAN 4 (subnet address = 172.16.4.X/24)

The below Layer 3 Diagram (Image B) was created outlining the application connectivity between the slow branch site (WAN connectivity to HQ via P2P wireless bridge) and the application servers back at the HQ data center.

Application Performance Troubleshooting Layer 3.png

Application servers default gateway is set to the ADC self IP address 172.16.4.15 which in turn forwards the packet onto the firewalls .1 interface for routing. Client connections to the application servers are directed to the ADC virtual server IP address 172.16.4.106.

The primary WAN connectivity between the branch office and the HQ data center is provided via a 54Mbps wireless bridge.

Latency and loss over this circuit was measured during testing with the results presented in the table below:

Circuit

Latency

Loss

Wireless Bridge

2ms

2%

Testing Methodology

With discovery complete my next step is to devise a test plan and implement it. We know from our interview with the IT team that the slow performance is easily reproduced, and we also know that the issue can be mitigated by circumnavigating the ADC virtual IP address. With these points in mind I decided on the following testing methodology:

Setup packet captures at 4 locations (Image C) along the traffic path between clients located at the branch office and HQ Data Center.

  • Capture Point A (HQ Data Center):
    • Client-Side capture before crossing WAN
    • Equipment: Cisco 2960X (Local SPAN)
  • Capture Point B (HQ Data Center):
    • Server-Side capture before firewall inspection
    • Equipment: ASA5585 (Embedded Packet Capture)
  • Capture Point C (HQ Data Center):
    • Server-Side capture post firewall inspection, before ADC
    • Equipment: ASA5585 (Embedded Packet Capture)
  • Capture Point D (Branch Office):
    • Client-Side capture before crossing WAN
    • Equipment: Cisco Nexus (Local SPAN)

Application Performance Troubleshooting Setup Packet.png

Configure client A to utilize the ADC virtual server address (.106) and complete two logins to the application. Record timings and collect captures from all four locations for analyses.

Configure client B to utilize the application server direct address (.107) and complete two logins to the application. Record timings and collect captures from all four locations for analyses.

Test Results

Table A below defines connection settings recorded for each of our tests 1 through 4. The time recorded was taken manually via a stopwatch. The result highlights the large delay experienced when logging into the application through the ADC virtual server. This confirms what our users and IT teams have been reporting.

Test #

Source

Source Port

Target

Target Port

Time (sec)

1

Client A (.222)

51054

ADC (.106)

23

191

2

Client A (.222)

51090

ADC (.106)

23

26

3

Client B (.223)

51120

App Front End Server 1 (.107)

23

3

4

Client B (.223)

51120

App Front End Server 1 (.107)

23

2

The below tables B through E present the statistics taken from each of our four captures per test. I have grouped these tables by capture location. Tables F through I display the same statistics this time grouped by test number.

Data points of interest from the below tables are as follows:

Average Packets Per Second: Note the disparity between the ADC test (low number) versus the direct server tests (high number).

TCP Lost Segments: These numbers represent the number of lost/ missing packets from the capture along with the percentage of packets they represent from the capture as a whole. We only see lost segments at capture point A (Lodi Distribution Center). This confirms that all other capture points (B through D) are upstream of the packet loss. The percentages reported (1.3% through 2.9%) aligns with the measured 2% packet loss measured over the wireless bridge and is what I would expect to see.

TCP Retransmissions: Number of packets having to be resent. Notice the high packet count and percentage of retransmissions during the ADC tests verses the direct to the server tests.

It should also be noted that we did not see any retransmissions at capture point D for the ADC connected tests. This confirms that the ADC is proxying the client server connection by setting up two TCP sessions one between the server and ADC and a second between the ADC and the client. This is represented in “Image C” above by the two separate red arrows.

Packet Size: Average packet size across all captures and test are small (all under 100Bytes), this is expected knowing the interactive nature of Telnet. What should be noted is that the ADC tests do not show packet sizes above 100Bytes while the direst to server test show packets utilizing the maximum segment size of 1380Bytes.

Time Span: Time in seconds from first packet in capture to the last.

Packet Capture Statistics Grouped by Capture Location

Test #

Capture

Packets

Average pps

Smallest packet, B

Largest packet, B

Average packet, B

TCP Lost Segments

TCP Retransmissions

Time Span (sec)

1

A

3353

15

60

93

68

45 (1.3%)

740 (22.1%)

227

2

A

1355

24

60

93

68

18 (1.3%)

198 (14.6%)

56

3

A

1576

56

60

1434

79

45 (2.9%)

9 (0.6%)

28

4

A

617

40

60

1434

75

12 (1.9%)

5 (0.8%)

15

 

Test #

Capture

Packets

Average pps

Smallest packet, B

Largest packet, B

Average packet, B

TCP Lost Segments

TCP Retransmissions

Time Span (sec)

1

B

3563

16

58

97

70

0

824 (23.1%)

227

2

B

1490

4

58

97

70

0

245 (16.4%)

358

3

B

1647

59

58

1438

81

0

9 (0.5%)

28

4

B

766

49

58

1438

81

0

10 (1.3%)

16

 

Test #

Capture

Packets

Average pps

Smallest packet, B

Largest packet, B

Average packet, B

TCP Lost Segments

TCP Retransmissions

Time Span (sec)

1

C

3563

16

58

97

70

0

824 (23.1%)

227

2

C

1490

4

58

97

70

0

245 (16.4%)

358

3

C

1647

59

58

1438

81

0

9 (0.5%)

28

4

C

766

49

58

1438

81

0

10 (1.3%)

16

 

Test #

Capture

Packets

Average pps

Smallest packet, B

Largest packet, B

Average packet, B

TCP Lost Segments

TCP Retransmissions

Time Span (sec)

1

D

2301

10

60

93

62

0

0

227

2

D

1322

4

60

93

64

0

0

358

3

D

1647

59

56

1434

78

0

9 (0.5%)

28

4

D

767

49

56

1434

78

0

10 (1.3%)

16

Packet Capture Statistics Grouped by Test Number

Test #

Capture

Packets

Average pps

Smallest packet, B

Largest packet, B

Average packet, B

TCP Lost Segments

TCP Retransmissions

Time Span (sec)

1

A

3353

15

60

93

68

45 (1.3%)

740 (22.1%)

227

1

B

3563

16

58

97

70

0

824 (23.1%)

227

1

C

3563

16

58

97

70

0

824 (23.1%)

227

1

D

2301

10

60

93

62

0

0

227

 

Test #

Capture

Packets

Average pps

Smallest packet, B

Largest packet, B

Average packet, B

TCP Lost Segments

TCP Retransmissions

Time Span (sec)

2

A

1355

24

60

93

68

18 (1.3%)

198 (14.6%)

56

2

B

1490

4

58

97

70

0

245 (16.4%)

358

2

C

1490

4

58

97

70

0

245 (16.4%)

358

2

D

1322

4

60

93

64

0

0

358

 

Test #

Capture

Packets

Average pps

Smallest packet, B

Largest packet, B

Average packet, B

TCP Lost Segments

TCP Retransmissions

Time Span (sec)

3

A

1576

56

60

1434

79

45 (2.9%)

9 (0.6%)

28

3

B

1647

59

58

1438

81

0

9 (0.5%)

28

3

C

1647

59

58

1438

81

0

9 (0.5%)

28

3

D

1647

59

56

1434

78

0

9 (0.5%)

28

 

Test #

Capture

Packets

Average pps

Smallest packet, B

Largest packet, B

Average packet, B

TCP Lost Segments

TCP Retransmissions

Time Span (sec)

4

A

617

40

60

1434

75

12 (1.9%)

5 (0.8%)

15

4

B

766

49

58

1438

81

0

10 (1.3%)

16

4

C

766

49

58

1438

81

0

10 (1.3%)

16

4

D

767

49

56

1434

78

0

10 (1.3%)

16

Note no difference in the results for capture point B and C. This confirms that our firewall has no impact on any flows.

To achieve a baseline across all captures I filtered the captures between the packet carrying the “Mobile Client” text (pre-login) and “F4-Exit” (post login)

Application Performance Troubleshooting Mobile Client.png

The below table (J) displays the statistics from the now filtered captures from capture point D.

 

Test #

Capture

Packets

Average pps

Largest packet, B

Average packet, B

TCP Lost Segments

TCP Retransmissions

Fast Retransmissions

Time Span (sec)

1

D

2926

16

93

69

39 (1.3%)

712 (24.3%)

0

178

2

D

814

25

93

70

10 (1.2%)

186 (22.9%)

0

33

3

D

1312

149

1434

81

41 (3.1%)

7 (0.5%)

2

9

4

D

383

60

1434

81

10 (2.6%)

4 (1%)

2

6

From the now filtered captures I created the below four TCP stream graphs. These graphs clearly show the slow retransmission issue from the ADC to the client. The red arrow highlights the retransmissions along with the length of time it takes to recover. The green circles on the graphs from test 3 and 4 show the maximum segment size retransmission which facilitate the quick recovery.

Test 1

Application Performance Troubleshooting TCP Stream Graph 1.png

Test 2

Application Performance Troubleshooting TCP Stream Graph 2.png

Test 3

Application Performance Troubleshooting TCP Stream Graph 3.png

Test 4

Application Performance Troubleshooting TCP Stream Graph 4.png

The cause of the TCP retransmissions can be attributed to four lost segments in the stream which leads to the receiver hitting the maximum TCP header size of 60Bytes (selective ack option header accounts for these additional 40Bytes). At this point the client is no longer able to acknowledge the receipt of additional data meaning all data received is wasted as it eventually must be retransmitted by the server. In the case of the ADC these retransmissions come in their original small per segment form one by one while the client utilizes delayed acknowledgment causing a 200ms gap between each packet. In the case of the direct server connection theses retransmissions are bundled and sent as large frames facilitating the quick recovery.

Recommendations

Engage with ADC vendor to optimize their client-side TCP profile. Need to support the bundling and retransmission of packets into a single frame. I would suggest enabling Nagle's algorithm support on the TCP client profile.

Implement a QoS policy for traffic traversing the wireless bridge between the corporate office and the distribution center. Shape traffic to 54Mbps and allocate guaranteed bandwidth to telnet traffic.

Follow Up

The customer engaged with their ADC vendor support team as suggested and were provided with recommended settings to correct the above mentioned slow recovery. These settings have been applied to a new test virtual server (172.16.4.105).

Initial eyeball test has been positive, but before migrating these settings to the production virtual server (172.16.4.106) the customer asked me to review the new settings and once again perform our same tests to confirm that the issue had been truly resolved.

To do this we took a single capture from location B (reference Image C) and asked the customer IT staff to complete three (3) application logins from our same client’s A (172.16.102.222) and B (172.16.102.223). As in the previous tests above client’s A and B are located at the same branch office with Client A’s application configured to point to the new test ADC virtual server (172.16.4.105) and client B pointing direct to one of the application front end servers (172.16.4.107).

Table K below lists the six TCP sessions seen in our single capture. The time column represents the seconds between the first TCP SYN packet (application opened) to the last TCP FIN/RST packet (application closed).  

Test #

Source

Source Port

Target

Target Port

Time (sec)

1

Client A (.222)

56673

ADC (.105)

23

238

2

Client A (.222)

56722

ADC (.105)

23

45

3

Client A (.222)

56740

ADC (.105)

23

37

4

Client B (.223)

56762

App Front End Server 1 (.107)

23

54

5

Client B (.223)

56783

App Front End Server 1 (.107)

23

50

6

Client B (.223)

56789

App Front End Server 1 (.107)

23

159

As with the test performed previously I filtered each session to achieve a true base line. Table L below shows the statistics of the filtered captures. The filter applied again was between the packet carrying the “Mobile Client” text (pre-login) and “F4-Exit” (post login). Reference Image D above.

Test #

Capture

Packets

Average pps

Largest packet, B

Average packet, B

TCP Lost Segments

TCP Retransmissions

Fast Retransmissions

Time Span (sec)

1

B

418

64

1241

81

0

4 (1%)

3

6

2

B

437

238

1438

84

0

3 (<1%)

1

2

3

B

443

277

1438

85

0

3 (<1%)

1

2

4

B

431

224

1438

85

0

4 (1%)

2

2

5

B

412

33

1241

79

0

3 (<1%)

3

13

6

B

402

141

1241

79

0

3 (<1%)

3

3

As we can see from our table above, both ADC configured and direct to front end server configured clients both perform well, with the ADC tests (1 through 3) slightly outperforming the direct to server tests.

Note the low TCP retransmission for tests 1 through 3 and largest packet size readings being at or near to the networks maximum transmission unit (MTU)/Maximum Segment Size (MSS).

The results are graphed below to give a visual representation. Note the fast recovery from packet loss. As before green circles highlight retransmissions with the red arrow showing the length of time for recovery.

Test 1

Application Performance Troubleshooting TCP Transmission Graph 1.png

Test 2

Application Performance Troubleshooting TCP Transmission Graph 2.png

Test 3

Application Performance Troubleshooting TCP Transmission Graph 3.png

Test 4

Application Performance Troubleshooting TCP Transmission Graph 4.png

Test 5

Application Performance Troubleshooting TCP Transmission Graph 5.png

Test 6

Application Performance Troubleshooting TCP Transmission Graph 6.png

Conclusion

In conclusion the packet captures demonstrate that the suggested ADC changes have been effective in resolving the slow retransmission issue. The customer went ahead and moved the new settings to the production virtual IP and all in the world was good once again. Well at least as far as this customer and application were concerned that is.

Although the ADC was one of our prime suspects right from the start it took packet analysis to prove to the vendor that their device was indeed the cause of the issue. Sometimes in IT you are guilty until proven innocent and in those cases its good to have a tool like Wireshark at hand. Always remember the packet never lies. 

Written By: Chris Marshall, LookingPoint Senior Solutions Architect - CCIE #29940

Written By:

subscribe to our blog

Get New Unique Posts