Post on 07-Jul-2015
description
CONFIDENTIAL 1
Remote&Analysis&Report&Enabling&Continual&Service&Improvement&in&Critical&Systems&
&& Overall Health
&
&
Web Application Database
&
Middleware Citrix
&
Storage Supporting Application Infrastructure
&
Application Communication Network
PREPARATION
Month: October 2014 Report: Sample Prepared for:
Customer Analyst:
Analyst ExtraHop Networks
Configuration: EH8000
Firmware: 4.0 ID: XXXXX
Aug& Sep& Oct&
Atlas Services | Remote Analysis Report Day 1 – Day 7
CONFIDENTIAL 2
&&&&&
WEB APPLICATION A review of the web application protocols including HTTP and HTTPS.
FINDINGS:
File&Not&Found&errors&(HTTP&status&code&404)&on&device1&have&significantly&decreased.&(Trend:&Resolbed)& ↑&&
&
&&
Investigate&Internal&Server&errors&(HTTP&status&code&500)&that&occurred&on&the&AAAAA&server&and&were&associated&with&a&single&URI.&Internal&Server&errors&were¬&previously¬ed&on&this&server.&(New&finding)&
☀&&&
Investigate&improvements&that&can&be&made&to&the&ZZZZZ&server&that&is&experiencing&a&lengthy&processing&time&on&average.&Processing&time&on&this&server&has&become&less&severe&since&the&previous&analysis&period.&(Trend:&Improvement)&
↗&&
&
Atlas Services | Remote Analysis Report Day 1 – Day 7
CONFIDENTIAL 3
CRITICAL CONCERNS: 86.9% of HTTP responses on the AAAAA server were Internal Server errors (HTTP status codes 500). Internal Server errors indicate that HTTP server encountered an unexpected condition that prevented it from fulfilling the request.
Internal Server errors on AAAAA (indicated by the vertical red bars) appeared to correlate with the HTTP transaction rate (indicated by the green line). At peak, 3,859 Internal Server errors occurred on this device in a single hour.
100% of Internal Server errors on AAAAA occurred while attempting to access a single URI resource, xxxx.xxxxxxx/PrePayService.
Atlas Services | Remote Analysis Report Day 1 – Day 7
CONFIDENTIAL 4
IMPROVEMENT OPPORTUNITIES: Several HTTP servers are experiencing lengthy processing time on average. Notice that the ZZZZZ server accounted for 55,742 responses and experienced an average processing time of over 2 seconds.
Utilizing the ExtraHop Heatmaps feature, we see that a high concentration of transactions on ZZZZZ experienced approximately 5 seconds of processing time. A darker area on the graph below indicates a high concentration of transactions.
Note the large standard deviation tied to processing time for the xxx.xxx.xxx.xx:xxxx/EAI/OA URI. This indicates that the processing times experienced for this URI were very “dispersed” and had a large amount of variation, meaning that much larger processing times were also observed. Using these standard deviation and mean measurements, we can conclude that approximately 1,277 transactions experienced processing times of approximately 12.7 seconds.
Atlas Services | Remote Analysis Report Day 1 – Day 7
CONFIDENTIAL 5
&&&&&
DATABASE A review of all parsed database protocol traffic, regardless of the type of database. Protocols include (if licensed): TNS (Oracle), TDS (MS SQL), DB2, Informix, Sybase, PostgreSQL, and MySQL
FINDINGS:
Investigate&database&errors&on&the&BBBBB&server&that&occurred&constantly;&these&errors&were&related&to&failed&logins&for&the&ZZZ_ZZZZZ&database.&(New&finding)& ☀&&
&
CRITICAL CONCERNS: None noted.
IMPROVEMENT OPPORTUNITIES: 1.0% of all database responses were errors.
93.3% of all database errors were concentrated on the BBBBB server. Also note that approximately 200% of all responses from this server resulted in errors, indicating that each response sent from this server resulted in two errors.
Atlas Services | Remote Analysis Report Day 1 – Day 7
CONFIDENTIAL 6
Error rate on this server (indicated below by the red vertical bars) stayed in excess of 700 errors per hour for a majority of the observation period.
100% of database errors from BBBBB were returned to the YYYYYY client.
Additionally, 100% of database errors on BBBBB had one of two messages. The messages of these errors suggest that 100% of errors on BBBBB result from the YYYYYY client attempting to log on to BBBBB and open an ZZZ_ZZZZZ database. 100% of these login and open attempts are failing. Investigate scheduled tasks that may be causing these errors.
Also worth noting are the processing times observed on this database server. While a majority of transactions were non-concerning (75% of all database transactions took, at most, 3 milliseconds of processing time), note that database transactions on BBBBB experienced as much as a minute of processing time.
Atlas Services | Remote Analysis Report Day 1 – Day 7
CONFIDENTIAL 7
The ExtraHop Heatmaps feature reveals that a “concentration” of transactions experienced around 3 seconds (3,000 milliseconds) of processing time. A darker area on the graph below indicates a higher concentration of transactions so while a large volume of transactions experienced less than 400 milliseconds of processing time, it may be worth researching what is causing some of the previously discussed failed logins to experience such lengthy processing times.
Atlas Services | Remote Analysis Report Day 1 – Day 7
CONFIDENTIAL 8
&&&&&
MIDDLEWARE A review of all parsed middleware protocol traffic (if licensed): FTP, MQSeries, and Memcache.
FINDINGS:
Investigate&FTP&errors&that&occurred&on&the&CCCCC&server&and&appear&to&correlate&with&SITE&method&calls.&The&overall&volume&of&FTP&errors&has&decreased&since&the&previous&analysis&period.&(Trend:&Improvement)&
↗&&
&
CRITICAL CONCERNS: 16.8% of FTP responses resulted in an error. This is a decrease from the 25.4% FTP error rate noted in the previous report.
38.4% of FTP errors originated on the CCCCC server.
Atlas Services | Remote Analysis Report Day 1 – Day 7
CONFIDENTIAL 9
Spikes, in both FTP error rate (indicated by the vertical red bars) and transaction rate (indicated by the green line) on CCCCC, occurred that the same time each day. The nightly spike is highly suggestive of an automated FTP process that is broken or otherwise misconfigured.
100% of FTP errors outbound from CCCCC were returned to a single client IP (xxx.xxx.xxx.xxx).
100% of FTP errors on CCCCC affected the XXX_XXX user.
FTP errors on CCCCC had two error messages. The messages are available below.
Atlas Services | Remote Analysis Report Day 1 – Day 7
CONFIDENTIAL 10
Further analysis of FTP errors suggests that there is a relationship between FTP 500 errors and the use of the FTP SITE method. FTP 500 errors are indicative of erroneous syntax resulting in an unrecognized action that, as a result, could not take place. Looking at the busiest FTP server (CCCCC), we see an almost 1:1 relationship between the use of the SITE method and FTP error code 500.
IMPROVEMENT OPPORTUNITIES: Not evaluated.
Atlas Services | Remote Analysis Report Day 1 – Day 7
CONFIDENTIAL 11
&&&&&
CITRIX A review of Citrix performance
FINDINGS:
Investigate&lengthy&session&load×&on&the&DDDDD&device&that&primarily&affected&two&clients&and&were&related&to&a&single&application.&Citrix&load×&have&slightly&decreased&since&the&previous&observation&period.&(Trend:&Improvement)&&
&&
CRITICAL CONCERNS: Several ICA servers are experiencing lengthy load times in excess of 40 seconds per session launch. When launching an ICA session, lengthy load times will delay the start of the ICA session and cause latency in overall application processing. ICA session launches transiting the DDDDD device experienced a high number of launches with long load times.
Atlas Services | Remote Analysis Report Day 1 – Day 7
CONFIDENTIAL 12
Drilling into DDDDD, we can see that session launches transiting two Cisco devices are primarily affecting two clients: FFFFF and GGGGGG.
Three #MMMMMM application was most impacted by lengthy load times. Investigate transactions that may be impacted by lengthy load times for this application.
IMPROVEMENT OPPORTUNITIES: Not evaluated.
Atlas Services | Remote Analysis Report Day 1 – Day 7
CONFIDENTIAL 13
&&&&&
STORAGE A review of all parsed storage protocol traffic. Protocols include (if licensed): CIFS, NFS, and iSCSI.
FINDINGS:
Investigate&STATUS_ACCESS_DENIED&CIFS&errors&that&transited&the&NNNNN&device&and&appeared&to&have&originated&at&yy.yy.yy.yy.&The&volume&of&CIFS&errors&significantly&increased&since&the&previous&observation&period.&(Trend:&Worse)&
↓&&
&
CRITICAL CONCERNS: 49.6% of CIFS responses were errors. Severity of CIFS errors ranges widely from informational to severe. High volumes of errors should be investigated to determine if action is required to fix or if changes can be made to reduce unnecessary processing time.
70.7% of CIFS errors transited the NNNNN device.
Atlas Services | Remote Analysis Report Day 1 – Day 7
CONFIDENTIAL 14
CIFS errors on NNNNN were returned to 118 client IPs.
Looking client-side at some of the top contributors of CIFS errors on the NNNNN device, it appears that a large portion of CIFS errors that transited NNNNN originated on SSSSS at yy.yy.yy.yy.
Atlas Services | Remote Analysis Report Day 1 – Day 7
CONFIDENTIAL 15
The majority of CIFS errors on NNNNN have variations of STATUS_ACCESS_DENIED error messages.
CIFS error rate (indicated by the vertical red bars) on NNNNN directly correlates with transaction rate (indicated by the green line). Investigate transactions that may be impacted by these CIFS errors. At peak, this device experienced 1,049,331 errors over the course of a single hour, or more than 291 errors every second. Note that this server was only active for four days during the observation period.
IMPROVEMENT OPPORTUNITIES: Not evaluated.
Atlas Services | Remote Analysis Report Day 1 – Day 7
CONFIDENTIAL 16
&&&&&
SUPPORTING APPLICATION INFRASTRUCTURE A review of protocol traffic related to supporting application infrastructure, including DNS, SSL, SMTP, and LDAP.
FINDINGS:
Investigate&the&high&volume&of&DNS&response&errors&concentrated&on&the&HHHHH&device&that&were&related&to&reverse&IP&lookups.&(New&finding)& ☀&&
&Investigate&excessive&use&of&the&ANY&method&by&the&PPPPP&server;&a&significant&volume&of&ANY&method&calls&originated&in&Australia.&The&volume&of&ANY&method&calls&has&slightly&decreased&since&the&previous&analysis&period.&&(Trend:&Improvement)&
↗&&
&
CRITICAL CONCERNS: 91.4% of all DNS responses were errors. A DNS response error occurs when a client makes a DNS lookup and the DNS server responds with some sort of error. These errors may not break an application, but they add latency to application transactions and cause unnecessary processing on the DNS server.
48.6% of DNS response errors originated on the HHHHH device. Note that 99.5% of requests made to this device result in a DNS response error.
Atlas Services | Remote Analysis Report Day 1 – Day 7
CONFIDENTIAL 17
The DNS response error rate (indicated by the vertical red bars) on HHHHH directly correlates with transaction rate (indicated by the green line). Investigate transactions that may be impacted by DNS response errors.
Nearly 100% of DNS response errors outbound from HHHHH were returned to LLLLL via a Cisco device.
DNS response errors outbound from HHHHH are related a number of reverse IP lookups. Note that these queries are erring nearly 100% of the time they are called.
Atlas Services | Remote Analysis Report Day 1 – Day 7
CONFIDENTIAL 18
Over 15,500,000 instances of the DNS “ANY” method occurred during the observation period. This is a decrease in the volume of ANY method requests noted in the previous report, however, this is still a concerning volume. Use of the ANY method returns all known information about a DNS zone in a single request, and is usually indicative of a DNS Amplification Attack. More information available here: http://www.us-cert.gov/ncas/alerts/TA13-088A.
86.3% of ANY method calls occurred on the PPPPP DNS server at xx.yy.zz.aa.
The following Geomap identifies the physical location of IPs that sent ANY requests to the server at xx.yy.zz.aa. A denser dot indicates a higher volume of transactions. Note that the AAA.BB.XXX.ZZ IP located in Canberra, Australia accounts for a large portion of these ANY method requests; this may be related to malicious activity.
IMPROVEMENT OPPORTUNITIES: Not evaluated.
Atlas Services | Remote Analysis Report Day 1 – Day 7
CONFIDENTIAL 19
&&&&&
APPLICATION COMMUNICATION
FINDINGS:
Investigate&Zero&Windows&that&occurred&on&the&RRRR&device.&Zero&Windows&occurred&in&spikes;&these&spikes&have&become&much&more&severe&since&the&previous&observation&period.&(Trend:&Worse)&
↓&&
&
CRITICAL CONCERNS: More than 77,000,000 Zero Windows were observed on the XXXXXXX network over the course of the seven-day observation period. A Zero Window indicates that the connection between two devices has stalled and that the device sending the Zero Window is unable to keep up with the rate of data that a peer is sending. In effect, the device sending the Zero Window is saying, “send no data until further notice.” 52.4% of Zero Windows were outbound from the RRRR device.
Atlas Services | Remote Analysis Report Day 1 – Day 7
CONFIDENTIAL 20
At peak, 4,620,000 Zero Windows were sent from RRRR over the course of a single hour, or more than 1,283 Zero Windows sent each second.
60.5% of Zero Windows outbound from RRRR were sent to the TTTTT device.
100% of Zero Windows sent from RRRR were related to the CIFS protocol.
IMPROVEMENT OPPORTUNITIES: Not evaluated.
Atlas Services | Remote Analysis Report Day 1 – Day 7
CONFIDENTIAL 21
&&&&&
NETWORK
FINDINGS:
Investigate&high&volume&of&IP&fragments&outbound&from&the&UUUUU&device.&Outbound&IP&fragments&were¬&previously¬ed&on&this&device.&(New&finding)& ☀&&
&
CRITICAL CONCERNS: More than 29,300,000 IP fragments were sent onto the XXXXXXX network over the course of the seven-day observation period. IP fragmentation may be caused by an MTU mismatch between devices on the network. This results in high volumes of segments being sent across the network, which can overwhelm both the network as well as devices.
44.4% of IP fragments were outbound from the UUUUU device at aa.bbb.ccc.dd.
100% IP fragments from UUUUU were sent to uu.xx.yy.zz via broadcast traffic on UDP port 8156.
& &
Atlas Services | Remote Analysis Report Day 1 – Day 7
CONFIDENTIAL 22
METRICS CHECKLIST Web&Application& &5xx&Errors& Review&of&serverTside&errors& ✓&&5xx&server&error&rate& Review&of&HTTP&servers&experiencing&high&5xx&error&rate& ✓&&4xx&Errors& Review&of&clientTside&errors& ✓&&URIs& Review&of&processing&time&by&URI& ✓&&Sever&Processing&Time& A&general&health&check&of&all&HTTP&server&devices&seen&by&ExtraHop.&A&review&of&
group&level&processing&time.& ✓&&
Database& &Errors& Review&of&Database&errors& ✓&&Server&error&rate& Review&of&Database&servers&experiencing&high&error&rate& ✓&&Method&Performance& Review&of&Database&method&performance& ✓&&Server&Processing&Time& A&general&health&check&of&all&DB&server&devices&seen&by&ExtraHop.&A&review&of&
group&level&processing&time.& ✓&&
Middleware& &Errors& Review&of&MQSeries&Errors& ✓&&Errors& Review&of&FTP&errors& ✓&&Error&Rate& Review&of&FTP&error&rate& ✓&&Server&Processing&Time& Review&of&FTP&server&processing&time& ✓&&Errors& Memcache&errors& ✓&&Misses& Review&of&Memcache&servers&experiencing&high&volume&of&misses& ✓&&Hits& Review&of&Memcache&servers&experiencing&high&volume&of&hits& ✓&&
Citrix& &Latency& Review&of&network&latency&time&for&clients&attached&to&a&Citrix&server& ✓&&Load&Time& Review&of&client&load&time&for&clients&attached&to&a&Citrix&server& ✓&&Client&Types& Review&of&Citrix&client&types&used&to&access&Citrix&servers& ✓&&
Storage& &Errors& Review&of&CIFS&errors& ✓&&Error&Rate& Review&of&CIFS&error&rate& ✓&&Processing&time& Review&of&CIFS&processing&time& ✓&&File&access&time& Review&of&file&access×&on&high&volume&CIFS&servers& ✓&&FSInfo& Review&of&FSInfo&queries&on&high&volume&CIFS&servers& ✓&&Errors& Review&of&NFS&errors& ✓&&Error&Rate& Review&of&NFS&error&rate& ✓&&Processing&time& Review&of&NFS&processing&time& ✓&&File&access&time& Review&of&file&access×&on&high&volume&NFS&servers& ✓&&Errors& Review&of&iSCSI&errors& ✓&&Error&Rate& Review&of&iSCSI&error&rate& ✓&&File&access&time& Review&of&file&access×&on&high&volume&iSCSI&servers& ✓&&& & &
Atlas Services | Remote Analysis Report Day 1 – Day 7
CONFIDENTIAL 23
METRICS CHECKLIST (CONTINUED) Supporting&Application&Infrastructure& &Errors& Review&of&SMTP&errors& ✓&&Error&Rate& Review&of&SMTP&error&rate& ✓&&Request&Timeouts& Review&of&DNS&request&timeouts& ✓&&Requests&vs.&Responses& Review&DNS&requests&vs.&DNS&responses& ✓&&Response&Errors& Review&DNS&response&errors& ✓&&Server&Error&Rate& Review&of&DNS&servers&experiencing&high&error&rate& ✓&&Error&Rate& Review&of&DNS&error&rate& ✓&&A&vs.&AAAA& Review&of&IPv6&DNS&lookups&and&responses& ✓&&Processing&Time& Review&of&DNS&processing&time& ✓&&Errors& Review&of&LDAP&errors& ✓&&Processing&Time& Review&of&LDAP&processing&time& ✓&&SSL&Certificate&Size& Review&of&512Tbit&SSL&certificates.& ✓&&Expiring&Certificates& Review&of&SSL&certificate&expiration&dates.& ✓&&
Application&Communication& &Zero&Windows& Number&of&zero&window&advertisements&received.&Zero&windows&are&an&
indication&of&one&side&of&a&TCP&conversation&overwhelming&the&other.& ✓&&Receive&Window&Throttles&
Number&of×&the&advertised&receive&window&of&the&peer&device&limits&the&throughput&of&the&connection.&Throttling&occurs&when&a&device&is&trying&to&slow&down&the&dataflow&coming&from&a&peer.&
✓&&
Out&of&Order& Number&of&packets&sent&out&of&order.&& ✓&&Tinygrams& Inefficient&segmentation&of&TCP&payload&resulting&in&more&packets&on&the&
network.&& ✓&&Aborts& TCP&conversation&forcibly&ended&due&to&error&within&TCP&data&framework& ✓&&Slow&Starts& Connection&throughput&reduced&due&to&TCP&slow&start&congestion&avoidance.& ✓&&Dropped&Segments& Packets&lost&en&route&between&two&devices&and&required&retransmission& ✓&&Round&Trip&Time& High&network&latency& ✓&&RTO& A&1T&to&8Tsecond&gap&in&TCP&conversations& ✓&&
Network&Health& &VLANs&& A&review&of&relative&traffic&occurring&on&different&tagged&VLANs& ✓&&Multicast&Top&Groups&& A&review&of&Top&Multicast&talkers.& ✓&&Traffic&& A&measure&of&all&the&traffic&being&passed&to&the&ExtraHop&system.& ✓&&IP&Fragmentation& A&review&of&observed&IP&fragmentation.& ✓&&Traffic& A&review&of&the&L3&traffic&profile.& ✓&&Traffic& A&review&of&proportions&of&L7&traffic.& ✓&&&