Post on 24-Aug-2020
How to proceed when 1 000 call agents tell you:’My Computer is slow‘
Tobias Oetiker <tobi@oetiker.ch>
OETIKER+PARTNER AG
22nd Large Installation System Administration Conference
boot up
I users blame IT performanceI stop watch and heisenbugsI sysinternals toolsI autoit and winspyI sorry, no quick fixI but we can monitor it
boot up
I users blame IT performanceI stop watch and heisenbugsI sysinternals toolsI autoit and winspyI sorry, no quick fixI but we can monitor it
boot up
I users blame IT performanceI stop watch and heisenbugsI sysinternals toolsI autoit and winspyI sorry, no quick fixI but we can monitor it
boot up
I users blame IT performanceI stop watch and heisenbugsI sysinternals toolsI autoit and winspyI sorry, no quick fixI but we can monitor it
boot up
I users blame IT performanceI stop watch and heisenbugsI sysinternals toolsI autoit and winspyI sorry, no quick fixI but we can monitor it
boot up
I users blame IT performanceI stop watch and heisenbugsI sysinternals toolsI autoit and winspyI sorry, no quick fixI but we can monitor it
design goals
I passive monitoring from users perspectiveI let users give their inputI minimal impactI simple setup and updateI central data store
design goals
I passive monitoring from users perspectiveI let users give their inputI minimal impactI simple setup and updateI central data store
design goals
I passive monitoring from users perspectiveI let users give their inputI minimal impactI simple setup and updateI central data store
design goals
I passive monitoring from users perspectiveI let users give their inputI minimal impactI simple setup and updateI central data store
design goals
I passive monitoring from users perspectiveI let users give their inputI minimal impactI simple setup and updateI central data store
three tools
I CPV monitor: observe the systemI CPV reporter: easy problem reportingI CPV explorer: view the results
three tools
I CPV monitor: observe the systemI CPV reporter: easy problem reportingI CPV explorer: view the results
three tools
I CPV monitor: observe the systemI CPV reporter: easy problem reportingI CPV explorer: view the results
cpv monitor and perl/CPAN
Look it’s perl honey!I AutoItI use Win32::GuiTest;I use Win32::API;I use Win32::OLE;I use Win32::GUI;I use FSA::Rules;I use threads;
cpv monitor and perl/CPAN
Look it’s perl honey!I AutoItI use Win32::GuiTest;I use Win32::API;I use Win32::OLE;I use Win32::GUI;I use FSA::Rules;I use threads;
cpv monitor and perl/CPAN
Look it’s perl honey!I AutoItI use Win32::GuiTest;I use Win32::API;I use Win32::OLE;I use Win32::GUI;I use FSA::Rules;I use threads;
cpv monitor and perl/CPAN
Look it’s perl honey!I AutoItI use Win32::GuiTest;I use Win32::API;I use Win32::OLE;I use Win32::GUI;I use FSA::Rules;I use threads;
cpv monitor and perl/CPAN
Look it’s perl honey!I AutoItI use Win32::GuiTest;I use Win32::API;I use Win32::OLE;I use Win32::GUI;I use FSA::Rules;I use threads;
cpv monitor and perl/CPAN
Look it’s perl honey!I AutoItI use Win32::GuiTest;I use Win32::API;I use Win32::OLE;I use Win32::GUI;I use FSA::Rules;I use threads;
cpv monitor and perl/CPAN
Look it’s perl honey!I AutoItI use Win32::GuiTest;I use Win32::API;I use Win32::OLE;I use Win32::GUI;I use FSA::Rules;I use threads;
cpv system overview
cpv monitor structure
lesson #1: fsm are cool
lesson #1: seemingly simple
lesson #1: complexity trap
cpv monitor
cpv monitor monitor
cpv reporter
cpv explorer
thinking BIG
wants
I ∼ 1500 clients in the call-centerI dynamic configurationI individual profiles
infrastructure
data store : PostgreSQLconfiguration : Apache, CPVservice.cgi
analysis : Apache, Qooxdoo, CPVjson.cgi, Gnuplot
thinking BIG
wants
I ∼ 1500 clients in the call-centerI dynamic configurationI individual profiles
infrastructure
data store : PostgreSQLconfiguration : Apache, CPVservice.cgi
analysis : Apache, Qooxdoo, CPVjson.cgi, Gnuplot
thinking BIG
wants
I ∼ 1500 clients in the call-centerI dynamic configurationI individual profiles
infrastructure
data store : PostgreSQLconfiguration : Apache, CPVservice.cgi
analysis : Apache, Qooxdoo, CPVjson.cgi, Gnuplot
thinking BIG
wants
I ∼ 1500 clients in the call-centerI dynamic configurationI individual profiles
infrastructure
data store : PostgreSQLconfiguration : Apache, CPVservice.cgi
analysis : Apache, Qooxdoo, CPVjson.cgi, Gnuplot
thinking BIG
wants
I ∼ 1500 clients in the call-centerI dynamic configurationI individual profiles
infrastructure
data store : PostgreSQLconfiguration : Apache, CPVservice.cgi
analysis : Apache, Qooxdoo, CPVjson.cgi, Gnuplot
thinking BIG
wants
I ∼ 1500 clients in the call-centerI dynamic configurationI individual profiles
infrastructure
data store : PostgreSQLconfiguration : Apache, CPVservice.cgi
analysis : Apache, Qooxdoo, CPVjson.cgi, Gnuplot
observation tools
I GetWindowText and friendsI Reading log filesI Windows WMI (Load, Processes)I Active Probing (Ping, HTTP)I HTTPAnalyzer ($$$) for http(s)I Full Custom Probes
observation tools
I GetWindowText and friendsI Reading log filesI Windows WMI (Load, Processes)I Active Probing (Ping, HTTP)I HTTPAnalyzer ($$$) for http(s)I Full Custom Probes
observation tools
I GetWindowText and friendsI Reading log filesI Windows WMI (Load, Processes)I Active Probing (Ping, HTTP)I HTTPAnalyzer ($$$) for http(s)I Full Custom Probes
observation tools
I GetWindowText and friendsI Reading log filesI Windows WMI (Load, Processes)I Active Probing (Ping, HTTP)I HTTPAnalyzer ($$$) for http(s)I Full Custom Probes
observation tools
I GetWindowText and friendsI Reading log filesI Windows WMI (Load, Processes)I Active Probing (Ping, HTTP)I HTTPAnalyzer ($$$) for http(s)I Full Custom Probes
observation tools
I GetWindowText and friendsI Reading log filesI Windows WMI (Load, Processes)I Active Probing (Ping, HTTP)I HTTPAnalyzer ($$$) for http(s)I Full Custom Probes
lesson #2: finding outlook errors
I outlook modal popup send button does not workI GetAsyncKeyState: Although the least significant bit of the
return value indicates whether the key has been pressed since thelast query, due to the pre-emptive multitasking nature ofWindows, another application can call GetAsyncKeyState andreceive the “recently pressed” bit instead of your application.The behavior of the least significant bit of the return valueis retained strictly for compatibility with 16-bit Windowsapplications (which are non-preemptive) and should not berelied upon.
I GetClassName(WindowFromPoint(GetCursorPos()))eq ’MsoCommandBar’;
lesson #2: finding outlook errors
I outlook modal popup send button does not workI GetAsyncKeyState: Although the least significant bit of the
return value indicates whether the key has been pressed since thelast query, due to the pre-emptive multitasking nature ofWindows, another application can call GetAsyncKeyState andreceive the “recently pressed” bit instead of your application.The behavior of the least significant bit of the return valueis retained strictly for compatibility with 16-bit Windowsapplications (which are non-preemptive) and should not berelied upon.
I GetClassName(WindowFromPoint(GetCursorPos()))eq ’MsoCommandBar’;
lesson #2: finding outlook errors
I outlook modal popup send button does not workI GetAsyncKeyState: Although the least significant bit of the
return value indicates whether the key has been pressed since thelast query, due to the pre-emptive multitasking nature ofWindows, another application can call GetAsyncKeyState andreceive the “recently pressed” bit instead of your application.The behavior of the least significant bit of the return valueis retained strictly for compatibility with 16-bit Windowsapplications (which are non-preemptive) and should not berelied upon.
I GetClassName(WindowFromPoint(GetCursorPos()))eq ’MsoCommandBar’;
lesson #3: WMGetText
I GetWindowText or WMGetTextI Application becomes real busy with WMGetTextI stay with GetWindowText
lesson #3: WMGetText
I GetWindowText or WMGetTextI Application becomes real busy with WMGetTextI stay with GetWindowText
lesson #3: WMGetText
I GetWindowText or WMGetTextI Application becomes real busy with WMGetTextI stay with GetWindowText
lesson #4: server issues
I 2008-10-27: 1,459 devices sent 2,417,807 samplesI 4 Core / 32-bit / 4 GB ramI 40 days of data 100,000,000 samplesI index does not fit in ramI too much data for processing
lesson #4: server issues
I 2008-10-27: 1,459 devices sent 2,417,807 samplesI 4 Core / 32-bit / 4 GB ramI 40 days of data 100,000,000 samplesI index does not fit in ramI too much data for processing
lesson #4: server issues
I 2008-10-27: 1,459 devices sent 2,417,807 samplesI 4 Core / 32-bit / 4 GB ramI 40 days of data 100,000,000 samplesI index does not fit in ramI too much data for processing
lesson #4: server issues
I 2008-10-27: 1,459 devices sent 2,417,807 samplesI 4 Core / 32-bit / 4 GB ramI 40 days of data 100,000,000 samplesI index does not fit in ramI too much data for processing
lesson #4: server issues
I 2008-10-27: 1,459 devices sent 2,417,807 samplesI 4 Core / 32-bit / 4 GB ramI 40 days of data 100,000,000 samplesI index does not fit in ramI too much data for processing
lesson #5: index compaction
I function based indexI hours since 2007 is good for 7 years with 2 byteI 2 byte for metric idI 2 byte for workstation idI two WHERE conditions
lesson #5: index compaction
I function based indexI hours since 2007 is good for 7 years with 2 byteI 2 byte for metric idI 2 byte for workstation idI two WHERE conditions
lesson #5: index compaction
I function based indexI hours since 2007 is good for 7 years with 2 byteI 2 byte for metric idI 2 byte for workstation idI two WHERE conditions
lesson #5: index compaction
I function based indexI hours since 2007 is good for 7 years with 2 byteI 2 byte for metric idI 2 byte for workstation idI two WHERE conditions
lesson #5: index compaction
I function based indexI hours since 2007 is good for 7 years with 2 byteI 2 byte for metric idI 2 byte for workstation idI two WHERE conditions
lesson #6: random data reduction
I too much data for statisticsI how to get 12% of the samples?I add 2 byte random value to each sampleI select all sample with rand < maxrand 12
100
lesson #6: random data reduction
I too much data for statisticsI how to get 12% of the samples?I add 2 byte random value to each sampleI select all sample with rand < maxrand 12
100
lesson #6: random data reduction
I too much data for statisticsI how to get 12% of the samples?I add 2 byte random value to each sampleI select all sample with rand < maxrand 12
100
lesson #6: random data reduction
I too much data for statisticsI how to get 12% of the samples?I add 2 byte random value to each sampleI select all sample with rand < maxrand 12
100
lesson #7: threaded perl
I works very well on win32I full copy — lots of memoryI save require modules after creating the threadI only thread where really necessary
lesson #7: threaded perl
I works very well on win32I full copy — lots of memoryI save require modules after creating the threadI only thread where really necessary
lesson #7: threaded perl
I works very well on win32I full copy — lots of memoryI save require modules after creating the threadI only thread where really necessary
lesson #7: threaded perl
I works very well on win32I full copy — lots of memoryI save require modules after creating the threadI only thread where really necessary
lesson #8: measuring boot and logon time
t
GWP boot
WMI SystemUpTime
Services.exestarted
WMI ProcessCreateDate
Logon
WMI LogonSessionStartTime
Explorer.exeor CPV.exe
started
WMI ProcessCreateDate
Load.Gwp.StartUp.Logon2CpvLoad.Gwp.StartUp.Boot2Service
Load.Gwp.StartUp.Logon2Explorer
lesson #9: detecting crashes
I no wait but process handleI no signals only exit codesI 0xC0000005 - segfaultI 0x00000103 - still runningI TerminateProcess can define exit code
ImplementationI find active windowI attach process handleI poll for exit code
lesson #9: detecting crashes
I no wait but process handleI no signals only exit codesI 0xC0000005 - segfaultI 0x00000103 - still runningI TerminateProcess can define exit code
ImplementationI find active windowI attach process handleI poll for exit code
lesson #9: detecting crashes
I no wait but process handleI no signals only exit codesI 0xC0000005 - segfaultI 0x00000103 - still runningI TerminateProcess can define exit code
ImplementationI find active windowI attach process handleI poll for exit code
lesson #9: detecting crashes
I no wait but process handleI no signals only exit codesI 0xC0000005 - segfaultI 0x00000103 - still runningI TerminateProcess can define exit code
ImplementationI find active windowI attach process handleI poll for exit code
lesson #9: detecting crashes
I no wait but process handleI no signals only exit codesI 0xC0000005 - segfaultI 0x00000103 - still runningI TerminateProcess can define exit code
ImplementationI find active windowI attach process handleI poll for exit code
lesson #9: detecting crashes
I no wait but process handleI no signals only exit codesI 0xC0000005 - segfaultI 0x00000103 - still runningI TerminateProcess can define exit code
ImplementationI find active windowI attach process handleI poll for exit code
lesson #9: detecting crashes
I no wait but process handleI no signals only exit codesI 0xC0000005 - segfaultI 0x00000103 - still runningI TerminateProcess can define exit code
ImplementationI find active windowI attach process handleI poll for exit code
lesson #9: detecting crashes
I no wait but process handleI no signals only exit codesI 0xC0000005 - segfaultI 0x00000103 - still runningI TerminateProcess can define exit code
ImplementationI find active windowI attach process handleI poll for exit code
lesson #10: application hangs - symptoms
lesson #10: application hangs - symptoms
lesson #10: application hangs - symptoms
lesson #10: application hangs - symptoms
lesson #10: application hangs - symptoms
lesson #10: application hangs - detection
I dead apps don’t process messagesI explorer fakes responsiveness
ImplementationI find active windowI window ping: SendMessage WM_NULLI wait until the window is back
lesson #10: application hangs - detection
I dead apps don’t process messagesI explorer fakes responsiveness
ImplementationI find active windowI window ping: SendMessage WM_NULLI wait until the window is back
lesson #10: application hangs - detection
I dead apps don’t process messagesI explorer fakes responsiveness
ImplementationI find active windowI window ping: SendMessage WM_NULLI wait until the window is back
lesson #10: application hangs - detection
I dead apps don’t process messagesI explorer fakes responsiveness
ImplementationI find active windowI window ping: SendMessage WM_NULLI wait until the window is back
lesson #10: application hangs - detection
I dead apps don’t process messagesI explorer fakes responsiveness
ImplementationI find active windowI window ping: SendMessage WM_NULLI wait until the window is back
positive
I CPV reporter - being part of the solutionI CPV explorer - data accessibilityI case: CRM crash detectionI ongoing: webapp monitoringI structured problem solvingI closed feedback loopI SLA benchmarks
positive
I CPV reporter - being part of the solutionI CPV explorer - data accessibilityI case: CRM crash detectionI ongoing: webapp monitoringI structured problem solvingI closed feedback loopI SLA benchmarks
positive
I CPV reporter - being part of the solutionI CPV explorer - data accessibilityI case: CRM crash detectionI ongoing: webapp monitoringI structured problem solvingI closed feedback loopI SLA benchmarks
positive
I CPV reporter - being part of the solutionI CPV explorer - data accessibilityI case: CRM crash detectionI ongoing: webapp monitoringI structured problem solvingI closed feedback loopI SLA benchmarks
positive
I CPV reporter - being part of the solutionI CPV explorer - data accessibilityI case: CRM crash detectionI ongoing: webapp monitoringI structured problem solvingI closed feedback loopI SLA benchmarks
positive
I CPV reporter - being part of the solutionI CPV explorer - data accessibilityI case: CRM crash detectionI ongoing: webapp monitoringI structured problem solvingI closed feedback loopI SLA benchmarks
positive
I CPV reporter - being part of the solutionI CPV explorer - data accessibilityI case: CRM crash detectionI ongoing: webapp monitoringI structured problem solvingI closed feedback loopI SLA benchmarks
challenge
I CPV drama triangle - victim / rescuerI who is begin observedI mapping the human waysI side effectsI high observability assumptions
challenge
I CPV drama triangle - victim / rescuerI who is begin observedI mapping the human waysI side effectsI high observability assumptions
challenge
I CPV drama triangle - victim / rescuerI who is begin observedI mapping the human waysI side effectsI high observability assumptions
challenge
I CPV drama triangle - victim / rescuerI who is begin observedI mapping the human waysI side effectsI high observability assumptions
challenge
I CPV drama triangle - victim / rescuerI who is begin observedI mapping the human waysI side effectsI high observability assumptions
future work
I DLL injectionI webapps, webapps, webappsI dealing with the data
future work
I DLL injectionI webapps, webapps, webappsI dealing with the data
future work
I DLL injectionI webapps, webapps, webappsI dealing with the data
Questions
Tobi Oetiker <tobi@oetiker.ch>OETIKER+PARTNER AG
Commercial Contact:Claus Henning Simon <ClausHenning.Simon@swisscom.com>Swisscom IT Services AG