Maintaining Large Vista Installations Amy Edwards, Ezra Freelove, & George Hernandez July 12, 2007.

28
Maintaining Large Vista Installations Amy Edwards, Ezra Freelove, & George Hernandez July 12, 2007

Transcript of Maintaining Large Vista Installations Amy Edwards, Ezra Freelove, & George Hernandez July 12, 2007.

Maintaining Large Vista Installations

Amy Edwards, Ezra Freelove, & George Hernandez

July 12, 2007

2

Agenda

• Comparisons

• Who is USG

• Automation

• Monitoring

• Maintenance

• More Tricks

• Questions?

3

Informal Poll - Number of nodes

(All prod clusters) now:• 1-10• 11-20• 21-50• 50-70• 70+

• Ours in bold

• (All prod clusters) by December:

• 1-10• 11-20• 21-50• 50-70• 70+

4

Informal Poll – Number of DB Instances

Including secondary and non-production

• 1-2• 3-6• 7-10• 10+

• Ours in bold

5

Vista Architecture

6

GeorgiaVIEW Project

• University System of Georgia (USG)

• Vista 3.0.7 • Host 32 institutions &

multiple consortial programs

• >150,000 active students– Active is 100+ actions

• >11,000 active sections / term

7

Issues

• Handling performance issues

• Capacity planning

• Upgrades

• Replication

• JMS sensitivity

• Integration

8

Automation

• Rolling Restarts– Managed nodes restarted weekly

• except JMS

• Log cleanup to preserve space• Error reporting

– application, tracking, vulnerabilities

• Thread dumps• Sync admin node with backup• LDIS batch integration

9

Monitoring

• Nagios– http://www.nagios.org/– Sends alerts

• Stats– Custom AJAX web app– Watch changes of over time

• AWStats– http://www.awstats.org/

10

Nagios Example

11

Nagios Monitors

• OS / Hardware– Load– Temperature– Free space

• Database– Tablespace free space– Listener– Oracle processes

• Application– Direct-login– Weblogic processes– Java MBeans

• Default/Primary Pending Requests Current Count

• Java Heap Current

• JDBC Waiting for Connection Current Count

• Multicast Messages Lost

• Primary count

12

Stats

• Short and long term analysis– 21 months of data

• Graphs all Nagios data collected

• Flexible creation of reports

• Built with AJAX

13

Stats Examples I of III

14

Stats Examples II of III

15

Stats Examples I of III

16

AWStats

• Records data from web server logs

• Custom script grabs data from webserver.log files

• Runs daily

17

AWStats Examples I of II

18

AWStats Eamples II of II

19

Specialized Nodes

• Admin

• JMS

• Institutional Admin– Integration

• Chat

20

JMS Node

• Provides special services– Mail, LC creation, chat

• Failure or migration of JMS node hinders usage

• Services do not migrate well– Allow targeted migration– OTHERS: Pin JMS to a specific node

21

Integration

• Batched LDIS data files

• Cron runs nightly• Files broken up by:

– type– “reasonable” number

of records

• Done on Inst node– Issues with import can

kill node

22

Touching Nodes

• ssh & dsh– Touch groups of nodes at once– Useful for:

• Installs• Gathering logs• Locating a session

23

Maintenance Page

• Hosted on opposite f5

• Two versions– Scheduled maintenance– Unscheduled outage

• In an f5 outage, move DNS to other f5 so message still appears

24

Installs and Upgrades

• Silent install scripts

• Test in both development environments– Create against a small database– Get results of time to complete against a full

size copy of production

• Install to production

25

Powerlinks and Custom Development

• Test in development

• Try to break

• Pilot in production

• Release to all

26

Questions?

27

Want More?

• To view my resources and references for this presentation, visit

www.scholar.com• Simply click “Advanced Search” and

search by ezrafreelove and tag: ‘bbworld07’