Ten More Database Mysteries

27
Ten More Database Mysteries Chris Lawson Database Specialists, Inc. www.dbspecialists.com [email protected]

description

Ten More Database Mysteries. Chris Lawson Database Specialists, Inc. www.dbspecialists.com [email protected]. 1. The Case of the Unwanted Services. DBA detects listener starting extra services NOT defined! > lsnrctl status Services Summary … - PowerPoint PPT Presentation

Transcript of Ten More Database Mysteries

Page 1: Ten  More  Database Mysteries

Ten More Database Mysteries

Chris Lawson

Database Specialists, Inc.

www.dbspecialists.com

[email protected]

Page 2: Ten  More  Database Mysteries

The Case of the Unwanted Services

• DBA detects listener starting extra services NOT defined!

> lsnrctl status

Services Summary… database1 has 1 service handler(s) database2 has 1 service handler(s) databasez has 5 service handler(s)

• They appear to be harmless, but what are these extra services?

• Restarted Listener. At first all OK, then 5 extra services activated.

• Confirmed that listener.ora file does NOT list the unwanted database.

• DBA confirmed using correct listener.ora file.

• Puzzle: There is no entry whatsoever for databasez.

1

Page 3: Ten  More  Database Mysteries

Unwanted Services: Solution• Multiple listener services is indicative of MTS (Multi-threaded

Server) Dispatchers; however, MTS was not being used!

• DBA recalled that databasez was actually on a different server.

• Solution: Database on different server started MTS services.• Key init.ora parameter: MTS_LISTENER_ADDRESS = "(ADDRESS=(PROTOCOL=tcp)

(HOST=myhost)(PORT=1526))"

• Services match init.ora parameter MTS_DISPATCHERS.

• How it happened: The remote init.ora file had originated on ‘our’ server, and the MTS parameter entries were not removed.

• Resolution: On other server, DBA corrected the init.ora entry, bounced database. Extra services automatically stopped.

• But - that’s not all...

Page 4: Ten  More  Database Mysteries

Unwanted Services: The Rest of the Story

• These extra services turned out to be not so “harmless” after all.

• The ‘unwanted’ services are fully functional; they will intercept connection requests and redirect them to a different server!

• Production users were redirected to a “clone” testing database on another server, and used it for several hours!

• This occurs because each MTS service points to a particular dispatcher on a particular server.

> lsnrctl services Services Summary... demo has 3 service handler(s)

D001 <machine: hohp2, pid: 3385> (ADDRESS=(PROTOCOL=tcp)(DEV=17)(HOST=[1.2.3.4)(PORT=1179))

• Perhaps could be used for rapid failover to standby database?

Page 5: Ten  More  Database Mysteries

Incognito DBA Privileges

• Application testers need to “refresh” the test database

often.

• They will need to start/shut database as the oracle user.

• DBA, being suspicious (justly) by nature, is reluctant to

provide oracle account.

• As punishment, users frequently ask DBA to run the

refresh.

• Question: How can developers perform tasks as oracle,

without actually having the oracle account?

2

Page 6: Ten  More  Database Mysteries

Incognito DBA Privileges: Solution

• Create script to refresh database, including startup, shutdown, etc.

• UNIX setuid feature allows programs to run with another identity. But setuid feature does not propagate to commands within the script.

• Trick: Use a ‘wrapper’ C program that runs as oracle and calls the refresh script:

# include <stdio.h>main (){system (“./refresh.ksh”);}

• Activate setuid for the wrapper program: chmod 6711 refresh

• Users simply enter refresh to run the refresh program as oracle.

• Thanks to Brian Keating of Database Specialists for this solution.

Page 7: Ten  More  Database Mysteries

NT Remote Control

• NT box is running an important database. It appears that the Listener is down.

• NT server is remote.

• Question: How can you check the listener remotely?

3

Page 8: Ten  More  Database Mysteries

NT Remote Control: Solution

• Use ‘SC’ command. It is part of the NT Resource Kit.

• The SC utility is very similar to the NET commands.

• First enable a security "context" to the remote box; e.g. map a network drive to the remote server, and provide the administrator account/password.

• The network drive may be disconnected after SC use is done.

• Note: A mapped network drive is not required if you are connected locally as administrator, and the remote server uses the same administrator password.

• Run SC using the format: SC \\[SERVER] [CMD] SERVICE

Page 9: Ten  More  Database Mysteries

NT Remote Control: Solution (continued)

Some common SC commands are:

query--------Queries the status for a service

start--------Starts a service

pause--------Sends a PAUSE control request

continue-----Sends a CONTINUE request

stop---------Sends a STOP request to a service

config-------Changes the configuration of a service

qc-----------Queries the configuration information

delete-------Deletes a service (from the registry)

create-------Creates a service (add to registry)

Page 10: Ten  More  Database Mysteries

NT Remote Control: Solution (continued)

EXAMPLE:

Check status of the Oracle 8 LISTENER service.NT server named ‘TSUNAMI’

SC \\TSUNAMI QUERY ORACLETNSLISTENER80

SERVICE_NAME: ORACLETNSLISTENER80 TYPE : 10 WIN32_OWN_PROCESS STATE : 4 RUNNING

(STOPPABLE,NOT_PAUSABLE,IGNORES_SHUTDOWN) WIN32_EXIT_CODE : 0 (0x0) SERVICE_EXIT_CODE : 0 (0x0) CHECKPOINT : 0x0 WAIT_HINT : 0x0j

Page 11: Ten  More  Database Mysteries

The Sad Case of the Failing Failover

• Very critical Customer Support application for large HMO.

• 24 x 7 crucial. Millions of monthly customers interactions.

• IBM RS-6000 High-availability cluster (HACMP)

• Scenario: Sudden hardware glitch causes failover to backup node.

• Secondary node fails to start database! Database complains of missing datafile.

• Critical application is down!

• Objective: Early-rising DBA (from different project) not familiar with the setup needs to get database up!

4

Page 12: Ten  More  Database Mysteries

The Failing Failover: Solution

• Investigation shows all file systems mounted. Bad News: file in question is simply not there.

• Good News: File is not supposed to be there--only a symbolic link.

• Each node has symbolic links pointing from “nice-looking” paths to the actual location of raw device:

/u04/oradata/data01.dbf > /dev/r123

• The primary node had all the links; the backup node was missing 2 symbolic links. Simply creating the links allowed database startup.

Page 13: Ten  More  Database Mysteries

The Failing Failover: Solution(continued)

• Moral of the story: High reliability hardware only as good as the weakest link--including maintenance requirements.

• This strategy was very susceptible to human error. Every new .dbf file required new symbolic link on both nodes. One omission completely destroyed the high-availability plan.

Page 14: Ten  More  Database Mysteries

The Case of the Impatient Insurance Agent

• Query to find details for expired insurance policies only returns 50 rows, but takes 2 minutes to finish.

• Query is simple join of 2 tables: Response and Policy

5 where-clause ‘filters’ 1 ‘filter’

result set: 20,000 rows result set: 15,000 rows

RESP POL

result:only 50 rows

policy_id

• Puzzle: How can query ever be quick, since the ‘super filter’ to reduce set to 50 rows is spread across two tables ?

5

Page 15: Ten  More  Database Mysteries

Impatient Agent: Solution

The Trick:

1) Perform pre-processing that uses filters, but retrieves only the key (policy_id) that is used as the join column.

2) This field is retrieved via index read only--thus avoiding the table reads that account for the delays.

3) Use this result set as the starting point for the original join.

RESP POL RESP POL

index reads only

original joinpre-processing driving “table”

table reads on small set only

Page 16: Ten  More  Database Mysteries

Daffy Database Links

• Version 7.3.4 database. Database links working normally. Database has been running straight for 3 weeks.

• Suddenly, the links fail: ORA-12154: TNS:could not resolve service name

• Investigation shows tnsnames.ora file unchanged. Link definition unchanged.

• Listener is running Oracle 8.0.4. Adding entry to tnsnames.ora file in Oracle 8 directory causes the links to work again!

• Tests with new links show that the tnsnames.ora file for 7.3.4 is being ignored!

• Further, if Oracle 8 tnsnames file is removed, the one in Oracle 7 is used instead!

6

Page 17: Ten  More  Database Mysteries

Database Links: Solution• Clue: Listener had been restarted recently. But why should the Listener

process (a server side function) affect finding the tns alias?

• Metalink analyst suggested checking TNS_ADMIN; but how is that relevant? We are not creating a UNIX session.

• Oracle Note 37808.1 clears-up the confusion:

For database links, TNS_ADMIN takes its value from

the value defined when the listener was started

• That is, a client process (link) is influenced by a server-side process!

• The listener had been started most recently with TNS_ADMIN set to 8.0.4, causing links to look in 8.0.4 tns file.

• Unsetting TNS_ADMIN, then restarting listener caused all operations to run as expected.

Page 18: Ten  More  Database Mysteries

All Primary Keys are Equal(But some are more equal than others)

• Background: Application tuning often requires DBA to ‘lure’ optimizer into using certain indexes. Hints not always successful or possible.

• Optimizer ‘likes’ certain indexes, because they typically are faster than others. The favorite: an index on Primary Key.

• Scenario: 20 gigabyte insurance billing system. Oracle 8.0.4.

• Particular query joins to a table called POLICY. Join uses the PK index (policy_ID).

• Query speed-up requires that join use new index on (PK, other col).

• Created new double index--but optimizer would not cooperate. It foolishly insists on using the PK index, even if hint used.

• Question: How can we get the optimizer to obey the DBA?

7

Page 19: Ten  More  Database Mysteries

All Primary Keys are Equal: Solution

• We need a way to ‘trick’ the optimizer into not using the PK index, but instead, use our ‘extra column’ index.

• This is difficult, because the PK index is the #1 choice.

• Trick: ‘Disguise’ the PK index as a unique index.

• Oddity: If unique index already exists, addition of a matching PK will use the existing index.

• In determining execution plan, optimizer will treat the new index as if it were a unique index, not a PK index.

• So, drop the PK, build a matching unique index, then rebuild the PK.

• Optimizer no longer stubbornly insists on using this ‘PK’ index, because it is not a true, pedigreed PK index--merely a unique index.

Page 20: Ten  More  Database Mysteries

The Sad Case of the Homeless Archive Logs

• Scenario: Large medical application; hundreds of connections into database. This is a critical, 24x7 server.

• Users suddenly complain of database ‘hanging.’

• Alert log shows database unable to write archive log.

ORA-00255: error archiving log 1 of thread 1, sequence # 200 ORA-00270: error creating archive log /demo/arch/1_200.dbf ORA-19504: failed to create file "/demo/arch/1_200.dbf"

• Investigation reveals disk crash on disk housing archive logs.

• No .dbf files on that disk--only archive logs.

• Problem: How can database operation be resumed, with minimal disruption to hundreds of users?

8

Page 21: Ten  More  Database Mysteries

Homeless Archive Logs: Solution

• DBA decided to dynamically redirect the archive logs, thereby avoiding need to shutdown database.

• To change destination of archive logs:

alter system archive log start to ‘[new path]’

• For example:

SVRMGRL> alter system archive log start to '/demo/arch2/'

• Once the ‘backlog’ of archive logs is corrected, database automatically resumes normal operation.

• Alert log now shows:

Fri Feb 11 09:00:47 2000

ARCH: Archival started

Archiver process freed from errors. No longer stopped.

Page 22: Ten  More  Database Mysteries

The Puzzling Affair of the Old Archive Logs

• With database in Archive Mode, old archive logs must be purged.

• On Unix, this is simple, using the find command (in cron).

• For example, to purge archive logs greater than 6 days old:

find /logdir/arch -name “arch*” -mtime +6 -exec rm {} \;

• Problem: How can we similarly purge the old archive logs on NT?

9

Page 23: Ten  More  Database Mysteries

Archive Logs: Solution• On NT, there is apparently no ‘find’ command, as in Unix.

• The archive logs can be found indirectly, because the database ‘knows’ when/where they were written.

• Logs listed in V$ARCHIVED_LOG

• Example, to remove logs older than 2 days:

spool del_logs.sql

select 'del ‘ || name from v$archived_log where completion_time < (sysdate - 2);

spool off

host del_logs

• Similar query could be used to copy archive logs to backup.

Page 24: Ten  More  Database Mysteries

SQL*Plus Madness

• Scenario: Connection ‘hangs’ upon trying to connect in

SQL*Plus.

• Also get error message re DBMS_APPLICATION_INFO

• DBA investigates. He finds:

1) Connect via Svrmgrl ok;

2) System user connects ok

• Problem: How can DBA ‘fix’ SQL*Plus?

10

Page 25: Ten  More  Database Mysteries

SQL*Plus Madness: Solution

• Package DBMS_APPLICATION_INFO is used to ‘register’

a running application, for tracking/analysis purposes.

• Once an application is ‘registered’ it will show up in

v$session in ‘module’ field.

• SQL*Plus is one of few applications that try to ‘register.’

This explains why svrmgrl still works.

• This all points to problem with the ‘register’ of SQL*Plus.

• The set-up script to build necessary tables is called

‘pupbld.sql’

Page 26: Ten  More  Database Mysteries

SQL*Plus Madness: Solution(continued)

• Action: Try to run pupbld.sql as SYSTEM but it hangs!

• Investigation shows: product_profile table missing, but synonym still there!

• Solution: drop public synonym for product_profile, then rebuild pupbld. All OK now.

Page 27: Ten  More  Database Mysteries

Contact Information

Chris Lawson

[email protected]

http://www.dbspecialists.com

Database Specialists, Inc.

388 Market Street, Suite 400

San Francisco, CA 94111