Operasi sistem

49
 Chapter 13 Troubleshooting the Operating System 13.1 - Identifying and Locating Symptoms and Problems 13.2 - LILO Boot Errors 13.3 - Various Reasons for Package Dependency Problems 13.4 - Troubleshooting Network Problems 13.5 - Disaster Recovery

Transcript of Operasi sistem

Page 1: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 1/49

Page 2: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 2/49

 

Identifying and LocatingSymptoms and Problems

Page 3: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 3/49

 

Hardware Problems 

• Although a few problems aredue to a combination of 

factors, most can be isolated in

origin to one of these:

 – Hardware, Kernel, Application

Software, Configuration, andUser Error,

• Other hardware leaves traces that

the kernel detects and records.

• Assuming an error is such that it

does not crash the system,evidence might be left in the log

file /var/log/messages, with the

message prefixed by the word

oops.

Page 4: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 4/49

 

Kernel Problems 

• Released Linux kernels are remarkably stable,unless experimental versions are used or 

individual modifications are made.

•Loadable kernel modules are considered partof the kernel as well, at least for the time

period they are loaded.

• Sometimes these can cause difficulties, too.

• The good news with modules is that they canbe uninstalled and replaced with fixed versions

while the system is still running.

Page 5: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 5/49

 

Application Software 

• Errors in application packages are most identifiable inthat they occur only when running the application.

• This is in contrast to hardware and kernel conditions

that affect an entire system.

• Some common signs of application bugs are failure toexecute and program crash.

• An application may consume too much system

memory and ultimately begin to swap so badly that

the whole system is affected.• Some errors are caused by things that have to do with

the running program itself.

Page 6: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 6/49

 

Configuration 

• Configuration problems tend to affect whole subsystems,such as the graphics, printing, or networking subsystems.

• If the system is rebooted and a remote file system that

was once present is not, the first place to look is in the

configuration file /etc/fstab to see if the file system issupposed to be mounted at boot time.

Page 7: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 7/49

 

User Error 

• It is not unforgivable to make a mistake in using acomputer program, nor is it to be ignorant of the right

way to do something. It is only unforgivable to insist

on remaining stubbornly so.

 • There is more to know about the ins and outs of 

operating almost any software package than

everyday users will ever care or attempt to learn. 

Page 8: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 8/49

 

Using System Utilities and System Status Tools 

• Linux operating systemsprovide various systemutilities and system statustools.

• The setserial utility

provides information andset options for the serialports on the system.

• The lpq command helps

resolve printing problems.• The command will displayall the jobs that arewaiting to be printed.

Page 9: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 9/49

 

Using System Utilities and System Status Tools

• The ipconfig command canbe entered at the shell to

return the current network

interface configuration of 

the system.• The route command

displays or sets the

information on the system’s

routing, which it uses to

send information to

particular IP addresses. 

Page 10: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 10/49

 

Unresponsive Programs and Processes 

• Sometimes there are programs and

processes that for various reasons can

become unresponsive or “lock up”.

• Sometimes just the program or process itself will lock up and other times can cause the

entire system to become unresponsive.

• One method of identifying and locating the

unresponsive program and effectively

troubleshooting the problem is to kill or restart

the process or program.

Page 11: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 11/49

 

When to Start, Stop,

or Restart a Process 

• It is easiest to terminate a program by using the kill command.

• Other processes need to be terminated by editing theSys V startup script.

• When restarting a program, service, or daemon it isbest to first consult the documentation becausedifferent programs have to be restarted in differentways.

• Some support using the restart command, some

need to be stopped completely and then startedagain, and others can simply reread their configuration files without needing to be either 

stopped and started again, or restarted. 

Page 12: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 12/49

 

Troubleshooting Persistent Problems 

• The best way to fix programs that crash repeatedly isto replace them with new software or with a different

kind of software that performs the same task.

• If it is possible, try using the software in a different

way or if there is a particular keystroke or commandthat causes the program to fail, stop using it.

• Most times there will be replacement software

available.

• If it is a daemon that is crashing regularly try usingother methods of starting it and running it. 

Page 13: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 13/49

 

Examining Log Files 

• Some of the more important logfiles on a Linux system are the /var/log/messages, /var/log/secure, and the /var/log/syslog log files.

• The system’s log files can be

used to monitor system loadssuch as how many pages aweb server has served.

• They can also check for security breaches such as

intrusion attempts, verify thatthe system is functioningproperly, and note any errorsthat might be generated bysoftware or programs.

Page 14: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 14/49

Page 15: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 15/49

 

The dmesg Command 

• The dmesg command canbe used to display therecent kernel messages,also known as the kernelring buffer.

• These messages containinformation about thehardware installed in thesystem and the drivers.

• The information in these

messages relates towhether the drivers arebeing loaded successfullyand what devices thedrivers are controlling.

T bl h i P bl

Page 16: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 16/49

 

Troubleshooting Problems

Based on User Feedback  

• There are severaldifferent types of problems that usersreport.

• Some of the mostcommon ones are:

 – Login Problems

 – File PermissionProblems

 – Removable MediaProblems

 – E-mail Problems

 – Program Errors

 – Shutdown Problems 

Page 17: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 17/49

Page 18: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 18/49

 

Error Codes 

• The LILO boot loader is thefirst piece of code that takescontrol of the boot processform the BIOS. It loads theLinux kernel, and then

passes control entirely to theLinux kernel.

• When there is a problem

with LILO an error code willbe displayed:

 – None, L error-code, LI,LI101010… LIL , LIL?, LIL-,LILO

B ti Li S t

Page 19: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 19/49

 

Booting a Linux System

without LILO 

• Using the LILO on aFloppy method is the

least useful but it can help

in some instances.

• From this screen a LILO

boot floppy disk can be

created which can be used

to boot Linux from LILO

using the floppy disk.

Page 20: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 20/49

 

Emergency Boot System 

• Linux provides anemergency system’s copyof LILO, which can beused to boot Linux in theevent that the original LILO

boot loader has errors or isnot working.

 

• This is known as the

Emergency Boot System.

• To use this copy of LILOconfiguration changesmust be made in lilo.conf .

U i E

Page 21: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 21/49

 

Using an Emergency

Boot Disk in Linux 

• There are severalreasons and errors thatcan cause a Linuxsystem not to boot,besides LILO problems.

• The emergency bootdisk should have thenecessary disk utilitiessuch as fdisk, mkfs,and fsck, which can beused to format a harddrive so that Linux canbe installed on it. 

U i E

Page 22: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 22/49

 

Using an Emergency

Boot Disk in Linux

• It is always important toinclude some sort of 

backup software utility.

• If a change or repair to

some configuration filesneeds to be made, first

back them up.

• Most distributions come

with some sort of backuputility like tar , restore,

cpio, and possibly others.

Page 23: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 23/49

 

Recognizing Common Errors 

V i R f

Page 24: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 24/49

 

Various Reasons for

Package Dependency Problems 

• When a package is installed in a Linux system theremight be other packages that need to be installed for that particular package to work properly.

• The dependency package may have certain fileswhich need to be in place or it may run certainservices which need to be started before the packagethat is to be installed can work.

• Linux will often notify the user if they are installing apackage that has dependencies so that they can beinstalled as well.

S l ti t P k

Page 25: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 25/49

 

Solutions to Package

Dependency Problems 

• One solution to solving package dependency problems is tosimply ignore the error message and forcibly install thepackage anyway.

• The correct and recommended method for providing solutions

is to modify the system so that it has the necessarydependencies that are needed to run properly.

• It may be necessary to rebuild the package from source codeif there are dependency error messages showing up.

• The easiest way is to locate a different version of the packagethat is causing the problems.

• Another option is to look for a newer version of the package. 

Page 26: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 26/49

 

Backup and Restore Errors 

• Backup and Restore errors can occur at different points.

• Some errors will occur when the system is actuallyperforming the backup.

• Other errors will occur during the restore process when thesystem is attempting to recover data.

• Some of the most common types of problems: – Driver problems

 – Tape drive access errors

 – File access errors

 – Media errors

 – Files not found errors 

Application Failure

Page 27: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 27/49

 

Application Failure

on Linux Servers 

• There are several things that can provide someindication of an application failure or software problemon a Linux server:

 – Failure to Start

 – Failure to Respond

 – Slow Responses – Unexpected Reponses

 – Crashing Application or Server  

• A good general rule is to check the system’s logs.

• The system’s log files are usually the place to find mosterror messages that are generated because they are not

always displayed on the screen. 

Page 28: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 28/49

 

Troubleshooting Network Problems 

Page 29: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 29/49

 

Loss of Connectivity 

• Loss of connectivity can be hardware and/or softwarerelated. The first rule of troubleshooting is to check

for physical connectivity.

• Ensure that the cables are properly plugged in atboth ends, that the network adapter is functioning by

checking the link light on the NIC, that the hub's

status lights are on, and that the communication

problem is not a simple hardware malfunction.

Page 30: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 30/49

 

Operator Error 

• Be sure that users are using the correct username andpassword and that their accounts are not restricted in away that prevents them from being able to connect to thenetwork.

• Software settings might have been changed by theinstallation routine of a recently installed program, or theuser might have been experimenting with settings.

• Users accidentally, or purposely, delete files, and power surges or shutting down the computer abruptly candamage file data.

• Viruses can also damage system files or user data. 

Page 31: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 31/49

 

Using TCP/IP Utilities 

• The first step in checking for asuspected connectivityproblem is to ping the host.

• If a reply is received, thephysical connection betweenthe two computers is intact and

working.• The successful reply also

signifies that the calling systemcan reach the Internet.

• The term ping time refers to

the amount of time thatelapses between the sendingof the Echo Request andreceipt of the Echo Reply.

• A low ping time indicates a fastconnection.

Page 32: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 32/49

 

Using TCP/IP Utilities

• Tracing utilities are used todiscover the route taken by apacket to reach its destination.

• The way to determine packetrouting in UNIX systems is the

traceroute command.• Traceroute shows all the

routers through which thepacket passes as it travelsthrough the network from

sending computer to destinationcomputer.

• This is useful for determining atwhat point connectivity is lost or slowed.

Page 33: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 33/49

 

Using TCP/IP Utilities

• The ipconfig command is used inWindows NT andWindows 2000 todisplay the IP address,

subnet mask, anddefault gateway for which a networkadapter is configured.

• For more detailedinformation, the /all switch is used.

Page 34: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 34/49

Page 35: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 35/49

 

Windows 2000 Diagnostic Tools 

• The network diagnostictools for Microsoft

Windows 2000 Server 

include Ipconfig,

Nbtstat, Netstat,

Nslookup, Ping, and

Tracert.

• Windows 2000 Server 

also includes the

Netdiag and Pathping 

commands.

Page 36: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 36/49

 

Wake-on-LAN 

• Some network interface cards support a technologyknown as Wake-On-LAN (WOL).

• The purpose of WOL technology is to enable a

network administrator to power up a computer by

sending a signal to the NIC with WOL technology.• The signal is called a magic packet.

• When the magic packet is received by the NIC, it

will power up the computer in which it is installed.

• When fully powered up, the remote computer canbe accessed through normal remote diagnostic

software. 

Page 37: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 37/49

 

Disaster Recovery 

Page 38: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 38/49

 

Risk Analysis 

• Identify business processes and their associated

infrastructure.

• Identify the threats associated with each of the

business processes and associated infrastructure.

• Define the level of risk associated with each threat.

• Rank the risks based on severity and likelihood. 

A good risk analysis can be broken into the followingfour parts:

Page 39: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 39/49

 

Understanding Redundancy

• Redundancy is the ability tocontinue providing service

when something fails.

• RAID 0 - also known as diskstriping, it writes data

across two or more physical

drives and has no

redundancy.

Page 40: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 40/49

 

Understanding Redundancy

• RAID 1 - also known as disk

mirroring, requires the use of twodisk drives and one disk controller to provide redundancy.

• To increase performance add asecond controller, one for each disk

drive.• RAID 5 - also know as disk striping

with parity. Parity is an encodingscheme that represents where theinformation is stored on each drive.

• RAID 5 is similar to RAID 0 in that it

writes data across disks but it addsa parity bit for redundancy.

• Three drives are required toimplement this type of RAID.

Page 41: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 41/49

 

Understanding Redundancy

• RAID 0+1 offers the best of both worlds, the performanceof RAID 0 and theredundancy of RAID 1.

• This is an expensive solutionbecause of the number of 

drives it requires.• A number of other 

components in the server canbe configured in a redundantmanner:

 – Power supplies, Coolingfans, Network interfaceadapters, Processors,Uninterruptible power supply

(UPS) 

Page 42: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 42/49

 

Clustering 

• A cluster is a group of independent computersworking together as asingle system.

• This system is used to

ensure that mission-criticalapplications and resourcesare as highly available aspossible.

• The advantages to runninga clustered configurations: – Fault tolerance, High

availability, Scalability, Easier manageability 

Page 43: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 43/49

 

Scalability 

• Scalability refers to how wella hardware or softwaresystem can adapt toincreased demands.

• The question is how much

extra capacity should be builtin, and how much additionalcapacity can be added oncethe server is installed?

• It is a good idea to add anadditional 25% to any newserver configurations toensure scalability. 

Page 44: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 44/49

 

High Availability 

• High availability is the designing and configuring of aserver to provide continuous use of critical data and

applications.

 

• Highly available systems are required to provideaccess to the enterprise applications that keep

businesses up and running, regardless of planned or 

unplanned interruption.

• It is not uncommon for mission critical applications to

have an availability requirement of 99.999%. 

Hot Swapping,

Page 45: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 45/49

 

Hot Swapping,

Warm Swapping, and Hot Spares 

1. A hot-swap component has the capability to be added

and removed from a computer while the computer is

running and have the operating system automaticallyrecognize the change.

2. Warm swaps are generally done in conjunction with the

failure of a hard drive. In this case, it is necessary to shut

down the disk array before the drive can be replaced.

3. A hot-spare component is a component that can be kept

on hand in case of an equipment failure.

The types of components that might be kept on hand in case of a problem are broken into these three basic categories:

Creating a Disaster Recovery Plan

Page 46: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 46/49

 

Creating a Disaster Recovery Plan

Based on Fault Tolerance/Recovery 

• From the risk analysis, identify the hardware failure-

related threats

• From the list of components, identify the components

that place the data at the most risk if they were to fail

• Take each component and make a list of the methods

that could be used to implement it in a fault-tolerant

configuration. List approximate costs for each solutionand the estimated outage time in the event of a failure

for each component. 

The first piece of the plan is to create the fault-toleranceportion of the disaster-recovery plan, follow these steps:

Creating a Disaster Recovery Plan Based on

Page 47: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 47/49

 

Creating a Disaster Recovery Plan Based on

Fault Tolerance/Recovery

4. Take any components that can be implemented in a costeffective manner and start documenting theconfiguration.

2. Take any components that either cannot beimplemented in a fault-tolerant configuration or that for 

which a fault-tolerant configuration would be cost-prohibitive, and determine whether a spare part shouldbe kept on hand in the event of an outage.

3. The disaster-recovery plan should include documentedcontingencies for any of the threats identified as part of 

the risk analysis.4. After all this information has been documented, place

the orders and get ready to start configuring the server.

T i h Pl

Page 48: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 48/49

 

Testing the Plan 

• Check the documentation to ensure that it is understandable

and complete.

• Do a “dry run” of each of the components of the plan. Make

sure spare drives can be located, if applicable, and thatreplacement parts can be ordered from the vendor.

• Test the notification processes. It should be documented

who is to be notified in case of an outage.

• Check the locations of any hot spare equipment or servers.

• Verify that any support contracts that are on equipment are

still in effect, and that all the contact numbers are available.

• Test the tape backups at least once a week.

• Test the RAID configuration at least twice a year.

Some of the things that should be tested for include the following:

H d C ld Si

Page 49: Operasi sistem

8/14/2019 Operasi sistem

http://slidepdf.com/reader/full/operasi-sistem 49/49

Hot and Cold Sites 

1. A hot site is a commercial facility available for systems backup.

• For a fee, these vendors provide a facility with server 

hardware, telecommunications hardware, andpersonnel to assist a company with restoring criticalbusiness applications and services in the event of aninterruption in normal operations.

• A cold site, also known as a shell site, is a facilitythat has been prepared to receive computer equipment in the event that the primary facility is

destroyed or becomes unusable. 

Two types of disaster-recovery sites are commonly used: