Squid Guide

156
Squid

Transcript of Squid Guide

Page 1: Squid Guide

Squid

Page 2: Squid Guide
Page 3: Squid Guide

1 Squid

1.1 A User’s Guide

1.1.1 Oskar Pearson

Qualica Technologies (Pty) Ltd, South Africa.

Copyright © 2000 by Oskar Pearson ([email protected]). All rights reserved. This version of the document (Version 0.1) is not to be mirrored.

All trademarks used in this document are owned by their respective companies. This document makesno ownership claim of any trademark(s). If you wish to have your trademark removed from this docu-ment, please contact the copyright holder. No disrespect is meant by any use of other companies trade-marks in this document.

Note: This document is not (yet) to be mirrored; copying for personal or company-wide use or printingis perfectly acceptable. Once the document is in a stable state, the document will be released under theGNU Free Documentation License. (http://www.gnu.org/copyleft/fdl.html), and mirroring will be allowed.

There are many mirrors of the old Squid User’s Guide out there, which will all now have effectivelyuseless mirrors; don’t mirror this documentation at your site unless you are willing to keep it up to date!

This document will shortly be released under the GNU Free Documentation License.

Table of Contents 1. Overall Layout (for writers) 2. Terminology and Technology

What Squid is Why Cache?

What Squid is not Supported Protocols

Supported Client Protocols Inter Cache and Management Protocols

Inter-Cache Communication Protocols Firewall Terminology

The Two Types of Firewall Firewalled Segments Hand Offs

3. Installing Squid Hardware Requirements

Gathering statistics Hard Disks RAM requirements CPU Power

- 1 -

1 SquidSquid

Page 4: Squid Guide

Choosing an Operating System Experience Features Compilers

Basic System Setup Default Squid directory structure User and Group IDs

Getting Squid Getting the Squid source code Getting Binary Versions of Squid

Compiling Squid Compilation Tools Unpacking the Source Archive Compilation options Running configure Compiling the Squid Source Installing the Squid binary

4. Squid Configuration Basics Version Control Systems The Configuration File Setting Squid’s HTTP Port

Using Port 80 Email for the Cache Administrator Effective User and Group ID

FTP login information Access Control Lists and Access Control Operators

Simple Access Control Ensuring Direct Access to Internal Machines

Communicating with other proxy servers Your ISP’s cache Firewall Interactions

5. Starting Squid Before Running Squid

Subdirectory Permissions Running Squid Testing Squid

Testing a Cache or Proxy Server with Client 6. Browser Configuration

Browsers Basic Configuration Advanced Configuration Basic Configuration Host name

Browser-cache Interaction Testing the Cache Cache Auto-config

Web server config changes for autoconfig files Autoconfig Script Coding

- 2 -

1.1.1 Oskar Pearson

Page 5: Squid Guide

Cache Array Routing Protocol cgi generated autoconfig files Future directions

Roaming Browsers Transparency

Ready to Go 7. Access Control and Access Control Operators

Uses of ACLs Access Classes and Operators Acl lines

A unique name Type Decision String Types of acl

Acl-operator lines The other Acl-operators

SNMP Configuration Querying the Squid SNMP server on port 3401 Running multiple SNMP servers on a cache machine

Delay Classes Slowing down access to specific URLs The Second Pool Class The Second Pool Class The Third Pool Class Using Delay Pools in Real Life

Conclusion 8. Cache Hierarchies

Introduction Why Peer Peer Configuration

The cache_peer Option Peer Selection

Selecting by Destination Domain Selecting with Acls Other Peering Options

Multicast Cache Communication Getting your machine ready for Multicast Querying a Multicast Cache Accepting Multicast Queries: The mcast_groups option Other Multicast Cache Options

Cache Digests Cache Hierarchy Structures

Two Peering Caches Trees Meshes Load Balancing Servers

The Cache Array Routing Protocol (CARP)

- 3 -

1.1.1 Oskar PearsonSquid

Page 6: Squid Guide

9. Accelerator Mode When to use Accelerator Mode

Acceleration of a slow server Replacing a combination cache/web server with Squid Transparent Caching Security

Accelerator Configuration Options The httpd_accel_host option The httpd_accel_port option The httpd_accel_with_proxy option The httpd_accel_uses_host_header option

Related Configuration Options The redirect_rewrites_host_header option Refresh patterns Access Control

Example Configurations Replacing a Combination Web/Cache server Accelerating Requests to a Slow Server

10. Transparent Caching The Problem with Transparency The Transparent Caching Process

Some Routing Basics Packet Flow with Transparent Caches

Network Layout Filtering Traffic

Unix machines Routers (not done) Layer-Four Switches (not done)

Kernel Redirection (not done) Squid Settings (not done)

11. Not Yet Done: Squid Config files and options

Overall Layout (for writers) Next

- 4 -

1.1.1 Oskar Pearson

Page 7: Squid Guide

Squid: A User’s GuidePrev Next

2 Chapter 1. Overall Layout (for writers)I) Installation When installing Squid the first step is to get it up and running on a test machine. This allows the user to get familiar with Squid’s basic setup and feel that they are progressing towards something tangible (rather than slogging through the whole book before actually getting Squid running). Only the very basics of the config file are going to be covered.

Chapter 1) Introduction to Squid terminology and technology 1.1) What Squid is 1.1.1) Why cache 1.2) What Squid is not 1.3) Supported Internet Server protocols 1.4) Inter-Cache communication 1.4.1) Hierarchy terminology 1.4.2) Inter-Cache protocols 1.5) Operating Systems

Chapter 2) 2.1) Advanced Planning: 2.1.1) Hardware requirements: 2.2) Operating System: 2.2.1) Use the OS that you have experience in 2.2.2) All examples will be generic. 2.2.3) need a compiler. 2.3) System setup 2.3.1) The default Squid directory structure 2.3.2) Creation of the squid user and group (includes permissions etc) 2.4) Working with precompiled binaries 2.4.1) precompiled binaries 2.4.2) Trusted sources of binaries 2.5) Source compilation 2.5.1) Recommended compilation tools 2.5.2) Compilation configuration options 2.5.3) compilation: make all; make install

Chapter 3) Introduction to the configuration file: Only the very basics of the config file are covered. This allows people to get Squid running as soon as they can.

- 5 -

2 Chapter 1. Overall Layout (for writers)Overall Layout (for writers)

Page 8: Squid Guide

3.1) note on RCS 3.2) The configuration file: 3.2.1) HTTP port 3.2.2) Communicating with other proxy servers 3.2.2.1) Basic cache hierarchy terminology 3.2.2.2) Proxy-level firewall 3.2.2.3) Packet-filter firewall 3.2.2.4) Source/Destination IP and Port pairs 3.2.3) Cache Store location 3.2.3.1) Disk space allocation (? move to chapter1?) 3.2.4) FTP login information 3.2.5) acl, http_access 3.2.5.1) create a basic acl that denies everything but one address range 3.2.5.2) Intranet access with parents 3.2.6) cache_mgr 3.2.7) cache_effective_group

Chapter 4) Starting and Running Squid (15 pages) 4.1) Running Squid for the first time 4.1.1) Permissions - on each ~squid/* directory 4.1.2) Creating cache directories 4.1.2.1) Problems creating Swap Directories - problems: not root squid user id doesn’t exist squid user doesn’t have write to cache dir squid user doesn’t have read/exec to a directory up the tree

4.2) Running Squid 3.2.1) What is expected in cache.log 4.3) Testing the cache with the included client 4.3.1) checking if Internet works 4.3.2) checking if intranet works (if configured with a parent) 4.3.3) Checking Access.log for hits vs misses Include basic fields 4.4) Addition to startup files (? check NT ?)

Chapter 5) Client configuration: (24 pages) Include some screen shots of the configuration menus 5.1) Basic client configuration. 5.1.1) Netscape 5.1.2) Internet Explorer 5.1.3) Unix environment variables (Important for both lynx and for wget - for prefetching pages)

- 6 -

2 Chapter 1. Overall Layout (for writers)

Page 9: Squid Guide

5.2) Client cache-specific modifications 5.3) Testing client access 5.4) Setting clients to use LOCAL caches 5.4.1) CARP 5.4.2) Autoconfigs 5.4.3) Future directions 5.2.4.1) DNS destination selection based on 5.2.4.2) Roaming ability will help 5.2.4.3) Transparency (see 11.1)

II) Integration By this point Squid should be installed with a minimum working environment.

This section covers changing cache setup to suit the local network configuration.

This section covers Access Control, Refresh patterns and Cache-peer relationships. These are the painful sections of the setup.

This section also goes through the options in the config file that haven’t been covered. This is essentially a ’reference guide’ to the config options.

Chapter 6) ACLs: (38 pages) Each of these includes a short example that shows how they work. At the end of the Chapter there is a nice long complex ACL list that should suit almost everyone.

6.1) Introduction to ACLs 6.1.1) ACL lines vs Operator lines 6.1.2) How decisions work 6.2) Data specification: 6.2.1) regular expressions 6.2.2) IP address range specifications 6.2.3) AS numbers 6.2.4) putting the data in files 6.3) types of acl lines: Works through all the acl types. (src, srcdomain, dst, dstdomain etc) - must include info on "no_cache", specifically for 3.2.5.2 6.4) Delay classes 6.5) SNMP configuration 6.5) The default acl set include info on why the SSL setup is the way that it is, and information on the echo/chargen ports

Chapter 7) Hierarchies: (42 pages) 7.1) Inter-cache communication protocols How each one is suited to specific circumstances. Compatability notes

- 7 -

2 Chapter 1. Overall Layout (for writers)Overall Layout (for writers)

Page 10: Squid Guide

(with other programs) are included. 7.1.1) ICP 7.1.2) Digests 7.1.3) HTCP 7.1.4) CARP 7.2) Various types of hierarchy structures are covered: 7.2.1) The Tree stucture 7.2.2) Load balancing peer system 7.2.3) True distributed system 7.3) Configuring Squid to talk to other caches 7.3.1) The cache_peer config option All options are covered with examples 7.3.2) cache_peer_domain config option 7.3.3) miss_access acl line

Chapter 8) Accelerator mode (11 pages) (? I haven’t use accelerator mode - I am using Miguel a.l. Paraz’s page in the Squid Documentation as a guide ?) 8.1) Intro - why use this mode 8.1.1) performance 8.1.2) security 8.2) Types of accelerator mode 8.2.1) Virtual mode (note on security problems) 8.2.2) Host header 8.3) Options 8.3.1) http_port 8.3.2) httpd_accel_host 8.3.3) httpd_accel_port 8.3.4) httpd_accel_with_proxy 8.3.5) httpd_accel_uses_host_header 8.4) Reverse caching using accelerator mode on the return path of an International link See Transparency

Chapter 9) Transparency: (24 pages) 9.1) TCP basics 9.2) Operating System function 9.3) Squid ’accept’ destination sensing 9.4) Special ACLs to stop loops 9.5) FTP transparency problems 9.6) Routing the actual TCP packets to Squid 9.7) Changing hierarchies to work with transparency

Chapter 10) The config file and Squid options (48 pages)

The options list doesn’t really belong in section (I). I am, instead going to cover it here. Also cover the options to ’client’.

- 8 -

2 Chapter 1. Overall Layout (for writers)

Page 11: Squid Guide

This covers ALL the tags in the config file. Where the tag has been covered already it refers people to that section of the book.

Arranged in alphabetical order.

III) Maintainence and Site-Specific Optimization Covers the further development of your cache setup. This covers both maintainence and specialized setups (like transparent caches)

Chapter 11) Refresh Patterns: (24 pages) 11.1) distribution of file types (gifs vs jpg vs html) 11.2) distribution of protocols 11.3) Server-Sent Header fields 11.3.1) Work through the types of headers 11.3.2) meta-tags 11.4) Client-Sent Header fields 11.4.1) If-Modi fied-Since Requests 11.4.2) Refresh button 11.5) refresh_pattern tag First match selection. Describes order of checking each of the fields.

Chapter 12) Cache analysis (24 pages) This section covers disadvantages and advantages of the various types of cache performance/savings analysis systems 12.1) access.log fields 12.2) Simple Network Management Protocol (SNMP) configuring, access control, multiple servers, multiple agent configurations, understanding results. Shew! 12.3) Cache-specific analysis using a squid analysis program 12.4) The cachemgr.cgi script Using the output (eg LRU values) for deciding when to buy more disk space etc 12.5) Using a cache-query-tool 12.6) Using your results Graphing response times over the months, for example.

Chapter 13) Standby procedures: (15 pages) 13.1) Hardware failure 13.1.1) Standby machines 13.1.2) DNS modification 13.1.3) Automatic configuration 13.2) Software failure 13.3.1) We need lots of info on ’vmstat’, ’iostat’, strace -T, (and other stuff like that) here.

- 9 -

2 Chapter 1. Overall Layout (for writers)Overall Layout (for writers)

Page 12: Squid Guide

cachemgr: Slowness: queued DNS queries DNS response times queued username/password authentication requests page faults: vmstat 13.2.2) Consistent crashing - filehandles - memory - all dnsservers busy - slow! - latency of local request, comparing with "client" through cache and without it.

Chapter 14) Future directions: (18 pages) 14.1) Wide ranging use of Skycache 14.2) Wide ranging use of transparency 14.3) Very heavily used parents For example at Exchange Points 14.4) compression between server and client - like the berkely thing...

Prev Home NextSquid Terminology and Technology

- 10 -

2 Chapter 1. Overall Layout (for writers)

Page 13: Squid Guide

Squid: A User’s GuidePrev Next

3 Chapter 2. Termi nol ogy and Technol ogyTable of Contents What Squid is What Squid is not Supported Protocols Inter-Cache Communication Protocols Firewall Terminology

4 What Squid isSquid is a free, high-speed, Internet proxy-caching program. So, what is a "proxy cache"?

According to Project Gutenberg’s Online version of Webster’s Unabridged Dictionary: Proxy. An agent that has authority to act for another.Cache. A hiding place for concealing and preserving provisions which it is inconvenient to carrySquid acts as an agent, accepting requests from clients (such as browsers) and passes them to the appropriate Internet server. It stores a copy of the returned data in an on-disk cache. The real benefit ofSquid emerges when the same data is requested multiple times, since a copy of the on-disk data isreturned to the client, speeding up Internet access and saving bandwidth. Small amounts of disk spacecan have a significant impact on bandwidth usage and browsing speed. (?costs?)

Internet Firewalls (which are used to protect company networks) often have a proxy component. Whatmakes the Squid proxy different from a firewall proxy? Most firewall proxies do not store copies ofthe returned data, instead they re-fetch requested data from the remote Internet server each time.

Squid differs from firewall proxies in other ways too:

Many protocols are supported (firewalls often have specific proxies for specific protocols: it’s diffi cult to ensure code security of a large program)

Hierarchies of proxies, arranged in complex relationships are possible

When, in this book, we refer to a ’cache’, we are referring to a ’caching proxy’ - something that keepscopies of returned data. A ’proxy’ on the other hand, is a program which do not cache replies.

The web consists of HTML pages, graphics and sound files (to name but a few!). Since only a verysmall portion of the web is made up of text, referring to all cached data as pages is misleading. Toavoid ambiguity, caches store objects, not pages.

(? trash Many Internet servers support more than one protocol. A given server can support more thanone type of query protocol. A web server uses the Hyper Text Transfer Protocol (HTTP) to serve data.An older protocol, the File Transfer Protocol (FTP) often runs on web servers too. Muddling them upwould be bad. Caching an FTP response and returning the same data to the client on a subsequent

- 11 -

3 Chapter 2. Terminology and TechnologyTerminology and Technology

Page 14: Squid Guide

HTTP request would be incorrect. Squid uses the complete URL to uniquely identify everything storedin the cache.

So as to avoid returning out of date data to clients, objects must be expired. Squid therefore allows youto set refresh times for objects, ensuring old data is not returned to clients.

Squid is based on software developed for the Harvest project, which developed their ’cached’(pronounced ’Cache-Dee’) as a side project. Squid development is funded by the National Laboratoryof Network Research (NLANR), who are in turn funded by the National Science Foundation (NSF).Squid is ’open source’ software, and although development is done mainly with NSF funding (??),features are added and bugs fixed by a team of online collaborators.

4.1 Why Cache?

4.1.1 In the USASmall Internet Service Providers (ISPs) cache to reduce their line costs, since a large portion of their operating costs are infrastructural, rather than staff related.

Companies and content providers (such as AOL) have recently started caching. These organizationsare not short of bandwidth (indeed, they often have as much bandwidth as a small country), but theircustomers occasionally see slow response. There are numerous reasons for this:

4.1.1.1 Origin Server Load

Raw bandwidth is increasing faster than overall computer performance. These days many servers actas a back-end for one site, load balancing incoming requests. Where this is not done, the result is slowresponse. If you have ever received a call complaining about slow response, you will know the benefitof caching - in many cases the user’s mind is already made up: it’s your fault.

4.1.1.2 Quick Abort

Squid can be configured to continue fetching objects (within certain size limits) even although some-body who starts a download aborts it. Since there is a chance of more than one person wanting thesame file, it is useful to have a copy of the object in your cache, even if the first user aborts. Whereyou have plenty of bandwidth, this continued-fetching ensures that you will be a local copy of theobject available, just in case someone else wants it. This can dramatically reduce latency, at the cost ofhigher bandwidth usage.

4.1.1.3 Peer Conges tion

As bandwidth increases, router speed needs to increase at the same rate. Many peering points (wherehuge volumes of Internet traffic are exchanged) often do not have the router horsepower to supporttheir ever-increasing load. You may invest vast sums of money to maintain a network that stays aheadof the growth curve, only to have all your effort wasted the moment packets move off your networkonto a large peering point, or onto another service provider’s network.

- 12 -

4.1 Why Cache?

Page 15: Squid Guide

4.1.1.4 Traffic spikes

Large sporting, television and political events can cause spikes in Internet traffic. Events like TheOlympics, the Soccer World Cup, and the Starr report on the Clinton-Lewinsky issue create largetraffic spikes.

You can plan ahead for sports events, but it’s diffi cult to estimate the load that they will eventuallycause. If you are a local ISP, and a local team reaches the finals, you are likely to get a huge peak intraffic. Companies can also be affected by traffic spikes, with bulk transfer of large databases or presentations flooding lines at random intervals. Though caching cannot completely solve thisproblem, it can reduce the impact.

4.1.1.5 Unreach able sites

If Squid attempts to connect to an origin server, only to find that it is down, it will log an error andreturn the object (even if there is a chance of sending out-of-date data to the client) from disk. Thisreduces the impact of a large-scale Intenet outage, and can help when a backhoe digs up a majorsegment of your network backbone.

4.1.2 Outside of the USAOutside of the USA, bandwidth is expensive and latency due to very long haul links is high.

4.1.2.1 Costs

Outside of the USA and Canada, bandwidth is expensive. Saving bandwidth reduces Internet infras-tructural costs significantly. Since Internet connectivity is so expensive, ISPs and their customersreduce their bandwidth requirements with caches.

4.1.2.2 Latency

Although reduction of latency is not normally the major reason for introducing caching in these coun-tries, the problems experienced in the USA are exacerbated by the high latency and lower speed of thelines to the USA.

Prev Home NextOverall Layout (for writers) What Squid is not

- 13 -

4.1.2 Outside of the USATerminology and Technology

Page 16: Squid Guide

Squid: A User’s GuidePrev Chapter 2. Terminology and Technology Next

5 What Squid is notSquid acts only as an HTTP proxy. It is not a general purpose proxy (so it cannot normally take theplace of a firewall proxy.

Squid is based on the HTTP/1.1 specification. Squid can only proxy for programs that use this proto-col for Internet access. Browsers, for example, use this specification; their primary function is thedisplay of retrieved web data, using the HTTP protocol.

FTP clients, on the other hand, often support proxy servers, but do not communicate with them usingthe HTTP protocol. These clients will not be able to understand the replies that Squid sends.

Squid is also not a generic proxy, and only handles a small subset of all possible Internet protocols:Quake, News, RealAudio and Video Conferencing will not work through Squid. Squid only supportsUDP for inter-cache communication - it does NOT support any client program that uses UDP for it’s communication, which excludes many multimedia programs.

Prev Home NextTerminology and Technology Up Supported Protocols

- 14 -

5 What Squid is not

Page 17: Squid Guide

Squid: A User’s GuidePrev Chapter 2. Terminology and Technology Next

6 Supported Proto cols

6.1 Supported Client Proto colsSquid supports the following incoming protocol request types (when the proxy requests are sent inHTTP format)

HyperText Transfer Protocol (HTTP), which is the specification that the WWW is based on.

File Transfer Protocol (FTP)

Gopher

Wide Area Information (?Server?) (WAIS) (With the appropriate relay server.)

Secure Socket Layer - which is used for secure online transactions.

6.2 Inter Cache and Management Proto cols(? oh bugger ?)

HTTP, which is used for retrieving copies of objects from other caches.

Internet Cache Protocol (ICP). ICP is used to find out if a specific object is in another cache’sstore.

Cache Digests. This protocol is used to retrieve an index of objects in another cache’s store.When a cache receives a request for an object it does not have, it checks this index to determinewhich cache does have the object.

Simple Network Management Protocol (SNMP). Common SNMP tools can be used to retrieve information about your cache.

Hyper Text Caching Protocol (HTCP). Though HTCP is not widely implemented, Squid is in theprocess of incorporating the protocol. (?check...1.2.3?)

Prev Home NextWhat Squid is not Up Inter-Cache Communication

Protocols

- 15 -

6 Supported ProtocolsSupported Protocols

Page 18: Squid Guide

Squid: A User’s GuidePrev Chapter 2. Terminology and Technology Next

7 Inter-Cache Commu nication Proto colsSquid gives you the ability to share data between caches, but why should you?

Just as there is a benefit to connecting individual PC’s to a network, and this network to the Internet,so there is an advantage to linking your cache to other people’s networks of caches. User Base. The following is not a complete discussion on how the size of your user base will influ-ence your hit rate. Chapter 2 discuss this topic in more depth. In short: the larger your user base, themore objects requested, the higher the chance of an object being requested twice. To increase your hitrate, add more clients.However, in many cases the size of your user base is finite - it’s limited by the number of staffmembers or customers. Co-operative peering with other caches increases the size of your user base,and effectively increases your hit rate. If you peer with a large cache, you will find that a percentage ofthe objects your users are requesting are already available there. Many people can increase their hitrate by about 5% by peering with other caches. Reduced Load. If you have a large network, one cache may not handle all incoming requests. Ratherthan having to continuously upgrade one machine, it makes sense to split the load between multipleservers. This reduces individual server load, while increasing the overall number of queries your cachesystem can handle.Squid implements Inter-Cache protocols in a very efficient manner, through ICP Multicast queries,and Cache Digests, which allow for large networks of caches (hierarchies). With these features, largenetworks of caches add very little latency, allowing you to scale your cache infrastructure as you grow. Disk Space. If you load balance between multiple caches, it is best to avoid duplication of data. Dupli-cated objects reduce the amount of objects in the overall store, which reduces your chances of a hit.Using the Cache Array Routing Protocol (CARP) or other Inter-Cache communication protocolsreduces duplication.For your cache system to be efficient and fast, not only is raw bandwidth an issue - choosing the right hardware and software is a diffi cult task.

Prev Home NextSupported Protocols Up Firewall Terminology

- 16 -

7 Inter-Cache Communication Protocols

Page 19: Squid Guide

Squid: A User’s GuidePrev Chapter 2. Terminology and Technology Next

8 Firewall Termi nol ogyFirewalls are used by many companies to protect their networks. Squid is going to have to interactwith your firewall to be useful. So that we are on the same wavelength, I cover some of the terminol-ogy here: it makes later descriptions easier if we get all the terms sorted out first.

8.1 The Two Types of FirewallA proxy-based firewall firewall allows does not route packets through to the end-user. All incomingdata is handled by the firewall’s IP stack, with programs (called proxies) that handle all the passing ofdata.

Proxies accept incoming data from the outside, process it (to make sure that it is in form that isunlikely to break an internal server) and pass it on to machines on the inside. The software running onthe proxy-level firewall is (hopefully !) written in a secure manner, so to crack through a proxy-level firewall you would probably have to find a hole in the firewall software itself, rather than on the soft-ware on inside machine.

A packet-filtering firewall works on a per-packet basis, deciding whether to pass every packet to andfrom your inside machines on the basis of the packet’s IP protocol, and source and destination IP/portpairs. If a publicly available service on an internal server is running buggy software, a cracker can probably penetrate your network.

It’s not up to me to say what sort of proxy is the best: it depends too much on your situation anyway.Packet filtering firewalls are normally the easiest to get Squid working through. The ’transparent redi-rection’ of proxy firewalls, on the other hand, can be useful: it can make redirection of client machinesto the cache server easy.

8.2 Firewalled SegmentsIf you have a firewall, your network is generally segmented in two: hosts are either trusted oruntrusted. Trusted hosts are on the inside of the firewall. Untrusted hosts (basically the rest of the Internet) sit on the outside of the firewall.

Occasionally, you may have a third class of hosts: semi-trusted. These hosts normally sit on a seperatesegment of the network, seperate from your internal systems. They are not trusted by your securehosts, but are still protected by the firewall. Public servers (such as web servers and ftp servers) are generally put here. This zone (or segment) is generally called a Demilitarized Zone (or DMZ).

8.3 Hand OffsWith a proxy-level firewall, client machines are normally configured to use the firewall’s internal interface as a proxy server. Some firewalls can do automatic redirection of outgoing web requests, butmay not be able to do authentication in this mode (?or is this heresy?). If you already have a large

- 17 -

8 Firewall TerminologyFirewall Terminology

Page 20: Squid Guide

number of client machines setup to talk to the firewall as a proxy, the prospect of having to change alltheir setups can influence your decision on where to position the cache server. In many cases it’s easierto re-configure the firewall to communicate with a parent, than to change the proxy server settings onall client machines.

The vast majority of proxy-level firewalls are able to talk to another proxy server using HTTP. Thisfeature is sometimes called a hand-off, and it is this which allows your firewall to talk to a higher-level firewall or cache server via HTTP. Hand-offs allow you to have a stack of firewalls, with higher-up firewalls protecting your entire company from outside attacks, and with lower-down firewalls protect-ing your different divisions from one another. When a firewall hands-off a request to another firewallor proxy server, it simply acts as a pipe between the client and the remote firewall.

The term hand-off is a little misleading, since it implies that the lower-down firewall is somehow lessinvolved in the data transfer. In reality the proxy process handling such a request is just as involved aswhen conversing directly with a destination server, since it is channeling data between the client andthe firewall the connection was handed to. The lower-down firewall is, in fact, treating the higher-upcache as a parent.

Prev Home NextInter-Cache Communication Protocols

Up Installing Squid

- 18 -

8.3 Hand Offs

Page 21: Squid Guide

Squid: A User’s GuidePrev Next

9 Chapter 3. Installing SquidTable of Contents Hardware Requirements Choosing an Operating System Basic System Setup Getting Squid Compiling Squid

10 Hardware Require mentsCaching stresses certain hardware subsystems more than others. Although the key to good cache performance is good overall system performance, the following list is arranged in order of decreasing importance:

Disk random seek time

Amount of system memory

Sustained disk throughput

CPU power

Do not drastically underpower any one subsystem, or performance will suffer. In the case of catas-trophic hardware failure you must have a ready supply of alternate parts. When your cache is critical,you should have a (working!) standby machine with operating system and Squid installed. This can bekept ready for nearly instantaneous swap-out. This will, of course, increase your costs, something thatyou may want to take into account. Chapter 13 covers standby procedures in detail.

10.1 Gathering statis ticsWhen deciding on your cache’s horsepower, many factors must be taken into account. To decide onyour machine, you need an idea of the load that it will need to sustain: the peak number of requests perminute. This number indicates the number of ’objects’ downloaded in a minute by clients, and can beused to get an idea of your cache load.

Computing the peak number of requests is diffi cult, since it depends on the browsing habits of users.This, in turn, makes deciding on the required hardware diffi cult. If you don’t have many statistics as toyour Internet usage, it is probably worth your while installing a test cache server (on any machine thatyou have handy) and pointing some of your staff at it. Using ratios you can estimate the number ofrequests with a larger user base.

- 19 -

9 Chapter 3. Installing SquidInstalling Squid

Page 22: Squid Guide

When gathering statistics, make sure that you judge the ’peak’ number of requests, rather than an average value. You shouldn’t take the number of requests per day and divide, since your peak (during,for example, lunch hour) can be many times your average number of requests.

It’s a very good idea to over-estimate hardware requirements. Stay ahead of the growth curve too,since an overloaded cache can spiral out of control due to a transient network problems If a cachecannot deal with incoming requests for some reason (say a DNS outage), it still continues to accept incoming requests, in the hope that it can deal with them. If no requests can be handled, the number of concurrent connections will increase at the rate that new requests arrive.

If your cache runs close to capacity, a temporary glitch can increase the number of concurrent,waiting, requests tremendously. If your cache can’t cope with this number of established connections,it may never be able to recover, with current connections never being cleared while it tries to deal witha huge backlog.

Squid 2.0 may be configured to use threads to perform asynchronous Input/Output on operatingsystems that supports Posix threads. Including async-IO can dramatically reduce your cache latency, allowing you to use a less powerful machine. Unfortunately not all systems support Posix threadscorrectly, so your choice of hardware can depend on the abilities of your operating system. Yourchoice of operating system is discussed in the next section - see if your system will support threads there.

10.2 Hard DisksThere are numerous things to consider when buying disks. Earlier on we mentioned the importance ofdisks with a fast random-seek time, and with high sustained-throughput. Having the world’s fastestdrive is not useful, though, if it holds a tiny amount of data. To cache effectively you need disks thatcan hold a significant amount of downloaded data, but that are fast enough to not slow your cache to acrawl.

Seek time is one of the most important considerations if your cache is going to be loaded. If you havea look at a disk’s documentation there is normally a random seek time figure. The smaller this valuethe better: it is the average time that the disk’s heads take to move from a random track to another (in milliseconds). Operating systems do all sorts of interesting things (which are not covered here) toattempt to speed up disk access times: waiting for disks can slow a machine down dramatically. These operating system features make it diffi cult to estimate how many requests per second your cache canhandle before being slowed by disk access times (rather than by network speed). In the next few para-graphs we ignore operating system readahead, inode update seeks and more: it’s a back of the envelope approximation for your use.

If your cache does not use asynchronous Input-Output (described in the Operating system sectionshortly) then your cache loses a lot of the advantage gained by multiple disks. If your cache is going tobe loaded (or is running anywhere approaching capacity according to the formulae below) you mustensure that your operating system supports posix threads!

A cache with one disk has to seek at least once per request (ignoring RAM caching of the disk andinode update times). If you have only one disk, the formula for working out seeks per second (andhence requests per second) is quite simple:

- 20 -

10.2 Hard Disks

Page 23: Squid Guide

requests per second = 1000/seek time

Squid load-balances writes between multiple cache disks, so if you have more than one data disk yourseeks-per-second per disk will be lower. Almost all operating systems will increase random seek timein a semi-linear fashion as you add more disks, though others may have a small performance penalty.If you add more disks to the equation, the requests per second value becomes even more approximate!To simplify things in the meantime, we are going to assume that you use only disks with the same seektime. Our formula thus becomes:

1000theoretical requests per second = ----------------- (seek time)/(number of disks)

Let’s consider a less theoretical example: I have three disks - all have 12ms seek times. I can thus (theoretically, as always) handle:

requests per second = 1000/(12/3) = 1000/4 = 250 requests per second

While we are on this topic: many people query the use of IDE disks in caches. IDE disks these days generally have very similar seek times to SCSI disks, and (with DMA-compatible IDE controllers)approach the speed of data transfer without slowing the whole machine down.

Deciding how much disk space to allocate to Squid is diffi cult. For the pilot project you can simply allocate a few megabytes, but this is unlikely to be useful on a production cache.

The amount of disk space required depends on quite a few factors.

Assume that you were to run a cache just for yourself. If you were to allocate 1 gig of disk, and youbrowse pages at a rate of 10 megabytes per day, it will take at least 100 days for you to fill the cache.

You can thus see that the rate of incoming cache queries influences the amount of disk to allocate.

If you examine the other end of the scale, where you have 10 megabytes of disk, and 10 incomingqueries per second, you will realize that at this rate your disk space will not last very long. Objects arelikely to be pushed out of the cache as they arrive, so getting a hit would require two people to be downloading the object at almost exactly the same time. Note that the latter is definitely not impossi-ble, but it happens only occasionally on loaded caches.

The above certainly appears simple, but many people do not extrapolate. The same relationshipsgovern the expulsion of objects from your cache at larger cache store sizes. When deciding on theamount of disk space to allocate, you should determine approximately how much data will passthrough the cache each day. If you are unable to determine this, you could simply use your theoreticalmaximum transfer rate of your line as a basis. A 1mb/s line can transfer about 125000 bytes persecond. If all clients were setup to access the cache, disk would be used at about 125k per second,which translates to about 450 megabytes per hour. If the bulk of your traffic is transferred during theday, you are probably transferring 3.6 gigabytes per day. If your line was 100% used, however, youwould probably have upgraded it a while ago, so let’s assume you transfer 2 gigabytes per day. If youwanted to keep ALL data for a day, you would have to have 2 gigabytes of disk for Squid.

The feasibility of caching depends on two or more users visiting the same page while the object is stillon disk. This is quite likely to happen with the large sites (search engines, and the default home pagesin respective browsers), but the chances of a user visiting the same obscure page is slim, simply due to

- 21 -

10.2 Hard DisksInstalling Squid

Page 24: Squid Guide

the volume of pages. In many cases the obscure pages are on the slowest links, frustrating users. Depending on the number of users requesting pages you should keep pages for longer, so that thechances of different users accessing the same page twice is higher. Determining this value, however, is diffi cult, since it also depends on the average object size, which, in turn, depends on user habits.

Some people use RAID systems on their caches. This can dramatically increase availability, but aRAID-5 system can reduce disk throughput significantly. If you are really concerned with uptime, youmay find a RAID system useful. Since the actual data in the cache store is not vital, though, you mayprefer to manually fail-over the cache, simply re-formatting or replacing drives. Sure, your cache mayhave a lower hit-ratio for a short while, but you can easily balance this minute cost against what hard-ware to do automatic failover would have cost you.

You should probably base your purchase on the bandwidth description above, and use the datadiscussed in chapter 11 to decide when to add more disk.

10.3 RAM require mentsSquid keeps an in-memory table of objects in RAM. Because of the way that Squid checks if objectsare in the file store, fast access to the table is very important. Squid slows down dramatically whenparts of the table are in swap.

Since Squid is one large process, swapping is particularly bad. If the operating system has to swapdata, Squid is placed on the ’sleeping tasks’ queue, and cannot service other established connections.(? hmm. it will actually get woken up straight away. I wonder if this is relevant ?)

Each object stored on disk uses about 75 bytes (? get exact value ?) of RAM in the index. The averagesize of an object on the Internet is about 13kb, so if you have a gigabyte of disk space you will proba-bly store around about 80 000 objects.

At 75 bytes of RAM per object, 80 000 objects require about six megabytes of RAM. If you have8gigs of disk you will need 48Mb of RAM just for the object index. It is important to note that thisexcludes memory for your operating system, the Squid binary, memory for in-transit objects and spareRAM for for disk cache.

So, what should your sustained-thoughput of your disks be? Squid tends to read in small blocks, so throughput is of lesser importance than random seek times. Generally disks with fast seeks are high throughput, and most disks (even IDE disks these days) can transfer data faster than clients can down-load it from you. Don’t blow a year’s budget on really high-speed disks, go for lower-seek timesinstead - or add more disks.

10.4 CPU PowerSquid is not generally CPU intensive. On startup Squid can use a lot of CPU while it works out what isin the cache, and a slow CPU can slow down access to the cache for the first few minutes upon startup.A Pentium 133 machine generally runs pretty idle, while receiving 7 TCP requests a second A multi-processor machine generally doesn’t increase speed dramatically: only certain portions of the Squidcode are threaded. These sections of code are not processor intensive either: they are the code pathswhere Squid is waiting for the operating system to complete something. A multiprocessor machine generally does not reduce these wait times: more memory (for caching of data) and more disks mayhelp more.

- 22 -

10.3 RAM requirements

Page 25: Squid Guide

Prev Home NextFirewall Terminology Choosing an Operating System

- 23 -

10.4 CPU PowerInstalling Squid

Page 26: Squid Guide

Squid: A User’s GuidePrev Chapter 3. Installing Squid Next

11 Choos ing an Operating SystemWhere I work, we run many varieties of Unix. When I first installed Squid it was on my desktop Linuxmachine - if I break it by mistake it’s not going to cause users hassles, so I am free to do on it what Iwish.

Once I had tested Squid, we decided to allow general access to the cache. I installed Squid on thefastest unused machine we had available at the time: a (then, at least) top of the range Pentium 133with 128Mb of RAM running FreeBSD.

I was much more familiar with Linux at that stage, and eventually installed Linux on the public cachemachine. Though running Linux caused some inconveniences (specifically with low per-process file-handle limits), it was the right choice, simply because I could maintain the machine better. Many timesmy experience with Linux has gotten me out of potentially sticky situations.

If your choice of operating system saves you time, and runs Squid, use it! Just as I didn’t use DigitalUnix (Squid is developed on funded Digital Unix machines at NLANR), you don’t need to use Linuxjust because I do.

Most modern operating systems sport both similar performance and similar feature sets. If your systemis commonly used and roughly Posix compliant at the source level, it will almost certainly besupported by Squid.

When was the last time you had an outage due to hardware failure? Unless you are particularlyunlucky, the interval between hardware failures is low. While the quality of hardware has increased dramatically, software often does not keep pace. Many outages are caused by faulty application of operating system software. You must thus be able to pick up the pieces if your operating systemcrashes for some reason.

11.1 ExperienceIf you normally work on a specific operating system, you should probably not use your cache as asystem to experiment with a new ’flavor’ of Unix. If you have more experience in an operatingsystem, you should use that system as the basis for your cache server. Customers rapidly turn offcaching when a cache stops accepting requests (while you learn your way around some ’feature’).

Your cache system will almost certainly form a core part of your network as soon as it is stable. Youmust be able to return the system to working order in minimal time in the event of a system failure,and this is where your existing experience becomes crucial. If the failure happens out of businesshours you may not be able to get technical support from your vendor. A dialup ISP’s hours of businessdiffer dramatically to that of Operating System vendors.

- 24 -

11 Choosing an Operating System

Page 27: Squid Guide

11.2 FeaturesThough most operating systems support similar features, there are often no standards for functionsrequired for some of the less commonly used operating system features. One example is transparency:many operating systems can now support transparent redirection to a local program, but almost all ofthem function in a different way, since there is not a real standard for the way an operating system issupposed to function in this scenario.

If you are unable to find information about Squid on your operating system, you may want to organizea trial hardware installation (assuming that you are using a commercial operating system) as a test.Only when you have the system running can you be sure that your operating system supports therequired features.

Squid works on the following systems: (? List ?)

If you are using Squid without extensions like transparency and ARP access control lists, you shouldnot have problems. For your convenience a table of operating system support of specific features isincluded. Since Squid is constantly being developed, it’s likely that this list will change.

11.3 Compil ersSquid is written on Digital Unix (?version ?) machines running the GNU C compiler (GCC). GCC isincluded with free operating systems such as Linux and FreeBSD, and is easily available for manyother operating systems and hardware platforms. The GNU compiler adheres as closely to the ANSI C standard as possible, so if a different compiler is included with your operating system, it may (or maynot) have trouble interpreting Squid’s source code, depending on it’s level of ANSI compliance. In practice, most compilers work fine.

Some commercial compilers choose backward compatibility with older versions over ANSI compli-ance. These compilers generally support an option that turns on ’ANSI compliant mode’. If you havetrouble compiling Squid you may have to turn this mode on. (? is this still valid? I remember thingslike this back in the Borland C days - though I seem to remember this on a Unix system too... ?) In theworst possible scenario you may have to compile GCC with your existing compiler and use GCC tocompile Squid.

If you do not have a compiler, you may be able to find a precompiled version of GCC for your systemon the Internet. Be very careful when installing software from untrusted sources. This is discussedshortly in the "precompiled binary" section.

If you cannot find versions of GCC for your platform, you may have to factor in the cost of thecompiler when deciding on your operating system and hardware.

Prev Home NextInstalling Squid Up Basic System Setup

- 25 -

11.2 FeaturesChoosing an Operating System

Page 28: Squid Guide

Squid: A User’s GuidePrev Chapter 3. Installing Squid Next

12 Basic System SetupBefore you even install the operating system, it’s best to get an idea as to how the system will lookonce Squid is up and running. This will allow you to partition the disks on the machine so that theirmount path will match Squid’s default configuration.

12.1 Default Squid direc tory struc tureNormally Squid’s directory tree looks like this:

/ usr/ local/ squid/ bin/ cache/ etc/ src/ squid-2.0/

Working through each directory below /usr/local/squid in the order presented above: bin. The Squid binary and associated tools are stored in this directory. Some tools are included withthe Squid source to help you manage and tune your cache server.cache. Squid has to store cached data on disk somewhere. The path /usr/local/squid/cache is thedefault location. You can change the location of this directory by editing the Squid config file.etc. Squid configuration files are stored in this directory. The most commonly changed file in here is squid.conf. We discuss the basic tags in that file in the next chapter.src. Since you are likely to download the source code for Squid from the net, it is useful to compile thecode where you can find it easily. I generally create a src directory and extract the code in there. Thisway I can revert to a previous version (without downloading it all over again). If you wish, you caneasily keep Squid in your /usr/local/src directory, or delete it completely once you have installed the binaries.Back to the cache directory: if you have more than one partition for the cached data, you can make subdirectories for each of the filesystems in the cache directory. Normally people name these directo-ries cache1, cache2’, cache3 and so forth. Your cache directories should be mounted somewhere like /usr/local/squid/cache/1/ and /usr/local/squid/cache/2/. If you have only one cache disk, you cansimply name the directory /usr/local/squid/cache/.

In Squid-1.1 cache directories had to be identical in size. This is no longer the case, so if you are upgrading to Squid 2.0 you may be able to resize your cache partitions. To do this, however, you mayhave to repartition disks and reformat.

- 26 -

12 Basic System Setup

Page 29: Squid Guide

When you upgrade to the latest version of Squid, it’s a good idea to keep the old working compiledsource tree somewhere. If you upgrade to the latest Squid and encounter problems, simply kill Squid,change to the previous source directory and reinstall the old binaries. This is a lot faster than trying to remember which source tree you were running, downloading it, compiling it, applying local patchesand then reinstalling.

12.2 User and Group IDsSquid, like most daemon processes on Unix machines, normally runs as the user nobody and with thegroup nogroup.

For the maximum flexibility in allowing root and non-root users to manipulate the Squid configura-tion, you should make both a new user and two new groups, specifically for the Squid system, ratherthan using the nobody and nogroup IDs. Throughout this book we assume that you have done so, andthat a group and a user have been created, (both called squid) and a second admin group, called squidadm. The squid user’s primary group should be squid, and the user’s home directory should be /usr/local/squid (the default squid software install destination).

When you have multiple administrators of a cache machine, it is useful to have a dedicated squidadmgroup, with sub-administrators added to this group. This way, you don’t have to change to the rootuser whenever you want to make changes to the Squid config. It’s possible, for users in the squidadmgroup to gain root access, so you shouldn’t place people without root access in the squidadm group.

When the config file has been changed, a signal has to be sent to the Squid process to inform it thatthat config files are to be re-read. Sending signals to running processes isn’t possible when the signalsender isn’t the same userid as the receiver. Other config file maintainers need permission to changetheir user-id (either by using the ’su’ command, or by logging in with another session) to either theroot user or to the user Squid is running as.

In some environments cache software maintainers aren’t trusted with root access, and the user nobodyisn’t allowed to log in. The best solution is to allow users that need to make changes to the config fileaccess to a reload script using sudo. Sudo is available for many systems, and source code is available.

In Chapter 4 we go through the process of changing the user-id that Squid runs as, so that files Squidcreates are owned by the squid user-id, and by the group squid. Binaries are owned by root, and configfiles are changeable by the squidadm group.

Prev Home NextChoosing an Operating System Up Getting Squid

- 27 -

12.2 User and Group IDsBasic System Setup

Page 30: Squid Guide

Squid: A User’s GuidePrev Chapter 3. Installing Squid Next

13 Getting SquidNow that your machine is ready for your Squid install, you need to download and install the Squidprogram. This can be done in two ways: you can download a source version and compile it, or you can download a precompiled binary version and install that, relying on someone else to do the compilationfor you.

Binary versions of Squid are generally easier to install than source code versions, specifically if your operating system vendor distributes a package which you can simply install.

Installing Squid from source code is recommended. This method allows you to turn on compile-timeoptions that may not be included in distributed binary versions (one of many examples: SNMP supportis not included into the source at compile time unless it is specifically included, and most binaryversions available do not include snmp support). If your operating system has been optimized so thatSquid can run better (let’s say you have increased the number of open filehandles per process) a precompiled binary will not take advantage of this tuning, since your compiler header files are proba-bly different to the ones where the binaries where compiled.

It’s also a little worrying running binaries that other people distribute (unless, of course, they are offi -cially supplied by your operating system vendor): what if they have placed a trojan into the binaryversion? To ensure the security of your system it is recommended that you compile from the officialsource tree.

Since we suggest installing from source code first, we cover that first: if you have to download a Squidbinary from somewhere, simply skip to the next sub-section: Getting a binary version of Squid.

13.1 Getting the Squid source codeSquid source is mirrored by numerous sites. For a list of mirrors, have a look at

Deciding which of the available files to download can become an issue, especially if you are not famil-iar with the Squid version naming convention. Squid is (as of this writing) in version 2. As features areadded, the minor version number is incremented (Squid 2.0 becomes Squid 2.1, then Squid 2.2 etcetc). Since new features may introduce new bugs, the first version including new features is distributedas a pre-release (or beta) version. The first pre-release of Squid 1.2 is called squid-2.1.PRE1-src.tar.gz. The second is squid-2.1.PRE2-src.tar.gz. Once Squid is consideredstable, a general release version is distributed: the first release version is called squid-2.0.RELEASE-src.tar.gz, the second (which would include minor bugfixes) squid-2.0.RELEASE2-src.tar.gz.

In short, files are named as follows: squid-2.minor-version-number.stabil-ity-info.release-number.tar.gz. Unless you are a Squid developer, you should download the last avail-able RELEASE version: you are less likely to encounter bugs this way.

- 28 -

13 Getting Squid

Page 31: Squid Guide

Squid source is normally available via FTP (the File Transfer Protocol), so you should be able to download Squid source by using the ftp program, available on almost every Unix system. If you arenot familiar with ftp, you can simply select the mirror closest to you with your browser and save theSquid source to your disk by right-clicking on the filename and selecting save as (do not simply clickon the filename - many browsers attempt to extract compressed files, printing the tar file to yourbrowser window: this is definitely not what you want!). Once the download is complete, transfer thefile to the cache machine.

13.2 Getting Binary Versions of SquidFinding binary versions of Squid to install is easy: deciding which binary to trust is more diffi cult. Ifyou do not choose carefully , someone could undermine your system security. If you cannot compileSquid, but know (and trust) someone that can do it for you, get them to help. It’s better than download-ing a version contributed by someone that you don’t know.

The worst places to download precompiled packages from are sites that accept contributions from thepublic at large: avoid files in paths like incoming or uploads, since the source of the file is unknown.

Mailing lists are often good places to find compiled Software (though people become irri tated if youdo not actually make a concerted effort to find a trusted version before bothering the list). Regular contributors to mailing lists have a reputation at stake, and are likely to provide binary versions of software that actually match the official source.

Binaries compiled by people the core Squid developers (www.ircache.net) know and trust are avail-able at ftp://squid.nlanr.net/pub/contrib/binaries/. You may be able to find a Squid binary for your operating system here.

Files can be distributed in many different ways. Generally Squid is tranformed into a package that canbe installed with some package tool. There are many competing package managers, so there is no wayof covering them all here.

Prev Home NextBasic System Setup Up Compiling Squid

- 29 -

13.2 Getting Binary Versions of SquidGetting Squid

Page 32: Squid Guide

Squid: A User’s GuidePrev Chapter 3. Installing Squid Next

14 Compil ing SquidCompiling Squid is quite easy: you need the right tools to do the job, though. First, let’s go throughgetting the tools, then you can extract the source code package, include optional Squid components(using the configure command) and then actually compile the distributed code into a binary format.

A word of warning, though: this is the stage where most people run into problems. If you haven’tcompiled source before, try and follow the next section in order - it shouldn’t be too bad. If you don’tmanage to get Squid running, at least you have gained experience.

14.1 Compi lation ToolsAll GNU utilities mentioned below are avaliable via FTP from the official GNU ftp site or one of it’smirrors. A list of mirrors is available at http://www.gnu.org/, or download them directly from ftp://ftp.gnu.org/.

The GNU compiler is only distributed as source (creating a chicken-and-egg problem if you do nothave a compiler) you may have to do an Internet search (using one of the standard search engines) totry and find a binary copy of the GNU compiler for your system. The Squid source is distributed incompressed form. First a standard tar file is created. This file is then compressed with the GNU gzipprogram. To decompress this file you need a copy of gzip. GCC (The Gnu C Compiler) is the recom-mended compiler: the developers wrote Squid with it, and it is available for almost all systems.

You will also need the make program, of which there is also a GNU version easily available.

If possible, install a C debugger: the GNU debugger (GDB) is available for most platforms. Though a debugger is not necessary for installation, but is very useful in the case of software bugs (as discussedin chapter 13).

14.2 Unpack ing the Source ArchiveEarlier we looked at the tree structure of the /usr/local/squid directory. I suggest extracting the Squidsource to the /usr/local/squid/src directory. So, create the directory and copy the downloaded Squid tar.gz file into it.

First let’s decompress the file. Some versions of tar can decompress the file in one step, but for compatability’s sake we are going to do it in two steps. Decompress the tar file by running gzip -dv squid-version.tar.gz. If all has gone well you should have a file called squid-version.tar in thecurrent directory. To get the files out of the "tarball", run tar xvf squid-version.tar.

Tar automatically puts the files into a subdirectory: something like squid-2.1.PRE2. Change into theextracted directory, and we can start configuring the Squid source.

- 30 -

14 Compiling Squid

Page 33: Squid Guide

14.3 Compi lation optionsSquid features are enabled (or disabled) with the configure shell script. Some Squid features have tobe specifically enabled when Squid is compiled, which can mean that you have to recompile at a laterstage. There are two reasons that a feature can be disabled by default: Operating system Compatibil ity . Although Squid is written in as generic a way possible, certain functions (such as async-io, transparency and ARP-based access control lists) are not available on all operating systems. When many operating systems cannot use a feature, it is included as a compile time option.Effi ciency. On a very lightly loaded cache, async-io can actually slow down requests minutely. Somesystem administrators may wish to disable certain features to speed up their caches.You may be wondering why there simply aren’t config file options for these less used features. Formost of the features there really isn’t a reason other than (?minimalisim?). Why have code sitting inthe executable that isn’t actually used? You can include the features that you might use at some time inthe future without detrimental effects (other than a slightly larger binary), so as to avoid having to recompile the Squid source later on.

The configure program also has a second function: with some source code you have to edit a headerfile which tell the compiler which function calls to use on the system. This very often makes source compilation diffi cult. With Squid, however, the GNU configure script checks what programs, librariesand function calls are available on your system. This simplifies setup dramatically.

To make configure as generic as possible, it’s actually a Bourne Shell /bin/sh script. If you havereplaced your /bin/sh shell with a less Posix-capable shell (like ash) you may not be able to run config-ure. If this is the case you will have to change the first line of the configure script to run the full shell.

all source inclusion options are set with the command ’./configure option’. On most systems rootdoesn’t have a ’.’ in their search path for security reasons, so you have to fully specify the path to thebinary (hence the ’ /’).

To turn more than one configuration option on at once you simply append each option to the end of thecommand line. You can, for example, change the prefix install directory and turn Async-IO on with acommand like the following (more on what each of these options is for shortly).

./configure --prefix=/usr/people/staff/oskar/squid --enable-async-io

Note that only the commonly used configuration options are included here. To get a complete list ofoptions you can run ’./configure --help’. Many of the resulting options are standard to the GNU configure script that Squid uses, and are used for some things like cross compilation.

If you wish to find out about some of the more obscure options you may have to ask someone on oneof the relevant mailing lists, or even read the source code!

14.3.1 Reduc ing output of config ureWhen you run configure you normally get a fairly verbose output as to what is being checked for.Most people don’t need all this information, so there is an option to stop configure printing themessages that aren’t important. To reduce the amount of printed output, use the --quiet option. Thisway you will only see error messages, not debug information.

- 31 -

14.3 Compilation optionsCompiling Squid

Page 34: Squid Guide

The first time you run configure you should run it in verbose mode. The configure process can take awhile on slower machines, so you should get an idea as to how long it will take to run. Should youneed to submit a bug report, you should always include as much information as possible, and shouldinclude the full configure output.

14.3.2 Desti nation direc torySome system administrators would prefer to dispense with the /usr/local/squid directory describedearlier. On some systems you may even be installing Squid on a machine where you do not have rootaccess (and can thus not create the /usr/local/squid directory). In either of these cases you will need tochange your destination path.

Throughout this book I assume that you have installed Squid in the default directory. Using the default destination will make it easier for you to follow the examples in this book.

Changing the destination directory is done with the --prefix configure option. Here are some exampleswhere we use this option.

Installing Squid in your home directory:

./configure --prefix=/usr/people/staff/oskar/squid

If you are installing Squid on a dedicated cache machine you may wish to place all Squid-related filesin the /usr/local directory. Config files (for example) will thus live in /usr/local/etc.

./configure --prefix=/usr/local/

14.3.3 Using the DL-Malloc LibraryThe memory allocation routines included with many operating systems aren’t very good for the waythat Squid allocates and uses memory. Squid uses the memory subsystem more intensively than mostprograms, since it’s a single process which runs for an extended period of time and continuously allo-cates and frees small sections of memory. On some systems the Squid process size increases at a rapidrate. When it eventually consumes all the memory on the system, it crashes.

This option enables a different system memory allocator: DL-Malloc, by Doug Lea, which is known tobe efficient for Squid’s allocation patterns.

Squid will increase in size as objects are added to the disk cache, as discussed in the Hardware Requirements section. The index of objects in the disk cache takes up RAM, so make sure that youhave sufficient memory in your system before deciding that the memory allocation system is at fault.

If a recently started copy of Squid uses substantially less memory than one that has been running for afew days (with the same size cache store), you might want to configure Squid to use DL-Malloc.

The included DL-Malloc memory allocation routines are not thread-safe, so you may not be able touse this option in conjunction with Async-IO. (? need to check details ?)

To use DL-Malloc, simply use the --enable-dl-malloc option:

- 32 -

14.3.2 Destination directory

Page 35: Squid Guide

./configure --enable-dl-malloc

14.3.4 Regular expres sion routinesRegular expressions allow you to do complex string matching, and are used for various things in theSquid config files (most notably in the rules that control how long objects stay in the cache).

On some systems you may wish to replace the default regular-expression routines with the GNUroutines. This may be because the default operating system ones are incompatible with Squid or do not function correctly. If your system doesn’t have regular expression libraries, Squid will automaticallyuse the GNU library, so the GNU regular expression routines are included in the default Squid sourcecode tree, and don’t have to be downloaded seperately.

To enable use of the GNU libraries, simply use the --enable-gnuregex configure option.

14.3.5 Asyn chronous IOSquid 2.0 includes a major performance increase in the form of Async-IO.

It’s important to remember that Squid is one processes. In many Internet daemons, more than one copyruns at a time, so if one process is by a system call, it does not effect the other running copies.

Squid is only one process. If the main loop stops running for some reason, all connections are slowed.In all versions of Squid, the main loop uses the select and poll system calls to decide which connec-tions to service. As Squid receives data from the server, it writes the data to disk and to the client.

To write data to disk, a file has to be opened on the cache drive. When lots of clients are opening andclosing connections to a busy cache, the main loop has to make lots of calls to open and close networkand disk filehandles (note that the word filehandle can refer to both a network connection and anon-disk file). These two functions block the flow of all data through the cache. While waiting for opento return, Squid cannot perform any other functions.

When you enable Async-IO, Squid 2.0 uses threads to open and close filedescriptors. A thread is partof the main Squid program in most ways, except that if it makes use of a blocking system call (such as open), only the thread stops, not the main loop or other threads. Note that there is not one thread per connection.

Using threads to make calls to blocking function calls reduces the latency that a cache adds to eachrequest. (People sometimes worry about the latency that caches add, but if you have a fast enoughcache the latency is not an issue - the client sees no noticeable overhead. Network overhead normallyoutweighs Squid overhead). Async-IO drastically reduces cache overhead when you have a loadedcache.

Unfortunately Posix threads aren’t available on all operating systems. This ties your hardware choiceinto your choice of operating system, since if your operating system does not support threads theremay be no choice but to use a faster system, or even to split the load between multiple machines. (?need a table of machines that work ?)

You should probably try and run Squid with Async-IO enabled if you have a few thousand requestsper hour. Some systems only support threads properly with a fair amount of initial setup. If your loadis low and Async-IO doesn’t work straight away you can leave Squid in the default configuration.

- 33 -

14.3.4 Regular expression routinesCompiling Squid

Page 36: Squid Guide

Use the --enable-async-io configure option to include the async-io code into Squid.

14.3.6 User Agent loggingMost modern browsers include a header with each outgoing request that includes some basic informa-tion about the user’s browser and operating system. This header is called a ’user-agent’ header, since itdescribes the agent program (the browser) used. An automated agent includes different user-agentheaders, so logging user-agent headers allow you to see if someone using an automated web fetcherprogram (commonly referred to as a spider) to fetch pages on their behalf. It can also be used to find statistics as to the most commonly used browsers. The captured information is written to a log file specified in the configuration file. To include the code responsible for logging this information into theSquid binary, use the --enable-useragent-log option to configure.

14.3.7 Simple Network Moni tor ing Proto col (SNMP)Enabling the Simple Network Monitoring Protocol (SNMP) allows you to query your cache machinewith one of the many SNMP tools available. If you have an existing SNMP monitoring system, youshould be able to use your existing software to monitor Squid performance and retrieve usage informa-tion. This is discussed in detail in Chapter 6.

Some tools will read the Squid MIB (? what does this stand for ?) included with Squid (as /usr/local/squid/etc/mib.txt, once Squid is installed). Some tools, on the other hand, will have to bepatched to understand the MIB that Squid uses. Since most SNMP products are written with a router inmind, they may not talk to an application like Squid, since the Squid MIB is quite different from arouter MIB. (For more information on Squid and SNMP, see chapter 11)

Use the --enable-snmp configure option to enable the Squid SNMP code.

14.3.8 Killing the parent process on exitSince Squid will be a very important part of your network when it is installed, you will probably havea program which simply restarts Squid if the running process exits. The RunCache program includedwith Squid does just this.

If you are doing maintenance on the cache system and actually wanted to kill the Squid process,having it automatically restarted as you work can be irri tating, or even cause real problems.

This option puts code into Squid that kills the parent process if Squid is shutdown cleanly. If Squidcrashes it leaves the parent process alone, and will this be automatically restarted.

Use the --enable-kill-parent-hack to enable killing of the parent process on exit.

If you don’t use this option, the correct procedure is to kill the parent with the kill command, and tothen use the shutdown command described in the Running Squid section to shutdown Squid. Do notuse the ’kill’ command if you can avoid it: Squid needs time to shut down cleanly, since it writes acomplete list of objects to disk).

- 34 -

14.3.6 User Agent logging

Page 37: Squid Guide

14.3.9 Reduc ing time system-callsWhen writing logs of cache events and client accesses, Squid calls the gettimeofday() operating systemcall to determine the accurate time.

This system call can take a short while to return, leaving Squid doing nothing while while it could bereading and writing data for something that doesn’t require logging. The amount of time that Squidtakes to make the system call is negligible on most machines, but under very high load the hugenumber of calls can impact overall performance. Enabling the ’time-hack’ option makes Squid updatethe clock only once per second, reducing the overhead dramatically on such caches. This does meansthat your log messages are less accurate. The log accuracy is important to some people, though. Whenyou have accurate time stamps of how long transfers take, you can create graphs of response time, anduse them to decide when you need to upgrade your machine. (More on this in chapter 11: Cache anal-ysis).

Most people do not need to use the --enable-time-hack option. It’s useful mainly on very slowmachines, or on operating systems where the gettimeofday call is very slow.

14.3.10 ARP-based Access Control ListsAll ethernet cards have a (supposedly) unique identifier which is used as an address for all networktraffic destined for that card. This number is referred to as a MAC address. If the card didn’t have thisaddress the operating system would have to check every packet on the network and decide if thepacket was destined for it’s IP address. With ethernet, however, the card’s internal optimized hardwarecan check all the packets and decide if the packet needs to be passed up to the operating system. Thenetwork protocol that associates MAC addresses with IP numbers is known as ARP (Address Resolu-tion Protocol).

If you want to control cache access by MAC address, you can enable ARP access control lists.

This option is only available on certain operating systems, since there is no standard method of findingthe ARP address of a host when you are connected at the TCP level. As of this writing, ARP acl listsonly work on Linux. If you are an operating system that can return this information to a user-levelprocess, use the --enable-arp-acl option to use MAC acls.

14.3.11 Inter-cache Commu nicationSquid includes multiple Inter-Cache communication protocols. By default, the original Inter-Cache protocol (ICP) is included in the source code. If you wish to include some of the less used protocols,you will need to include them at compile time. Inter-cache communication is covered in depth inchapter 8. For the initial install you should probably not enable these protocols, since they may not beused.

If you are planning on joining an existing hierarchy you should ask the hierarchy administrators as towhat protocols are supported or needed. If you are setting up a new hierarchy then you should onlyenable these after reading the above referenced chapter.

You cna enable the cache-digests with the --enable-cache-digests option, and the Hyper Text Caching Protocol (HTCP) with --enable-htcp.

- 35 -

14.3.9 Reducing time system-callsCompiling Squid

Page 38: Squid Guide

14.3.12 Keeping track of origin request hosts(? I have never used this function. I think that it may be used mainly by the NLANR caches. I need tofind out exactly what this is used for. This is my ’best guess’ in the meantime. ?)

When Squid caches forward requests on to a destination server (or, in fact, to a parent cache) it addsheaders to the request indicating both the origin IP of the requester and the IP address of the cache thatis doing the forwarding (it’s own IP). Squid can be configured to keep track of both of these headersfor access logs of incoming requests. If you have caches beneath yours, this logs the headers the clientcaches add.

This feature is only really useful if you are at the top of a hierarchy and want to see who the biggestusers of lower caches are. Currently, you can only access the data stored in this way with thecachemgr.cgi cgi program. (? not sure ?).

You probably don’t want to enable this option, but if you do, use the --enable-forw-via-db option.

14.3.13 Language selec tionWhen Squid is unable to fulfill a request, an error page is returned to the user with information onwhat went wrong. This page can be in the language of your choice. Squid already includes error pagesin quite a number of languages: for list of included languages, check the contents of the directory errors/ in the extracted source directory.

cache:~/src/squid-2.0.RELEASE> ls errors/Bulgarian Estonian Italian Russian-1251 list Czech French Makefile.in Russian-koi8-r Dutch German Polish Spanish English Hungarian Portuguese Turkish

The file ’list’ contains a list of files to edit, when creating your own language error files.

Unfortunately there are not versions of the config file in different languages - only the error messagesreturned to users have been translated. The language defaults to English if you do not specify alanguage.

To use a specific language, replace language-name in the below text with something like Bulgarian. enable-err-language=language-name

14.4 Running config ureNow that you have decided which options to use, it’s time to run configure. Here’s an example:

./configure --enable-err-language=Bulgarian --prefix=/usr/local

Running ./configure with the options that you have chosen should go smoothly. In the unlikely eventthat configure returns with an error message, here are some suggestions that may help.

- 36 -

14.4 Running configure

Page 39: Squid Guide

14.4.1 Broken compil ersThe most common problem for new installers is that there is a problem with the installed compiler (orthe headers) for the system.

To test this theory simply try and run configure with no options at all. If you still get an error messageit is almost certainly a compiler or header file problem.

To make sure try and compile a program that uses some of the less used system calls and see if thiscompiles.

If your compiler doesn’t compile files correctly, you might want to check if he header files exist, and ifthey do, permissions on the directory and the include files themselves.

If you have installed GCC in a non-standard directory, or if you are cross compiling, you may need configure to append options to the GCC command it uses during it’s tests. You can get configure toappend options to the GCC command line by setting the ’CFLAGS’ shell variable prior to running configure. If, for example, you compiler only works when you you modify the default include direc-tory, you can get configure to append that option to the default command line with a (Bourne Shell)command like:

CFLAGS=-I/usr/people/staff/oskar/gcc/includeexport CFLAGS

14.4.2 Incom pat ible OptionsSome configuration options exclude the use of others. This is another common cause of problems. Totest this you should just try and run configure without any options at all, and see if the problem disap-pears. If so, you can try and work out which option is causing the conflict by adding each option to the configure command line one-by-one. You may find that you have to choose between two options (forexample Async-IO and the DL-Malloc routines). In this case you may have to decide which of theoptions is the most important in your setup.

14.5 Compil ing the Squid SourceNow that you have configured Squid, you need to make the Squid binaries. You should simply have torun make in the extracted source directory, and a binary will be created as src/squid.

cache:/ # cd /usr/local/squid/src/squid-2.2.RELEASEcache:/usr/local/squid/src/squid-2.2.RELEASE # make

If the compilation fails, it may be because of conflicting configure options as described in the config-ure section. Follow the same instructions described there to find the offending option. (You should run make clean between configure runs, to ensure that old binaries are removed) As a start, try running configure without any options at all and then see if make completes. If this works, try additional configure options one at a time to see which one causes the problem.

- 37 -

14.5 Compiling the Squid SourceCompiling Squid

Page 40: Squid Guide

14.6 Installing the Squid binaryThe make command creates the binary, but doesn’t install it.

Running make install creates the /usr/local/squid/bin and /usr/local/squid/etc subdirectories, andcopies the binaries and default config files in the appropriate directories. Permissions may not be setcorrectly, but we will work through all created directories and set them up correctly shortly.

This command also copies the relevant config files into the default directories. The standard config fileincluded with the source is placed in the etc subdirectory, as are the mime.types file and the defaultSquid MIB file (squid.mib).

If you are upgrading (or reinstalling), make install will overwrite binary files in the bin directory, butwill not overwrite your painfully manipulated configuration files. If the destination configuration fileexists, make install will instead create a file called filename.default. This allows you to check if usefuloptions have been added by comparing config files.

If all has gone well you should have a fully installed (but unconfigured) Squid system setup.

Congratulations!

Prev Home NextGetting Squid Up Squid Configuration Basics

- 38 -

14.6 Installing the Squid binary

Page 41: Squid Guide

Squid: A User’s GuidePrev Next

15 Chapter 4. Squid Config uration BasicsTable of Contents Version Control Systems The Configuration File Setting Squid’s HTTP Port Email for the Cache Administrator Effective User and Group ID Access Control Lists and Access Control Operators Communicating with other proxy servers

The first high-performance proxy-cache program was developed as part of the Harvest project. TheHarvest project was an NSF (?check this info for accuracy?) funded project to create a web indexingsystem. Part of this project included writing a high-performance cache daemon, or cached(pronounced "Cache-Dee") to speed the re-indexing of pages. Once the project was completed the cached source code was used as the basis for many commercial cache servers, as the source was freely available. Many of the cached developers moved on to or formed companies that developed commer-cial cache software.

I remember first installing cached: I was boggled at the number of options in the configuration file. Itried working through the options from top to bottom, deciding which to change and which to leave. Ihad no idea what they all meant. As I worked though the file, I figured more and more options out,though others remained mysteries.

After a lot of changes I tried to start cached, and had no luck. It spat out loads of errors, and I couldn’tconnect to the machine with my web browser at all. I had no idea what the real problem was - and Ichanged more and more options with time. This simply buried the real problem beneath hundreds ofother possible problems.

Though Squid is now easier to install, the lessons I learned then are still relevant. The default configu-ration file is probably right for 90% of installations - once you have Squid running, you should changethe configuration file one option at a time. Don’t get over-ambitious in your changes quite yet! Leavethings like refresh rules until you have experimented with the basic options - what port you want yourto accept requests on, what user to run as, and where to keep cached pages on your drives.

So that you can get Squid running, this chapter works through the basic Squid options, giving you background information and introducing you to some of the basic concepts. In later chapters you’llmove on to more advanced topics.

The Squid config file is not arranged in the order as this book. The config file also does not progressfrom basic to advanced config options in any specific order, but instead consists of related sections,with all hierarchy settings in a specific section of the file, all access controls in another and so forth.

- 39 -

15 Chapter 4. Squid Configuration BasicsSquid Configuration Basics

Page 42: Squid Guide

To make changes detailed in this chapter you are going to have to skip around in the config file a bit.It’s probably easiest to simply search for the options discussed in each subsection of this chapter, but ifyou have some time it will be best if you read through the config file, so that you have an idea of howsections fit together.

The chapter also points out options that may have to be changed on the other 10% of machines. If youhave a firewall, for example, you will almost certainly have to configure Squid differently to someonethat doesn’t.

16 Version Control SystemsI recommend that you put all Squid configuration files and startup scripts under revision control. Ifyou are like me, you love to play with new software. You change an option, get the program to re-readthe configuration file, and see what difference it makes. By repeating this process, I learn what eachoption does, and at the same time I gain experience, and discover why the program is written the wayit is. Quite often configuration files make no sense until you discover the overall structure of the underlying program.

The best way for you to understand each of the options in the Squid config file (and to understandSquid itself) is to experiment with the multitude of options. At some stage in the experimentationstage, you will find that you break something. It’s useful to be able to revert to a previous version (orsimply to be reminded what changes you have made).

Many readers will already have used a Revision Control System. The RCS system is included withmany Unix systems, and source is freely available. For the few that haven’t used RCS, however, it’sworth including some pointers to some manual pages:

ci(1)co(1) rcs(1) rcsdiff(1) rlog(1)

One of the wonders of Unix is the ability to create scripts which reduce the number of commands thatyou have to type to get something done. I have a short script on all the machines I maintain called rvi.Using rvi instead of vi allows me to use one command to edit files under RCS (as opposed to the customary four). Put this file somewhere in your path and make it executable chmod +x rvi . You canthen simply use a command like rvi squid.conf to edit files that are under revision control. This is alot quicker than running each of the co, rcsdiff and ci commands.

#!/bin/shco -l $1 $VISUAL $1 rcsdiff -u $1 ci -u $1

Prev Home NextCompiling Squid The Configuration File

- 40 -

16 Version Control Systems

Page 43: Squid Guide

Squid: A User’s GuidePrev Chapter 4. Squid Configuration Basics Next

17 The Config uration FileAll Squid configuration files are kept in the directory /usr/local/squid/etc. Though there is more thanone file in this directory, only one file is important to most administrators, the squid.conf file. Thoughthere are (as of this writing) one hundred and twenty five option tags in this file, you should only needto change eight options to get Squid up and running. The other one hundred and seventeen optionsgive you amazing flexibility, but you can learn about them once you have Squid running, by playingwith the options or by reading the descriptions in chapter 10.

Squid assumes that you wish to use the default value if there is no occurrence of a tag in the squid.conffile. Theoretically, you could even run Squid with a zero length configuration file.

The remainder of this chapter works through the options that you may need to change to get Squid torun. Most people will not need to change all of these settings. You will need to change at least one partof the configuration file though: the default squid.conf denies access to all browsers. If you don’tchange this, Squid will not be very useful!

Prev Home NextSquid Configuration Basics Up Setting Squid’s HTTP Port

- 41 -

17 The Configuration FileThe Configuration File

Page 44: Squid Guide

Squid: A User’s GuidePrev Chapter 4. Squid Configuration Basics Next

18 Setting Squid’s HTTP PortThe first option in the squid.conf file sets the HTTP port(s) that Squid will listen to for incomingrequests.

Network services listen on particular ports. Ports below 1024 can only be used by the system adminis-trator, and are used by programs that provide basic Internet services: SMTP, POP, DNS and HTTP(web). Ports above 1024 are used for untrusted services (where a service does not run as administra-tor), and for transient connections, such as outgoing data requests.

Typically, web servers listen for incoming web requests (using the HyperText Transfer Protocol -HTTP) on port 80.

Squid’s default HTTP port is 3129. Many people run their cache servers on a port which is easier to remember: something like 80 or 8080). If you choose a low-numbered port, you will have to startSquid as root (otherwise you are considered untrusted, and you will not be able to start Squid. ManyISPs use port 8080, making it an accepted pseudo-standard.

If you wish, you can use multiple ports appending a second port number to the http_port variable.Here is an example:

http_port 3128 8080

It is very important to refer to your cache server with a generic DNS name. Simply because you onlyhave one server now does not mean that you should not plan for the future. It is a good idea to setup aDNS hostname for your proxy server. Do this right away! A simple DNS entry can save many hoursfurther down the line. Configuring client machines to access the cache server by IP address is askingfor a long, painful transition down the road. Generally people add a hostname like cache.mydo-main.com to the DNS. Other people prefer the name proxy, and create a name like proxy.mydo-main.com.

18.1 Using Port 80HTTP defines the format of both the request for information and the format of the server response. Thebasic aspects of the protocol are quite straight forward: a client (such as your browser) connects to port80 and asks for the file by supplying the full path and filename that it wishes to download. The clientalso specifies the version of the HTTP protocol it wishes to use for the retrieval.

With a proxy request the format is only a little different. The client specifies the whole URL instead ofjust the path to the file. The proxy server then connects to the web server specified in the URL, andsends a normal HTTP request for the page. (? The format of HTTP requests is described in more detailin chapter 4, where you type in an HTTP request, just as a browser would send it to test that the cacheis responding to requests - may use the ’client’ program instead.?)

- 42 -

18 Setting Squid’s HTTP Port

Page 45: Squid Guide

Since the format of proxy requests is so similar to a normal HTTP request, it is not especially surpris-ing that many web servers can function as proxy servers too. Changing a web server program to func-tion as a proxy normally involves comparatively small changes to the code, especially if the code iswritten in a modular manner - as is the Apache web server. In many cases the resulting server is not asfast, or as configurable, as a dedicated cache server can be.

The CERN web server httpd was the first widely available web proxy server. The whole WWWsystem was initially created to give people easy access to CERN data, and CERN HTTPD was thus thede-facto test-bed for new additions to the initial informal HTTP specification. Most (and certainly atone stage all) of the early web sites ran the CERN server. Many system administrators who wanted aproxy server simply used their standard CERN web server (listening on port 80) as their proxy server,since it could function as one. It is easy for the web server to distinguish a web site request from anormal web page request, since it simply has to check if the full URL is given instead of simply a pathname. Given the choice (even today) many system administrators would choose port 80 as their proxyserver port simply as ’port 80 is the standard port for web requests’.

There are, however, good reasons for you to choose a port other than 80.

Running both services on the same port meant that if the system administrator wanted to install a different web server package (for extra features available in the new software) they would be limitedto software that could perform both as a web server and as a proxy. Similarly, if the same sysadminfound that their web server’s low-end proxy module could not handle the load of their ever-expandinglocal client base, they would be restricted to a proxy server that could function as a web server. Theonly other alternative is to re-configure all the clients, which normally involves spending a few days apologizing to users and helping them through the steps involved in changing over.

Microsoft use the Microsoft web server (IIS) as a basis for their proxy server component, andMicrosoft proxy thus only (? tried once - let’s see if it’s changed since ?) accepts incoming proxyrequest on port 80. If you are installing a Squid system to replace either CERN, Apache or IIS runningin both web-server and cache-server modes on the same port, you will have to set http_port to 80.Squid is written only as a high-performance proxy server, so there is no way for it to function as a webserver, since Squid has no support for reading files from a local disk, running CGI scripts and so forth.There is, however, a workaround.

If you have both services running on the same port, and you cannot change your client PC’s, do notdespair. Squid can accept requests in web-server format and forward them to another server. If youhave only one machine, and you can get your web server software to accept incoming requests on anon-default port (for example 81), Squid can be configured to forward incoming web requests to thatport. This is called accelerator mode (since it’s initial purpose was to speed up very slow web servers).Squid effectively does some translation on the original request, and then simply acts as if the requestwere a proxy request and connects to the host: the fact that it’s not a remote host is irrelevant. Acceler-ator mode is discussed in more detail in chapter 9. Until then, get Squid installed and running onanother port, and work your way through the first couple of chapters of this book, until you have aworking pilot-phase system. Once Squid is stable and tested you can move on to changing web serversettings. If you feel adventurous, however, you can skip there shortly!

- 43 -

18.1 Using Port 80Setting Squid’s HTTP Port

Page 46: Squid Guide

18.1.1 Where to Store Cached DataCached Data has to be kept somewhere. In the section on hardware sizing, we discussed the size andnumber of drives to use for caching. Squid cannot autodetect where to store this data, though, so youneed to let Squid know which directories it can use for data storage.

The cache_dir operator in the squid.conf file is used to configure specific storage areas. If you usemore than one disk for cached data, you may need more than one mount point (for example /usr/local/squid/cache1 for the first disk, /usr/local/squid/cache2 for the second). Squid allows you tohave more than one cache_dir option in your config file.

Let’s consider only one cache_dir entry in the meantime. Here I am using the default values from the standard squid.conf.

cache_dir /usr/local/squid/cache/ 100 16 256

The first option to the cache_dir tag sets the directory where data will be stored. The prefix valuesimply has /cache/ tagged onto the end and it’s used as the default directory. This directory is alsomade by the make install command that we used earlier.

The next option to cache_dir is straight forward: it’s a size value. Squid will store up to that amountof data in that directory. The value is in megabytes, so of the cache store. The default is 100megabytes.

The other two options are more complex: they set the number of subdirectories (first and second tier)to create in this directory. Squid makes lots of directories and stores a few files in each of them in anattempt to speed up disk access (finding the correct entry in a directory with one million files in it isnot efficient: it’s better to split the files up into lots of smaller sets of files... don’t worry too muchabout this for the moment). I suggest that you use the default values for these options in the meantime: if you have a very large cache store you may want to increase these values, but this is covered inthe section on

Prev Home NextThe Configuration File Up Email for the Cache Administra-

tor

- 44 -

18.1.1 Where to Store Cached Data

Page 47: Squid Guide

Squid: A User’s GuidePrev Chapter 4. Squid Configuration Basics Next

19 Email for the Cache Admin is tra torIf Squid dies email is sent to the address specified with the cache_mgr tag. This address is alsoappended to the end of error pages returned to users if, for example, the remote machine is unreach-able.

Prev Home NextSetting Squid’s HTTP Port Up Effective User and Group ID

- 45 -

19 Email for the Cache AdministratorEmail for the Cache Administrator

Page 48: Squid Guide

Squid: A User’s GuidePrev Chapter 4. Squid Configuration Basics Next

20 Effec tive User and Group IDSquid can only bind to low numbered ports (such as port 80) if it is started as root. Squid is normallystarted by your system’s rc scripts when the machine boots. Since these scripts run as root, Squid isstarted as root at bootup time.

Once Squid has been started, however, there is no need to run it as root. Good security practice is torun programs as root only when it’s absolutely necessary, and for this reason Squid changes user andgroup ID’s once it has bound to the incoming network port.

The cache_effective_user and cache_effective_group tags tell Squid what ID’s to change to. TheUnix security system would be useless if it allowed all users to change their ID’s at will, so Squid onlyattempts to change ID’s if the main program is started as root.

If you do not have root access to the machine, and are thus not starting Squid as root, you can simplyleave this option commented out. Squid will then run with whatever user ID starts the actual Squidbinary.

As discussed in chapter 2, this book assumes that you have created both a squid user and a squid groupon your cache machine. The above tags should thus both be set to "squid". Example 4-1. Effective User and Group IDs

cache_effective_user squidcache_effective_group squid

20.1 FTP login infor mationSquid can act as a proxy server for various Internet protocols. The most commonly used protocol isHTTP, but the File Transfer Protocol (FTP) is still alive and well.

FTP was written for authenticated file transfer (it requires a username and password). To providepublic access, a special account is created: the anonymous user. When you log into an FTP server youuse this as your username. As a password you generally use your email address. Most browsers thesedays automatically enter a useless email address.

It’s polite to give an address that works, though. If one of your users abuses a site, it allows the siteadmin get hold of you easily.

Squid allows you to set the email address that is used with the ftp_user tag. You should probablycreate a [email protected] email address specifically for people to contact you on.

There is another reason to enter a proper address here: some servers require a real email address. Foryour proxy to log into these ftp servers you will have to enter a real email address here.

- 46 -

20 Effective User and Group ID

Page 49: Squid Guide

Prev Home NextEmail for the Cache Administra-tor

Up Access Control Lists and AccessControl Operators

- 47 -

20.1 FTP login informationEffective User and Group ID

Page 50: Squid Guide

Squid: A User’s GuidePrev Chapter 4. Squid Configuration Basics Next

21 Access Control Lists and Access Control Operators

Squid could not be used in an ISP environment without a sophisticated access control system. Indeed,Squid should not be used in ANY environment without some kind of basic authentication system. It isamazing how fast other Internet users will find out that they can relay requests through your cache, andthen proceed to do so.

Why? Sometimes to obfusticate their real identity, and other times since they have a fast line to you,but a slow line to the remainder of the Internet.

21.1 Simple Access ControlIn many cases only the most basic level of access control is needed. If you have a small network, anddo not wish to use things like user/password authentication or blocking by destination domain, youmay find that this small section is sufficient for all your access control setup. If not, you should readchapter 6, where access control is discussed in detail.

The simplest way of restricting access is to only allow IPs that are on your network. If you wish to implement different access control, it’s suggested that you put this in place later, after Squid isrunning. In the meantime, set it up, but only allow access from your PC’s IP address.

Example access control entries are included in the default squid.conf. The included entries should helpyou avoid some of the more obscure problems, such as bandwidth-chewing loops, cache tunnelingwith SSL CONNECTs and other strange access problems. In chapter 6 we work through the configfile’s default config options, since some of them are pretty complex.

Access control is done on a per-protocol basis: when Squid accepts an HTTP request, the list of HTTPcontrols is checked. Similarly, when an ICP request is accepted, the ICP list is checked before a replyis sent.

Assume that you have a list of IP addresses that are to have access to your cache. If you want them tobe able to access your cache with both HTTP and ICP, you would have to enter the list of IP addressestwice: you would have lines something like this: Example 4-2. Theoretical Access List

http_access deny 10.0.1.0/255.255.255.0http_access allow 10.0.0.0/255.0.0.0icp_access allow 10.0.0.0/255.0.0.0

Rule sets like the above are great for small organisations: they are straight forward.

For large organizations, though, things are more convenient if you can create classes of users. You canthen allow or deny classes of users in more complex relationships. Let’s look at an example like this,where we duplicate the above example with classes of users:

- 48 -

21 Access Control Lists and Access Control Operators

Page 51: Squid Guide

Example 4-3. Access Lists using Classes

# classesacl mynetwork src 10.0.0.0/255.0.0.0acl servernet src 10.0.1.0/255.255.255.0# what HTTP access to allow classeshttp_access deny servernethttp_access allow mynet# what ICP access to allow classesicp_access deny serverneticp_access allow mynet

Sure, it’s more complex for this example. The benefits only become apparent if you have large accesslists, or when you want to integrate refresh-times (which control how long objects are kept) and thesources of incoming requests. I am getting quite far ahead of myself, though, so let’s skip back.

We need some terminology to discuss access control lists, otherwise this could become a rather longchapter. So: lines beginning with acl are (appropriately, I believe) acl lines. The lines that use theseacls (such as http_access and icp_access in the above example) are called acl-operators. An acl-opera-tor can either allow or deny a request.

So, to recap: acls are used to define classes. When Squid accepts a request it checks the list of acl-operators specific to the type of request: an HTTP request causes the http_access lines to bechecked; an ICP request checks the icp_access lists.

Acl-operators are checked in the order that they occur in the file (ie from top to bottom). The frst acl-operator line that matches causes Squid to drop out of the acl list. Squid will not check through all acl-operators if the first denies the request.

In the previous example, we used a src acl: this checks that the source of the request is within thegiven IP range. The src acl-type accepts IP address lists in many formats, though we used thesubnet/netmask in the earlier example. CIDR (Classless Internet Domain Routing) notation can also beused here. Here is an example of the same address range in either notation: Example 4-4. CIDR vs Netmask Source-IP Notation

acl mynet1 src 10.1.0.0/255.0.0.0acl mynet2 src 10.2.0.0/16

Access control lists inherit permissions when there is no matching acl If all acl-operators in the file arechecked, and no match is found, the last acl-operator checked determines whether the request isallowed or denied. This can be confusing, so it’s normally a good idea to place a final "catch-all" acl-operator at the end of the list. The simplest way to create such an operator is to create an acl thatmatches any IP address. This is done with a src acl with a netmask of all 0’s. When the netmask arith-metic is done, Squid will find that any IP matches this acl.

Your cache server may well be on the network placed in the relevant allow lists on your cache, and ifyou were thus to run the client on the cache machine (as opposed to another machine somewhere onyour network) the above acl and http_access rules would allow you to test the cache. In many cases,however, a program running on the cache server will end up connecting to (and from) the address’127.0.0.1’ (also known as localhost). Your cache should thus allow requests to come from the address127.0.0.1/255.255.255.255. In the below example we don’t allow icp requests from the localhostaddress, since there is no reason to run two caches on the same machine.

- 49 -

21.1 Simple Access ControlAccess Control Lists and Access Control Operators

Page 52: Squid Guide

The squid.conf file that comes with Squid includes acls that deny all HTTP requests. To use yourcache, you need to explicitly allow incoming requests from the appropriate range. The squid.conf fileincludes text that reads:

## INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS #

To allow your client machines access, you need to add rules similar to the below in this space. Thedefault access-control rules stop people exploiting your cache, it’s best to leave them in. Example 4-5. Example Complete ACL list

## INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS## acls for my network addressesacl my-iplist-1 src 192.168.1.0/24acl my-iplist-2 src 10.0.0.0/255.255.0.0# Check that requests are from users on our networkhttp_access allow my-iplist-1http_access allow my-iplist-2icp_access allow my-iplist-1icp_access allow my-iplist-2# allow requests from the local machine (for testing and the like)http_access allow localhost# End of locally-inserted ruleshttp_access deny all

21.2 Ensur ing Direct Access to Inter nal MachinesAcl-operator lines are not only used for authentication. In an earlier section we discussed communica-tion with other cache servers. Acl lines are used to ensure that requests for specific URLs are handledby your cache, not passed on to another (further away) cache.

If you don’t have a parent cache (a firewall, or you have a parent ISP cache) you can probably skipthis section.

Let’s assume that you connect to your ISP’s cache server as a parent. A client machine (on your localnetwork) connects to your cache and requests http://www.yourdomain.example/. Your cache serverwill look in the local cache store. If the page is not there, Squid wil will connect to it’s configuredparent (your ISP’s cache: across your serial link), and request the page from there. The problem,though, is that there is no need to connect across your internet line: the web server is sitting a few feetfrom your cache in the machine room.

Squid cannot know that it’s being very inefficient unless you give it a list of sites that are "near by".This is not the only way around this problem though: your browser could be configure to ignore thecache for certain IPs and domains, and the request will never reach the cache in the first place.Browser config is covered in Chapter 5, but in the meantime here is some info on how to configureSquid to communicate directly with internal machines.

The acl-operators always_direct and never_direct determine whether to pass the connection to aparent or to proceed directly.

- 50 -

21.2 Ensuring Direct Access to Internal Machines

Page 53: Squid Guide

The following is a set of operators are based on the final configuration created in the previous section,but using never_direct and always_direct operators. It is assumed that all servers that you wish toconnect to directly are in the address ranges specified in with the my-iplist directives. In some casesyou may run a web server on the same machine as the cache server, and the localhost acl is thus also considered local.

The always_direct and never_direct tags are covered in more detail in Chapter 7, where we cover hierarchies in detail. Example 4-6. Using always and never_direct

# acls for my network addressesacl my-iplist-1 src 192.168.1.0/24acl my-iplist-2 src 10.0.0.0/255.255.0.0# Various programs running on the cache box connect to Squid, so it’s# useful to allow connections from the localhost address.acl localhost src 127.0.0.1/255.255.255.255# used to deny all requests: Since the netmask is all 0’s, any request# matches this aclacl all src 0.0.0.0/0.0.0.0# Check that requests are from users on our networkhttp_access allow my-iplist-1http_access allow my-iplist-2icp_access allow my-iplist-1icp_access allow my-iplist-2# check the localhost acl as a special casehttp_access allow localhost# If the requests comes from any other IP, deny all access.http_access deny all# always go direct to local machinesalways_direct allow my-iplist-1always_direct allow my-iplist-2# never go direct to other hostsnever_direct allow all

Squid always attempts to cache pages. If you have a large Intranet system, it’s a waste of cache storedisk space to cache your Intranet. Controlling which URLs and IP ranges not to cache are covered indetail in chapter 6, using the no_cache acl operator.

Prev Home NextEffective User and Group ID Up Communicating with other proxy

servers

- 51 -

21.2 Ensuring Direct Access to Internal MachinesAccess Control Lists and Access Control Operators

Page 54: Squid Guide

Squid: A User’s GuidePrev Chapter 4. Squid Configuration Basics Next

22 Commu nicat ing with other proxy serversSquid supports the concept of a hierarchy of proxies. If your proxy does not have an object on disk,it’s default action is to connect to the origin web server and retrieve the page. In a hierarchy, yourproxy can communicate with other proxies (in the hope that one of these servers will have the relevantpage). You will, obviously, only peer with servers that are ’close’ to you, otherwise you would end upslowing down access. If access to the origin server is faster than access to neighboring cache servers itis not a good idea to get the page from the slower link!

Having the ability to treat other caches as siblings is very useful in some interactions. For example: ifyou often do business with another company, and have a permanent link to their premises, you can configure your cache to communicate with their cache. This will reduce overall latency: it’s almostcertainly faster to get the page from them than from the other side of the country.

When querying more than one cache, Squid does not query each in turn, and wait for a reply from thefirst before querying the second (since this would create a linear slowdown as you add more siblings,and if the first server stops responding, you would slow down all incoming requests). Squid thus sendsall ICP queries together - without waiting for replies. Squid then puts the client’s request on hold untilthe first positive reply from a sibling cache is received, and will retrieve the object from the fastest-replying cache server. Since the earliest returning reply packet is usually on the fastest link (andfrom the least loaded sibling server), your server gets the page fast.

Squid will always get the page from the fastest-responding cache - be it a parent or a sibling.

The cache_peer option allows you to specify proxy servers that your server is to communicate with.The first line of the following example configures Squid to query the cache machine cache.mypar-ent.example as a parent. Squid will communicate with the parent on HTTP port 3128, and will use ICPto query the server using port 3130. Configuring Squid to query more than one server is easy: simplyadd another cache_peer line. The second line configures cache.sibling.example as a sibling, listeningfor HTTP request on port 8080 and ICP queries on port 3130.

cache_peer cache.myparent.example parent 3128 3130cache_peer cache.sibling.example sibling 8080 3130

If you do not wish to query any other caches, simply leave all cache_peer lines commented out: thedefault is to talk directly to origin servers.

Cache peering and hierarchy interactions are discussed in quite some detail in this book. In some cases hierarchy setups are the most diffi cult part of your cache setup process (especially in a distributed environment like a nationwide ISP). In depth discussion of hierarchies is beyond the scope of thischapter, so much more information is given in chapter 8. There are cases, where you need at least one hierarchy line to get Squid to work at all. This section covers the basics, just for those setups.

- 52 -

22 Communicating with other proxy servers

Page 55: Squid Guide

You only need to read this material if one of the following scenarios applies to you:

You have to use your Internet Service Provider’s cache.

You have a firewall.

22.1 Your ISP’s cacheIf you have to use your Internet Service Provider’s cache, you will have to configure Squid to querythat machine as a parent. Configuring their cache as a sibling would probably return error pages forevery URL that they do not already have in their cache.

Squid will attempt to contact parent caches with ICP for each request. This is essentially a ping. Ifthere is no response to this request, Squid will attempt to go direct to the origin server. since (in thiscase, at least) you cannot bypass your ISP’s cache, you may want to reduce the latency added by thisextra query. To do this, place the default and no-query keywords at the end of your cache_peer line:

cache_peer cache.myisp.example parent 3128 3130 default no-query

The default option essentially tells Squid "Go through this cache for all requests. If it’s down, returnan error message to the client: you cannot go direct".

The no-query option gets Squid to ignore the given ICP port (leaving the port number out will returnan error), and never to attempt to query the cache with ICP.

22.2 Firewall Inter actionsFirewalls can make cache configuration hairy. Inter-cache protocols generally use packets which fire-walls inherently distrust. Most caches (Squid included) use ICP, which is a layer on top of UDP. UDPis diffi cult to make secure, and firewall administrators generally disable it if at all possible.

It’s suggested that you place your cache server on your DMZ (if you have one). There are a few advantages to this:

Your cache server is kept secure.

The firewall can be configured to hand off requests to the cache server, assuming it is capable.

You will be able to peer with other, outside, caches (like your ISP’s), since DMZ networks gener-ally have less rigid rule sets.

The remainder of this section should help you getting Squid and your firewall to co-operate. A fewcases are covered for each type of firewall: the cache inside the firewall; the cache outside the firewall;and, finally, on the DMZ.

22.2.1 Prox ying FirewallsThe vast majority of firewalls no nothing about ICP. If, on the other hand, your firewall does notsupport HTTP, it’s a good time to have a serious talk to the buyer that had an all-expenses-paidweekend on the firewall supplier. Configuring the firewall to understand ICP is likely to be painful,but HTTP should be easy.

- 53 -

22.1 Your ISP’s cacheCommunicating with other proxy servers

Page 56: Squid Guide

If you are using a proxy-level firewall, your client machines are probably configured to use the fire-wall’s internal IP address as their proxy server. Your firewall could also be running in transparentmode, where it automatically picks up outgoing web requests. If you have a fair number of clientmachines, you may not relish the idea of reconfiguring all of them. If you fall into this category, youmay wish to put your firewall on the outside (or on the DMZ) and configure the firewall to passrequests to the cache, rather than reconfiguring all client machines.

22.2.1.1 Inside

The cache is considered a trusted host, and is protected by the firewall. You will configure clientmachines to use the cache server in their browser proxy settings, and when a request is made, thecache server will pass the outgoing request to the firewall, treating the firewall as a parent proxyserver. The firewall will then, connect to the destination server. If you have a large number of clients configured to use the firewall as their proxy server, you could get the firewall to hand-off incomingHTTP requests back into the network, to the cache server. This is less efficient though, since the cachewill then have to re-pass these requests through the firewall to get to the outside, using the parentoption to cache_peer. Since the latter involves traffic passing through the firewall twice, your load isvery likely to increase. You should also beware of loops, with the cache server parenting to the fire-wall and the firewall handing-off the cache’s request back to the cache!

As described in chapter 1, Squid will also send ICP queries to parents. Firewalls don’t care for UDPpackets, and normally log (and then discard) such packets.

When Squid does not receive a response from a configured parent, it will mark the parent as down, andproceed to go directly.

Whenever Squid is setup to use a parent that does not support ICP, the cache_peer line should includethe "default" and "no-query" options. These options stop Squid from attempting to go direct when allcaches are considered down, and specify that Squid is not to send ICP requests to that parent.

Here is an example config entry:

cache_peer inside.fw.address.domain parent 3128 3130 default no-query

22.2.1.2 Outside

There are only two major reasons for you to put your cache outside the firewall:

One: Although squid can be configured to do authentication, this can lead to the duplication of effort(you will encounter the "add new staff to 500 servers" syndrome). If you want to continue to authenti-cate users on the firewall, you will have to put your cache on the outside or on the DMZ. The firewallwill thus accept requests from clients, authenticate them, and then pass them on to the cache server.

Two: Communicating with cache hierarchies is easy. The cache server can communicate with othersystems using any protocol. Sibling caches, for example, are diffi cult to contact through a proxying firewall.

You can only place your cache outside if your firewall supports hand-offs. Browsers inside willconnect to the firewall and request a URL, and the firewall will connect to the outside cache andrequest the page.

- 54 -

22.2.1 Proxying Firewalls

Page 57: Squid Guide

If you place your cache outside your firewall, you may find that your client PC’s have problems connecting to internal web servers (your intranet, for example, may be unreachable). The problem isthat the cache is unable to connect back through to your internal network (which is actually a goodthing: don’t change that). The best thing to do here is to add exclusions to your browser settings: this isdescribed in Chapter 5 - you should specifically have a look at the section on browser autoconfig. Inthe meantime, let’s just get Squid going, and we will configure browsers once you have a cache to talkto.

Since the cache is not protected by the firewall, it must be very carefully configured - it must onlyaccept requests from the firewall, and must not run any strange services. If possible, you shoulddisable telnet, and use something like SSH (Secure SHell) instead. The access control lists (which youwill setup shortly) must only allow the firewall, otherwise people will be able to relay their requeststhrough your cache, using your bandwidth.

If you place the cache outside the firewall, you client PC’s will be configured to use the firewall astheir proxy server (this is probably the case already). The firewall must be configured to hand-offclient HTTP requests to the cache server. The cache must be configured to only allow HTTP requestswhen from the firewall’s outside IP address. If not configured this way, other Internet users could useyour cache server as a relay, using your bandwidth and hardware resources for illegitimate (and possi-bly illegal) purposes.

With your cache server on the outside network, you should treat the machine as a completely untrustedhost, lest a cracker find a hole somewhere on the system. It is recommended that you place the cacheserver on a dedicated firewall network card, or on a switched ethernet port. This way, if your cacheserver were to be cracked, the cracker would only be able to read passing HTTP data. Since the major-ity of sensitive information is sent via email, this would reduce the potential for sensitive data loss.

Since your cache server only accepts requests from the firewall, there is no cache_peer line needed inthe squid.conf. If you have to talk to your ISP’s cache you will, of course, need one: see the section onthis a bit further back.

22.2.1.3 DMZ

The best place for a cache is your DMZ.

If you are concerned with the security of your cache server, and want to be able to communicate withoutside cache servers (using ICP), you may want to put your cache on the DMZ.

With Squid on your DMZ, internal client PCs are setup to proxy to the firewall. The firewall is then responsible for handing-off these HTTP requests to the cache server (so the firewall in fact treats thecache server as a parent).

Since your cache server is (essentially) on the outside of the firewall, the cache doesn’t need to treatthe firewall as a parent or sibling: it only accepts requests from the firewall: it never passes them to the firewall.

If your cache is outside your firewall, you will need to configure your client PC’s not to use the fire-wall as a proxy server for internal hosts. This is quite easy, and is discussed in the chapter on browser configuration.

- 55 -

22.2.1 Proxying FirewallsCommunicating with other proxy servers

Page 58: Squid Guide

Since the firewall is acting as a filter between your cache and the outside world, you are going to haveto open up some ports on the firewall. The cache will need to be able to connect to port 80 on anymachine on the outside world. Since some valid web servers will run on ports other than 80, youshould consider allowing connections to any port from the cache server. In short, allow connections to:

Port 80 (for normal HTTP requests)

Port 443 (for HTTPS requests)

Ports higher than 1024 (site search engines often use high-numbered ports)

If you are going to communicate with a cache server outside the firewall, you will need even moreports opened. If you are going to communicate with ICP, you will need to allow UDP traffic from andto your cache machine on port 3130. You may find that the cache server that you are peering with uses different ports for reply packets. It’s probably a bad idea to open all UDP traffic, though.

22.2.2 Packet Filter ing fire wallsSquid will normally live on the inside of your packet-filtering firewall. If you have a DMZ, it may bebest to put your cache on this network, as you may want to allow UDP traffic to and from the cacheserver (to communicate with other caches).

To configure your firewall correctly, you should make the minimum number of holes in your filter set.In the remainder of this section we assume that your internal machines can connect to the cache server unimpeded. If your cache is on the DMZ (or outside the firewall altogether) you will need to allowTCP connections from your internal network (on a random source port) to the HTTP port that Squidwill be accepting requests on (this is the port that you set a bit earlier, in the "Setting Squid’s HTTPPort" section of this chapter.

First, let’s consider the firewall setup when you do not query any outside caches. On accepting arequest, Squid will attempt to connect to a machine on the Internet at large. Almost always, the desti-nation port will be the default HTTP port, port 80. A few percent of the time, however, the request willbe destined for a high-numbered port (any port number higher than 1023 is a high-numbered port).Squid always sources TCP requests from a high-numbered port, so you will thus need to allow TCPrequests (all HTTP is TCP-based) from a random high-numbered port to both port 80 and anyhigh-numbered port.

There is another low-numbered port that you will probably need to open. The HTTPS port (used forsecure Internet transactions) is normally listening on TCP port 443, so this should also be opened.

In the second situation, let’s look at cache-peering. If you are planning to interact with other caches,you will need to open a few more ports. First, let’s look at ICP. As mentioned previously, ICP isUDP-based. Almost all ICP-compliant caches listen for ICP requests on UDP port 3130. Squid willalways source requests from port 3130 too, though other ICP-compliant caches may source theirrequests from a different port.

It’s probably not a good idea to allow these UDP packets no matter what source address they comefrom. Your filter should probably specify the IP addresses for each of the caches that you wish to peerfrom, rather than allowing UDP packets from any source address. That should be it: You should nowbe able to save the config file, and get ready to start the Squid program.

- 56 -

22.2.2 Packet Filtering firewalls

Page 59: Squid Guide

Prev Home NextAccess Control Lists and AccessControl Operators

Up Starting Squid

- 57 -

22.2.2 Packet Filtering firewallsCommunicating with other proxy servers

Page 60: Squid Guide

Squid: A User’s GuidePrev Next

23 Chapter 5. Start ing SquidTable of Contents Before Running Squid Running Squid Testing Squid

24 Before Running SquidBefore we can start Squid, we have to create a few directories on the system. It’s important that these directories have the correct permissions, otherwise someone with a login on the cache may be able togain root access. Let’s work through the default directory tree, and set the permissions on each direc-tory correctly. Since you may have special requirements, I won’t simply give you a sequence ofcommands to run: if you need to use different permissions, it’s important to understand the possible consequences.

24.1 Subdi rectory Permis sionsIn Chapter 2 we created a squid user and group, and created another group, squidadm for the peoplethat will maintain the cache. When Squid starts up, it changes it’s user and group ids to squid (thanksto the cache_effective_user and cache_effective_group tags in squid.conf.) Changing userids reducesthe chance of a complete exploit because of a bug in Squid. It’s important, however, to remember thatusers in the squidadm group can probably get root on your machine, so you should not put people thatdo not already have root on the machine in that group: it’s just so that you don’t have to su to root continuously.

24.1.1 System Dependant Infor mationIn the below examples, we assume that a root group exists on your system. This may not be the case:some of you may have to replace the root group (not the user id!) in the examples below with either wheel or bin.

There are many versions of chown that don’t use the "userid:groupid" notation we use in the following examples. On some systems the ":" may have to be replaced by a ".", on others you may have to run a separate "chgrp groupid" command to achieve the same end. See your system’s chown manpage formore information.

24.1.2 Walking the Direc tory TreeLet’s start with the /usr/local/squid/ directory. If your system starts Squid at bootup, a startup scriptrunning as root starts the program /usr/local/squid/bin/squid. If someone were to replace this binarywith a trojan, they could gain root access. The /usr/local/squid/ directory should be owned by root,group root, and should not be writable by anyone else. This stops someone from moving the entire bin

- 58 -

23 Chapter 5. Starting Squid

Page 61: Squid Guide

directory to (say) bin.off, and creating a new bin directory which contains their own squid binary. Usethe following commands to set the permissions on this directory correctly:

chown root:root /usr/local/squid/chmod 755 /usr/local/squid/

Since we have already introduced the /usr/local/squid/bin directory, let’s set it’s permissions correctlynext. If the directory itself was writeable by malicious users, we would have the same problem that wedescribed above. Let’s change it to be owned by root, group root, and make sure that only these rootcan write to the directory. We also need the files in this directory to be readable (and executable) by everyone, so that normal users can run programs like client. There are no setuid binaries in this direc-tory, and if the rest of the files have the correct permissions, there is no reason not to let users into this directory.

cd /usr/local/squid/binchown root:root . chown root:root * chmod 755 . *

Config files all live in the /usr/local/squid/etc/ directory. If a user can write to these files, they canalmost certainly do malicious things. Because of this, you should not let normal users edit these files:only users which already have root access should be allowed to edit squid.conf. Earlier in the book, wecreated a squidadm for these users.

The /usr/local/squid/etc/ directory should be owned by root, group squidadm, so that squid-administra-tors would be able to create and update config files.

Many of you will not have encountered chown commands which use more than three numbers before.The following command sets the sticky bit on the directory. Let’s assume that my primary group-id is staff (not squidadm.) On some systems, any file that I create will be owned by group staff, even if the directory is owned by the squidadm group. On these systems this would be a security problem: if Icreate the squid.conf file, people in the staff group may be able to make changes to the file.

With the sticky bit set on the directory, any files I create will be owned by the squidadm group. As Ihave said: this isn’t necessary on some operating systems, but these permissions shouldn’t have anyadverse effect.

cd /usr/local/squid/etcchmod 2775 . chown root:squidadm . *

When you use RCS (introduced in Chapter 2), the revision history of a file is stored in an RCS logfile.These files will normally be created in the current directory (the ci command appends a comma to the filename to decide the name of the logfile, leading to filenames like squid.conf,v.) If you don’t wantyour directory cluttered with these files, you can create an RCS directory, and move RCS files into it.The Revision Control System only stores logfiles in the current directory if an RCS directory doesn’texist, if one does, all new log files are created in it.

If someone can gain access to the log files, they essentially have write access to original file, sincewhen you check a file out (to make changes to it) the log file is considered to be the authoritive source.Don’t forget to change the permissions on the RCS log files Squid doesn’t create an RCS directory automatically; we create it in the example below.

- 59 -

24.1.2 Walking the Directory TreeStarting Squid

Page 62: Squid Guide

# first, make the RCS directorycd /usr/local/squid/etc mkdir RCS # move any RCS logfiles into the RCS directory, so that they don’t # clutter the config-file directory mv *,v RCS # make sure that the RCS directory is owned by the right people, and # can be writeable by them chown root:squidadm RCS chmod 2770 RCS # change the permissions of the files in the RCS directory to match # newly created files chown root:squidadm RCS/* chmod 770 RCS/*

Cache log files should be confidential. You (and other Squid administrators) may have to look at them occasionally, but other users should have no access to the files. Squid runs as the squid user, though,and needs to create the logs, so any directory we make needs to be writeable by the squid user too.

chown squid:squidadm /usr/local/squid/logschmod 770 /usr/local/squid/logs

24.1.3 Object Store Direc tory Permis sionsAs you may recall from Chapter 3, downloaded objects are placed in a hierarchy of swap directories.Squid.conf contains a cache_store line for each directory that files are to be stored in, and specifies thenumber of subdirectories that are to be created in each cache store (most people leave this at thedefault, 4096 directories per cache store.)

Squid’s -z command-line option will create the appropriate cache-swap directories (since creatingthem by hand would be painful!) If the top-level cache directory specified in squid.conf does not exist,Squid will attempt to create it too, as squid, group squid (or whatever cache_effec-tive_user/cache_effective_group is set to in squid.conf.) Since we changed the permissions on /usr/local/squid above so that only root can write to this directory, Squid’s directory create will fail.Instead, let’s create these directories and set their permissions manually.

If someone has read access to the cache logs, they can invade people’s privacy. It may seem harmlessto let people access the cache store indiscriminately, but I contend that it isn’t.

Many web accesses reveal something about the person examining the page, be it their sexuality, their financial status, or their job satisfaction. This is why we stop people accessing log files and finding outwho went to what pages, right? Well, it also means that we must stop people accessing the cache storedirectly. In many systems using cryptography, you can discover much about the nature of the contentsof the traffic by traffic analysis, relating traffic flows to other events. If people only have access to thecache using a browser, it diffi cult for them to associate any hits they see with a given person. If theycan examine other information about the object (say, the time that the file was created) they may beable to discover information about the person that requested the object. A simple example: let’s saythat someone connected to a job-search site at the middle of the night. You can immediately narrowdown the list of possible requestors to night-staff. If you can find out who was on duty on that day,you narrow the number of possible requestors even more.

- 60 -

24.1.3 Object Store Directory Permissions

Page 63: Squid Guide

Let’s change the permissions on the cache store so that only squid-administrators can access files in it.Note that you are going to have to repeat this process for every cache_dir in the squid.conf file.

mkdir /usr/local/squid/cache/chown squid:squidadm /usr/local/squid/cache/ chmod 770 /usr/local/squid/cache/

Once the permissions on the cache directories are set correctly, you can run squid -z.

Your output should look something like this:

cache1:~ # /usr/local/squid/bin/squid -z1999/06/12 19:15:34| Creating Swap Directories cache1:~ #

24.1.4 Prob lems Creat ing Swap Direc toriesIf you get permission denied errors running squid -z, double check that you are root, then check thatsquid.conf’s cache_effective_user and cache_effective_group tags are both set to squid.

What occasionally happens is that a directory further towards / (the directories /usr/local/ or /usr/local/squid/, for example) are not set as readable/executable by the squid user. Because the appro-priate user cannot access these directories, it cannot create files in the cache store directory. If you arestill having problems running squid -z, ensure that each of the directories /usr/local and /usr/local/squid have reasonable permissions.

Right:

cache1:~ # ls -ld /usr/local/drwxr-xr-x 10 root root 1024 May 1 10:31 /usr/local/

Wrong:

cache1:~ # ls -ld /usr/local/drwxr-x--- 10 root root 1024 May 1 10:31 /usr/local/

Use the chmod o+rx /usr/local/ command to make the directory readable and executable by everyone.

Prev Home NextCommunicating with other proxy servers

Running Squid

- 61 -

24.1.4 Problems Creating Swap DirectoriesStarting Squid

Page 64: Squid Guide

Squid: A User’s GuidePrev Chapter 5. Starting Squid Next

25 Running SquidSquid should now be configured, and the directories should have the correct permissions. We shouldnow be able to start Squid, and you can try and access the cache with a web browser. Squid isnormally run by starting the RunCache script. RunCache (as mentioned ealier) restarts Squid if it diesfor some reason, but at this stage we are merely testing that it will run properly: we can add it tostartup scripts at a later stage.

Programs which handle network requests (such as inetd and sendmail) normally run in the back-ground. They are run at startup, and log any messages to a file (instead of printing it to a screen or terminal, as most user-level programs do.) These programs are often referred to as daemon programs.Squid is such a program: when you run the squid binary, you should be immediately returned to thecommand line. While it looks as if the program ran and did nothing, it’s actually sitting in the back-ground waiting for incoming requests. We want to be able to see that Squid’s actually doing somethinguseful, so we increase the debug level (using -d 1) and tell it not to dissapear into the background(using -N.) If your machine is not connected to the Internet (you are doing a trial squid-install on yourhome machine, for example) you should use the -D flag too, since Squid tries to do DNS lookups for afew common domains, and dies with an error if it is not able to resolve them.

The following output is that printed by a default install of Squid:

cache1:~ # /usr/local/squid/bin/squid -N -d 1 -D

Squid reads the config file, and changes user-id’s here:

1999/06/12 19:16:20| Starting Squid Cache version 2.2.DEVEL3 for i586-pc-linux-gnu...1999/06/12 19:16:20| Process ID 4121

Each concurrent incoming request uses at least one filedescriptor. 256 filedescriptors is only enoughfor a small, lightly loaded cache server, see Chapter 12 for more details. Most of the following is diag-nostic:

1999/06/12 19:16:20| With 256 file descriptors available1999/06/12 19:16:20| helperOpenServers: Starting 5 ’dnsserver’ processes 1999/06/12 19:16:20| Unlinkd pipe opened on FD 13 1999/06/12 19:16:20| Swap maxSize 10240 KB, estimated 787 objects 1999/06/12 19:16:20| Target number of buckets: 15 1999/06/12 19:16:20| Using 8192 Store buckets, replacement runs every 10 seconds 1999/06/12 19:16:20| Max Mem size: 8192 KB 1999/06/12 19:16:20| Max Swap size: 10240 KB 1999/06/12 19:16:20| Rebuilding storage in Cache Dir #0 (DIRTY)

When you connect to an ftp server without a cache, your browser chooses icons to match the filesbased on their filenames. When you connect through a cache server, it assumes that the page returnedwill be in html form, and will include tags to load any images so that the directory listing looks

- 62 -

25 Running Squid

Page 65: Squid Guide

normal. Squid adds these tags, and has a collection of icons that it refers clients to. These icons arestored in /usr/local/squid/etc/icons/. If Squid has permission problems here, you need to make surethat these files are owned by the appropriate users (in the previous section we set permissions on thefiles in this directory.)

1999/06/12 19:16:20| Loaded Icons.

The next few lines are the most important. Once you see the Ready to serve requests line, you shouldbe able to start using the cache server. The HTTP port is where Squid is waiting for browser connec-tions, and should be the same as whatever we set it to in the previous chapter. The ICP port should be3130, the default, and if you have included other protocols (such as HTCP) you should see them here.If you see permission denied errors here, it’s possible that you are trying to bind to a low-numberedport (like 80) as a normal user. Try run the startup command is root, or (if you don’t have root accesson the machine) choose a high-numbered port. Another common error message at this stage is Addressalready in use. This occurs when another process is already listening to the given port. This could bebecause Squid is already started (perhaps you are upgrading from an older version which is beingrestarted by the RunCache script) or you have some other process listening on the same port (such as aweb server.)

1999/06/12 19:16:20| Accepting HTTP connections on port 3128, FD 35.1999/06/12 19:16:20| Accepting ICP messages on port 3130, FD 36. 1999/06/12 19:16:20| Accepting HTCP messages on port 4827, FD 37. 1999/06/12 19:16:20| Ready to serve requests.

Once Squid is up-and-running, it reads the cache-store. Since we are starting Squid for the first time,you should see only zeros for all the numbers below:

1999/06/12 19:16:20| storeRebuildFromDirectory: DIR #0 done!1999/06/12 19:16:25| Finished rebuilding storage disk. 1999/06/12 19:16:25| 0 Entries read from previous logfile. 1999/06/12 19:16:25| 0 Entries scanned from swap files. 1999/06/12 19:16:25| 0 Invalid entries. 1999/06/12 19:16:25| 0 With invalid flags. 1999/06/12 19:16:25| 0 Objects loaded. 1999/06/12 19:16:25| 0 Objects expired. 1999/06/12 19:16:25| 0 Objects cancelled. 1999/06/12 19:16:25| 0 Duplicate URLs purged. 1999/06/12 19:16:25| 0 Swapfile clashes avoided. 1999/06/12 19:16:25| Took 5 seconds ( 0.0 objects/sec). 1999/06/12 19:16:25| Beginning Validation Procedure 1999/06/12 19:16:26| storeLateRelease: released 0 objects 1999/06/12 19:16:27| Completed Validation Procedure 1999/06/12 19:16:27| Validated 0 Entries 1999/06/12 19:16:27| store_swap_size = 21k

Prev Home NextStarting Squid Up Testing Squid

- 63 -

25 Running SquidRunning Squid

Page 66: Squid Guide

Squid: A User’s GuidePrev Chapter 5. Starting Squid Next

26 Testing SquidIf all has gone well, we can begin to test the cache. True browser access is only covered in the nextchapter, and there is a whole chapter devoted to configuring your browser. Until then, testing is donewith the client program, which is included with the Squid source, and is in the /usr/local/squid/bin directory.

The client program connects to a cache and request a page, and prints out useful timing information.Since client is available on all systems that Squid runs on, and has the same interface on all of them,we use it for the initial testing.

At this stage Squid should be in the foreground, logging everything to your terminal. Since client is aunix program, you need access to a command prompt to run it. At this stage it’s probably easiest tosimply start another session (this way you can see if errors are printed in the main window).

The client program is compiled to connect to localhost on port 3128 (you can override these defaultsfrom the command line, see the output of client -h for more details.)

If you are running client on the cache server, and are using port 3128 for incoming requests, youshould be able to type a command like this, and the client program will retrieve the page through thecache server:

client http://squid.nlanr.net/

If your cache is running on a different machine you will have to use the -h and -p options. The follow-ing command will connect to the machine cache.qualica.comf on port 8080 and retrieve the above webpage. Example 5-1. Using the -h and -p client Options

cache1:~ $ /usr/local/squid/bin/client -h cache.qualica.com -p 8080 http://www.ora.com/

The client program can also be used to access web sites directly. As you may remember from readingChapter 2, the protocol that clients use to access pages through a cache is part of the HTTP specifica-tion. The client program can be used to send both "normal" and "cache" HTTP requests. To check thatyour cache machine can actually connect to the outside world, it’s a good idea to test access to anoutside web server.

The next example will retrieve the page at http://www.qualica.com/, and send the html contents of thepage to your terminal.

If you have a firewall between you and the internet, the request may not work, since the firewall mayrequire authentication (or, if it’s a proxy-level firewall and is not doing transparent proxying of thedata, you may explicitly have to tell client to connect to the machine.) To test requests through the fire-wall, look at the next section.

- 64 -

26 Testing Squid

Page 67: Squid Guide

A note about the syntax of the next request: you are telling client to connect directly to the remote site,and request the page /. With a request through a cache server, you connect to the cache (as you wouldexpect) and request a whole url instead of just the path to a file. In essence, both normal-HTTP andcache-HTTP requests are identical; one just happens to refer to a whole URL, the other to a file. Example 5-2. Retrieving Pages directly from a remote site with client

cache1:~ $ /usr/local/squid/bin/client -h www.ora.com -p 80 /

Client can also print out timing information for the download of a page. In this mode, the contents ofthi page isn’t printed: only the timing information is. The zero in the below example indicates thatSquid is to retrieve the page until interrupted (with Control-C or Break.) If you want to retrieve thepage a limited number of times, simply replace the zero with a number. Example 5-3. Print ing timing infor mation for a page download

cache1:~ $ /usr/local/squid/bin/client -g 0 -h www.ora.com -p 80 /

26.1 Testing a Cache or Proxy Server with ClientNow that you have client working, you Example 5-4. Accessing a site through the cache

cache1:~ $ /usr/local/squid/bin/client -h cache1.domain.example -p 3128 http://www.ora.com/

If the request through the cache returned the same page as you retrieved with direct access (you didn’treceive an error message from Squid), Squid should be up and running. Congratulations! If thingsaren’t going so well for you, you will have received an error message here. Normally, this is becauseof the acls described in the previous chapter. First, you should have a look at the terminal where youare running Squid (Or, if you are skipping ahead and have put Squid in the background, in the /usr/local/squid/logs/cache.log file.) If Squid encountered some sort of problem, there should be anerror or warning in this file. If there are no messages here, you should look at the /usr/local/squid/logs/access.log file next. We haven’t coverd the details of this file yet, but they arecoverded in the next section of this chapter. First, though, let’s see if your cache can process requeststo internal servers. There are many cases where a request will work to internal servers but not to exter-nal machines.

26.1.1 Testing Intranet AccessIf you have a proxy-based firewall, Squid should be configured to pass outgoing requests to the proxyrunning on the firewall. This quite often presents a problem when an internal client is attempting toconnect to an internal (Intranet) server, as discussed in section 2.2.5.2. To ensure that the acl-operatorlists created in section 2.2.5.2 are working, you should use client to attempt to connect to a machine onthe local network through the cache. cache1:~ $ client -h cache1.domain.example -p 3128 http://www.localdomain.example If you didn’t get an error message from a command like the above,access to local servers should be working. It is possible, however, that the connection could be beingpassed from the local cache to the parent (across a serial line), and the parent could be connecting backinto the local network, slowing the connection enormously. The only way to ensure that the connectionis not passing through your parent is to check the access logs, and see which server the connection isbeing passed to. 3.3.3: Access.log basics The access.log file logs all incoming requests. chapter 11covers the fields in the access.log in detail. The most important fields are the URL (field 7), and hier-archy access type (field 9) fields. Note that a "-" indicates that there is no data for that field. The following example access.log entries indicate the changes in log output when connecting to anotherserver, without a cache, with a single parent, and with multiple parents. Though fields are seperated by

- 65 -

26.1 Testing a Cache or Proxy Server with ClientTesting Squid

Page 68: Squid Guide

spaces, fields can contain sub-fields, where a "/" indicates the split. When connecting directly to a destination server, field 9 contains two subfields - the key word "DIRECT", followed by the name ofthe server that it is connecting to. Access to local servers (on your network) should always beDIRECT, even if you have a firewall, as discussed in section 3.1.2. The acl operator always_directcontrols this behaviour. 905144366.259 1010 127.0.0.1 TCP_MISS/200 20868 GEThttp://www.ora.com/ - DIRECT/www.ora.com text/html When you have configured only one parentcache, the hierarchy access type indicates this, and includes the name of that cache. 905144426.435289 127.0.0.1 TCP_MISS/200 20868 GET http://www.ora.com/ - SINGLE_PARENT/cache1.ora.comtext/html There are many more types that can appear in the hierarchy access information field, butthese are covered in chapter 11. Another useful field is the ’Log Tag’ field, field four. In the followingexample this is the field "TCP_MISS/200". 905225025.225 609 127.0.0.1 TCP_MISS/200 10089 GEThttp://www.is.co.za/ - DIRECT/www.is.co.za text/html A MISS indicates that the request was alreadystored in the cache (or that the page contained headers indicating that the page was not to be cached).A HIT would indicate that the page was already stored in the cache. In the latter case the request timefor a remote page should be substantially less than the first occurence in the logs. The time that Squidtook to service the request is the second field. This value is in milliseconds. This value shouldapproach that returned by examining a client request, but given operating system buffering there islikely to be a discrepancy. The fifth field is the size of the page returned to the client. Note that anaborted request can end up downloading more than this from the origin server if the quick_abortfeature set is turned on in the Squid config file. Here is an example request direct from the originserver: 905230201.136 6642 127.0.0.1 TCP_MISS/200 20847 GET http://www.ora.com/ -DIRECT/www.ora.com text/html If we use client to fetch the page a short time later, a HIT isreturned, and the time is reduced hugely. 905230209.899 151 127.0.0.1 TCP_HIT/200 20869 GEThttp://www.ora.com/ - NONE/- text/html Some of you will have noticed that the size of the hit hasincreased slightly. If you have checked the size of a request from the origin server and compared it tothat of the same page through the cache, you will also note that the size of the returned data hasincreased very slightly. Extra headers are added to pages passing through the cache, indicating whichpeer the page was returned from (if applicable), age information and other information. Clients neversee this information, but it can be useful for debugging. Since Squid 1.2 has support for HTTP/1.1,extra features can be used by clients accessing a copy of a page that Squid already has. Certain extraheaders are included into the HTTP headers returned in HITS, indicating support for features whichare not available to clients when returning MISSes. In the above example Squid has included a headerin the page indicating that range-request are supported. If Squid is performing correctly, you shouldshut Squid down and add it to your startup files. Since Squid maintains an in-memory index of allobjects in the cache, a kill -9 could cause corruption, and should never be used. The correct way to shutdown Squid is to use the command: cache1:~ # ~squid/bin/squid -k shutdown Squidcommand-line options are covered in chapter 10. 3.4) Addition to Startup Files The location of startupfiles vary from system to system. The location and naming scheme of these files is beyond the scopeof this book. If you already have a local startup file, it’s a pretty good idea to simply add the RunCacheprogram to that file. Note that you should place RunCache in the background on startup, which isnormally done by placing an ’&’ after the command: /usr/local/bin/RunCache & The RunCacheprogram attempts to restart Squid if it dies for some reason, and logs basic Squid debug output both tothe file "/usr/local/squid/squid.out" and to syslog.

Prev Home NextRunning Squid Up Browser Configuration

- 66 -

26.1.1 Testing Intranet Access

Page 69: Squid Guide

Squid: A User’s GuidePrev Next

27 Chapter 6. Browser Config urationTable of Contents Browsers Browser-cache Interaction Testing the Cache Cache Auto-config cgi generated autoconfig files Future directions Ready to Go

28 BrowsersSquid is the server half of a client-server relationship. Though you have configured Squid, your client(the browser) is still configured to talk to the menagerie of servers that make up the Internet.

You have already used the client program included with Squid to test that the cache is working.Browsers are more complicated to configure than client, especially since there are so many differenttypes of browser.

This chapter covers the three most common browsers. It also includes information on the proxy configuration of Unix tools, since you may wish to use these for automatic download of pages. Onceyour browser is configured, some of the proxy-oriented features of browsers are covered. Manybrowsers allow you to force your cache server to reload the page, and have other proxy-specificfeatures.

So that you can skip sections in this chapter that you don’t need to read, browsers are configured in the following order: Netscape Communicator, Microsoft Internet Explorer, Opera and finally UnixClients.

You can configure most browsers in more than one way. The first method is the simplest for a sysad-min, the second is simplest for the user. Since this book is written for system administrators, we termthe first basic configuration, the second advanced configuration.

28.1 Basic Config urationIn this mode, each browser is configured independently to the others. If you need to change somethingabout the server (the port that it accepts requests on, for example), each browser will have to be recon-figured manually: you will have to physically walk to it and change the setup. To avoid cachingintranet sites, you will have to add exclusions for each intranet site.

- 67 -

27 Chapter 6. Browser ConfigurationBrowser Configuration

Page 70: Squid Guide

28.2 Advanced Config urationIn this mode, you will configure a so-called rule server. Clients connect to this server on startup, and download information on which proxy server to talk to, and which URLs to retrieve from which proxyserver. Exclusion of intranet sites is handled centrally: so one change will update all clients. If your organization is is large, or is growing, you should use the auto-config option.

Though this method is called auto-config, it’s not completely automatic, since the user still has to entera URL indicating the location of the list of rules. Advanced configuration has some advantages:

Changes to the proxy server are easy, since you only change the rule server.

A proxy server can be chosen based on destination machine name, destination port and more.Since this list is maintained centrally, chances also only have to be made once.

Browser configuration is easy, instead of adding complicated lists of IP’s, a user simply has totype in a URL.

Since it’s easy to configure, users are more likely to use the cache.

When you write your list of rules (also called a proxy auto-config script), you will still need to supplythe client with the same information as with the basic configuration, it’s just that the list of this infor-mation is maintained centrally. Even if you decide to use only autoconfig on your network, you should probably work through the basic configuration first.

28.3 Basic Config urationTo configure any browser, you need at least two pieces of information:

The proxy server’s host name

The port that the proxy server is accepting requests on

28.4 Host nameIt’s very important to use a proxy specific host name. If you decide to move the cache to anothermachine at a later stage you will find that it’s much easier to change DNS settings than to change the configuration of every browser on your network.

If your operating system supports IP aliases you should organize a dedicated IP address for the cacheserver, and use the tcp_incoming_address and tcp_outgoing_address squid.conf options to make Squidonly accept incoming HTTP requests on that IP address.

There isn’t really a naming convention for caches, but people generally use one of the following: cache, proxy, www-proxy, www-cache, or even the name of the product they are using; squid, netapp, netscape. Some people also include the location of the cache, and configure people in a region to talkto their local cache. More and more people are simply using cache, and it’s the suggested name. If youwish to use regional names, you can use something along the lines of region.cache.domain.example.

Your choice of port has already been discussed. Have a look at HTTP:port in the index for more infor-mation.

- 68 -

28.2 Advanced Configuration

Page 71: Squid Guide

28.4.1 Netscape Commu nicator 4.5(? Screen shots here ?)

Select the Edit menuSelect Preferences Maximize Advanced Select Proxies Choose Manual proxy configuration Click the View... button

For each of FTP Proxy, Gopher Proxy, HTTP Proxy, Security Proxy, enter the hostname of your cacheon the left, and the chosen http_port on the right. Squid can function as a WAIS proxy when it has aWAIS relay (see the tags wais_relay_host and wais_relay_port in chapter 10 for more information).

If you have an intranet server, you can enter the host name in the box titled "No Proxy for". If youwish to add more than one server, simply use a comma to separate the entries.

Since you are going to be accessing a large cache server, the disk space allocated for the browserscache is disk space that could be used for something else. It’s worth having some disk space allocatedto the browsers’ cache, especially if the cache is across a serial line. Modem users, for example, shouldkeep their cache settings as is.

Select the Edit menuSelect Preferences Maximize Advanced Select Cache Change the text in the Disk Cache box to 1000

28.4.2 Inter net Explorer 4.0Select the View menu option Select Internet Options Click on the Connection tab Select Access the Internet using a proxy server Type in your hostname in the Address: field, and your chosen port in the Port: field. Internet Explorer can attempt to connect directly to the destination server if the URL youare going to is in the local domain (? I presume ?). You should turn this option on, so that localaccesses are not cached, and do not pass through the cache server. If you have more than one domain,you will have to specifically change options so that all your domains are ignored, using the Advancedbutton.

In the advanced menu, you can configure per-protocol cache server/port pairs, or you can type in onlythe first proxy/port pair, and select Use the same proxy for all protocols. Although Squid doesn’tnormally work with SOCKS, it’s rarely used, so you can probably use the same proxy for all proto-cols.

The main advantage of using the Advanced menu is the ability to specify which domains are to beconnected to directly, rather than through the proxy server. If all your local sites’ hostnames beginwith intranet, you can simply put that into the box titled Do not use proxy for addresses beginning with. You can add more than one exception by using a semicolon (;) between entries.

- 69 -

28.4.1 Netscape Communicator 4.5Browser Configuration

Page 72: Squid Guide

You will probably wish to exclude all local sites too. Since the exception list allows you to use a * character for what is known as a wildcard match, you can add *.localdomain.example, and all hosts inyour domain will be accessed directly. Many people access local sites by IP address, rather than byname. Since the exception list matches against the URL (??) these will still pass through the cache, andyou will need to add an IP address range to the list of hosts to exclude: 192.168.0.* should do nicely.

To reduce the local browser cache space (as discussed in the Netscape section in the previous section):

ViewOptions General In the Temporary Internet files section, click the Settings button. Move the slider all the way to the left.

Since Squid-2.0 and above handle HTTP/1.1 correctly, you should also configure Internet Explorer touse HTTP/1.1 when communicating with the proxy server:

ViewInternet Options Advanced tab Scroll down until you see HTTP 1.1 Settings Tick Use HTTP 1.1 through proxy server

(? I believe that opera is the third most common browser ?) (? I don’t have a machine with it on... sinceI run Linux?)

28.4.3 Unix clientsMost Unix client programs use a single environment variable to decide how they are to access the Internet. If you run lynx (a terminal-based browser) on any of your machines, or use the recursiveweb-spider wget, you can simply set a shell variable and these programs will use the correct proxyserver.

Each protocol has a different environmental variable, so that you can configure your client to use a different proxy for each protocol. Each protocol simply has the text _proxy tagged onto the end, sosome of the most common protocols end up as follows:

http_proxy

ftp_proxy

gopher_proxy

Since many people prefer a shell other than bash, we make an exception to our rule that "all examplesare based on sh" here. sh. The Bourne Shell (or Bash, the freeware alternative)

http_proxy=http://cache.domain.example:3128/export http_proxy OR ftp_proxy=http://cache.domain.example:3128/

- 70 -

28.4.3 Unix clients

Page 73: Squid Guide

export ftp_proxy

tcsh. The C Shell

setenv http_proxy http://cache.domain.example:3128/OR setenv ftp_proxy http://cache.domain.example:3128/

(? ksh, others ?)

Prev Home NextTesting Squid Browser-cache Interaction

- 71 -

28.4.3 Unix clientsBrowser Configuration

Page 74: Squid Guide

Squid: A User’s GuidePrev Chapter 6. Browser Configuration Next

29 Browser-cache Inter actionThe Internet is a transient place, and at some stage a server that does not correctly handle caching willbe found. You can easily add the server to the appropriate do not cache lists, but most browsers giveusers a way of forcing a web page reload. Netscape Communicator . Pressing the Reload button forces the cache server to reload the HTMLpage that is being viewed. Holding down the key and pressing Reload forces a reload of all the objectson the current page.page or graphic brings up a menu where you can select reload, which forces a re-get of the page. Ifyou right click in a frame, you can re-load only the frame. Microsoft Inter net Explorer 4. With Internet Explorer there is no difference between a reload and areload. A reload also does a different type of request, which essentially checks if the cache considersthe page to be fresh. If the refresh rules on the cache are set to refresh in a long time, the page willcome from the cache, and will not be re-fetched from the origin server.Lynx. Pressing will force a reload of the page.

Prev Home NextBrowser Configuration Up Testing the Cache

- 72 -

29 Browser-cache Interaction

Page 75: Squid Guide

Squid: A User’s GuidePrev Chapter 6. Browser Configuration Next

30 Testing the CacheAs you can see, pressing reload in Netscape (and some other browsers) doesn’t simply re-fetch thepage, it forces the cache not to serve the cached page. Many people doing tests of how the cacheincreases performance simply press reload, and believe that there has been no change in speed. Thecache is, in fact, re-downloading the page from the origin server, so a speed increase is impossible.

To test the cache properly you need two machines setup to access the cache, and a page that does notcontain do not cache me headers. Pages that use ASP often include headers that force Squid not tocache the page, even if the authors are not aware of it’s implications.

So, to test the cache, choose a site that is off your local network (for a marked change, choose one in a different country) and access it from the first machine. Once it has download, change to the secondmachine and re-download the page. Once the page has downloaded there, check that the page ismarked as a ’HIT’ (in the file called access.log - the basics of which are covered earlier in this book).If the second accesses were marked as misses, it is probably because the origin server is asking Squidnot to cache the page. Try a different page and see difference the cache makes to browsing speed.

Many people are looking for an increase in performance on problem pages, since this is when peoplebelieve that they are getting the short end of the stick. If you choose a site that is too close, you mayonly be able to see a difference in the speed in the transaction-time field of the access.log.

Since you have a completely unloaded cache, you should access a local, unloaded web server a fewtimes, and see what kind of latency you experience with the cache. If you have time, print out some ofthe access log entries. If, some time in the future, you are unsure as to the cache load, you can comparethe latency added now to the latency added by the same cache later on; if there is a difference youknow it’s time to upgrade the cache.

Prev Home NextBrowser-cache Interaction Up Cache Auto-config

- 73 -

30 Testing the CacheTesting the Cache

Page 76: Squid Guide

Squid: A User’s GuidePrev Chapter 6. Browser Configuration Next

31 Cache Auto-configClient browsers can have all options configured manually, or they can be configured to download a autoconfig file (every time the startup), which provides all of the information about your cache setup.Each URL referenced (be it the URL that you typed, or the URL for a graphic on the page yet to beretrieved) is checked against the list of rules. You should keep the list of rules as short as possible, otherwise you could end up slowing down page loads - not at the cache level, but at the browser.

31.1 Web server config changes for auto con fig filesThe original \**Netscape documentation for the proxy autoconfig file suggested Available athttp://home.netscape.com/eng/mozilla/2.0/relnotes/demo/proxy-live.html the filename proxy.pac forProxy AutoConfig files. Since it’s possible to have a file ending in .pac that is not used for autoconfig-uration, browsers require a server returning an autoconfig file to indicate so in the mime type. Mostweb servers do not automatically recognize the .pac extension as a proxy-autoconfig file, and have tobe reconfigured to return the correct mime type (application/x-ns-proxy-autoconfig).

31.1.1 ApacheOn some systems Apache already defines the autoconfig mime type. The Apache config file mime.types is used to associate filename extensions with mime types. This file is normally stored inthe apache conf directory. This directory also contains the access.conf and httpd.conf files, which youmay be more familiar with editing. As you can probalby see, the mime.types file consists of two fields:a mime type on the left, the associated filename extension on the right. Since this file is only read atstartup or reconfigure, you will need to send a HUP signal to the parent apache process for yourchanges to take affect. The following line should be added to the file, assuming that it is not alreadyincluded:

application/x-ns-proxy-autoconfig pac

Example 6-1. Restarting Apache

cd /usr/local/lib/httpd/logskill -HUP ‘cat httpd.pid‘

31.1.2 Inter net Infor mation Server(?nothing here yet?)

31.1.3 Netscape(? or here ?)

- 74 -

31 Cache Auto-config

Page 77: Squid Guide

31.2 Auto con fig Script CodingThe autoconfig file is actually a Java function, put in a file and served by your standard web serverprogram. Don’t panic if you don’t know Java, since this section acts as a cookbook. Besides: the basic structure of the Java language is quite easy to get the hang of, especially if you have previous programming experience, whether it be in C, Pascal or Perl.

31.2.1 The Hello World! of auto-config uration scriptsIf you have learned a programming language, you probably remember one of the most basic programssimply printing the phrase Hello World!. We don’t want to print anything when someone tries to go toa page, but the following example is similar to the original Hello World program in that it’s the short-est piece of code that does something useful.

The following simply connects direct to the origin server for every URL, just as it would if you had noproxy-cache configured at all. Example 6-2. A very basic autoconfig file

function FindProxyForURL(url, host) { return DIRECT;}

The next example gets the browser to connect to the cache server named cache.domain.example onport 3128. If the machine is down for some reason, an error message will be returned to the user. Example 6-3. Connecting to a cache server

function FindProxyForURL(url, host) { return "PROXY cache.domain.example:3128";}

Example 6-4. Connecting to a cache server, with failover

function FindProxyForURL(url, host) { return "PROXY cache.domain.example:3128; DIRECT";}

As you may be able to guess from the above, returning text with a semicolon (;) splits the answerreturned into two sub-strings. If the first cache server is unavailable, the second will be tried. Thisprovides you with a failover mechanism: you can attempt a local proxy server first and, if it is down,try another proxy. If all are down, a direct attempt will be made. After a short period of time, the proxywill be retried.

A third return type is included, for SOCKS proxies, and is in the same format as the HTTP type:

return "SOCKS socks.domain.example:3128";

If you have no intranet, and require no exclusions, you should use the above autoconfig file. Configur-ing machines with above autoconfig file allows you to add future required exclusions very easily.

- 75 -

31.2 Autoconfig Script CodingCache Auto-config

Page 78: Squid Guide

31.2.2 Auto-config func tionsWeb browsers include various built-in functions to make your autoconfig coding as simple as possible.You don’t have to write the code that does a string match of the hostname, since you can use a stan-dard function call to do a match. Not all functions are covered here, since some of them are very rarelyused. You can find a complete list of autoconfig functions (with examples) at http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/proxy-live.html.

31.2.2.1 dnsDo mainIs

Returns true if the first argument (normally specified as the variable host, which is defined in the auto-config function by default) is in the domain specified in the second argument. Checks if a host is in adomain. Example 6-5. dnsDomainIs

if dnsDomainIs(host,".mydomain.example") { return "DIRECT";}

You can check more than one domain by using the || Java operator. Since this is a Java operator youcan use the layout described in this example in any combination. Example 6-6. Using multi ple dnsDomainIs calls

if (dnsDomainIs(host,".mydomain.example")|| dnsDomainIs(host,".anotherdomain.example")) { return "DIRECT";}

31.2.2.2 isInNet

Sometimes you will wish to check if a host is in your local IP address range. To do this, the browserresolves the name to find the IP address. Do not use more than one isInNet call if you can help it: eachcall causes the browser to resolve the hostname all over again, which takes time. A string of these callscan reduce browser performance noticeably.

The isInNet function takes three arguments: the hostname, and a subnet/netmask pair. Example 6-7. using the isInNet call

if isInNet(host, "192.168.0.0", "255.255.0.0") { return "DIRECT";}

31.2.2.3 isPlain Host name

Simply checks that there is no full-stop in the hostname (the only argument for this call). Many peoplerefer to local machines simply by hostname, since the resolver library will automatically attempt tolook up host.domain.example if you simply attempt to connect to host. For example: typing www inyour browser should bring up your web site.

Many people connect to internal web servers (such as one sitting on their co-worker’s desk) by typingin the hostname of the machine. These connections should not pass through the cache server, so manypeople use a function like the following: Example 6-8. using isPlainHostName to decide if the connection should be direct

- 76 -

31.2.2 Auto-config functions

Page 79: Squid Guide

if isPlainHostName(host) { return "DIRECT";} else { return "PROXY cache.mydomain.example:3128";}

31.2.2.4 myIpAd dress

Returns the IP address of the machine that the browser is running on, requires no arguments.

On a network with more than one cache, your script can use this information to decide which cache to communicate with. In the next subsection we look at different ways of communicating with a localproxy (with minimal manual user intervention), so the example here is comparatively basic. The belowexample assumes that you have more than two networks: one with a private address range (10.0.0.*),the others with real IP addresses.

If the client machine is in the private address range, it cannot connect directly to the destination server,so if the cache is down for some reason they cannot access the Internet. A machine with a real IPaddress, on the other hand, should attempt to connect directly to the origin server if the cache is down.(? need to check it will work too! ?).

Since myIpAddress requires no arguments, we can simply place it in where we would have put host inthe isInNet function call. Example 6-9. myIpAd dress

if (isInNet(myIpAddress, "10.0.0.0", "255.255.255.0")) { return "PROXY cache.mydomain.example:3128";} else { return "DIRECT";}

31.2.2.5 shExp Match

The shExpMatch function accepts two arguments: a string and a shell expression. Shell expressionsare similar to regular expressions, though are more limited. This function is often used to check if theurl or host variables have a specific word in them.

If you are configuring a ISP-wide script, this function can be quite useful. Since you do not know if acustomer will call their machine "intranet" or "intra" or "admin", you can chain many shExpMatchchecks together. Note that in the below example uses a single "intra*" shell expression to match both"intranet" and "intra.mydomain.example". Example 6-10. shExpMatch

if (shExpMatch(host, "intra*")|| shExpMatch(host, "admin*")) { return "DIRECT";} else { return "PROXY cache.mydomain.example:3128";}

31.2.2.6 url.substring

This function doesn’t take the same form as those described above. Since Squid does not support all possible protocols, you need a way of comparing the first few characters of the destination URL withthe list of possible protocols. The function has two arguments. The first is a starting position, thesecond the number of characters to retrieve. Note that (like C), string start at position 0, rather than at

- 77 -

31.2.2 Auto-config functionsCache Auto-config

Page 80: Squid Guide

1.

All of this is best demonstrated with an example. The following attempts to connect to the cache forthe most common URL types (http, ftp and gopher), but attempts to go directly for protocols thatSquid doesn’t recognize. Example 6-11. url.substring

if (url.substring(0, 5) == "http:" || url.substring(0, 4) == "ftp:"|| url.substring(0, 7) == "gopher:") return "PROXY cache.is.co.za:8080; DIRECT";else return "DIRECT";

31.2.3 Example auto con fig filesThe main reason that autoconfig files were invented was the sheer number of possible cache setups.It’s diffi cult (or even impossible) to represent all of the possible combinations that a autoconfig filecan provide you with.

There is no config file that will work for everyone, so a couple of config files are included here, one ofwhich should suit your setup.

31.2.3.1 A Small Organization

A small organization is the easiest to create an autoconfig file for. Since you will have a moderatelysmall number of IP addresses you can use the isInNet function to discover if the destination host islocal or not (a large organization, such as an ISP would need a very long autoconfig file simplybecause they have many IP address ranges). Example 6-12. A small organization’s proxy config file

function FindProxyForURL(url, host) { // We only have one network range, and one DNS request doesn’t // mean a large slowdown if (isInNet(host, "196.4.160.0", "255.255.255.0")) return DIRECT; // If it’s not local, use the cache server, with automatic // connection to the outside in case of problems return "PROXY cache.domain.example:3128; DIRECT"}

31.2.3.2 A Dialup ISP

Since dialup customers don’t have intranet systems, a dialup ISP would have a very straight forwardconfig file. If you wish your customers to connect directly to your web server (why waste the diskspace of a cache when you have the origin server rack-mounted above it), you should use the dnsDo-mainIs function: Example 6-13. Dialup ISP autoconfig file

function FindProxyForURL(url, host) { // For servers in the local domain, go direct if dnsDomainIs(host, "mydomain.example") return "DIRECT"; // Otherwise go through the cache server, with fail-over return "PROXY cache.mydomain.example:3128; DIRECT";}

- 78 -

31.2.3 Example autoconfig files

Page 81: Squid Guide

31.2.3.3 Leased Line ISP

When you are providing a public service, you have no control over what your customers call theirmachines. You have to handle the generic names (like intranet) and hope that people name theirmachines according to the de-facto standards. Example 6-14.

function FindProxyForURL(url, host) { // so that people can type just "intranet" or just "mypc" if isPlainHostName(host) return "DIRECT"; // For servers in our domain, go direct: for announcements etc if dnsDomainIs(host, "mydomain.example") return "DIRECT"; // since there are many domains, we cannot do them all. We assume // that people are going to type "intranet" instead of // "intranet.customerdomain.example" return "PROXY cache.mydomain.example:3128; DIRECT";

(? I need some info on ieak - waiting for people here?)

31.3 Cache Array Routing Proto colMany large ISPs will have more than one cache server. To avoid duplicating objects, these cacheservers have to communicate with one another. Consider the following;

cache1 gets a request for an object. It caches the page, and stores it on disk. An hour or so later, cache2gets a request for the same page. To find a local copy of the object, cache2 has to query the othercaches. Add more and more caches, and your number of queries goes up.

If an incoming request for a specific URL only ever went to one cache, your caches would not need to communicate with one another. A client requesting the page http://www.oreilly.com/ would alwaysconnect to cache1.

Let’s assume that you have 5 caches. Splitting the Internet into five pieces would split the load acrossthe caches almost evenly. How do you split though? By destination IP address? No, since IP’s like19?.*.*.* are much more common than "5.*.*.*". By domain? No again, since one domain likemicrosoft.com would mean that you were distributing load incorrectly.

Some of you will know what a hash function is. If not, don’t panic: you can still use CARP withoutknowing the theoretical basis of the algorithms involved.

CARP allows you to split up the Internet by URL (the combination of hostname, path and filename). Ifyou have 5 cache servers, you split up the domain of possible answers into 5 parts. (A hash functionreturns a number, so we are using the appropriate terms - a domain is not an Internet domain in thiscontext). With a good hashing function, the numbers returned are going to be spread across the 5 partsevenly, which spreads your load perfectly.

If you have a cache which is twice as powerful as your others, you can allocate it more of the domain,and put more load on it.

- 79 -

31.3 Cache Array Routing ProtocolCache Auto-config

Page 82: Squid Guide

Carp is used by some cache servers (most notably Microsoft Proxy and Squid) to decide which parentcache to send a request too. Browsers can also use CARP to decide which cache to talk to, using a javaauto-config script. For more information (and an example Java script), you should look at the webpage http://naragw.sharp.co.jp/sps/

Prev Home NextTesting the Cache Up cgi generated autoconfig files

- 80 -

31.3 Cache Array Routing Protocol

Page 83: Squid Guide

Squid: A User’s GuidePrev Chapter 6. Browser Configuration Next

32 cgi gener ated auto con fig filesIt is possible to associate the .pac extension with a cgi program on most web servers. A program couldthen generate an autoconfig script depending on the source address of the request. Since the autconfigfile is only loaded on startup (or when the autoconfig refresh button is pressed) the slight delay due toa cgi program would not be noticeable to the user. Most large ISPs allocate subnets to regions, so abrowser could be configured to access the nearest cache by looking at the source address of the requestto the cgi program.

Prev Home NextCache Auto-config Up Future directions

- 81 -

32 cgi generated autoconfig filescgi generated autoconfig files

Page 84: Squid Guide

Squid: A User’s GuidePrev Chapter 6. Browser Configuration Next

33 Future direc tionsThere has recently been a move towards a standard for the automatic configuration of proxy-caches.New versions of Netscape and Internet Explorer are expected to use the new unknown standard to automatically change their proxy settings. This allows you to manipulate your cache server settingswithout inconveniencing clients.

33.1 RoamingRoaming customers have to remove their configured caches, since your access control lists should stopthem accessing your cache from another network.

Although both problems can be reduced by the cgi-generated configs (discussed above) a firewallbetween the browser and your cgi server would still mean that roaming users cannot access the Inter-net.

There are changes on the horizon that would help. As more and more protocols take roaming users intoaccount, standards will evolve that make Internet usage plug-and-play. If you are in Tanzania today,plug in your modem and use the Internet. If you are in France in a weeks time, plug in again and(without config changes) you will be ready to go.

Progress on standards for autoconfiguration of Internet applications is underway, which will allow administrators to specify config files depending on where a user connects from without something likethe cgi kludge above.

33.2 BrowsersBrowser support for CARP is not at the stage where it is tremendously useful: once there is a proper standard for setup, it’s likely to be included into the main browsers.

At some stage, expect support for ICP and cache-digests in browsers. The browser will then be able tomake intelligent decisions as to which cache to talk to. Since ICP requests are efficient, a browsercould send requests for each of the links on a page once it has retrieved the HTML source.

33.3 Trans parencyCurrently there is a major trend towards transparent caching, not only in the "Outer Internet" (where bandwidth is very expensive), but in the USA. (Transparency is covered in detail in chapter 12.)

Transparency has one major advantage: Users do not have to configure their browsers to access thecache.

- 82 -

33 Future directions

Page 85: Squid Guide

To backbone providers this means that they can cache all passing traffic. A local ISP would configuretheir clients to talk to their cache; a backbone provider could then ask their ISP clients to use theirs asparents, but transparent caching has another advantage.

A backbone provider is acting as transit for requests that originate on other backbone provider’snetworks. With transparency, a backbone provider reduces this traffic as well as requests from theirnetwork to other backbone providers.

Assume you place a cache the hop before a major peering point. Here the cache intercepts both incom-ing requests (from other providers to web servers on your network) and outgoing (from your network to web servers on other provider’s networks). This will reduce your peering-point usage (by caching outgoing requets for pages), and will also reduce the money you spend on other people’s customers:since you reduce the cost it takes for data to flow out of your network. The latter cost may be minimal,but in times of network trouble it can reduce your latency noticibly.

As more and more backbone providers cache pages, more local ISPs will cache ("since it’s cachedfurther along the path, we may as well implement caching here - it’s not going to change anything").Though this will probably cause a drop in the hit rate of the backbone providers, their ever increasinguser-base may make up for it. Backbone providers are caching centrally - with large numbers of edgecaches (local ISP caches), they are likely to see fewer hits. Certain Inter-University networks havealready noticed such a hit rate decline. As more and more universities add local caches, their hit ratefalls.

Since the Universities are large, it’s likely that their users will surf the same web page twice. Previ-ously the Inter-University network would have returned the hit for that page, now the University’slocal cache does; this reduces the edge-cache’s number of queries, and hence it’s hit rate.

Prev Home Nextcgi generated autoconfig files Up Ready to Go

- 83 -

33.3 TransparencyFuture directions

Page 86: Squid Guide

Squid: A User’s GuidePrev Chapter 6. Browser Configuration Next

34 Ready to GoIf all has gone well, you should be ready to use your cache, at least on a trial basis. People around youroffice or division can now be configured to use the cache, and once you are happy with it’s perfor-mance and stability, you can make it a proper service.

Prev Home NextFuture directions Up Access Control and Access

Control Operators

- 84 -

34 Ready to Go

Page 87: Squid Guide

Squid: A User’s GuidePrev Next

35 Chapter 7. Access Control and Access Control Operators

Table of Contents Uses of ACLs Access Classes and Operators Acl lines Acl-operator lines SNMP Configuration Delay Classes Conclusion

Access control lists (acls) are often the most diffi cult part of the configuration of a Squid cache: thelayout and concept is not immediately obvious to most people. Hang on to your hat!

Unless chapter 3 is still fresh in your mind, you may wish to skip back and review the access controlsection of that chapter before you continue. This chapter assumes that you understood the differencebetween an acl and an acl-operator.

36 Uses of ACLsThe primary use of the acl system is to implement simple access control: to stop other people usingyour cache infrastructure. (There are other uses of acls, described later in this chapter; in the meantimewe are going to discuss only the access control function of acls.) Most people implement only verybasic access control, denying access to people that are not on their network. Squid’s access system is incredibly flexible, but 99% of administrators only use the most basic elements. In this chapter some examples of the less common uses of acls are covered: hopefully you will discover some Squid featurewhich suits your organization - and which you didn’t think was part of Squid before.

Prev Home NextReady to Go Access Classes and Operators

- 85 -

35 Chapter 7. Access Control and Access Control OperatorsAccess Control and Access Control Operators

Page 88: Squid Guide

Squid: A User’s GuidePrev Chapter 7. Access Control and Access Control Operators Next

37 Access Classes and OperatorsThere are two elements to access control: classes and operators. Classes are defined with the aclsquid.conf tag, while the names of the operators vary: the most common operator used is http_access. Classes. A class normally refers to a set of users. (A class can also refer to a list of destinationdomains, filename extensions and more, but for now let’s start with the basics!). If you have 50 peoplethat are allowed Internet access, you could put all of their IP addresses in a list, and use that list as a"class of IP addresses that have Internet access".Operators. It’s often useful to use one set of ACLs for ICP and another for HTTP. This way you canapply different sets of rules for different protocols; this comes in very useful when you have a numberof peering arrangements. Most ISP’s do not want their caches to be SNMP-queried by all of theircustomers: they do, however, want all their customers to have access to browser access. In short, youwant one set of acls to apply to HTTP traffic, another to apply to SNMP - and that’s exactly what youget. For each protocol there is a different acl-operator, examples include the http_access, icp_accessand snmp_access tags. It’s very important to note that there is a not an ftp_access type. FTP requestsare passed to the cache using the HTTP format (it’s just a different format URL that gets sent to thecache server). The proto acl type (discussed shortly, with examples!) allows you to deny access to thecache if it’s FTP, HTTP, SSL etc.Let’s work through the below example line-by-line. Here, a systema administrator is in the process ofinstalling a cache, and doesn’t want other staff to access it while it’s being installed, since it’s likely toping-pong up and down during the installation. Once the administrator is happy with the config, thewhole network will be allowed access. The admin’s PC is at the IP 10.0.0.3. Example 7-1. Explicit allow, explicit deny (do not use this!, see later text for reasons)

acl myIP src 10.0.0.3/255.255.255.255acl myNet src 10.0.0.0/255.255.0.0http_access allow myIPhttp_access deny myNet

If the admin connects to the cache from the PC, Squid does the following:

Accepts the (HTTP) connection and reads the request

Checks the line that reads http_access allow myIP.

Since your IP address matches the IP defined in the myIP acl, access is allowed. Remember thatSquid drops out of the operator list on the first match.

If you connect from a different PC (on the 10.0.*.* network) things are very similar:

Accepts the connection and reads the request

The source of the connection doesn’t match the myIP acl, so the next http_access line is checked.

- 86 -

37 Access Classes and Operators

Page 89: Squid Guide

The myNet acl matches the source of the connection, so access is denied. An error page isreturned to the user instead of the requested page.

If someone reaches your cache from another netblock (from, say, 192.168.*.*), the above access listwill not block access. The reason for this is quite complicated. If Squid works through a set of acl-operators and finds no match, it defaults to using the opposite of the last match (if the previous operator is an allow, the default is to deny; if it’s a deny, the default is to allow). This seems a bitstrange at first, but let’s look at an example where this behaviour is used: it’s more sensible than itseems.

The following acl example is nice and simple: it’s something a first-time cache admin could create. Example 7-2. Only an allow acl-operator

acl myNet src 10.0.0.0/255.255.0.0http_access allow myNet

A config file with no access lists will allow cache access without any restrictions. An administratorusing the above access lists obviously wishes to allow only his network access to the cache. Given theSquid behavior of inverting the last decision, we have an invisible line reading

http_access deny all

Inverting the last decision is a simple (if not immediately obvious) solution to one of the mostcommon acl mistakes: not adding a final deny all to the end of your acl list.

With this new knowledge, have a look at the first example in this chapter: you will see why I said notto use it in your configs. Given that the last operator denies the local network, local people will not beable to access the cache. The remainder of the Internet, however, will! As discussed in chapter 1, thesimplest way of creating a catch-all acl is to match requests when they come from any IP address.When programs do netmask arithmetic a subnet of all zeros will match any IP address. A correctedversion of the first example dispenses with the myNet acl. Example 7-3. Corrected example 6-1, explicit deny all

acl myIP src 10.0.0.3/255.255.255.255acl all src 0.0.0.0/0.0.0.0http_access allow myIPhttp_access deny all

Once the cache is considered stable and is moved into production, the config would change.http_access lines do add a very small amount of overhead, but that’s not the only reason to havesimple access rulesets: the less rulesets, the easier your setup is to understand. The below exampleincludes a deny all rule although it doesn’t really need one: you may know of the automatic inversionof the last rule, but someone else working on the cache may not. Example 7-4. Example 6-1 once the cache is considered stable

acl myNet src 10.0.0.0/255.255.0.0acl all src 0.0.0.0/0.0.0.0http_access allow myNethttp_access deny all

You should always end your access lists with an explicit deny. In Squid-2.1 the default config file doesthis for you when you insert your HTTP acl operators in the appropriate place.

- 87 -

37 Access Classes and OperatorsAccess Classes and Operators

Page 90: Squid Guide

Prev Home NextAccess Control and AccessControl Operators

Up Acl lines

- 88 -

37 Access Classes and Operators

Page 91: Squid Guide

Squid: A User’s GuidePrev Chapter 7. Access Control and Access Control Operators Next

38 Acl linesThe Examples so far have given you an idea of an acl line’s layout. Their layout can be symbolized asfollows (? Check! ?):

acl name type (string|"filename") [string2] [string3] ["file name2"]

The acl tag consists of a minimum of three fields: a unique name; an acl type and a decision string. Anacl line can have more than one decision string, hence the [string2] and [string3] in the line above.

38.1 A unique nameThis is supposed to be descriptive. Use a name such as customers or mynet. You have seen this lots oftimes before: the word myNet in the above example is one such case.

There must only be one acl with a given name; if you find that you have two or more classes withsimilar names, you can append a number to the name: customer1, customer2 etc. I generally avoid this,instead putting all similar data on these classes into a file, and including the whole file as one acl.Check the Decision String section for some more info on this.

38.2 TypeSo far we have discussed only acls that check the source IP address of the connection. This isn’t suffi-cient for many people: it may be useful for you to allow connections at only certain times, or to onlyspecific domains, or by only some users (using usernames and passwords). If you really want to, youcan even combine all of the above: only allow connections from users that have the right password,have the right destination and are going to the right domain. There are quite a few different acl types:the next section of this chapter discusses all of the different types in detail. In the meantime, let’sfinish the description of the structure of the acl line.

38.3 Decision StringThe acl code uses this string to check if the acl matches a given connection. When using this field,Squid checks the type field of the acl line to decide how to use the decision string. The decision stringcould be an IP address range, a regular expression or a list of domains or more. In the next section(where we discuss the types of acls available) we discuss the different forms of the Decision String.

If you have another look at the formal definition of the acl line above, you will note that you can havemore than one decision string per acl line. Strings in this format are ORd together; if you were tospecify two IP address ranges on the same line the return result of the acl would be true if either of theIP addresses match. (If source strings were ANDd together, then an incoming request would have tocome from two IP address ranges at the same time. This is not impossible, but would almost certainlybe pointless.) Example 7-5. Using multi ple acl Decision Strings per line

- 89 -

38 Acl linesAcl lines

Page 92: Squid Guide

# This line will match requests from either address range: # 10.0.0.0/255.255.255.0 OR 10.1.0.0/255.255.255.0acl myNets src 10.0.0.0/255.255.255.0 10.1.0.0/255.255.255.0acl all src 0.0.0.0/0.0.0.0http_access allow myNetshttp_access deny all

Large decision lists can be stored in files, so that your squid.conf doesn’t get cluttered. Some of thecaches I have worked on have had in the region of 2000 lines of acl rules, which could lead to a very cluttered squid.conf file. You can include a file into the decision section of an acl list by placing the filename (with path) in double-quotes. The file simply contains the data set; one datum per line. In thenext example the file /usr/local/squid/conf/data/myNets can contain any number of IP ranges, onerange per line. Example 7-6.

acl myNets src "/usr/local/squid/conf/data/myNets"acl all src 0.0.0.0/0.0.0.0http_access allow myNetshttp_access deny all

While on the topic of long lists of acls: it’s important to note that you can end up slowing your cacheresponse with very long lists of acls. Checking acls requires CPU time, and long lists can decreasecache performance, since instead of moving data to clients Squid is busy checking access lists. What constitutes a long list? Don’t worry about lists with a few hundred entries unless you have a reallyslow or busy CPU. Lists thousands of lines long can, however, cause problems.

38.4 Types of aclSo far we have only spoken about acls that filter by source IP address. There are numerous other acltypes:

Source/Destination IP address

Source/Destination Domain

Regular Expression match of requested domain

Words in the requested URL

Words in the source or destination domain

Current day/time

Destination port

Protocol (FTP, HTTP, SSL)

Method (HTTP GET or HTTP POST)

Browser type

Name (according to the Ident protocol)

- 90 -

38.4 Types of acl

Page 93: Squid Guide

Autonomous System (AS) number

Username/Password pair

SNMP Community

38.4.1 Source/Desti nation IP addressIn the examples earlier in this chapter you saw lines in the following format:

acl myNet src 10.0.0.0/255.255.0.0http_access allow myNet

The above acl will match when the IP address comes from any IP address between 10.0.0.0 and 10.0.255.255. In recent years more and more people are using Classless Internet Domain Routing(CIDR) format netmasks, like 10.0.0.0/16. Squid handles both the traditional IP/Netmask and morerecent IP/Bits notation in the src acl type. IP ranges can also be specified in a further format: one thatis Squid specific. (? I need to spend some time hacking around with these: I am not sure of the layout?)

acl myNet src addr1-addr2/netmaskhttp_access allow myNet

Squid can also match connections by destination IP. The layout is very similar: simply replace src with dst. Here are a couple of examples: Example 7-7. Denying access to a small section of a larger block

acl BadDest dst 10.0.0.0/255.255.0.0acl NiceDest dst 10.1.0.0/16http_access deny BadDesthttp_access allow NiceDest

38.4.2 Source/Desti nation DomainSquid can also limit requests by their source domain. Though it doesn’t always happen in the realworld, network administrators can add reverse DNS entries for each of the hosts on their network.(These records are normally referred to as PTR records.) Squid can make decisions about the validityof incoming requests by checking their reverse DNS entries. In the below example, the acl is true if therequest comes from a host with a reverse entry that is in either the oreilly.com or ora.com domains.

acl myDomain srcdomain oreilly.com ora.comacl allow myDomain

Reverse DNS matches should not be used where security is important. A determined attacker (whocontrolled the reverse DNS entries for the attacking host) would be able to manipulate these entries sothat the request comes from your domain. Squid doesn’t attempt to check that reverse and forwardDNS entries match, so this option is not recommended.

Squid can also be configured to deny requests to specific domains. Many people implement these filterlists for pornographic sites. The legal implications of this filtering are not covered here: there aremany, and the relevant law is in a constant state of flux, so advice here would likely be obsolete in avery short period of time. I suggest that you consult a good lawyer if you want to do something likethis.

- 91 -

38.4.1 Source/Destination IP addressAcl lines

Page 94: Squid Guide

The dst acl type allows one to match accesses by destination domain. This could be used to match urlsfor popular adult sites, and refuse access (perhaps during specific times).

If you want to deny access to a set of sites, you will need to find out these site’s IP addresses, and denyaccess to these IP addresses too. If you just put the IP addresses in, someone determined to access aspecific site could find out the IP address associated with that hostname and access it by entering theIP address in their browser.

The above is best described with an example. Here, I assume that you want to restrict access to the site www.adomain.example. If you use either the host of nslookup commands, you would find that thisserver has the IP address 10.255.1.2. It’s easiest to just have two acls: one for IPs and one for domains.If the lists get to large, you can simply place them in a file. Example 7-8. Filter ing out unwanted destination sites

acl badDomains dstdomain adomain.exampleacl badIPs dst 10.255.1.2http_access deny badlisthttp_access deny badIPshttp_access allow myNethttp_access deny all

38.4.3 Words in the requested URLMost caches can filter out URLs that contain a set of banned words. Regular expressions allow you tosimply check if a word is in a given URL, but they also allow for more powerful searches of the URL.With a simple word check you would find it nearly impossible to create a rule that allows access tosites with the word sex in the URL, but at the same time denies access to all avi files on that site. Withregular expressions this sort of checking becomes easy, once you understand the regex syntax.

38.4.3.1 A Quick intro duc tion to regular expres sions

We haven’t encountered regular expressions in this book yet. A regular expression (regex) is an incredibly useful way of matching strings. As they are incredibly powerful they can get a little compli-cated. Regexes are often used in string-oriented languages like Perl, where they make processing oflarge text files (such as logs) incredibly easy. Squid uses regular expressions for numerous things:refresh patterns and access control among them.

If you have not used regular expressions before, you might want to have a look at the O’Reilly bookon regular expressions or the appropriate section in the O’Reilly perl book. Instead of going into detailhere, I am just going to give some (hopefully ) useful examples. If you have perl installed on yourmachine, you could have a look at the perlre manual page to get an idea as to how the various regex operators (such as .) function.

Regular expressions in Squid are case-sensitive by default. If you want to match both upper orlower-case text, you can prefix the regular expression with a -i. Have a look at the next example,where we use this to match either sex SEX (or even SeX).

38.4.3.2 Using Regular expres sions to match words in the requested URL

Using regular expressions allows you to create more flexible access lists. So far you have only beenable to filter sites by destination domain, where you have to match the entire domain to deny access tothe site. Since regular expressions are used to match text strings, you can use them to match words,

- 92 -

38.4.3 Words in the requested URL

Page 95: Squid Guide

partial words or patterns in URLs or domains.

The most common use of regex filters in ACL lists is for the creation of far-reaching site filters: if theurl or domain contain a set of banned words, access to the site is denied. If you wish to deny access tosites that contain the word sex in the URL, you would add one acl rule, rather than trying to find everysite that has adult material on it.

The big problem with regex filters is that not all sites that contain the word sex in the URL are porno-graphic. By denying these sites you are likely to be infringing people’s rights, and you should refer toa lawyer for advice on the legality of this.

Creating a list of sites that you don’t want accessed can be tedious. There are companies that selladult/unwanted material lists which plug into Squid, but these can be expensive. If you cannot justifythe cost, you can

The url_regex acl type is used to match any word in the URL. Here is an example: Example 7-9. Denying access to sites with the word sex in the URL

acl badURL url_regex -i sexhttp_access deny badUrlhttp_access allow myNethttp_access deny all

In places where bandwidth is very expensive, system administrators may have no problem with people visiting pornograpic sites. They may, however, want to stop people downloading huge avi files fromthese sites. The following example would deny downloads of avi files from sites that contain the word sex in the URL. The regular expression below matches any URL that contains the word sex AND endswith .avi. Example 7-10.

acl badURL url_regex -i sex.*\.avi$http_access deny badUrlhttp_access allow myNethttp_access deny all

The urlpath_regex acl strips off the url-type and hostname, checking instead only the path and file-name.

38.4.3.3 Words in the source or desti nation domain

Regular expressions can also be used for checking the source and destination domains of a request.The dstdom_regex tag is used to check that a request comes from a specific subdomain, while the dstdom_regex checks the domain part of the requested URL. (You could check the requested domainwith a url_regex tag, but you could run into interesting problems with sites that refer to pages withurls like http://www.company.example/www.anothersite.example.)

Here is an example acl set that uses a regular expression (rather than using the srcdomain and dstdo-main tags). This example allows you to deny access to .com or .net sites if the request is from the .zadomain. This could be useful if you are providing a "public peering" infrastructure to other caches inyour geographical region. Note that this example is only a fragment of a complete acl set: you would presumably want your customers to be able to access any site, and there is no final deny acl.

- 93 -

38.4.3 Words in the requested URLAcl lines

Page 96: Squid Guide

acl bad_dst_TLD dstdom_regex \.com$ \.net$acl good_src_TLD srcdom_regex \.za$ # allow requests FROM the za domain UNLESS they want to go to \.com or \.net http_access deny bad_dst_TLD http_access allow good_src_TLD

38.4.4 Current day/timeSquid allows one to allow access to specific sites by time. Often businesses wish to filter out irrelevantsites during work hours. The Squid time acl type allws you to filter by the current day and time. By combining the dstdomain and time acls you can allow access to specific sites (such as your the sites of suppliers or other associates) during work hours, but allow access to other sites after work hours.

The layout is quite compact:

acl name time [day-list] [start_hour:minute-end_hour:minute]

Day list is a list of single characters indicating the days that the acl applies to. Using the first letter ofthe day would be ambiguous (since, for example, both Tuesday and Thursday start with the sameletter). When the first letter is ambiguous, the second letter is used: T stands for Tuesday, H for Thurs-day. Here is a list of the days with their single-letter abreviations:

S - Sunday M - Monday T - Tuesday W - Wednesday H - Thursday F - Friday A - Saturday

Start_hour and end_hour are values in mili tary time (17:00 instead of 5:00). End_hour must always belarger than start_hour; this means (unfortunately) that you cannot do the following:

# since start_time must be smaller than end_time, this won’t work:acl darkness 17:00-6:00

The only alternative to the darkness example above is something like this:

acl night time 17:00-24:00acl early_morning time 00:00-6:00

As you can see from the original definition of the time acl, you can specify the day of the week (withno time), the time (with no day), or both the time and day (?check!?). You can, for example, create arule that specifies weekends without specifying that the day starts at midnight and ends at the follow-ing midnight. The following acl will match on either Saturday or Sunday.

acl weekends time SA

The following example is too basic for real-world use. Unfortunately creating a good example requiressome of the more advanced features of the http_access line; these are covered in the next section ofthis chapter, and examples are included there. Example 7-11. Allowing Web access during the weekend only

acl myNet src 10.0.0.0/16acl workdays time MTWHF# allow web access only on the weekends!http_access deny workdayshttp_access allow myNet

- 94 -

38.4.4 Current day/time

Page 97: Squid Guide

38.4.5 Desti nation PortWeb servers almost always listen for incoming requests on port 80. Some servers (notably site-specificsearch engines and unofficial sites) listen on other ports, such as 8080. Other services (such as IRC)also use high-numbered ports. Because of the way HTTP is designed, people can connect to things likeIRC servers through your cache servers (even though the IRC protocol is very different to the HTTP protocol). The same problems can be used to tunnel telnet connections through your cache server. Themajor part of the HTTP specification that allows for this is the CONNECT method, which is used byclients to connect to web servers using SSL.

Since you generally don’t want to proxy anything other than the standard supported protocols, you canrestrict the ports that your cache is willing to connect to. The default Squid config file limits standardHTTP requests to the port ranges defined in the Safe_ports squid.conf acl. SSL CONNECT requestsare even more limited, allowing connections to only ports 443 and 563.

Port ranges are limited with the port acl type. If you look in the default squid.conf, you will see lineslike the following:

acl Safe_ports port 80 21 443 563 70 210 1025-65535

The format is pretty straight-forward: destination ports 443 OR 563 are matched by the first acl, 80 21443, 563 and so forth by the second line. The most complicated section of the examples above is theend of the line: the text that reads "1024-65535".

The "-" character is used in squid to specify a range. The example thus matches any port from 1025 allthe way up to 65535. These ranges are inclusive, so the second line matches ports 1025 and 65535 too.

The only low-numbered ports which Squid should need to connect to are 80 (the HTTP port), 21 (theFTP port), 70 (the Gopher port), 210 (wais) and the appropriate SSL ports. All other low-numberedports (where common services like telnet run) do not fall into the 1024-65535 range, and are thusdenied.

The following http_access line denies access to URLs that are not in the correct port ranges. You havenot seen the ! http_access operator before: it inverts the decision. The line below would read "denyaccess if the request does not fall in the range specified by acl Safe_ports" if it were written in english.If the port matches one of those specified in the Safe_ports acl line, the next http_access line ischecked. More information on the format of http_access lines is given in the next section Acl-operator lines.

http_access deny !Safe_ports

38.4.6 Proto col (FTP, HTTP, SSL)Some people may wish to restrict their users to specific protocols. The proto acl type allows you torestrict access by the URL prefix: the http:// or ftp:// bit at the front. The following example will denyrequest that uses the FTP protocol. Example 7-12. Denying access to FTP sites

- 95 -

38.4.5 Destination PortAcl lines

Page 98: Squid Guide

acl ftp proto FTPacl myNet src 10.0.0.0/16acl all src 0.0.0.0/0.0.0.0http_access deny ftphttp_access allow mynethttp_access deny all

The default squid.conf file denies access to a special type of URL, urls which use the cache_object protocol. When Squid sees a request for one of these URLs it serves up information about itself: usage statistics, performance information and the like. The world at large has no need for this information,and it could be a security risk.

38.4.7 Method (HTTP GET, POST or CONNECT)HTTP can be used for downloading (GETting data) or uploads (POSTing data to a site). The CONNECT mode is used for SSL data transfers. When a connection is made to the proxy the client specifies what kind of request (called a method) it is sending. A GET request looks like this:

GET http://www.ora.com/ HTTP/1.1blank-line

If you were connecting using SSL, the GET word would be replaced with the word CONNECT.

You can control what methods are allowed through the cache using the post acl type. The mostcommon use is to stop CONNECT type requests to non-SSL ports. The CONNECT method allowsdata transfer in any direction at any time: if you telnet to a badly configured proxy, and enter some-thing like the following, you could end up connected to a machine if you had telnetted there from thecache server. This could get around packet-filters, firewall access lists and passwords, which is gener-ally considered a bad thing!

CONNECT www.domain.example:23 HTTP/1.1blank-line

Since CONNECT requests can be quite easily exploited, the default squid.conf denies access to SSLrequests to non-standard ports, as we spoke about in the previous section (on the port acl-operator.)

Let’s assume that you want to stop your clients from POSTing to any sites (note that doing this is not agood idea, since people using some search engines (for example) would run into problems: at thisstage this is just an example. Example 7-13. Breaking search site access

acl Post_class method POSTacl myNet src 10.0.0.0/16acl all src 0.0.0.0/0.0.0.0# stop requests before they are allowed by being from my address# rangehttp_access deny Post_class# allow my clients access to sites that aren’t data postshttp_access allow myNet# deny everyone elsehttp_access deny all

- 96 -

38.4.7 Method (HTTP GET, POST or CONNECT)

Page 99: Squid Guide

38.4.8 Browser typeCompanies sometimes have policies as to what browsers people can use. The browser acl type allowsyou to specify a regular expression that can be used to allow or deny access. Example 7-14.

38.4.9 User nameLogs generally show the source IP address of a connection. When this address is on a multiusermachine (let’s use a Unix machine at a university as an example) you cannot pin down a request asbeing from a specific user. There could be hundreds of people logged into the Unix machine, and theycould all be using the cache server. Trying to track down a misbehaver is very diffi cult in this case,since you can never be sure which user is actually doing what. To solve this problem, the ident proto-col was created. When the cache server accepts a connection, it can connect back to the origin server(on a low-numbered port, so the reply cannot be faked) and finds out who just connected. This doesn’tmake any sense on windows systems: people can just load their own ident servers (and become daffyduck for a day). If you run multi-user systems then you may want only certain people on thosemachines to be able to use the cache. In this case you can use the ident username to allow or denyaccess. Example 7-15. Using ident usernames to deny cache access

acl ident goodusers oskar tomhttp_access allow goodusers

One of the best things about Unix is the flexibility you get. If you wanted (for example) only studentsin their second year on to have access to the cache servers via your Unix machines, you could create a replacement ident server. This server could find out which user that has connected to the cache, butinstead of returning the username you could return a string like "third_year" or "postgrad". Rather than maintaining a list of which students are in on both the cache server and the central Unix system, youcould simple Squid rules, and the ident server could do all the work where it checks which user iswhich. Example 7-16. Using Ident to classify users, and using Squid to deny classes

acl responsible ident third_year fourth_year postgrad staffhttp_access allow responsible

38.4.10 Autonomous System (AS) NumberSquid is often used by large ISPs. These ISPs want all of their customers to have access to their cacheswithout having incredibly long manually-maintained ACL lists (don’t forget that such long lists of IPs generally increase the CPU usage of Squid too). Large ISP’s all have AS (Autonomous System)numbers which are used by other Internet routers which run the BGP (Border Gateway Protocol)routing protocol.

The whois server whois.ra.net keeps a (supposedly authoritive) list of all the IP ranges that are in eachAS. Squid can query this server and get a list of all IP addresses that the ISP controls, reducing thenumber of rules required. The data returned is also stored in a radix tree, for more cpu-friendlyretrieval.

Sometimes the whois server is updated only sporadically. This could lead to problems with newnetworks being denied access incorrectly. It’s probably best to automate the process of adding new IPranges to the whois server if you are going to use this function.

- 97 -

38.4.8 Browser typeAcl lines

Page 100: Squid Guide

If your region has some sort of local whois server that handles queries in the same way, you can usethe as_whois_server Squid config file option to query a different server.

38.4.11 Username/Password pairIf you want to track Internet usage it’s best to get users to log into the cache server when they want touse the net. You can then use a stats program to generate per-user reports, no matter which machine onyour network a person is using. Universities and colleges often have labs with many machines, whereit is diffi cult to tell which user is sitting in front of a machine at any specific time. By using names and passwords you will solve this problem.

Squid uses modules to do user authentication, rather than including code to do it directly. The defaultSquid source does, however, include two standard modules; The first authenticates users from a file,the other uses SMB (Windows NT) authentication. These modules are in the auth_modules directoryin the source directory. These modules are not compiled when you compile Squid itself, and you willneed to chooes an authentication module and run make in the appropriate directory. If the compilegoes well, a make install will place the program file in the /usr/local/squid/bin/ directory and anyconfig files in the /usr/local/squid/etc/ directory.

NCSA authentication is the easiest to use, since it’s self contained. The SMB authentication programrequires that SAMBA be installed, since it effectively talks to the NT server through SAMBA.

The squid.conf file uses the authenticate_program tag to decide which external program to use to authenticate users. If Squid were to only start one authentication program, a slow username/passwordlookup could slow the whole cache down (while all other connections waited to be authenticated).Squid thus opens more than one authentication program at a time, sending pending requests to thesecond when the first is busy, the third when the second is and so forth. The actual number started is specified by the authenticate_children squid.conf value. The default number started is five, but if youhave a heavily loaded cache then you will need to increase this value.

38.4.12 Using the NCSA authen tication moduleTo use the NCSA authentication module, you will need to add the following line to your squid.conf:

authenticate_program /usr/local/squid/bin/ncsa_auth /usr/local/squid/etc/passwd

You will also need to create the appropriate password file (/usr/local/squid/etc/passwd in the exampleabove). This file consists of a username and password pair, one per line, where the username and pass-word are seperated by a colon (:), just as they are in your /etc/passwd file (assuming you are runningUnix). The password is encrypted with the same function as the passwords in /etc/passwd (or/etc/shadow on newer systems) are. Here is an example password line:

oskar:lKdpxbNzhlo.w

Since the encrypted passwords are the same, you could simply copy the system password file periodi-cally, since the ncsa_auth module understands the /etc/passwd or /etc/shadow file format. If your usersdo not already have passwords in unix crypt format somewhere, you will have to use the htpasswdprogram to generate the appropriate user and password pairs. This program is included in the /usr/local/squid/bin/ directory.

- 98 -

38.4.11 Username/Password pair

Page 101: Squid Guide

38.4.13 Using the SMB authen tication module

38.4.14 SNMP Commu nityIf you have configured Squid to support SNMP, you can also create acls that filter by the requestedSNMP community. By combining source address (with the src acl type) and community filters (usingthe snmp_community acl type) you can restrict sensitive SNMP queries to administrative machineswhile allowing safer queries from the public. SNMP setup is covered in more detail later in thechapter, where we discuss the snmp_access acl-operator.

Prev Home NextAccess Classes and Operators Up Acl-operator lines

- 99 -

38.4.13 Using the SMB authentication moduleAcl lines

Page 102: Squid Guide

Squid: A User’s GuidePrev Chapter 7. Access Control and Access Control Operators Next

39 Acl-oper ator linesAcl-operators are the other half of the acl system. For each connection the appropriate acl-operatorsare checked (in the order that they appear in the file). You have met the http_access and icp_access operators before, but they aren’t the only Squid acl-operators. All acl-operator lines have the sameformat; although the below format mentions http_access specifically, the layout also applies to all theother acl-operators too.

http_access allow|deny [!]aclname [& [!]aclname2 ... ]

Let’s work through the fields from left to right. The first word is http_access, the actual acl-operator.

The allow and deny words come next. If you want to deny access to a specific class of users, you canchange the customary allow to deny in the acl line. We have seen where a deny line is useful before,with the final deny of all IP ranges in previous examples.

Let’s say that you wanted to deny Internet access to a specific list of IP addresses during the day. Sinceacls can only have one type per acl, you could not create an acl line that matches an IP address duringspecific times. By combining more than one acl per acl-operator line, though, you get the same effect.Consider the following acls:

acl dialup src 10.0.0.0/255.255.255.0acl work time 08:00-17:00

If you could create an acl-operator that was matched when both the dialup and work acls were true,clients in the range could only connect during the right times. This is where the aclname2 in the above acl-operator definition comes in. When you specify more than one acl per acl-operator line, both aclshave to be matched for the acl-operator to be true. The acl-operator function AND’s the results fromeach acl check together to see if it is to return true of false.

You could thus deny the dialup range cache access during working hours with the following acl rules: Example 7-17. Using more than one acl operator on an http_access line

acl myNet src 168.209.2.0/255.255.255.0acl dialup src 10.0.0.0/255.255.255.0acl work_hours time 08:00-17:00# If a connection arrives during work hours, dialup is 1, and# work_hours is 1. When ANDed together the http_access line matches# and denies the client access# during work hours:# 1 AND 1 = TRUE, so the http_access line matches them and# they are denied# after work hours:# 1 AND 0 = FALSE, so the line does not match: the next# http_acess line is checked. Note that#http_access deny dialup work_hours# If it’s not during work hours, the above line will fail, and the

- 100 -

39 Acl-operator lines

Page 103: Squid Guide

# next http_access line will be checked. You want to allow dialup# users explicit access here, otherwise they are not caught by the# myNet acl, and are denied by the final deny line.http_access allow dialuphttp_access allow myNethttp_access deny all

You can also invert an acl’s result value by using an exclamation mark (the traditional NOT valuefrom many programming languages) before the appropriate acl. In the following example I havereduced Example 6-4 into one http_access line, taking advantage of the implicit inversion of the lastrule to deny access to all clients. Example 7-18. Specifying more than one acl per http_access line

acl myNet src 10.0.0.0/255.255.0.0acl all src 0.0.0.0/0.0.0.0# A request from an outside network:# 1 AND (NOT 0) = True, so the request is denied# A request from an internal network:# 1 AND (NOT 1) = False. Because the last definition# is inverted (see earlier discussions in this chapter# for more detail), the local network is allowed: the# ’deny’ is inverted.http_access deny all !myNet# There is an invisible "http_access allow all" here because of the# way Squid inverts the last http_access rule.

Since the above example is quite complicated: let’s cover it in more detail:

In the above example an IP from the outside world will match the ’all’ acl, but not the ’myNet’ acl; theIP will thus match the http_access line. Consider the binary logic for a request coming in from theoutside world, where the IP is not defined in the myNet acl.

Deny http access if ((true) & (!false))

If you consider the relevant matching of an IP in the 10.0.0.0 range, the myNet value is true, the binary representation is as follows:

Deny http access if ((true) & (!true))

A 10.0.0.0 range IP will thus not match the only http_access line in the squid config file. Remember-ing that Squid will default to using the inverse of the last match in the file, accesses will be allowedfrom the myNet IP range.

39.1 The other Acl-oper atorsYou have encountered only the http_access and icp_access acl-operators so far. Other acl-operatorsare:

no_cache

ident_lookup_access

miss_access

- 101 -

39.1 The other Acl-operatorsAcl-operator lines

Page 104: Squid Guide

always_direct, never_direct

snmp_access (covered in the next section of this chapter)

delay_classes (covered in the next section of this chapter)

broken_posts

39.1.1 The no_cache acl-oper atorThe no_cache acl-operator is used to ensure freshness of objects in the cache. The default Squidconfig file includes an example no_cache line that ejects the results of cgi programs from the cache. Ifyou want to ensure that cgi pages are not cached, you must un-comment the following lines from squid.conf:

acl QUERY urlpath_regex cgi-bin \\?no_cache deny QUERY

The first line uses a regular expression match to find urls that have cgi-bin or ? in the path (since weare using the urlpath_regex acl type, a site with a name like cgi-bin.oreilly.com will not be matched.)The no_cache acl-operator is then used to eject matching objects from the cache.

39.1.2 The ident_lookup_access acl-oper atorEarlier we discussed using the ident protocol to control cache access. To reduce network overhead,Squid does an ident lookup only when it needs to. If you are using ident to do access control, Squidwill do an ident lookup for every request, and you don’t have to worry about this acl-operator.

Many administrators would like to log the the ident value for connections without actually using it foraccess control. Squid used to have a simple on/off switch for ident lookups, but this incurred extra overhead for the cases where the ident lookup wasn’t useful (where, for example, the connection isfrom a desktop PC).

Let’s consider some examples. Assume that a you have one Unix server (at IP address 10.0.0.3), andall remaining IP’s in the 10.0.0.0/255.255.255.0 range are desktop PC’s. You don’t want to log theident value from PC’s, but you do want to record it when the connection is from the Unix machine.Here is an example acl set that does this: Example 7-19. Logging ident values from specific machines

acl myNet src 10.0.0.0/255.255.255.0acl all src 0.0.0.0/0.0.0.0# not used for access control, just to differentiate ident lookups:acl Unixmachine src 10.0.0.3/255.255.255.255http_access allow myNethttp_access deny all# do an ident lookup when the request is from Unixmachineident_lookup_access allow Unixmachine# but don’t log ident values for anything elseident_lookup_access deny all

If a system cracker is attempting to attack your cache, it can be useful to have their ident value logged.The following example gets Squid not to do ident lookups for machines that are allowed access, but ifa request comes from a disallowed IP range, an ident lookup is done and inserted into the log. Example 7-20. Doing ident lookups for unknown machines

- 102 -

39.1.1 The no_cache acl-operator

Page 105: Squid Guide

acl myNet src 10.0.0.0/255.255.255.0acl all src 0.0.0.0/0.0.0.0http_access allow myNethttp_access deny all# If the request is from a local machine, don’t do an ident queryident_lookup_access deny myNet# If the request is from another network, do an ident queryident_lookup_access allow all

39.1.3 The miss_access acl-oper atorThe ICP protocol is used by many caches to find out if objects are in another cache’s on-disk store. Ifyou are peering with other organisation’s caches, you may wish them to treat you as a sibling, wherethey only get data that you already have stored on disk. If an unscrupulous cache-admin were tochange their cache_peer line to read parent instead of sibling, they could get you to retrieve objectson their behalf.

To stop this from happening, you can create an acl that contains the peering caches, and use the miss_access acl-operator to ensure that only hits are served to these caches. In response to all otherrequests, an access-denied message is sent (so if a sibling complains that they almost always get errormessages, it’s likely that they think that you should be their parent, and you think that they should be treating you as a sibling.)

When looking at the following example it is important to realise that http_access lines are checked before any miss_access lines. If the request is denied by the http_access lines, an error page isreturned and the connection closed, so miss_access lines are never checked. This means that the last miss_access line in the example doesn’t allow random IP ranges to access your cache, it only allowsranges that have passed the http_access test through. This is simpler than having one miss_access linefor each http_access line in the file, and it will reduce CPU usage too, since only two acls are checkedinstead of the six we would have instead. Example 7-21. Allowing a subnet range to only get data we already have (hits)

acl myFirstNet src 10.0.0.0/255.255.255.0acl mySecondNet src 10.1.0.0/255.255.255.0acl myThirdNet src 10.2.0.0/255.255.255.0acl othercompany src 10.11.12.13/255.255.255.255acl all src 0.0.0.0/0.0.0.0http_access allow myNethttp_access allow myFirstNethttp_access allow mySecondNethttp_access allow myThirdNethttp_access allow othercompanyhttp_access deny all#If the request is for a miss, and it’s from othercompany, deny itmiss_access deny othercompanymiss_access allow all

39.1.4 The always_direct and never_direct acl-oper atorsThese operators help you make controlled decisions about which servers to connect to directly, andwhich to connect through a parent cache/proxy. I previously discussed this set of options briefly inChapter Three, during the Basic Installation phase.

- 103 -

39.1.3 The miss_access acl-operatorAcl-operator lines

Page 106: Squid Guide

These tags are covered in detail in the following chapter, in the Peer Selection section.

39.1.5 The broken_posts acl-oper atorSome servers incorrectly handle POST data, requiring an extra Carridge-Return (CR) and Line-Feed(LF) after a POST request. Since obeying the HTTP specification will make Squid incompatable withthese server, there is an option to be non-compliant when talking to a specific set of servers. Thisoption should be very rarely used. The url_regex acl type should be used for specifying the brokenserver. Example 7-22. Using the broken_posts acl-operator

acl broken_server url_regex http://broken-server.domain.example/broken_posts allow broken_server

Prev Home NextAcl lines Up SNMP Configuration

- 104 -

39.1.5 The broken_posts acl-operator

Page 107: Squid Guide

Squid: A User’s GuidePrev Chapter 7. Access Control and Access Control Operators Next

40 SNMP Config urationBefore we continue: if you wish to use Squid’s SNMP functions, you will need to have configuredSquid with the --enable-snmp option, as discussed way back in Chapter 2. The Squid source onlyincludes SNMP code if it is compiled with the correct options.

Normally a Unix SNMP server (also called an agent) collects data from the various services runningon a machine, returning information about the number of users logged in, the number of sendmailprocesses running and so forth. As of this writing, there is no SNMP server which gathers Squid statis-tics and makes them available to SNMP managment stations for interpretation. Code has thus beenadded to Squid to handle SNMP queries directly.

Squid normally listens for incoming SNMP requests on port 3401. The standard SNMP port is 161.

For the moment I am going to assume that your management station can collect SNMP data from aport other than 161. Squid will thus listen on port 3401, where it will not interfere with any otherSNMP agents running on the machine.

No specific SNMP agent or mangement station software is covered by this text. A Squid-specific mib.txt file is included in the /usr/local/squid/etc/ directory. Most management station software shouldbe able to use this file to construct Squid-specific queries.

40.1 Query ing the Squid SNMP server on port 3401All snmp_access acl-operators are checked when Squid is queried by an SNMP management station.The default squid.conf file allows SNMP queries from any machine, which is probably not what youwant. Generally you will want only one machine to be able to do SNMP queries of your cache. SomeSNMP information is confidential, and you don’t want random people to poke around your cachesettings. To restrict access, simply create a src acl for the appropriate IP address, and use snmp_access to deny access for every other IP.

Not all Squid SNMP information is confidential. If you want to allow split up SNMP information into public and private, you can use an SNMP-specific acl type to allow or deny requests based on the community the client has requested. Example 7-23. Using the snmp_community acl type

acl myNet src 10.0.0.0/255.255.255.0acl snmpServer src 10.0.0.3/255.255.255.255acl all src 0.0.0.0/0.0.0.0acl public_snmp snmp_community cacheSysPerf# let the SNMP Server get any SNMP informationsnmp_access allow snmpServer# Stop people outside getting any furthersnmp_access deny !myNet# Let anyone inside my network get useful (but not sensitive) datasnmp_access allow public_snmpsnmp_access deny all

- 105 -

40 SNMP ConfigurationSNMP Configuration

Page 108: Squid Guide

40.2 Running multi ple SNMP servers on a cache machineIf you are running multiple SNMP servers on your cache machine, you probably want to see all theSNMP data returned on one set of graphs or summaries. You don’t want to have to query two SNMPservers on the same machine, since many SNMP analysis tools will not allow you to relate (forexample) load average to number of requests per second when the SNMP data comes from more thanone source.

Let’s work through the steps Squid goes through when it receives an SNMP query: The request isaccepted, and access-control lists are checked. If the request is allowed, Squid checks to see if it’s arequest for Squid information or a request for something it doesn’t understand. Squid handles allSquid-specific queries internally, but all other SNMP requests are simply passed to the other SNMPserver; Squid essentially acts as an SNMP proxy for SNMP queries it doesn’t understand.

This SNMP proxy-mode allows you to run two servers on a machine, but query them both on the sameport. In this mode Squid will normally listen on port 161, and the other SNMP server is configured tolisten on another port (let’s use port 3456 for argument’s sake). This way the client software doesn’thave to be configured to query a different port, which especially helps when the client is not underyour control.

40.2.1 Binding the SNMP server to a non-stan dard portGetting your SNMP server to listen on a different port may be as easy as changing one line in a configfile. In the worst case, though, you may have to trick it to listen somewhere else. This section is a bitof a guide to IP server trickery!

Server software can either listen for connections on a hard-coded port (where the port to listen to iscoded into the source and placed directly into the binary on compilation time), or it can use standardsystem calls to find the port that it should be listening to. Changing programs that use the second set ofoptions to use a different port is easy: you edit the /etc/services file, changing the value for the appro-priate port there. If this doesn’t work, it probably means that your program uses hard-coded values,and your only recourse is to recompile from source (if you have it) or speak to your vendor.

You can check that your server is listening to the new port by checking the output of the netstatcommand. The following command should show you if some process is listening for UDP data on port3456:

cache1:~ $ netstat -na | grep udp | grep 3456udp 0 0 0.0.0.0:3456 0.0.0.0:* cache1:~ $

Changing the services port does have implications: client programs (like any SNMP managementstation software running on the machine) will also use the services file to find out which port theyshould connect when forming outgoing requests. If you are running anything other than a simpleSNMP agent on the cache machine, you must not change the /etc/services file: if you do you willencounter all sorts of strange problems!

Squid doesn’t use the /etc/services file, but the port to listen to is stored in the standard Squid configfile. Once the other server is listening on port 3456, we need to get Squid to listen on the standardSNMP port and proxy requests to port 3456.

- 106 -

40.2 Running multiple SNMP servers on a cache machine

Page 109: Squid Guide

First, change the snmp_port value in squid.conf to 161. Since we are forwarding requests to anotherSNMP server, we also need to set forward_snmpd_port to our other-server port, port 3456.

40.2.2 Access Control with more than one AgentSince Squid is actually creating all the queries that reach the second SNMP server, using an IP-basedaccess control system in the second server’s config is useless: all requests will come from localhost.Since the second server cannot find out where the requests came from originally, Squid will have totake over the access control functions that were handled by the other server.

For the first example, let’s assume that you have a single SNMP management station, and you wantthis machine to have access to all SNMP functions. Here we assume that the management station is atIP 10.0.0.2. Example 7-24. Allowing SNMP access from only one machine

acl myNet 10.0.0.0/255.255.255.0acl all src 0.0.0.0/0.0.0.0acl snmpManager src 10.0.0.2/255.255.255.255http_access allow myNethttp_access deny allsnmp_access allow snmpManagersnmp_access deny all

You may have classes of SNMP stations too: you may wish some machines to be able to inspectpublic data, but others are to be considered completely trusted. The special snmp_community acl typeis used to filter requests by destination community. In the following example all local machines areable to get data in the public SNMP community, but only the snmpManager machine is able to getother information. In this example we are using the ANDing of the publicCommunity and myNet aclsto ensure that only people on the local network can get even public information. Example 7-25. Using the snmp_community acl type

acl myNet 10.0.0.0/255.255.255.0acl all src 0.0.0.0/0.0.0.0acl snmpManager src 10.0.0.2/255.255.255.255acl publicCommunity snmp_community publichttp_access allow myNethttp_access deny allsnmp_access allow snmpManagersnmp_access allow publicCommunity myNet# deny people outside of the local network to ALL data, even publicsnmp_access deny all

Prev Home NextAcl-operator lines Up Delay Classes

- 107 -

40.2.2 Access Control with more than one AgentSNMP Configuration

Page 110: Squid Guide

Squid: A User’s GuidePrev Chapter 7. Access Control and Access Control Operators Next

41 Delay ClassesDelay Classes are generally used in places where bandwidth is expensive. They let you slow downaccess to specific sites (so that other downloads can happen at a reasonable rate), and they allow youto stop a small number of users from using all your bandwidth (at the expense of those just trying touse the Internet for work).

Many non-US Universities have very small pipes to the Internet. Unfortunately these Universitiesoften end up with huge amounts of their bandwidth being used for surfing that is not study-related. Inthe US this is fine, since the cost is negligible, but in other countries the cost of this casual surfing is astronomical.

To ensure that some bandwidth is available for work-related downloads, you can use delay-pools. By classifying downloads into segments, and then allocating these segments a certain amount of band-width (in kilobytes per second), your link can remain uncongested for useful traffic.

To use delay-pools you need to have compiled Squid with the appropriate source code: you will haveto have used the --enable-delay-pools option when running the configure program back in Chapter 2.

41.1 Slowing down access to specific URLsAn acl-operator (delay_access) is used to split requests into pools. Since we are using acls, you cansplit up requests by source address, destination url or more. There is more than one type (or class) ofpool. Each type of pool allows you to limit bandwidth in different ways.

41.2 The Second Pool ClassRather than cover all of the available classes immediately, let’s deal with a basic example first. In thisexample we have only one pool, and the pool catches all URLs containing the word abracadabra. Example 7-26. Limit ing download speed by a word in the URL

acl magic_words url_regex -i abracadabradelay_pool_count 1delay_class 1 1delay_parameters 1 16000/16000delay_access 1 allow magic_words

The first line is a standard ACL: it returns true if the requested URL has the word abracadabra in it.The -i flag is used to make the search case-insensitive.

The delay_pool_count variable tells Squid how many delay pools there will be. Here we have onlyone pool, so this option is set to 1.

The third line creates a delay pool (delay pool number 1, the first option) of class 1 (the second optionto delay_class).

- 108 -

41 Delay Classes

Page 111: Squid Guide

The first delay class is the simplest: the download rate of all connections in the class are addedtogether, and Squid keeps this aggregate value below a given maximum value.

The fourth line is the most complex, as if you can see. The delay_parameters option allows you to setspeed limits on each pool. The first option is the pool to be manipulated: since we have only one poolin this example, this is set to 1. The second option consists of two values: the restore and max values, seperated by a forward-slash (/).

If you download a short file at high speed, you create a so-called burst of traffic. Generally these shortbursts of traffic are not a problem: these are normally html or text files, which are not the real band-width consumers. Since we don’t want to slow everyone’s access down (just the people downloading comparitively large files), Squid allows you to configure a size that the download is to start slowingdown at. If you download a short file, it arrives at full speed, but when you hit a certain threshold thefile arrives more slowly.

The restore value is used to set the download speed, and the max value lets you set the size at whichthe files are to be slowed down from. Restore is in kilobytes per second, max is in kilobytes.

In the above example, downloads proceed at full speed until they have downloaded 16000 bytes. Thislimit ensures that small file arrive reasonably fast. Once this much data has been transferred, however,the transfer rate is slowed to 16000 bytes per second. At 8 bits per byte this means that connections arelimited to 128kilobits per second (16000 * 8).

41.3 The Second Pool ClassAs I discussed in this section’s introduction, delay pools can help you stop one user from floodingyour links with downloads. You could place each user in their own pool, and then set limits on aper-user basis, but administrating these lists would become painful almost immediately. By using a different pool type, you can set rate limits by IP address easily.

Let’s consider another example: you have a 128kbit per second line. Since you want some bandwidth available for things like SMTP, you want to limit web access to 100kbit per second. At the same time,you don’t want a single user to use more than their fair share of sustained bandwidth. Given that youhave 20 staff members, and 100kbit per second remaining bandwidth, each person should not use morethan 5kbit per second of bandwidth. Since it’s unlikely that every user will be surfing at once, we can probably limit people to about four times their limit (that’s 20kbit per second, or 2.5kbytes persecond).

In the following example, we change the delay class for pool 1 to 2. Delay class 2 allows us to specifyboth an aggregate (overall) bandwidth usage and a per-user usage. In the previous example the delay_paramaters tag only took one set of options, the aggregate peak and burst rates. Given that weare now using a class-two pool, we have to supply two sets of options to delay_parameters: theoverall speed and the per-IP speed. The 100kbits per second value is converted to bytes per second by dividing by 8 (giving us the 12500 values), and the per-IP value of 2.5kbits per second we discoveredis converted to bytes per second (giving us the 2500 values.) Example 7-27. Limit ing both overall and per-user bandwidth usage

acl all src 0.0.0.0/0.0.0.0delay_pool_count 1delay_class 1 2delay_parameters 1 12500/12500 2500/2500delay_access 1 allow all

- 109 -

41.3 The Second Pool ClassDelay Classes

Page 112: Squid Guide

41.4 The Third Pool ClassThis class is useful to very organizations like Universities. The second pool class lets you stop individ-ual users from flooding your links. A lab full of students all operating at their maximum download ratecan, however, still flood the link. Since such a lab (or department, if you are not at a University) willall have IP addresses in the same range, it is useful to be able to put a cap on the download rate of anentire network range. The third pool class lets you do this. Currently this option only works on class-Cnetwork ranges, so if you are using variable length subnet masks then this will not help.

In the next example we assume that you have three IP ranges. Each range must not use more than 1/3of your available bandwidth. For this example I am assuming that you have a 512kbit/s line, and youwant 64kbit/s available for SMTP and other protocols. This will leave you with an overall downloadrate cap of 448kbit/s.) Each Class-C IP range will have about 150kbit/s available. With 3 ranges of 256IP addresses each, you should have in the region of 500 pc’s, which (if calculated exactly) gives you.669kbit per second per machine. Since it is unlikely that all machines will be using the net at the sametime, you can probably allocate each machine (say) 4kbit per second (a mere 500 bytes per second). Example 7-28. Using Class 3 Delay Pools

acl all src 0.0.0.0/0.0.0.0delay_pool_count 1delay_class 1 3# 56000*8 sets your overall limit at 448kbit/s# 18750*8 sets your per-network limit at 150kbit/s# 500*8 sets your per-user limit at 4kbit/sdelay_parameters 1 56000/56000 18750/18750 500/500delay_access 1 allow all

In this example, we changed the delay class of the pool to 3. The delay_parameters option now takesfour arguments: the pool number; the overall bandwidth rate; the per-network bandwidth rate and theper-user bandwidth rate.

The 4kbit per second limit for users seems a little low. You can increase the per-user limit, but youmay find that it’s a better idea to change the max value instead, so that the limit sets in after only (say) 16kilobytes or so. This will allow small pages to be downloaded as fast as possible, but large pageswill be brought down without influencing other users.

If you want, you can set the per-user limit to something quite high, or even set them to -1, which effec-tively means that there is no limit. Limits work from right to left, so if I user is sitting alone in a labthey will be limited by their per-user speed. If this value is undefined, they are limited by theirper-network speed, and if that is undefined then they are limited by their overall speed. This meansthat you can set the per-user limit higher than you would expect: if the lab is not busy then they willget good download rates (since they are only limited by the per-network limit).

41.5 Using Delay Pools in Real LifeBy combining multiple ACLs, you can do interesting things with delay pools. Here are some exam-ples:

By using time-based acls, you can limit people’s speed during working hours, but allow themfull-speed access outside hours.

- 110 -

41.4 The Third Pool Class

Page 113: Squid Guide

Again (with time-based acl lists), you can allocate a very small amount of bandwidth to httpduring working hours, discouraging people from browsing the Web during office hours.

By using acls that match specific source IP addresses, you can ensure that sibling caches havefull-speed access to your cache.

You can prioritize access to a limited set of destination sites by using the dst or dstdomain acltypes by inverting the rules we used to slow access to some sites down.

You can combine username/password access-lists and speed-limits. You can, for example. allowusers that have not logged into the cache access to the Internet, but at a much slower speed thanusers who have logged in. Users that are logged in get access to dedicated bandwidth, but arecharged for their downloads.

Prev Home NextSNMP Configuration Up Conclusion

- 111 -

41.5 Using Delay Pools in Real LifeDelay Classes

Page 114: Squid Guide

Squid: A User’s GuidePrev Chapter 7. Access Control and Access Control Operators Next

42 Conclu sionOnce your acl system is correctly set up, your cache should essentially be ready to become a functionalpart of your infrastructure. If you are going to use some of the advanced Squid features (like transpar-ent operation mode, for example),

Prev Home NextDelay Classes Up Cache Hierarchies

- 112 -

42 Conclusion

Page 115: Squid Guide

Squid: A User’s GuidePrev Next

43 Chapter 8. Cache HierarchiesTable of Contents Introduction Why Peer Peer Configuration Peer Selection Multicast Cache Communication Cache Digests Cache Hierarchy Structures The Cache Array Routing Protocol (CARP)

44 Intro duc tionSquid is particularly good at communicating with other caches and proxies. Numerous inter-cache communication protocols are supported, including ICP (Inter-Cache Protocol), Cache-Digests, HTCP(Hyper-Text Cache Protocol) and CARP (Cache Array Routing Protocol). Each of these protocols hasspecific strengths and weaknesses; they are more suited to some circumstances than others.

In this chapter we look at each of the protocols in detail. We also look at the different ways that youcan structure your cache hierarchy, and work through the config options that effect cache hierarchies.

Prev Home NextConclusion Why Peer

- 113 -

43 Chapter 8. Cache HierarchiesCache Hierarchies

Page 116: Squid Guide

Squid: A User’s GuidePrev Chapter 8. Cache Hierarchies Next

45 Why PeerThe primary function of an inter-cache protocol is to stop object duplication, increasing hit rates. Ifyou have a large network with widely separated caches, you may wish to store objects in each cacheeven if one of your other caches has it: by keeping objects close to your users, you reduce theirnetwork latency (even if you end up "wasting" disk space in the process.)

Inter-branch traffic can be reduced by placing a cache at each branch. Since caches can avoid duplicat-ing objects between them, each disk you add to a cache adds space to the overall hierarchy, increasingyour hierarchy hit-rate. This is a lot better than simply having caches at branches which do not communicate with one another, since with that setup you end up end up with multiple copies of eachcache object; one per server. Clients can also be configured to query another branches cache if theirlocal one goes down, adding redundancy.

If overloaded, a central cache machine can become a network bottleneck. Unlike one cache machine,caches in a hierarchy can be close to all parts of the network; they can also handle a much larger load(with a near-linear increase in performance with each added machine). Loaded caches can thus bereplaced with clusters of low-load caches, without wasting disk space.

Integrating your caches into a public cache hierarchy can increase your hit rate (since you increaseyour effective disk space by accessing other machine’s object stores.) By choosing peers carefully , youcan reduce latency, or reduce costs by saving Internet bandwidth (if communicating with your peers ischeaper than going direct to the source.) On the other hand, communicating with peers via loaded (orhigh-latency) line can slow down your cache. It’s best to check your peer response times periodicallyto check if the peering arrangement is beneficial. You can use the client program to check cacheresponse times, and the cache manager (discussed in Chapter 12) to look at Squid’s view on the cache.

Prev Home NextCache Hierarchies Up Peer Configuration

- 114 -

45 Why Peer

Page 117: Squid Guide

Squid: A User’s GuidePrev Chapter 8. Cache Hierarchies Next

46 Peer Config urationFirst, let’s look at the squid.conf options available for hierarchy configuration. We will then workthrough the most common hierarchy structures, so that you can see the way that the options are used.

You use the cache_peer option to configure the peers that Squid will communicate with. Otheroptions are then used to select which peer to pass a request to.

46.1 The cache_peer OptionWhen communicating with a peer, Squid needs some basic information about how to talk to themachine; the hostname, what ports to send queries to, and so forth. The cache_peer config line doesthis. Let’s look at an example line: Example 8-1. The cache_peer tag

cache_peer cache.domain.example parent 3128 3130 default

The cache_peer option is split into five fields. The first field (cache.domain.example) is the host-name or IP of the cache that is to be queried. The second field indicates the type of relationship, andmust be set to either parent or sibling or multi cast. The third field sets the HTTP port of the destina-tion server, while the fourth sets the ICP (UDP) query port. The fifth field can contain more than zeroor more keywords, although we only use one in the example above; the keyword default sets that thecache will be used as the default path to the outside world. If you compiled Squid to support HTCP,your cache will automatically attempt to connect to TCP port 4827 (there is currently no option tochange this port value). Cache digests are transferred via the HTTP port specified on the cache_peerline.

Here is a summary of the available cache_peer options: proxy-only. Data retrieved from this remote cache will not be stored locally, but retrieved again onany subsequent request. By default Squid will store objects it retrieves from other caches: by havingthe object available locally it can return the object fast if it’s ever requested again. While this is goodfor latency, it can be a waste of bandwidth, especially if the other cache is on the same piece of ether-net. In the examples section of this chapter, we use this option when load-balancing between twocache servers.weight. If more than one cache server has an object (based on the result of an ICP query), Squiddecides which cache to get the data from the cache that responded fastest. If you want to prefer onecache over another, you can add a weight value to the preferred cache’s config line. Larger values arepreferred. Squid times how long each ICP request takes (in milliseconds), and divides the time by theweight value, using the cache with the smallest result. Your weight value should thus not be an unrea-sonable value.ttl. This tag is covered in the multicast section, later in this chapter.no-query. Squid will send ICP requests to all configured caches. The response time is measured, andused to decide which parent to send the HTTP request to. There is another function of these requests:if there is no response to a request, the cache is marked down. If you are communicating with a cachethat does not support ICP, you must use the no-query option: if you don’t, Squid will consider that

- 115 -

46 Peer ConfigurationPeer Configuration

Page 118: Squid Guide

cache down, and attempt to go directly to the destination server. (If you want, you can set the ICP porton the config line to point to the echo port, port 7. Squid will then use this port to check if the machineis available. Note that you will have to configure inetd.conf to support the UDP echo port.) This optionis normally used in conjunction with the default option.default. This sets the host to be the proxy of last resort. If no other cache matches a rule (due to acl ordomain filtering), this cache is used. If you have only one way of reaching the outside world, and itdoesn’t support ICP, you can use the default and no-query options to ensure that all queries are passedthrough it. If this cache is then down, the client will see an error message (without these options, Squidwould attempt to route around the problem.)round-robin. This option must be used on more than one cache_peer line to be useful. Connectionsto caches configured with this options are spread evenly (round-robined) among the caches. This canbe used by client caches to communicate with a group of loaded parents, so that load is spread evenly.If you have multiple Internet connections, with a parent cache on each side, you can use this option todo some basic load-balancing of the connections.multi cast-responder. This option is covered in the multicast section later in this chapter.closest-only.no-netdb-exchange. If your cache was configured to keep ICMP (ping) timing information with the --enable-icmp configure option, your cache will attempt to retrieve the remote machine’s ICMPtiming information from any peers. If you don’t want this to happen (or the remote cache doesn’tsupport it), you can use the no-netdb-exchange option to stop Squid from requesting this informationfrom the cache.no-delay. Hits from other caches will normally be included into a client’s delay-pool information. Ifyou have two caches load-balancing, you don’t want the hits from the other cache to be limited. Youmay also want hits from caches in a nearby hierarchy to come down at full speed, not to be limited asif they were misses. Use the no-delay option to ensure that requests come down at their full speed.login. Caches can be configured to use usernames and passwords on accesses. To authenticate with aparent cache, you can enter a username and password using this tag. Note that the HTTP protocolmakes authenticating to multiple cache servers impossible: you cannot chain together a string ofproxies, each one requiring authentication. You should only use this option if this is a personal proxy.

Prev Home NextWhy Peer Up Peer Selection

- 116 -

46.1 The cache_peer Option

Page 119: Squid Guide

Squid: A User’s GuidePrev Chapter 8. Cache Hierarchies Next

47 Peer SelectionLet’s say that you have only one parent cache server: the server at your ISP. In Chapter 3, we config-ured Squid so that the parent cache server would not be queried for internal hosts, so queries to the internal machines went direct, instead of adding needless load to your parent cache (and the linebetween you). Squid can use access-control lists to decide which cache to talk to, rather than just the destination domain. With access lists, you can use different caches depending on the source IP,domain, text in the URL and more. The advantages of this flexibility are not immediately obvious(even to me), but some examples are given in th remainder of this chapter. First, however, let’s cover filtering by destination domain.

47.1 Select ing by Desti nation DomainThis tag is used to communicate with different caches depending on the domain that the request isdestined for. To ensure that you don’t query another cache server for your local domain, you can usethe following config line: Example 8-2. The cache_peer_domain tag

cache_peer_domain peer-cache.otherdomain.example !.mydomain.example

47.2 Select ing with AclsSquid can also make peer selections based on the results of acl rules. The cache_peer_access line isdiscussed in the previous chapter. The following example could be used if you want all requests from aspecific IP address range to go to a specific cache server (for accounting purposes, for example). In the following example, all requests from the 10.0.1.* range are passed to cache.domain.example, but allother requests are handled directly. Example 8-3. Using acls to select peers

acl myNet src 10.0.0.0/255.255.255.0acl custNet src 10.0.1.0/255.255.255.0acl all src 0.0.0.0/0.0.0.0cache_peer cache.domain.example parent 3128 3130cache_peer_access cache.domain.example allow custNetcache_peer_access cache.domain.example deny all

47.2.1 Query ing an Adult-Site Filter ing -cache for Specific URLsLet’s say that you have a separate Adult-Site cache, which filters out urls. The company that maintainsthe filter list charges by number of queries, so it’s in your interest to bypass them for URLs that youknow are fine. Their documentation says that you should set their machine up as your default parent,so you create a list of suspect words, and set the cache up to forward requests for any URL thatcontains one of these words to the filtering cache server. By avoiding the filtering server, you will endup missing a fairly large number of sites. At the same time, however, you don’t end up filtering out

- 117 -

47 Peer SelectionPeer Selection

Page 120: Squid Guide

valid sites that do contain suspect words in the URL. Example 8-4. Passing suspect urls to a filter ing cache

acl suspect_url url_regex "/usr/local/squid/etc/suspect-url-list"acl all src 0.0.0.0/0.0.0.0cache_peer filtercache.domain.example parent 3128 3130cache_peer_access filtercache.domain.example allow suspect_url# all other requests go directcache_peer_access filtercache.domain.example deny all

47.2.2 Filter ing with Cache HierarchiesISPs in the outer regions quite often peer with large hierarchies in the USA, so as to avoid any extralatency in the USA. Since it’s almost certainly faster to get any local data directly from the source,they configure their caches to retrieve data for their local top-level domain directly, rather than via aUSA cache. Example 8-5. Ignor ing Hierarchy Caches for a Local Top-Level Domain

acl local-tld dstdomain -i \.zacache_peer cache1.domain.example.us parent 3128 3130cache_peer_access cache1.domain.example.us deny local-tld

47.2.3 The always_direct and never_direct tagsSquid checks all always_direct tags before it checks any never_direct tags. If a matching always_direct tag is found, Squid will not check the never_direct tags, but decides which cache totalk to immediately. This behavior is demonstrated by the following example; here, Squid will attemptto go the machine intranet, even though the same host is also matched by the all acl. Example 8-6. Bypassing a parent for a local machine

cache_peer cache.otherdomain.example parent 3128 3130acl all src 0.0.0.0/0.0.0acl localmachines dstdomain intranet.mydomain.examplenever_direct allow allalways_direct allow localmachines

Let’s work through the logic that Squid uses in the above example, so that you can work out whichcache Squid is going to talk to when you construct your own rules.

First, let’s consider a request destined for the web server intranet.mydomain.example. Squid firstworks through all the always_direct lines; the request is matched by the first (and only) line. The never_direct and always_direct tags are acl-operators, which means that the first match is consid-ered. In this illustration, the matching line instructs Squid to go directly when the acl matches, so all neighboring peers are ignored for this request. If the line used the deny keyword instead of allow,Squid would have simply skipped on to checking the never_direct lines.

Now, the second case: a request arrives for an external host. Squid works through the always_directlines, and finds that none of them match. The never_direct lines are then checked. The all acl matchesthe connection, so Squid marks the connection as never to be forwarded directly to the origin server.Squid then works through it’s list of peers, trying to find the cache that the request is best forwarded to(servers that have the object are more likely to get the request, as are servers that respond fast). The algorithm that Squid uses to decide which of it’s peers to use is discussed shortly.

- 118 -

47.2.2 Filtering with Cache Hierarchies

Page 121: Squid Guide

47.2.4 prefer_direct

47.2.5 hier archy _stoplistSquid can be configured to avoid cache siblings when the requested URL contains specific word-lists.The hierarchy_stoplist tag normally contains words that occur when the remote page is dynamically generated, such as cgi-bin, asp or more.

47.2.6 neigh bor _type_domainYou can blur the distinction between peers and a siblings with this tag. Let’s say that you work for avery large organization, with many regions, some in different countries.

These organizations generally have their own network infrastructure: you will install a link to a localregional office, and they will run links to a core backbone. Let’s assume that you work for the regionaloffice, and you have an Internet line that your various divisions share. You also have a link to yourhead-office, where they have a large cache, and their own Internet link. You peer with their cache(with them set up as a sibling), and you also peer with your local ISP’s server.

When you request pages from the outside world, you treat your ISP’s cache server as a parent, butwhen you query web servers in your own domain you want the requests to go to your head-office’scache, so that any web sites within your organization are cached. By using the neigh-bor_type_domain option, you can specify that requests for your local domain are to be passed to yourhead-office’s cache, but other requests are to be passed directly. Example 8-7. Changing the Cache Type by Destination Domain

cache_peer core-cache.mydomain.example sibling 3128 3130cache_peer cache.isp.example parent 3128 3130neighbor_type_domain parent mydomain.example

47.3 Other Peering OptionsVarious other options allow you to tune various values that effect your cache’s interaction with hierar-chies. These options all effect all peering caches (rather than individual machines).

47.3.1 miss_accessThe miss_access tag is an acl-operator. This tag has already been covered in the acls chapter (Chapter6), but is covered here again for completeness. The miss_access tag allows you to create a list ofcaches which are only allowed to retrieve hits from your cache. If they request an object that is missed,Squid will return an error page denying them access. If the example below is not immediately clear,please refer to Chapter 6 for more information Example 8-8.

acl all src 0.0.0.0/0.0.0.0acl friendly_company src 10.2.0.3/255.255.255.0http_access allow friendly_companyicp_access allow friendly_company# This line stops the machine 10.2.0.3 from getting hits from our# cachemiss_access deny friendly_companymiss_access allow all

- 119 -

47.3 Other Peering OptionsPeer Selection

Page 122: Squid Guide

47.3.2 dead_peer_timeoutIf a peer cache has not responded to an ICP request for dead_peer_timeout seconds, the cache will bemarked as down, and the object will be retrieved from somewhere else (probably directly from the source.)

47.3.3 icp_hit_staleTurning this option on can cause problems if you peer with anyone.

Prev Home NextPeer Configuration Up Multicast Cache Communication

- 120 -

47.3.2 dead_peer_timeout

Page 123: Squid Guide

Squid: A User’s GuidePrev Chapter 8. Cache Hierarchies Next

48 Multi cast Cache Commu nicationCache digests are in some ways a replacement for multicast cache peering. There are some advantagesto cache-digests: they are handled at the Squid level (so you don’t have to fiddle with kernel multicastsettings and so forth), and they add significantly less latency (finding out if a cache has an objectsimply involves checking an in-memory bit-array, which is significantly faster than checking acrossthe network).

First, though, let’s cover some terminology. Most people are familiar with the term broadcast, wheredata is sent from one host to all hosts on the local network. Broadcasts are normally used to discoverthings, not for general inter-machine transfer: a machine will send out a broadcast ARP request to tryand find the hardware address that a specific IP address belongs to. You can also send ping packets tothe broadcast address, and find machines on the local network when they respond. Broadcasts onlywork across physical segments (or bridged/switched networks), so an ARP request doesn’t go to everymachine on the Internet.

A unicast packet is the complete opposite: one machine is talking to only one other machine. All TCP connections are unicast, since they can only have one destination host for each source host. UDPpackets are almost always unicast too, though they can be sent to the broadcast address so that theyreach every single machine in some cases.

A multicast packet is from one machine to one or more. The difference between a multicast packet anda broadcast packet is that hosts receiving multicast packets can be on different lans, and that each multicast data-stream is only transmitted between networks once, not once per machine on the remotenetwork. Rather than each machine connecting to a video server, the multicast data is streamedper-network, and multiple machines just listen-in on the multicast data once it’s on the network.

This efficient use of bandwidth is perfect for large groups of caches. If you have more than one server(for load-distribution, say), and someone wants to peer with you, they will have to configure theirserver to send one ICP packet to each of your caches. If Squid gets an ICP request from somewhere, itdoesn’t check with all of it’s peers to see if they have the object. This "check with my peers" behavioronly happens when an HTTP request arrives. If you have 5 caches, anyone who wants to find out ifyour hierarchy has an object will have to send 5 ICP requests (or treat you as a parent, so that yourcaches check with one another). This is a real waste of bandwidth. With a multicast network, though,the remote cache would only send one ICP request, destined for your multicast address. Routersbetween you would only transfer one packet (instead of 5), saving the duplication of requests. Once onyour network, each machine would pick up one packet, and reply with their answer.

Multicast packets are also useful on local networks, if you have the right network cards. If you have alarge group of caches on the same network, you can end up with a lot of local traffic. Each request thata cache receives prompts one ICP request to all the other local caches, swamping the local networkwith small packets (and their replies). A multicast packet, on the other hand, is a kind of broadcast tothe machines on the local network. They will each receive a copy of the packet, although only onewent out onto the wire. If you have a good ethernet card, the card will handle a fair amount of the filtering (some cards may have to be put into promiscuous mode to pick up all the packets, which can

- 121 -

48 Multicast Cache CommunicationMulticast Cache Communication

Page 124: Squid Guide

cause load on the machine: make sure that the card you buy supports hardware multicast filters). This solution is still not linearly scalable, however, since the reply packets can easily become the bottleneckby themselves.

48.1 Getting your machine ready for Multi castThe kernel’s IP stack (the piece of kernel code that handles IP networking) needs to look out for multi-cast packets, otherwise they will be discarded (either by the network card or the lower levels of the IPstack.) Your kernel may already have multicast support, or you will have to turn it on. Doing this is, unfortunately, beyond the scope of this book, and you may have to root around for a howto guide somewhere.

Once your machine is setup to receive multicast packets, you need your machines to talk to oneanother. You can either join the mbone (a virtual multicast backbone), or set up an internal multicastnetwork. Joining the mbone could be a good thing anyway, since you get access to other services. Youmust be sure not to use a random set of multicast IP addresses, since they may belong to someone else.You can get your own IP range from the people at the mbone.

An outgoing multicast packet has a ttl (Time To Live) value, which is used to ensure that loops are notcreated. Each time a packet passes through a router, the router decrements this ttl value, and the valueis then checked. Once the value reaches zero, the packet is dropped. If you want multicast packets tostay on your local network, you would set the ttl value to 1. The first router to see the packet would decrement the packet, discover the ttl was zero and discard it. This value gives you a level of controlon how many multicast routers will see the packet. You should set this value carefully , so that youlimit packets to your local network or immediate multicast peers (larger multicast groups are seldomof any use: they generate too many responses, and when geographically dispersed, may simply addlatency. You also don’t want crackers picking up all your ICP requests by joining the appropriate multicast group.)

Various multicast debugging tools are available. One of the most useful is mtrace, which is effectivelya traceroute program for multicast connections. This program should help you choose the right ttl value.

48.2 Query ing a Multi cast CacheThe cache_peer option traditionally can have two types of cache: a parent and a sibling. If you are querying a set or multicast caches, you need to use a different tag, the multicast cache type. When yousend a multicast request to a cache, each of the servers in the group will send you a response packet(from their real IP address.) Squid discards unexpected ICP responses by default, and since it can’t determine which ICP replies are valid automatically, you will have to add lines to the Squid config filethat stop it rejecting packets from hosts in the multicast group.

In the following example, the multicast group 224.0.1.20 consists of three hosts, at IP addresses 10.11.12.1, 10.11.13.1 and 10.11.14.1. These hosts are quite close to your cache, so the ttl value is setto 5. Example 8-9. Sending Queries to a Multi cast Server

cache_peer 224.0.1.20 multicast 3128 3130 ttl=5# these servers belong to the 224.0.1.20 multicast groupcache_peer 10.11.12.1 sibling 3128 3130 multicast-respondercache_peer 10.11.13.1 sibling 3128 3130 multicast-respondercache_peer 10.11.14.1 sibling 3128 3130 multicast-responder

- 122 -

48.1 Getting your machine ready for Multicast

Page 125: Squid Guide

48.3 Accept ing Multi cast Queries: The mcast_groups optionAs a multicast server, Squid needs to listen out for the right packets. Since you can have more than one multicast group on a network, you need to configure Squid to listen to the right multicast-group (the IPthat you have allocated to Squid.) The following (very simple) example is from the config of the servermachine 10.11.12.1 in the example above. Example 8-10. Listening for Multi cast Queries

multicast_groups 224.0.1.20

48.4 Other Multi cast Cache Options

48.4.1 The mcast_icp_query_timeout OptionAs you may recall, Squid will wait for up to dead_peer_timeout seconds after sending out an ICPrequest before deciding to ignore a peer. With a multicast group, peers can leave and join at will, and itshould make no difference to a client. This presents a problem for Squid: it can’t wait for a number ofseconds each time (what if the caches are on the same network, and responses come back in millisec-onds: the waiting just adds latency.) Squid gets around this problem by sending ICP probes to the multicast address occasionally. Each host in the group responds to the probe, and Squid will knowhow many machines are currently in the group. When sending a real request, Squid will wait until itgets at least as many responses as were returned in the last probe: if more arrive, great. If less arrive,though, Squid will wait until the dead_peer_timeout value is reached. If there is still no reply, Squidmarks that peer as down, so that all connections are not held up by one peer.

When Squid sends out a multicast query, it will wait at most mcast_icp_query_timeout seconds (it’sperfectly possible that one day a peer will be on the moon: and it would probably be a bad idea to peerwith that cache seriously, unless it was a parent for the Mars top-level domain.) It’s unlikely that youwill want to increase this value, but you may wish to drop it, so that only reasonably speedy replies are considered.

Prev Home NextPeer Selection Up Cache Digests

- 123 -

48.3 Accepting Multicast Queries: The mcast_groups optionMulticast Cache Communication

Page 126: Squid Guide

Squid: A User’s GuidePrev Chapter 8. Cache Hierarchies Next

49 Cache DigestsCache digests are one of the latest peering developments. Currently they are only supported by Squid,and they have to be turned on at compile time.

Squid keeps it’s "list" of objects in an in-memory hash. The hash table (which is based on MD5) helpsSquid find out if an object is in the cache without using huge amounts of memory or reading files ondisk. Periodically Squid takes this table of objects and summarizes it into a small bitmap (suitable for transfer across a modem). If a bit in the map is on, it means that the object is in the store, if it’s off, theobject is not. This bitmap/summary is available to other caches, which connect on the HTTP port andrequest a special URL. If the client cache (the one that just collected the bitmap) wants to know if theserver has an object, it simply performs the same mathematical function that generated the values inthe bitmap. If the server has the object, the appropriate bit in the bitmap will be defined.

There are various advantages to this idea: if you have a set of loaded caches, you will find thatinter-cache communication can use significant amounts of bandwidth. Each request to one cachesparks off a series of requests to all neighboring caches. Each of these queries also causes some serverload: the networking stack has to deal with these extra packets, for one thing. With cache-digests,however, load is reduced. The cache digest is generated only once every 10 minutes (the exact value istunable). The transfer of the digest thus happens fairly seldom, even if the bitmap is rather large (a few100kbytes is common.) If you were to run 10 caches on the same physical network, however, witheach ICP request being a few hundred bytes, the numbers add up. This network load reduction cangive your cache time to breathe too, since the kernel will not have to deal with as many small packets.

ICP packets are incredibly simple: they essentially contain only the requested URL. Today, however, alot of data is transferred in the headers of a request. The contents of a static URL may differ dependingon the browser that a user uses, cookie values and more. Since the ICP packet only contains the URL,Squid can only check the URL to see if it has the object, not both the headers and the URL. This can(very occasionally) cause strange problems, with the wrong pages being served. With cache digests,however, the bitmap value depends on both the headers AND the url, which stops these strange hits ofobjects that are actually generated on-the-fly (normally these pages contain cgi-bin in their path, butsome don’t, and cause problems.)

Cache digests can generate a small percentage of false hits: since the list of objects is updated onlyevery 10 minutes, your cache could expire an object a second after you download the summarizedindex. For the next ten minutes, the client cache would believe your server has data that it doesn’t.Some five percent of hits may be false, but they are simply retrieved directly from the origin server ifthis turns out to be the case.

Prev Home NextMulticast Cache Communication Up Cache Hierarchy Structures

- 124 -

49 Cache Digests

Page 127: Squid Guide

Squid: A User’s GuidePrev Chapter 8. Cache Hierarchies Next

50 Cache Hierarchy Struc turesDeciding what hierarchy structure to use is diffi cult. Not only that, but it’s quite often very diffi cult tochange, since you can have thousands of clients accessing you directly, and even more through yourpeers.

Here, I cover the most common cache-peer architectures. We start off with the most simple setupthat’s considered peering: two caches talking to one another as siblings. I am going to try and cover allthe issues with this simple setup, and then move to larger cache meshes.

50.1 Two Peering CachesWe have two caches, your cache and their cache. You have a nice fast link between them, withlow-latency, both caches have quite a lot of disk space, and they aren’t going to be running into prob-lems with load anytime soon. Let’s look at the peering options you have: ICP peering. ICP is the obvious choice here; the low latency line means that checking with the othercache doesn’t cause significant added latency, and the server on the other side isn’t going to become a limit ing factor soon.Multi cast ICP peering. Multicast is not really useful here. Multicast is useful only when your cacheis talking to lots of other caches. With one cache on the other side, the setup time for multicast doesn’tseem worth it. If you got some other benefit from the multicast configuration (video conferencing, forexample), then things would become more worthwhile. In the meantime, however, there is no signifi-cant bandwidth or load saving advantage to using multicast.Cache Digests. Cache digests are normally quite large. By the sounds if this, the caches are not verybusy. Transferring digest summaries between the caches may use more bandwidth than you think. ICPqueries are only sent when an object is requested, but cache-digests are retrieved automatically throughout the day and night. If the cache is transferring less than the digest size in a ten minuteperiod, you should probably use ICP. If the line was high-latency, you should consider digests more carefully : high-latency lines are normally better at bulk-data transfer than at sending lots of smallpackets. With a high-latency link, Squid can spend so much time waiting for returning ICP packetsthat browsing feels slow. With cache-digests, the cache would know if the remote side has the objector not, at the cost of more bandwidth.

50.1.1 Things to Watch Out ForThe most common problem with this kind of setup is the "continuous object exchange". Let’s say thatyour cache has an object. A user of their cache wants the object, so they retrieve it from you. A fewhours later you expire the object. The next day, one of your users requests the object. Your cachechecks with the other cache and finds that it does, indeed, have the object (it doesn’t realize that it wasthe one that retrieved the object in the first place). It retrieves the object. Later on the whole processrepeats itself, with their cache getting the object from you again. To stop this from happening, youmay have to use the proxy-only option to the cache_peer line on both caches. This way, the cachessimply retrieve their data from the fast sibling cache each time: if that cache expires the object, theobject cannot return from the other cache, since it was never saved there.

- 125 -

50 Cache Hierarchy StructuresCache Hierarchy Structures

Page 128: Squid Guide

With ICP, there is a chance that an object that is hit is dynamically generated (even if the path does notsay so). Cache digests fix this problem, which may make their extra bandwidth usage worthwhile.

50.2 TreesThe traditional cache hierarchy structure involves lots of small servers (with their own disk space, eachholding the most common objects) which query another set of large parent servers (there can even beonly one large server.) These large servers then query the outside on the client cache’s behalf. Thelarge servers keep a copy of the object so that other internal caches requesting the page get it fromthem. Generally, the little servers have a small amount of disk space, and are connected to the largeservers by quite small lines.

This structure generally works well, as long as you can stop the top-level servers from becoming over-loaded. If these machines have problems, all performance will suffer.

Client caches generally do not talk to one another at all. The parent cache server should have anyobject that the lower-down cache may have (since it fetched the object on behalf of the lower-downcache). It’s invariably faster to communicate with the head-office (where the core servers would be situated) than another region (where another sibling cache is kept).

In this case, the smaller servers may as well treat the core servers as default parents, even using the no-query option, to reduce cache latency. If the head-office is unreachable it’s quite likely that thingsmay be unusable altogether (if, on the other hand, your regional offices have their own Internet lines,you can configure the cache as a normal parent: this way Squid will detect that the core servers aredown, and try to go direct. If you each have your own Internet link, though, there may not be a reasonto use a tree structure. You might want to look at the mesh section instead, which follows shortly.)

To avoid overloading one server, you can use the round-robin option on the cache_peer lines for eachcore server. This way, the load on each machine should be spread evenly.

50.3 MeshesLarge hierarchies generally use either a tree structure, or they are true meshes. A true mesh considersall machines equal: there is no set of large root machines, mainly since they are almost all largemachines. Multicast ICP and Cache digests allow large meshes to scale well, but some meshes havebeen around for a long time, and are only using vanilla ICP.

Cache digests seem to be the best for large mesh setups these days: they involve bulk data transfer, butas the average mesh size increases machines will have to be more and more powerful to deal with thenumber of queries coming in. Instead of trying to deal with so many small packets, it is almostcertainly better to do a larger transfer every 10 minutes. This way, machines only have to check theirlocal ram to see which machines have the object.

Pure multicast cache meshes are another alternative: unfortunately there are still many reply packets generated, but it still effectively halves the number of packets flung around the network.

50.4 Load Balanc ing ServersSometimes, a single server cannot handle the load required. DNS or CARP load balancing will allowyou to split the load between two (or more) machines.

- 126 -

50.2 Trees

Page 129: Squid Guide

DNS load balancing is the simplest option: In your DNS file, you simply add two A records for thecache’s hostname (you did use a hostname for the cache when you configured all those thousands ofbrowsers like I told you, right?) The order that the DNS server returns the names in is continuously,randomly switched, and the client requesting the lookup will connect to a random server. These servermachines can be setup to communicate with one-another as peers. By using the proxy-only option, youreduce duplication of objects between the machines, saving disk space (and, hopefully , increasing yourhit rate.)

There are other load-balancing options. If you have client caches accessing the overloaded server(rather than client pcs), you can configure Squid on these machines with the round-robin option on the cache_peer lines. You could also use the CARP (Cache Array Routing Protocol) to split the loadunevenly (if you have one very powerful machine and two less powerful machines, you can use CARPto load the fast cache twice as much as the other machines).

Prev Home NextCache Digests Up The Cache Array Routing Proto-

col (CARP)

- 127 -

50.4 Load Balancing ServersCache Hierarchy Structures

Page 130: Squid Guide

Squid: A User’s GuidePrev Chapter 8. Cache Hierarchies Next

51 The Cache Array Routing Proto col (CARP)The CARP protocol uses a hash function to decide which cache a request is to be forwarded to. Whena request is to be sent out, the code takes the URL requested and feeds it through a formula that essen-tially generates a large number from the text of the URL. A different URL (even if it differs by onlyone character) is likely to end up as a very different number (it won’t, for example, differ by one). Ifyou take 50 URLs and put them through the function, the numbers generated are going to be spread farapart from one another, and would be spread evenly across a graph. The numbers generated, however,all fit in a certain range. Because the number are spread across the range evenly, we can split the rangeinto two, and the same number of URLs will have ended up in the first half as the second.

Let’s say that we create a rule that effectively says: "I have two caches. Whenever I receive a request, Iwant to pass it to one of these caches. I know that any number generated by the hash function will beless than X, and that numbers are as likely to fall above one-half X as below. By sending all requeststhat hash to less than one-half X to cache one, and the remaining requests to cache two, the loadshould be even."

To terminology: the total range of numbers is split into equally large number ranges (called buckets).

Let’s say that we have two caches, again. This time, though, cache one is able to handle twice the loadof cache two. If we split the hash space into three ranges, and allocate buckets one and three to cacheone (and bucket two to cache two), a URL will have twice the chance of going to cache one as it doesto cache two.

Squid caches can talk to parent caches using CARP balancing if CARP was enabled when the sourcewas configured (using the ./configure --enable-carp command.)

The load-factor values on all cache_peer lines must add up to 1.0. The below example splits 70% ofthe load onto the machine bigcache.mydomain.example, leaving the other 30% up to the other cache. Example 8-11. Using CARP Load Factor variables

cache_peer smallcache.mydomain.example parent 3128 3130 carp-load-factor=0.3cache_peer bigcache.mydomain.example parent 3128 3130 carp-load-factor=0.70

Now that your cache is integrated into a hierarchy (or is a hierarchy!), we can move to the nextsection. Accelerator mode allows your cache to function as a front-end for a real web server, speedingup web page access on those old servers.

Transparent caches effectively accelerate web servers from a distance (the code, at least, to performboth functions is effectively the same.) If you are going to do transparent proxying, I suggest that youread the next two Chapters. If you aren’t interested in either of these Squid features, your Squid instal-lation should be up and running. The remainder of the book (Section III) covers cache maintenanceand debugging.

- 128 -

51 The Cache Array Routing Protocol (CARP)

Page 131: Squid Guide

Prev Home NextCache Hierarchy Structures Up Accelerator Mode

- 129 -

51 The Cache Array Routing Protocol (CARP)The Cache Array Routing Protocol (CARP)

Page 132: Squid Guide

Squid: A User’s GuidePrev Next

52 Chapter 9. Accel erator ModeTable of Contents When to use Accelerator Mode Accelerator Configuration Options Related Configuration Options Example Configurations

Some cache servers can act as web servers (or vis versa). These servers accept requests in both the standard web-request format (where only the path and filename are given), and in the proxy-specificformat (where the entire URL is given).

The Squid designers have decided not to let Squid be configured in this way. This avoids various complicated issues, and reduces code complexity, making Squid more reliable. All in all, Squid is aweb cache, not a web server.

By adding a translation layer into Squid, we can accept (and understand) web requests, since theformat is essentially the same. The additional layer can re-write incoming web requests, changing the destination server and port. This re-written request is then treated as a normal request: the remoteserver is contacted, the data requested and the results cached. This lets you get Squid to pretend to be aweb server, re-writing requests so that they are passed on to some other web server.

53 When to use Accel erator ModeAccelerator mode should not be enabled unless you need it. There are a limited set of circumstances inwhich it is needed, so if one of the following setups applies to you, you should have a look at the remainder of this chapter.

53.1 Accel eration of a slow serverSquid can sit in front of a slow server, caching the server’s results and passing the data on to clients.This is very useful when the origin server (the server that is actually serving the original data) is veryslow, or is across a slow line). If the origin server is across a slow line, you could just move the originserver closer to the clients, but this may not be possible for administrative reasons.

53.2 Replac ing a combi nation cache/web server with SquidIf you are in the process of replacing a combination cache/web server, your client machines may be configured to talk to the cache on port 80. Rather than reconfiguring all of you clients, you can getSquid to listen for incoming connections on port 80 (moving the real server to another port or server.)When Squid finds that it’s received a web request, it will forward the request to the origin server, sothat the machine continues to function as both a web and cache server.

- 130 -

52 Chapter 9. Accelerator Mode

Page 133: Squid Guide

53.3 Trans parent CachingSquid can be configured to magically intercept outgoing web requests and cache them. Since the outgoing requests are in web-server format, it needs to translate them to cache-format requests. Trans-parent caching is covered in detail in the following section.

53.4 SecuritySquid can be placed in front of an insecure web server to protect it from the outside world: not merelyto stop unwanted clients from accessing the machine, but also to stop people from exploiting bugs inthe server code.

Prev Home NextThe Cache Array Routing Proto-col (CARP)

Accelerator Configuration Options

- 131 -

53.3 Transparent CachingAccelerator Mode

Page 134: Squid Guide

Squid: A User’s GuidePrev Chapter 9. Accelerator Mode Next

54 Accel erator Config uration OptionsThe list of accelerator options is short, and setup is fairly simple. Once we have a working acceleratorcache, you will have to create the appropriate access-list rules. (Since you probably want peopleoutside your local network to be able to access your server, you cannot simple use source-IP address rulesets anymore.)

54.1 The httpd_accel_host optionYou will need to set the hostname of the accelerated server here. It’s only possible to have one destina-tion server, so you can only have one occurence of this line. If you are going accelerate more than oneserver, or transparently cache traffic (as described in the next chapter), you will have to use the word virtual instead of a hostname here.

54.2 The httpd_accel_port optionAccelerated requests can only be forwarded to one port: there is no table that associates acceleratedhosts and a destination port. Squid will connect to the port that you set the httpd_accel_port value to.When acting as a front-end for a web server on the local machine, you will set up the web server tolisten for connections on a different port (8000, for example), and set this squid.conf option to matchthe same value. If, on the other hand, you are forwarding requests to a set of slow backend servers,they will almost certainly be listening to port 80 (the default web-server port), and this option willneed to be set to 80.

54.3 The httpd_accel_with_proxy optionIf you use the httpd_accel_host option, Squid will stop recognizing cache requests. So that yourcache can function both as an accelerator and as a web cache, you will need to set the httpd_accel_with_proxy option to on.

54.4 The httpd_accel_uses_host_header optionA normal HTTP request consists of three values: the type of transfer (normally a GET, which is usedfor downloads); the path and filename to be retrieved (or executed, in the case of a cgi program); andthe HTTP version.

This layout is fine if you only have one web site on a machine. On systems where you have more thanone site, though, it makes life diffi cult: the request does not contain enough information, since itdoesn’t include information about the destination domain. Most operating systems allow you to have IP aliases, where you have more than one IP address per network card. By allocating one IP per hostedsite, you could run one web server per IP address. Once the programs were made more efficient, onerunning program could act as a server for many sites: the only requirement was that you had one IPaddress per domain. Server programs would find out which of the IP addresses clients were connectedto, and would serve data from different directories for each IP.

- 132 -

54 Accelerator Configuration Options

Page 135: Squid Guide

There are a limited number of IP addresses, and they are fast running out. Some systems also have alimited number of IP aliases, which means that you cannot host more than a (fairly arbitrary) numberof web sites on machine. If the client were to pass the destination host name along with the path and filename, the web server could listen to only one IP address, and would find the right destination direc-tores by looking in a simple hostname table.

From version 1.1 on, the HTTP standard supports a special Host header, which is passed along withevery outgoing request. This header also makes transparent caching and acceleration easier: by pullingthe host value out of the headers, Squid can translate a standard HTTP request to a cache-specificHTTP request, which can then be handled by the standard Squid code. Turning on the httpd_accel_uses_host_header option enables this translation. You will need to use this option whendoing transparent caching.

It’s important to note that acls are checked before this translation. You must combine this option withstrict source-address checks, so you cannot use this option to accelerate multiple backend servers (thisis certain to change in a later version of Squid).

Prev Home NextAccelerator Mode Up Related Configuration Options

- 133 -

54.4 The httpd_accel_uses_host_header optionAccelerator Configuration Options

Page 136: Squid Guide

Squid: A User’s GuidePrev Chapter 9. Accelerator Mode Next

55 Related Config uration OptionsSo far, we have only covered the Config options that directly relate to accelerator mode.

55.1 The redi rect _rewrites_host_header option

55.2 Refresh patternsAccelerating a slow web server is only useful if the cache can keep copies of returned pages (so that itcan avoid contacting the back-end server.) Since you know about the backend server, you can specifyrefresh patterns that suit the machine exactly. Refresh patterns aren’t covered here (they are coveredin-depth in Chapter 11), but it’s worth looking at how your site changes, and tuning your refreshpatterns to match.

If, on the other hand, you are using simply using accelerator mode to replace a combination cache (orto act as a secure front-end for another server), you can disable caching of that site altogether: other-wise you simply end up duplicating data (once on the origin site, once for the cached copy) with no benefit.

55.3 Access ControlPresumably you will want people from outside your network to be able to access the web server thatSquid is accelerating. If you have based your access lists on the examples in this book, you will findthat machines on the outside cannot access the site being accelerated. The accelerated request is treatedexactly like a normal http request, so people accesing the site from the outside world will be rejectedsince your acl rules deny access from IPs that are not on your network. By using the dst acl type, youcan add specific exclusions to your access lists to allow requests to the accelerated host. Example 9-1. Before Accelerator Configuration

acl all src 0.0.0.0/0.0.0.0acl myNet src 10.0.0.0/255.255.255.0http_access allow myNethttp_access deny all

In the following example, we have changed the config so that the first rule matches (and allows) anyrequest to the machine at IP 10.0.0.5, the accelerated machine. If we did not have the port acl in thebelow rules, someone could request a URL with a different port number with a request that explicitly specifies a non-standard port. If we were to leave out this rule, it could let a system cracker pokearound the system with requests for things like http://server.mydomain.example:25. Example 9-2. After Accelerator Configuration

# the remote server is at 10.0.0.5, port 80httpd_accel_host 10.0.0.5httpd_accel_port 80acl all src 0.0.0.0/0.0.0.0acl myNet src 10.0.0.0/255.255.255.0

- 134 -

55 Related Configuration Options

Page 137: Squid Guide

acl acceleratedHost dst 10.0.0.5acl acceleratedPort port 80# requests must be the the right host AND the right port to be allowed:http_access allow acceleratedHost acceleratedPort# if they aren’t accelerated requests, are they at least from my# network?http_access allow myNethttp_access deny all

Prev Home NextAccelerator Configuration Options

Up Example Configurations

- 135 -

55.3 Access ControlRelated Configuration Options

Page 138: Squid Guide

Squid: A User’s GuidePrev Chapter 9. Accelerator Mode Next

56 Example Config urationsLet’s cover two example setups: one, where you are simply using Squid’s accelerator function so thatthe machine has both a web server and a cache server on port 80; two, where you are using Squid as an accelerator to speed up a slow machine.

56.1 Replac ing a Combi nation Web/Cache serverFirst, let’s cover the most common use of accelerator mode: replacing a combination web/cache serverwith Squid. When Squid is acting as an accelerator (speeding up a slow web server), Squid will acceptrequests on port 80 (on any IP address) and pass them to a cache server on a different machine (also onport 80). Since it’s unlikely that you want to use two machines where you can use one (unless you are changing to Squid due to server overload), we will need to configure Squid to pass requests to thelocal machine.

Squid will need to accept incoming requests on port 80 (using the http_port option), and pass therequests on to the web server on another port (since only one process can listen for requests on port 80at a time). I normally get web servers to listen for requests on port 8000.

Since you want Squid to function both as an accelerator and as a cache server, you will need to use the httpd_accel_with_proxy option.

The cache in this example is the local machine: there is almost certainly no reason to cache resultsfrom this server. I could have used an extremely conservative refresh_pattern in the below example,but instead I decide to use the no_cache tag: this way I can make use of my predefined acl. The always_direct tag in the below example will be very useful if you have a peer cache: you don’t wantthe request passed on to a peer machine. Example 9-3. Forward ing Web Requests to a Server on the Same Machine

http_port 80# forward incoming requests to localhost, port 8000httpd_accel_host 127.0.0.1acl acceleratedHost dst 127.0.0.1/255.255.255.255httpd_accel_port 8000acl acceleratedPort port 8000httpd_accel_with_proxy onacl all src 0.0.0.0/0.0.0.0acl myNet src 10.0.0.0/255.255.255.0# we don’t want to cache localhost: it’s a waste of disk spaceno_cache deny acceleratedHost# we also don’t want requests for localhost passed on to a peeralways_direct allow acceleratedHost# Allow requests when they are to the accelerated machine AND to the# right porthttp_access allow acceleratedHost acceleratedPorthttp_access allow myNethttp_access deny all

- 136 -

56 Example Configurations

Page 139: Squid Guide

56.2 Accel erating Requests to a Slow ServerWhen accelerating a slow server, you may find that communicating with peer caches is faster than communicating with the accelerated host. In the following example, we remove all the options thatstop Squid from caching the server’s results. We also assume that the accelerated host is listening onport 80, since there is no conflict with Squid trying to listen to the same port.

Once you have tested that connecting to Squid brings up the correct pages, you will have to change theDNS entry to point to your cache server. Example 9-4. Accelerating a Slow Server

http_port 80# forward incoming requests to 10.0.0.5, port 80httpd_accel_host 10.0.0.5acl acceleratedHost dst 10.0.0.5/255.255.255.255httpd_accel_port 80acl acceleratedPort port 8000httpd_accel_with_proxy onacl all src 0.0.0.0/0.0.0.0acl myNet src 10.0.0.0/255.255.255.0# since we want to try get accelerated pages through peers, and we# want to cache the results, we remove the no_cache and always_direct# options# Allow requests when they are to the accelerated machine AND to the# right porthttp_access allow acceleratedHost acceleratedPorthttp_access allow myNethttp_access deny all

Prev Home NextRelated Configuration Options Up Transparent Caching

- 137 -

56.2 Accelerating Requests to a Slow ServerExample Configurations

Page 140: Squid Guide

Squid: A User’s GuidePrev Next

57 Chapter 10. Trans parent CachingTable of Contents The Problem with Transparency The Transparent Caching Process Network Layout Filtering Traffic Kernel Redirection (not done) Squid Settings (not done)

When you implement disk caching in an Operating System Kernel, all applications automatically seethe benefit: the data caching happens without their knowledge. Since the Operating System ensuresthat on-disk copies of data are always the same as the cached copies, the data that an application readsis never out of date.

With web caching, however, there is a chance that the original data can change without the cacheknowing. Squid uses refresh patterns (described in chapter 11) to decide when cached objects are to beremoved. If these rules are too agressive, you could end up serving stale objects to clients. Even ifthese rules are perfect, an incorrectly configured source-server could get Squid to return old objects.Because users could retrieve an out of date page, you should not implement caching without their knowledge.

Squid can be configured to act transparently. In this mode, clients will not configure their browsers toaccess the cache, but Squid will transparently pick up the appropriate packets and cache requests. Thissolves the biggest problem with caching: getting users to use the cache server. Users hardly ever knowhow to configure their browsers to use a cache, which means that support staff have to spend time withevery user getting them to change their settings. Some users are worried about their privacy, or theythink (that since it’s a host between them and the Internet) that the cache is slower (certainly not thecase, as a few tests with the client program will show).

However: transparent caching isn’t really transparent. The cache setup is transparent, but using thecache isn’t. Users will notice a difference in error messages, and even the progress bars that browsersshow can act differently.

58 The Problem with Trans parencyWhen Squid transparently caches a site, the source IP address of the connection changes: the requestcomes from the cache server rather than the client machine. This can play havoc with web sites thatuse IP-address authentication (such sites only allow requests from a small set of IP addresses, ratherthan authenticating requests with a name and password.)

Since the cache changes the source IP address of the connection, some servers may deny legitimateusers access. In many cases, this will cost users money (they may pay for the service, or use the infor-mation on that site to make money.)

- 138 -

57 Chapter 10. Transparent Caching

Page 141: Squid Guide

If you know your network inside out, and know exactly who would be accessing a site like this, thereis probably no problem with using transparent caching. If this is the case, though, it might be easier tosimply change all of your users’ settings.

Dialup ISPs generally have little problem implementing transparent caching, since dialup customersalmost always get a different IP address whenever they connect. They cannot thus access sites whichrequire a static IP address, so when requests start coming from the cache server there is no problem.

ISPs which transparently cache leased-line customers are the most likely to have problems with IP-authenticating servers. If you are phasing transparency in for such an ISP, you must make sure thatyour customers know all the implications. They must know how to refresh pages (and who to tell ifthey find such out-of-date pages, so that the Squid refresh rules can be changed), and how the sourceIP address is going to change. You must not simply install the transparent cache and hope for the best!

Prev Home NextExample Configurations The Transparent Caching Process

- 139 -

58 The Problem with TransparencyTransparent Caching

Page 142: Squid Guide

Squid: A User’s GuidePrev Chapter 10. Transparent Caching Next

59 The Trans parent Caching ProcessLet’s look at what happens when you use transparency. First, though, you need to know something ofwhat happens to IP packets at the ethernet level.

59.1 Some Routing BasicsAn ethernet IP packet contains four addresses:

The destination mac address. When a packet is transmitted down the ethernet wire, all ethernetcards on the network will check the destination mac address value. Each ethernet has a (suppos-edly) unique mac address. If the ethernet card’s mac address matches the destination mac addressof the packet, the ethernet card will pass the packet to the operating system, which will then dealwith the contents of the packet.

The source mac address: set by the sending ethernet card

The destination IP address: set by the application sending the packet.

The source IP address: set by the operating system of the source host (or, in some circumstances,the application on the source machine.) This value is not changed by routers along the way,routers re-forward the contents of the packet intact, and change only the destination macaddresses. If the source address was changed by each router, the routers would have to keep stateof all the connections passing through it. This way, it can simply forward packets and forgetabout them.

When a host wants to communicate with a machine that isn’t on the local network, it uses a smart router to find the path to that network. When the client wants to send a packet through a router, theclient sets the destination mac address of the packet to the router’s interface, and sets the IP destina-tion address to the required end host. It’s important to know that the destination IP address of thepacket isn’t set to the router’s IP address, only the mac address is changed. When a router accepts apacket, it decides which host to forward it to, based on it’s routing tables. The router then sets the destination mac address of the packet to the next-hop router’s ethernet address, and sends the packet tothat machine. The remote host then repeats this process: if it’s the destination machine, it uses thepacket, but if it’s another router, it will try and move the packet closer to it’s final destination.

59.2 Packet Flow with Trans parent CachesTransparent caches essentially look out for TCP connections destined for port 80. The cache serverwill intercept these packets, convert them to a standard TCP stream and pass them to Squid. WhenSquid sends reply data to the client, the Operating System fakes the source address of the packets, sothat the client believes it is connected to the server that it originally sent the request to.

- 140 -

59 The Transparent Caching Process

Page 143: Squid Guide

You can’t simply plug a transparent cache into the network and get it to transparently cache pages. Thecache server needs to be in a position where it can fake the reply packets (without the real server inter-rupting the conversation and confusing things.) The server needs to be the gateway to the outsideworld.

Let’s look at the simplest transparent cache setup. The client machine (10.0.0.50) treats the cacheserver’s internal (10.0.0.1) interface as it’s default gateway. This way, all packets arrive on the cacheserver before they reach the rest of the Internet. The filter looks for port 80 packets, and passes them toSquid, but allows all other packets to be passed to the routing layer, which passes the packets to therouter’s IP (172.31.0.2).

Once the connection is established, Squid needs to communicate with the client. Squid doesn’t do anystrange packet assembly: that’s left to the transparency layer. When Squid sends reply data to theclient, the kernel automatically changes the packet’s from address, so it appears to the client that theserver is just routing the requests from the outside world. When Squid connects to the remote server,however, the connect comes from the external interface of the cache server (172.31.0.1, in theexample.) This is where IP-authentication breaks: since the request is coming from the cache (ratherthan the client’s real address (10.0.0.50).

Effectively, we need to get four things right to get transparency right:

Correct network layout

Filtering out the appropriate packets

Kernel Transparency: redirecting port 80 connections to Squid

Squid settings. Squid needs to know that it’s supposed to act in transparent mode.

Prev Home NextTransparent Caching Up Network Layout

- 141 -

59.2 Packet Flow with Transparent CachesThe Transparent Caching Process

Page 144: Squid Guide

Squid: A User’s GuidePrev Chapter 10. Transparent Caching Next

60 Network LayoutFor traffic to be filtered, all network traffic needs to pass through a filter device. On smaller networks,the cache server can do the filtering (as it does in the above example network), but many people arenow opting for secondary filter machines. These filter machines can be routers, Unix machines or evenso-called layer four switches. These filtering machines allow for automatic failover (in case of cachefailure) and load balancing. At the same time, the CPU load on the cache machine is vastly reduced:the CPU doesn’t have to examine every passing packet and do caching.

Sometimes, data is load-balanced across multiple Internet lines. You must ensure that all outgoing datais routed through the cache machine: the ougoing packets have to pass through the filter server, so ifyou are load-balancing outgoing traffic across more than one line, you may have to restructure yournetwork so that packets pass through the filter server before they reach the outside world.

Prev Home NextThe Transparent Caching Process Up Filtering Traffic

- 142 -

60 Network Layout

Page 145: Squid Guide

Squid: A User’s GuidePrev Chapter 10. Transparent Caching Next

61 Filter ing TrafficTraffic filtering can now be done by numerous devices. A short time ago, only Unix servers (withspecial modifications) could sort traffic streams by destination port. These days, however, routers,switches and (of course) Unix machines can filter IP traffic.

Which device you use to do your filtering depends on your load. For light loads, your cache server cando everything: the filtering, the redirection and the transparent caching. For heavier loads, you maywant to use a seperate Unix machine, or you may want to get your router to filter the streams for you(only certain routers can do filtering fast at the hardware level: doing filtering on other routers will add additional load to the CPU). You could even get a so-called layer four switch, which can do filtering atgigabit ethernet speeds.

61.1 Unix machinesSome Unix systems have built in support for filtering by destination TCP port. Since very few peopledo filter like this, many of the free Unix-like systems will need their kernel recompiled to include this functionality. Commercial systems may not support transparency, but if you are running a BSD-basedsystem, you may be able to install the

61.2 Routers (not done)Not Done

61.3 Layer-Four Switches (not done)Not Done

Prev Home NextNetwork Layout Up Kernel Redirection (not done)

- 143 -

61 Filtering TrafficFiltering Traffic

Page 146: Squid Guide

Squid: A User’s GuidePrev Chapter 10. Transparent Caching Next

62 Kernel Redi rection (not done)Not Done

Prev Home NextFiltering Traffic Up Squid Settings (not done)

- 144 -

62 Kernel Redirection (not done)

Page 147: Squid Guide

Squid: A User’s GuidePrev Chapter 10. Transparent Caching Next

63 Squid Settings (not done)Not Done

Prev Home NextKernel Redirection (not done) Up Not Yet Done: Squid Config files

and options

- 145 -

63 Squid Settings (not done)Squid Settings (not done)

Page 148: Squid Guide

Squid: A User’s GuidePrev

64 Chapter 11. Not Yet Done: Squid Config filesand options

3.2: Squid Command Line Options 3.2.1: Help To get a complete list of Squid’s command-line options, with a short description of each option, use the ’-h’ option. 3.2.2: HTTP Port Option: -a Format: -a port number Example: squid -a 3128 Squid will normally accept incoming HTTP requests on the port specified in the squid.conf file with the http_port tag. If you wish to override the tag for some reason, you can use the ’-a’ option. 3.2.3: Debug Information Option: -d Format: -d debug level value Example: squid -d 3 By default Squid only logs fatal errors to the screen, logging all other errors to the cache.log file. If you wish to log more information (for example debugging information, rather than only errors) The "-d" option allows you to increase the amount of debug information logged to the screen. If squid is started from your startup scripts, then this output will appear on the console of the machine. If started from a remote login, this output will be written to the screen of your remote session. 3.2.4: Config file Option: -f Format: -f path Example: squid -f /usr/local/etc/squid.conf This option allows you to specify a different path to the squid config file. When installing a binary version of squid, the default path to the squid.conf file may be inappropriate to your system. If you wish to test a different version of the config file, but wish to be able to revert to the previous config file in a hurry, you can use this option to refer to a different config file. To change back to the other config file you just have to restart Squid without this option. 3.2.5: Signaling a running Squid Option: -k Format: -k action Example: squid -k rotate

- 146 -

64 Chapter 11. Not Yet Done: Squid Config files and options

Page 149: Squid Guide

You can communicate with a running copy of Squid by sending it signals. These signals will cause Squid to perform maintenance functions, doing things like reloading the config file, rotating the logs (for analysis) and so forth. On some operating systems certain signals are reserved. The threads library on Linux, for example, uses the SIGUSR1 and SIGUSR2 signals for thread communication. Sending the wrong signal to a running Squid is easy, and can have unfortunate consequences. This option allows you to use descriptive options to send a running Squid signals, creating a standardized cross-platform user interface. Tag: reconfigure Action: Reloads the squid.conf file. Description: It’s important to note that when Squid re-reads this file it closes all current connections, which means that clients that were downloading files will be cut off mid-download. You should only schedule reloads for after-hours, when their impact is minimal. Tag: rotate Action: Rotates the cache.log and access.log files Description: Cache log files get very large. To stop the log files using up all your disk space you should rotate the logs daily. The squid.conf logfile_rotate option sets the maximum number of rotated logs that you wish to keep. The most common use of this action is to rotate the logs just before logfile analysis (see Chapter 10). A crontab signals the rotation, sleeps for a short time, and then calls the logfile analysis program. Tags: shutdown, interrupt Action: Closes current connections, writes index and exits Description: Squid keeps an index of cache objects in memory. When you wish to shutdown Squid you should use this option, rather than simply killing Squid. Shutting down Squid can take a short while, while it writes the object index to disk. Squid writes to the cache.log file while it shuts down, indicating how many objects it has written to the index. Both the shutdown and interrupt tag have the same effect. (?why I thing it’s since there isn’t a "kill" command for NT?) Tag: kill Action: Kills the Squid process Description: The kill tag should only be used if shutdown or interrupt have no effect. Using this tag will kill Squid without giving it a chance to write the cache index file, causing a slow rebuild on the next start. Tag: debug Action: Turns on maximum debugging Description:

- 147 -

64 Chapter 11. Not Yet Done: Squid Config files and optionsNot Yet Done: Squid Config files and options

Page 150: Squid Guide

At times it is useful to see exactly what the running copy of Squid is doing. Using the debug option will turn maximum logging on for the main Squid process. The output is very verbose, and with a heavily loaded cache can consume megabytes of disk space, so use this only on a lightly loaded cache, and for small periods of time. Tag: check Action: Prints an error message if Squid isn’t running Description: Using this tag causes a ’kill -0’ signal to the running copy of Squid. This doesn’t do anything to the running process, other than check that it exists (and that the user running the command has permission to send signals to the process). 3.2.5: Logging to syslog Option: -s Format: -s Example: squid -s Squid normally logs events and debug information to a special file, normally stored in "/usr/local/squid/logs/cache.log". In some environments you may wish for events to be logged to central "log server", using syslog. Turning on this flag will are not logged to syslog. Logs of client accesses are stored in the file "/usr/local/squid/logs/access.log" ------------------ cache_dir: Squid is designed with the ability to store millions of objects. Given that many operating systems have a limit on file size it’s not feasible for a cross platform program like Squid to store all objects in one file, though there are patches to allow users to create squid stores on large files or on raw devices. If you run a news server you will probably have an idea of how slow it is to do a directory listing of a directory with hundreds of thousands of files in it. On almost all filesystems there is a linear slowdown as more files are added to a directory. This rules out the other option, creating unique filenames and storing them all in one directory. Squid uses a hierarchy of directories for file storage. The default setup creates 16 first-tier directories. Each one of these directories then contains 256 second-tier directories. Files are only stored in the second-tier directories. This

Prev Home Squid Settings (not done)

- 148 -

64 Chapter 11. Not Yet Done: Squid Config files and options

Page 151: Squid Guide

Table of Contents.................... 1Squid................... 11 Squid................ 11.1 A User’s Guide............... 11.1.1 Oskar Pearson............... 5Overall Layout (for writers)............ 52 Chapter 1. Overall Layout (for writers)............... 11Terminology and Technology........... 113 Chapter 2. Terminology and Technology................. 114 What Squid is................. 124.1 Why Cache?................ 124.1.1 In the USA............. 124.1.1.1 Origin Server Load............... 124.1.1.2 Quick Abort.............. 124.1.1.3 Peer Congestion.............. 134.1.1.4 Traffic spikes.............. 134.1.1.5 Unreachable sites.............. 134.1.2 Outside of the USA................ 134.1.2.1 Costs................ 134.1.2.2 Latency................. 14What Squid is not................ 145 What Squid is not................. 15Supported Protocols................ 156 Supported Protocols............. 156.1 Supported Client Protocols........... 156.2 Inter Cache and Management Protocols............ 16Inter-Cache Communication Protocols............ 167 Inter-Cache Communication Protocols................ 17Firewall Terminology............... 178 Firewall Terminology............. 178.1 The Two Types of Firewall............... 178.2 Firewalled Segments................. 178.3 Hand Offs.................. 19Installing Squid.............. 199 Chapter 3. Installing Squid............... 1910 Hardware Requirements............... 1910.1 Gathering statistics................. 2010.2 Hard Disks............... 2210.3 RAM requirements................ 2210.4 CPU Power.............. 24Choosing an Operating System............. 2411 Choosing an Operating System................. 2411.1 Experience................. 2511.2 Features................. 2511.3 Compilers................. 26Basic System Setup................ 2612 Basic System Setup............ 2612.1 Default Squid directory structure............... 2712.2 User and Group IDs

- i -

Table of ContentsNot Yet Done: Squid Config files and options

Page 152: Squid Guide

.................... 28Getting Squid

.................. 2813 Getting Squid

.............. 2813.1 Getting the Squid source code

............. 2913.2 Getting Binary Versions of Squid

................... 30Compiling Squid

.................. 3014 Compiling Squid

................ 3014.1 Compilation Tools

.............. 3014.2 Unpacking the Source Archive

................ 3114.3 Compilation options

............. 3114.3.1 Reducing output of configure

............... 3214.3.2 Destination directory

............. 3214.3.3 Using the DL-Malloc Library

............. 3314.3.4 Regular expression routines

................ 3314.3.5 Asynchronous IO

............... 3414.3.6 User Agent logging

......... 3414.3.7 Simple Network Monitoring Protocol (SNMP)

............ 3414.3.8 Killing the parent process on exit

............. 3514.3.9 Reducing time system-calls

............ 3514.3.10 ARP-based Access Control Lists

............. 3514.3.11 Inter-cache Communication

........... 3614.3.12 Keeping track of origin request hosts

............... 3614.3.13 Language selection

................ 3614.4 Running configure

................ 3714.4.1 Broken compilers

............... 3714.4.2 Incompatible Options

.............. 3714.5 Compiling the Squid Source

............... 3814.6 Installing the Squid binary

................ 39Squid Configuration Basics

............. 3915 Chapter 4. Squid Configuration Basics

................ 4016 Version Control Systems

................. 41The Configuration File

................ 4117 The Configuration File

................ 42Setting Squid’s HTTP Port

............... 4218 Setting Squid’s HTTP Port

................. 4218.1 Using Port 80

............. 4418.1.1 Where to Store Cached Data

............... 45Email for the Cache Administrator

.............. 4519 Email for the Cache Administrator

................ 46Effective User and Group ID

............... 4620 Effective User and Group ID

................ 4620.1 FTP login information

........... 48Access Control Lists and Access Control Operators

.......... 4821 Access Control Lists and Access Control Operators

............... 4821.1 Simple Access Control

........... 5021.2 Ensuring Direct Access to Internal Machines

............. 52Communicating with other proxy servers

............ 5222 Communicating with other proxy servers

................. 5322.1 Your ISP’s cache

................ 5322.2 Firewall Interactions

............... 5322.2.1 Proxying Firewalls

- ii -

Table of Contents

Page 153: Squid Guide

................. 5422.2.1.1 Inside

................. 5422.2.1.2 Outside

................. 5522.2.1.3 DMZ

.............. 5622.2.2 Packet Filtering firewalls

................... 58Starting Squid

................ 5823 Chapter 5. Starting Squid

................ 5824 Before Running Squid

............... 5824.1 Subdirectory Permissions

............. 5824.1.1 System Dependant Information

............. 5824.1.2 Walking the Directory Tree

............ 6024.1.3 Object Store Directory Permissions

........... 6124.1.4 Problems Creating Swap Directories

................... 62Running Squid

.................. 6225 Running Squid

.................... 64Testing Squid

.................. 6426 Testing Squid

........... 6526.1 Testing a Cache or Proxy Server with Client

.............. 6526.1.1 Testing Intranet Access

................. 67Browser Configuration

.............. 6727 Chapter 6. Browser Configuration

................... 6728 Browsers

................ 6728.1 Basic Configuration

............... 6828.2 Advanced Configuration

................ 6828.3 Basic Configuration

.................. 6828.4 Host name

............. 6928.4.1 Netscape Communicator 4.5

............... 6928.4.2 Internet Explorer 4.0

................. 7028.4.3 Unix clients

................. 72Browser-cache Interaction

................ 7229 Browser-cache Interaction

................... 73Testing the Cache

................. 7330 Testing the Cache

.................. 74Cache Auto-config

................. 7431 Cache Auto-config

.......... 7431.1 Web server config changes for autoconfig files

.................. 7431.1.1 Apache

............. 7431.1.2 Internet Information Server

.................. 7431.1.3 Netscape

............... 7531.2 Autoconfig Script Coding

......... 7531.2.1 The Hello World! of auto-configuration scripts

............... 7631.2.2 Auto-config functions

................ 7631.2.2.1 dnsDomainIs

................. 7631.2.2.2 isInNet

............... 7631.2.2.3 isPlainHostname

................ 7731.2.2.4 myIpAddress

................ 7731.2.2.5 shExpMatch

................ 7731.2.2.6 url.substring

.............. 7831.2.3 Example autoconfig files

.............. 7831.2.3.1 A Small Organization

................ 7831.2.3.2 A Dialup ISP

- iii -

Table of ContentsNot Yet Done: Squid Config files and options

Page 154: Squid Guide

............... 7931.2.3.3 Leased Line ISP

.............. 7931.3 Cache Array Routing Protocol

................ 81cgi generated autoconfig files

............... 8132 cgi generated autoconfig files

................... 82Future directions

.................. 8233 Future directions

.................. 8233.1 Roaming

.................. 8233.2 Browsers

................. 8233.3 Transparency

.................... 84Ready to Go

................... 8434 Ready to Go

............ 85Access Control and Access Control Operators

......... 8535 Chapter 7. Access Control and Access Control Operators

.................. 8536 Uses of ACLs

................ 86Access Classes and Operators

............... 8637 Access Classes and Operators

..................... 89Acl lines

................... 8938 Acl lines

................. 8938.1 A unique name

................... 8938.2 Type

................. 8938.3 Decision String

.................. 9038.4 Types of acl

............. 9138.4.1 Source/Destination IP address

............. 9138.4.2 Source/Destination Domain

............. 9238.4.3 Words in the requested URL

......... 9238.4.3.1 A Quick introduction to regular expressions

.... 9238.4.3.2 Using Regular expressions to match words in the requested URL

......... 9338.4.3.3 Words in the source or destination domain

................ 9438.4.4 Current day/time

................ 9538.4.5 Destination Port

............. 9538.4.6 Protocol (FTP, HTTP, SSL)

.......... 9638.4.7 Method (HTTP GET, POST or CONNECT)

................. 9738.4.8 Browser type

................. 9738.4.9 User name

........... 9738.4.10 Autonomous System (AS) Number

.............. 9838.4.11 Username/Password pair

.......... 9838.4.12 Using the NCSA authentication module

........... 9938.4.13 Using the SMB authentication module

............... 9938.4.14 SNMP Community

................... 100Acl-operator lines

................. 10039 Acl-operator lines

............... 10139.1 The other Acl-operators

.............. 10239.1.1 The no_cache acl-operator

........... 10239.1.2 The ident_lookup_access acl-operator

............. 10339.1.3 The miss_access acl-operator

......... 10339.1.4 The always_direct and never_direct acl-operators

............. 10439.1.5 The broken_posts acl-operator

.................. 105SNMP Configuration

................. 10540 SNMP Configuration

.......... 10540.1 Querying the Squid SNMP server on port 3401

- iv -

Table of Contents

Page 155: Squid Guide

......... 10640.2 Running multiple SNMP servers on a cache machine

......... 10640.2.1 Binding the SNMP server to a non-standard port

.......... 10740.2.2 Access Control with more than one Agent

.................... 108Delay Classes

.................. 10841 Delay Classes

............ 10841.1 Slowing down access to specific URLs

............... 10841.2 The Second Pool Class

............... 10941.3 The Second Pool Class

................ 11041.4 The Third Pool Class

.............. 11041.5 Using Delay Pools in Real Life

.................... 112Conclusion

................... 11242 Conclusion

.................. 113Cache Hierarchies

............... 11343 Chapter 8. Cache Hierarchies

................... 11344 Introduction

..................... 114Why Peer

................... 11445 Why Peer

.................. 115Peer Configuration

................. 11546 Peer Configuration

............... 11546.1 The cache_peer Option

.................... 117Peer Selection

.................. 11747 Peer Selection

............. 11747.1 Selecting by Destination Domain

................ 11747.2 Selecting with Acls

....... 11747.2.1 Querying an Adult-Site Filtering-cache for Specific URLs

............ 11847.2.2 Filtering with Cache Hierarchies

........... 11847.2.3 The always_direct and never_direct tags

................. 11947.2.4 prefer_direct

................ 11947.2.5 hierarchy_stoplist

.............. 11947.2.6 neighbor_type_domain

................ 11947.3 Other Peering Options

................. 11947.3.1 miss_access

............... 12047.3.2 dead_peer_timeout

................. 12047.3.3 icp_hit_stale

............... 121Multicast Cache Communication

.............. 12148 Multicast Cache Communication

........... 12248.1 Getting your machine ready for Multicast

.............. 12248.2 Querying a Multicast Cache

........ 12348.3 Accepting Multicast Queries: The mcast_groups option

.............. 12348.4 Other Multicast Cache Options

........... 12348.4.1 The mcast_icp_query_timeout Option

.................... 124Cache Digests

.................. 12449 Cache Digests

................ 125Cache Hierarchy Structures

............... 12550 Cache Hierarchy Structures

................ 12550.1 Two Peering Caches

.............. 12550.1.1 Things to Watch Out For

................... 12650.2 Trees

................... 12650.3 Meshes

............... 12650.4 Load Balancing Servers

- v -

Table of ContentsNot Yet Done: Squid Config files and options

Page 156: Squid Guide

............ 128The Cache Array Routing Protocol (CARP)

............ 12851 The Cache Array Routing Protocol (CARP)

................... 130Accelerator Mode

............... 13052 Chapter 9. Accelerator Mode

............... 13053 When to use Accelerator Mode

.............. 13053.1 Acceleration of a slow server

........ 13053.2 Replacing a combination cache/web server with Squid

................ 13153.3 Transparent Caching

................... 13153.4 Security

............... 132Accelerator Configuration Options

.............. 13254 Accelerator Configuration Options

.............. 13254.1 The httpd_accel_host option

.............. 13254.2 The httpd_accel_port option

............. 13254.3 The httpd_accel_with_proxy option

........... 13254.4 The httpd_accel_uses_host_header option

................ 134Related Configuration Options

............... 13455 Related Configuration Options

........... 13455.1 The redirect_rewrites_host_header option

................. 13455.2 Refresh patterns

................. 13455.3 Access Control

................. 136Example Configurations

................ 13656 Example Configurations

........... 13656.1 Replacing a Combination Web/Cache server

............ 13756.2 Accelerating Requests to a Slow Server

.................. 138Transparent Caching

.............. 13857 Chapter 10. Transparent Caching

.............. 13858 The Problem with Transparency

............... 140The Transparent Caching Process

.............. 14059 The Transparent Caching Process

................ 14059.1 Some Routing Basics

............ 14059.2 Packet Flow with Transparent Caches

................... 142Network Layout

.................. 14260 Network Layout

................... 143Filtering Traffic

.................. 14361 Filtering Traffic

................. 14361.1 Unix machines

................ 14361.2 Routers (not done)

............. 14361.3 Layer-Four Switches (not done)

................ 144Kernel Redirection (not done)

............... 14462 Kernel Redirection (not done)

................. 145Squid Settings (not done)

................ 14563 Squid Settings (not done)

............ 146Not Yet Done: Squid Config files and options

......... 14664 Chapter 11. Not Yet Done: Squid Config files and options

- vi -

Table of Contents