6/24/99 1
Les Cottrell
for the SLAC network group
SLAC
Presented by Charley Granieri at the SLAC Computing External Review, June 1999
SLAC Networking
6/24/99 2
Outline of talk• LAN - architecture, assets, monitoring
• Residential access to SLAC
• WAN - connectivity and monitoring
• Email - servers, spam, majordomo etc.
• Other network services such as News ...
• Advanced technology pilots
• Summary - challenges etc.
6/24/99 3
Mission etc.• Provide leadership and support in data
communications to the Laboratory as a whole and to physics research in particular.– Network engineering & management - 4.3 FTEs, 1 open
slot– Network monitoring:
• LAN 1.5 FTEs• WAN 2.7 FTEs
– Network services (email, news, VMS etc): 2.5 FTEs– NetOps: 3 + 1 open slot
• Telecommunications also under same hat (helps coordination and convergence):– 2.5FTEs + contractor
6/24/99 4
Network Drivers• Deployment of computers to new areas/farms/people• Faster interfaces, more capable, easier to use computers• New applications (BaBar, BSD, multimedia, VoIP …)• Increased reliance• Increased security• World wide collaborations - distance independent• New technologies (media, interfaces, protocols,
applications)
6/24/99 5
Growth of SLAC LAN
6/24/99 6
Principles for LAN design• Simplicity
– Enet 10/100/1000 Mbps, phase out FDDI, LocalTalk etc.– Reduce number of protocols in core to IP only, limit
bridging, keep smart stuff at edges,
• Stay away from edges of performance envelope– over-provision, double aggregate every 18 months
• shared to switched 10Mbps => 100Mbps for desktop• 100Mbps switched => 1Gbps for core & high vol. Servers
• Provide high availability– redundancy of core components so can schedule outages– UPS
• Invest in network management tools
6/24/99 7
Network Architecture
• Structured wiring started 1995, complete outside radiation fence this fiscal year, i.e. 90% completed
• Increasingly switched network (from shared media)– Based on mass market Ethernet– improved error isolation, ability to know where assets
are, and security– scalable
6/24/99 8
DMZ
InternetModems,
ISDNxDSL
SLAC Switched LAN Summer 1999
FDDI Ring
ESA
Legacy
SSRL
OldServers
MCC3
16 Buildingswitches
BaBarMCC2
SSRL
MCC1
10BaseT
FDDI/CDDI
100BaseT
100BaseFL
Gigaswitch
Router
Switch
Hub
1Gbit FL
4Gbit FL
Concentrator
IR24 Farms 3 Servers
BSD
CoreRouters
Switches
6/24/99 9
Current state - availability
• Switched segmentation reduces impact of many problems, simplifies identification
• UPS for core components
• Redundant core devices
• Redundant power supplies on core switches & routers
• Redundant trunks
• Cisco Hot Standby Routing Protocol
6/24/99 10
Current state - performance • Just been through major upgrade, switch fabric
occupancy ~ 60%, 1000Mbps in core + high performance servers– 46% of available bandwidth in 1000Mbps links, 47% in
100Mbps links– 2.6 hosts / collision domain (down from 3.6 at last
review)
• Close collaboration with BaBar & systems to improve/optimize performance for trigger farms and data collection
6/24/99 11
BaBar• Make sure network is not the bottleneck
• Measured > 400 Mbps (UDP or TCP - with extended windows) Gbps to Gbps
• Measured ~ 400Mbps aggregate from Gbps to 4 * 100 Mbps between CC & IR2
• Provide real-time web accessible monitoring page showing thruputs for various components & drill down
6/24/99 12
Real time BaBar thruput Monitoring
6/24/99 13
LAN assets inventory• Oracle Database of network equipment, linked to
property control, phone etc.• Much of network info gleaned automatically and
entered into dB:– connectivity from router ARP tables, from bridge/switch
CAM tables, from CDP• gives MAC level addresses etc• create “model” of router/switch/hub & host connections
– MIBs in nodes provide make & model, S/N, swr/hdw rev level, port type, speed
• Other info is entered manually:– when host registered it gets property control number, IP
address, owner, admin– DNS entered into dB then automatically updates DNS tables
6/24/99 14
LAN performance monitoring • Read MIBs from routers & switches & plot:
– octets, errors– generate alerts (outside thresholds, e.g. heavy
multi/broadcast activity, heavy utilization, high error rates)
– graphical Web reports using Java & other (MRTG) tools with history for baselines
6/24/99 15
DMZ monitoring
• FDDI probe monitors traffic coming in via ESnet, data is read out at intervals (typically once/hour) and logged to database.
• Reports are generated daily.– Report on common protocol utilization and suspicious
use– top 20 nodes, conversation pairs, reports by domain,
complete list of conversations
6/24/99 16
Residential & dialup services
• Use PPTP VPN for security, have NT, Win98, Mac clients, also useful for travelers
Dialup/ARA
Dialup / PPP ISDN
DSL-Covad DSL-PBI
Max speed 33kbps 56kbps128 kbps
144k, 384k, 1.5M / 384k
384k / 128k, 1.5M / 384k
Inside SLAC Firewall Yes Yes Yes No NoClients Mac Any Any Any Any
LocationOpt. Local Opt. Local
Opt. Local
~80% BayArea
~60% BayArea
Users 70 150 80 14 2
Ports 12 Campus ISP46=>69
6/24/99 17
Utilization
Tracking use, keeping logs for more detailed auditing
6/24/99 18
WAN Challenges• No single management responsible for Internet
• Exponential growth
• HEP critically dependent on WAN for collaborations
• HEP/Research & Education competing with commodity usage in many cases
• Internet extremely complex, changing rapidly, internal behavior hard to predict
• HEP use is very diverse, collaborators, vendors, services
6/24/99 19
Connectivity• Use of ESnet link (43Mbps) up by factor of 2 in last
4 months– 5 minute averages up to 10-15 Mbps / direction fairly
typical/ day– In process of upgrading to 155 Mbps
• 40Gbytes/day IP traffic, roughly 50% TCP, 50% UDP– FTP, AFS, ssh, http, xwin are top protocols
• Campus link just upgraded from 10 Mbps to 155 Mbps
• Working to get NTON reconnected
6/24/99 20
Internet End-to-end Monitoring www.slac.stanford.edu/comp/net/wan-mon/tutorial.html
• Within ESnet connectivity excellent, Internet 2 good, after that only acceptable to poor
• Monitor to set “user” expectations, help with problem detection, get planning information & trends, identify problem areas, optimize routes
• Collaborative effort to provide HEP-wide & ESnet wide monitoring requested by ICFA, ESnet– Partially funded by DOE/MICS FWP– Involves many HEP sites, led by SLAC & HEPNRC
6/24/99 21
• Treats Internet as black box
• Provides useful real world measures of network round trip response time, loss, reachability, jitter
• Low cost/lightweight tool– ping “universally available”, easy to understand
• no software for clients to install
• no special privileges needed for monitor sites
– resources: 100bps/link, ~600kBytes/month/link
• Agrees well with more complex measurements
Main tool (PingER) currently uses Ping
6/24/99 22
Extent of measurements• 18 Monitoring sites - 7 in US (5 ESnet, 2
vBNS), 2 in Canada, 7 in Europe (ch, de, dk, hu, it, uk(2)), 2 in Asia (jp, tw)
• 1261 monitoring-remote-site pairs• 379 unique hosts, 272 sites• 50 beacon sites, 27 countries• Metrics include response, jitter, loss,
reachability• Data goes back > 4 years• 1 Million probes of Internet / day
PingER pair distribution by global areaRussian
Fed4%
Gov7%
South America
1%
Org1%Australasia
1%
Canada5%
China2%
Europe38%
Mil0%
Edu33%
Com2%
Japan3%
Asia2%
6/24/99 23
Results 1/2
Comparison of median packet loss for Mar-99 for various communities
0.01
0.1
1
10
ESnet -ESnet (31)
vBNS -vBNS (18)
XIWT -XIWT (140)
ELab -ELab (14)
Community% m
ed
ian
mo
nth
ly
pa
ck
et
los
s
75%median25%
6/24/99 24
Results 2/2TCP bandwidth < (1470/RTT) * (1/sqrt(loss))
10
100
1000
10000
Jun-94 Oct-95 Mar-97 Jul-98 Dec-99
Ban
dw
idth
in
kb
ytes
/sec
Canada (18 pairs)Edu/US (138 pairs)ESnet (31 pairs)Japan (12 pairs)Europe (95 pairs)100% improvement / yearExpon. (ESnet (31 pairs))Expon. (Europe (95 pairs))Expon. (Edu/US (138 pairs))Expon. (Canada (18 pairs))Expon. (Japan (12 pairs))
6/24/99 25
Email• Gateway processes about 40K msg/day (growing 25%
/ year, doubled since last review)– monitor & alerts (email, pager) on exceptions– 95% trivial email delivered in < 1 min
• ~ 2700 email users– Support generic addresses fname.lname or [email protected]
– 700 POP users, 30 IMAP, Quickmail gone, VM gone– separated IMAP & POP servers– dedicated internal SMTP server– IMAP pilot - Netscape & pine most popular clients
• See www.slac.stanford.edu/comp/net/email/futures.html
6/24/99 26
Current Email system
ScreeningRouter
OffsiteMail
gateways
SLACAX SSRL SLD SLC
SMTP
VAX clusters
Redundantmail servers
SMTPServ01..2
Listserv
SMTP
PC/MacEmail users
UnixIMAP users
Pine
Eudora
Netscape
Outlook
SMTP
SMTPserv
SMTP
SMTPNon-authenticated relay
POP&
IMAP
cleartext
POPservIMAPserv
POP & IMAP in cleartext
Problem areas in red:cleartext passwordsNFS mounted spool
non-authenticated SMTPBackup
SMTP
Unix Email users
NFS mode
NFS server
SMTP
NFS
NFS
NFS
Unix Email users
NFS mode
NFS server
SMTP
NFS
NFS
NFS
Unix Email users
NFS mode
NFS server
SMTP
NFS
NFS
NFS
6/24/99 27
Proposed Email System
ScreeningRouter
OffsiteMail
gateways
SLACAX SSRL SLD SLC
SMTP
VAX clusters
Redundantmail servers
SMTP
Listserv
SMTP
PC/MacEmail users
UnixIMAP users
Pine
Eudora
Netscape
OutlookPOP
&IMAP
SSL
SMTP
SMTPserv
SMTP
Authenticated SMTP
POPservIMAPserv
POP & IMAP SSL
SMTP
Unix Email users
NFS mode
NFS server
SMTP
NFS
NFSUnix Email users
NFS mode
NFS server
SMTP
NFS
NFSUnix Email users
NFS mode
NFS server
SMTP
NFS
NFS
SMTP
6/24/99 28
Mail list server (majordomo)
• 215 lists (up from 155 last review)
• On a separate server
• Have web forms for requesting lists, maintaining subscriptions and querying the lists
6/24/99 29
Spam / Viruses
• Actively provide anti-spam support:– last review was growing (factor of 16 in 9 months) up to
40 spam actions/week – now stable ~ 10 actions/week– ~ 2100 sites blocked (was 90 two years ago)– prepared to restore domain upon user request
• Since Melissa remove any Excel or Word attachment with a macro on SLAC incoming email
• Also strip out well known viruses / worms (e.g. happy99, explore.zip.exe)
6/24/99 30
Dynamic Host Configuration Protocol• Provide DHCP for fixed hosts & roamers• Tension between easy walk-up use & security
– require registration for accountability• this is for connection inside the site firewall
• an issue is whether to provide anonymous DHCP outside firewall (i.e. what you are using today)
– seek guidance on how to strike the balance
6/24/99 31
DHCP• Is in production but barely
• Web forms for adding to DHCP database– needs to allow editing, deleting, more restrictive
availability, better integration with Enterprise DB
• Work in progress or queued– automate log file pruning, restrict who can register hosts– convert to use Enterprise DB as master– convert DHCP server from SunOS to Solaris– increase information logged about user, location etc.
• Needs resources (aka part of new hire) to focus on it and fix current problems
6/24/99 32
Lightweight Directory Access Protocol
• Microsoft is embracing LDAP in Windows 2000
• Email vendors are migrating towards password DBs in LDAP
• Have an LDAP-v3 server – loaded with the SLAC user directory information, – read only at the moment
• Starting to coordinate with other HEP labs (e.g. CERN), there is a HEP LDAP email list
6/24/99 33
News, NTP, DNS
• News down to 20 groups, out-sourced to campus
• NTP: driven from GPS on-site
• DNS: driven from Oracle network database of hosts
6/24/99 34
VMS central support
• Driven by SLD, has SLACVX for SLD offline– AlphaServer 8400 + 10 smaller alphas & + 6 VAXes (for
X support hosts & legacy code)– ~ 6000 SpecInt92 – 500 Gbytes disk, RAID controller, STK connection– HSM, Oracle etc.– Software & hardware basically stable– Supported by SCS staff (~0.5FTE)– Support folks autopaged
6/24/99 35
Advanced technology exploration
• ESnet IPv6 collaborator
• NGI proposal (Particle Physics Data Grid) and high performance WAN networking (China Clipper)
• NTON project (480 Mbps disk to application SLAC<>LBNL)
• Internet monitoring (IEPM)
• VoIP pilot with CERN, FNAL, DESY, ESnet/LBNL
6/24/99 36
Major challenges• Tracking topology & configuration
• Monitoring a switched network
• Staying at right point in technology curve
• Constraining complexity, – phase out of legacies, Appletalk, Macs, DECnet IV,
FDDI (user resistance)– embracing new needs: e.g. VPNs, xDSL, IPv6, video,
VoIP, IMAP, DHCP, QoS, new routing protocols
6/24/99 37
Major challenges
• Balancing security vs. usability & simplicity
• Increasing purposes for and dependence on the net– video, VoIP, multicast– outages hard to schedule, upgrades hard to do
• Finding & keeping staff
6/24/99 38
Summary• LAN: well positioned, architecture scales, follows
industry practices, will need continued growth
• WAN: little control, yet must understand, track, monitor and collaborate with others inside & outside HEP, nationally & internationally
• VMS: central support reducing, stable, goes away with SLD
• Network services, technologies & protocols keep emerging
• People / skills resources are major gating factor
Top Related