L6: Network Hardware and Software ECE 544: Computer ...
Transcript of L6: Network Hardware and Software ECE 544: Computer ...
L6: Network Hardware L6: Network Hardware and Softwareand Software
ECE 544: Computer Networks IIECE 544: Computer Networks IISpring 2009Spring 2009
Seskar (at) Seskar (at) winlabwinlab (dot) (dot) rutgersrutgers (dot) edu(dot) edu
Network Hardware BasicsNetwork Hardware Basics
StandardsStandards
Availability of interoperable equipment from multiple vendors
Prevents a “Tower of Babel” situation Equipment from different vendors will interoperate if it
complies with the standard Alliances and certification bodies assure interoperability
Wi-Fi for 802.11 WiMax for 802.16
Lowers costs to consumers Both through competition and economies of scale
Promotes advancement of technology Vendors constantly strive for competitive advantage through
improved technology and features
IEEE 802 StandardsIEEE 802 StandardsMaintained by IEEE 802 LAN/MAN Standards Committee Maintained by IEEE 802 LAN/MAN Standards Committee
(LMSC):(LMSC): 802.1 Overview, Architecture, Internetworking and Management802.1 Overview, Architecture, Internetworking and Management 802.2 Logical Link Control802.2 Logical Link Control 802.3 Ethernet (CSMA/CD PHY and MAC)802.3 Ethernet (CSMA/CD PHY and MAC) 802.5 Token Ring PHY and MAC802.5 Token Ring PHY and MAC
802.11 Wireless LAN802.11 Wireless LAN 802.12 Demand Priority Access802.12 Demand Priority Access
802.15 Wireless PAN802.15 Wireless PAN 802.16 Broadband Wireless Access802.16 Broadband Wireless Access 802.17 Resilient Packet Ring802.17 Resilient Packet Ring
802.18 Radio Regulatory802.18 Radio Regulatory 802.19 Coexistence802.19 Coexistence 802.20 Mobile Broadband Wireless Access802.20 Mobile Broadband Wireless Access 802.21 Media Independent Handoff802.21 Media Independent Handoff 802.22 Wireless Regional Area Network802.22 Wireless Regional Area Network
IEEE 802 Naming ConventionsIEEE 802 Naming Conventions
Standalone documents either get no letter (IEEE Standalone documents either get no letter (IEEE 802.3) or gets a capital letter (IEEE 802.1D)802.3) or gets a capital letter (IEEE 802.1D)
Document that supplements a standalone document Document that supplements a standalone document gets a lowergets a lower--case letter (IEEE 802.11b)case letter (IEEE 802.11b)
Letters are assigned in sequential order (Letters are assigned in sequential order (a,B,C,d,ea,B,C,d,e ……) ) and uniquely identify both the Working Group Task and uniquely identify both the Working Group Task Force and the actual documentForce and the actual document
Approved standards have IEEE in front while drafts Approved standards have IEEE in front while drafts have P only designation followed by the draft have P only designation followed by the draft number (P802.1p/D9)number (P802.1p/D9)
802.1802.1
802.1B Management802.1B Management 802.1D MAC Bridges802.1D MAC Bridges 802.1E System Load Protocol802.1E System Load Protocol 802.1F Common Definitions for Management802.1F Common Definitions for Management 802.1G Remote MAC Bridging802.1G Remote MAC Bridging 802.1H Bridging of Ethernet802.1H Bridging of Ethernet 802.1Q Virtual Bridged LANs802.1Q Virtual Bridged LANs
TerminologyTerminology
PHY LayerHeader
EthernetHeader
IPHeader
TCPHeader
DataEthernetTrailer
PHY LayerTrailer
TCP Segment
Ethernet Frame
IP Packet
Fast Ethernet Symbol Stream
Source: Seifert “The switch Book”
Bridge (Switch)Bridge (Switch)FrameFrameData LinkData Link
RepeaterRepeaterSymbol StreamSymbol StreamPhysicalPhysical
Router (Switch)Router (Switch)PacketPacketNetworkNetwork
Firewall (Switch)Firewall (Switch)Segment or Segment or MessageMessage
TransportTransport
SessionSession
PresentationPresentation
GatewayGatewayApplicationApplication
InterconnInterconn. Device. DevicePDUPDU
Router Hardware HistoryRouter Hardware History
Mid 1980s (early days):Mid 1980s (early days): Shared LANs interconnected by bridgesShared LANs interconnected by bridges Two port software based routersTwo port software based routers
Late 1980s Late 1980s –– early 1990s (rapid expansion for router market)early 1990s (rapid expansion for router market) Slower than bridges but have much more functionsSlower than bridges but have much more functions ““Route when you can, bridge when you mustRoute when you can, bridge when you must””
Early Early –– mid 1990s (routers as necessary evils)mid 1990s (routers as necessary evils) Hardware based bridges (switches) with wireHardware based bridges (switches) with wire--speed performancespeed performance ““Switch when you can, route when you mustSwitch when you can, route when you must””
Late 1990sLate 1990s Hardware based routers become practicalHardware based routers become practical WireWire--speed routingspeed routing Perception that all traffic can be switchedPerception that all traffic can be switched
DevicesDevices
Repeaters/HubsRepeaters/Hubs Bridges (Layer 2 Switches?)Bridges (Layer 2 Switches?) Routers (Layer 3 Switches?)Routers (Layer 3 Switches?)
CoreCore EdgeEdge
Firewalls, Network Address Translators (Layer 4 Firewalls, Network Address Translators (Layer 4 Switches?)Switches?)
Gateways, Load Balancers (Layer 7 Switches?)Gateways, Load Balancers (Layer 7 Switches?)
Switch/Router HardwareSwitch/Router Hardware
MediaInterface
Link ProtocolController
HeaderProcessing
Port 1 (Line Card)
Port N (Line Card)
…Port 2
Switch Fabric
Processor{s}
RequirementsRequirements
Distributed Data PlaneDistributed Data Plane Packet Processing: Examine L2Packet Processing: Examine L2--L7 protocol information L7 protocol information
(Determine (Determine QoSQoS, VPN ID, policy, etc.), VPN ID, policy, etc.) Packet Forwarding: Make appropriate routing, switching, and Packet Forwarding: Make appropriate routing, switching, and
queuing decisionsqueuing decisions Performance: At least sum of external BWPerformance: At least sum of external BW
Distributed Control PlaneDistributed Control Plane Up to 10Up to 106 6 entries in various tables (forwarding addresses, entries in various tables (forwarding addresses,
routing information etc.)routing information etc.) Performance: on the order of 100 MIPSPerformance: on the order of 100 MIPS
Switch FabricSwitch Fabric
Connects inputs and outputsConnects inputs and outputs Fabric types:Fabric types:
Shared BusShared Bus –– shared backplane with or without DMA shared backplane with or without DMA (first and second generation routers) (first and second generation routers) –– arbitration problemarbitration problem
Shared MemoryShared Memory –– single common memory is shared single common memory is shared between multiple input outputs (typically used in lowbetween multiple input outputs (typically used in low--cost cost devices) devices) –– memory bandwidth problemmemory bandwidth problem
Shared InterconnectShared Interconnect (Crossbar) (Crossbar) –– switching fabric switching fabric provides parallel paths. Scheduler is centralized; routing tableprovides parallel paths. Scheduler is centralized; routing tables s are kept in the line cards (third generation routers) are kept in the line cards (third generation routers) –– multicast multicast problemproblem
QueuingQueuing
Output queuePros:
Simple algorithms Single congestion pointCons:
N inputs may send to the same output; requires N times speedup
Input queuePros:
Simple algorithms Single congestion pointCons:
Must implement flow control
Low utilization due to HoL Blocking
Modern routersModern routers
Combine input buffering with virtual output queues Combine input buffering with virtual output queues (separate input queue per output) and use output (separate input queue per output) and use output bufferingbuffering Solves blocking problemSolves blocking problem Resolves contention and simplifies schedulingResolves contention and simplifies scheduling Can achieve utilization of 1Can achieve utilization of 1 Scales to > 1 Scales to > 1 TbpsTbps
Increases complexity (as well as introduces multiple Increases complexity (as well as introduces multiple congestion points)congestion points)
Table SRAMFwd/Class
TCAMs
RTT Buffer Mem (1GB)+ pointer
SRAM
Distributed Memory Router Line Distributed Memory Router Line CardCard
InputQueuing
ReceiveFwd
Engine
ControlCPU MemControl
LinecardControl
CPU
FabricRe-Assem.
TransmitFwd
Engine
OutputQueuing
L2 Buffering Optics
ToFabric
FromFabric
Framer
RTT Buffer Mem (1GB)+ pointer
SRAM
Table SRAMFwd/Class
TCAMs
512+MB DRAM
From CISCO
Switched LANSwitched LAN
Modern switches Modern switches -- LAN Segmentation taken to the LAN Segmentation taken to the extreme (extreme (microsegmentationmicrosegmentation):): No access contentionNo access contention No collisions (fullNo collisions (full--duplex)duplex) Dedicated bandwidth for each stationDedicated bandwidth for each station No distance reduction per segmentNo distance reduction per segment
Best case capacity (nonBest case capacity (non--blocking switch)blocking switch)
n
portportDataRateCapacity
1
StoreStore--andand--forwardforwardThe frame (The frame (packet,messagepacket,message) is ) is
received completely before received completely before decision is madedecision is made
CutCut--troughtroughTable lookup and forwarding Table lookup and forwarding
decision is made as soon as decision is made as soon as ““DestinationDestination”” has been has been receivedreceived
CutCut--Trough vs. StoreTrough vs. Store--andand--ForwardForward
Absolute latency Absolute latency (not that much affected by the lookup (not that much affected by the lookup process)process)
Problem with output port availabilityProblem with output port availability CutCut--trough is generally not possible for multicast or trough is generally not possible for multicast or
unknown destination unknown destination (all output ports have to be available (all output ports have to be available simultaneously)simultaneously)
Performance RequirementsPerformance Requirements
•In general header inspection and packet forwarding require complex look-ups on a per packet basis resulting in up to 500 instructions per packet
•At 40Gbps processing requirements are > 100 MPPS
largelargesmallsmall
Time per packet [Time per packet [µµs]s]Peak packet rate Peak packet rate [[KppsKpps]]
OverheadOverheadRateRate[Mbps][Mbps]
77760.077760.019440.019440.04860.04860.01953.11953.11214.81214.8303.8303.8195.3195.319.519.5
0.010.010.050.050.210.210.510.510.820.823.293.295.125.1251.2051.20
1327.104 1327.104 [Mbps][Mbps]
331.776 331.776 [Mbps][Mbps]
82.944 82.944 [Mbps][Mbps]
38 38 [Bytes][Bytes]
20.736 20.736 [Mbps][Mbps]
6.912 6.912 [Mbps][Mbps]
38 38 [Bytes][Bytes]
38 38 [Bytes][Bytes]
0.310.311.221.224.884.8812.1412.1419.5219.5278.0978.09121.44121.441214.401214.40
39,813.12 39,813.12 9953.28 9953.28 2488.32 2488.32 1000.001000.00622.08 622.08 155.52 155.52 100.00100.0010.0010.00
OCOC--768 (40G)768 (40G)
OCOC--192 (10G)192 (10G)
OCOC--48 (2.5G)48 (2.5G)
1000Base1000Base--TT
OCOC--1212
OCOC--33
100Base100Base--TT
10Base10Base--TT
(Optical Carrier (OC-1) is SONET line with payload of 50.112 Mbps and overhead of 1,728 Mbps)
Software Based Software Based ““SwitchingSwitching””1.1. NIC receives the frame, stores it in a buffer, and signals the dNIC receives the frame, stores it in a buffer, and signals the device driver to pick it up.evice driver to pick it up.2.2. Device driver will transfer the packet to the IP stack.Device driver will transfer the packet to the IP stack.3.3. IP stack will check the header and options for correctness and tIP stack will check the header and options for correctness and then check the address; hen check the address;
local packets are passed up to higher layers (or socket abstractlocal packets are passed up to higher layers (or socket abstraction); nonion); non--local packets local packets are passed to IP forwarder.are passed to IP forwarder.
4.4. IP forwarder will decrement the TTL, and consult the routing tabIP forwarder will decrement the TTL, and consult the routing table as to where to le as to where to send the packet (lookup). Tables are constructed by a separate rsend the packet (lookup). Tables are constructed by a separate routing daemon (OSPF, outing daemon (OSPF, BGP, or RIP).BGP, or RIP).
5.5. Finally packet will be sent to appropriate device driver for delFinally packet will be sent to appropriate device driver for delivery. Device driver will ivery. Device driver will perform fragmentation if needed before actually delivering the pperform fragmentation if needed before actually delivering the packet to NIC.acket to NIC.
Things that define performance:Things that define performance: Interrupt latencyInterrupt latency Bus bandwidthBus bandwidth Speed of the CPUSpeed of the CPU Memory bandwidthMemory bandwidth
Click RouterClick Router
Linux kernel implementation based on peer-to-peer packet transfer between processing blocks.
The router is assembled from packet processing modules called elements. Configuration consist of selecting elements and connecting them into a directed graph. (e.g. IP router has sixteen elements on its forwarding path)
Supports both push and pull (queue element as a connection between opposite ports)
Maximum loss-free forwarding rate for IP routing is 357,000 64-byte packets per second (4 year old number)
Fast PathFast Path
Software based Software based ““switchingswitching”” is amendable to changes but slowis amendable to changes but slow Hardware based Hardware based ““switchingswitching”” has very high processing rate but is has very high processing rate but is
inflexibleinflexible The switching architecture is optimized for functions needed forThe switching architecture is optimized for functions needed for
majority of packets majority of packets –– FAST PATH. FAST PATH. For For unicastunicast IPIP:: Packet parsing and validationPacket parsing and validation Routing table lookupRouting table lookup ARP mappingARP mapping Update TTL and header checksumUpdate TTL and header checksum
The rest of the features are implemented in softwareThe rest of the features are implemented in software Rarely used protocol optionsRarely used protocol options Fragmentation (if ports are not the same)Fragmentation (if ports are not the same) HousekeepingHousekeeping SNMP, ICMP, Exception conditionsSNMP, ICMP, Exception conditions
Cisco 6513 SwitchCisco 6513 Switch
The 13The 13--slot chassis based switchslot chassis based switch Variety of modules:Variety of modules:
Supervisor engines Supervisor engines Ethernet modules (Fast Ethernet, Gigabit Ethernet, 10 Gigabit EtEthernet modules (Fast Ethernet, Gigabit Ethernet, 10 Gigabit Ethernet)hernet) Flex WAN modules Flex WAN modules Shared Port Adaptors/SPA Interface Processors Shared Port Adaptors/SPA Interface Processors MultiMulti--Gigabit services modules (content services ,firewall, intrusion Gigabit services modules (content services ,firewall, intrusion detection, IP detection, IP
Security [IPSec], VPN, network analysis, and Secure Sockets LayeSecurity [IPSec], VPN, network analysis, and Secure Sockets Layer [SSL] r [SSL] acceleration) acceleration)
11521152Fast Fast
202010 Gig10 Gig577577GigGig
Max Density (ports)Max Density (ports)EthernetEthernet
1,000,0001,000,000Routes supportedRoutes supported
407Packets per second [Mpps]
64KMAC addresses supported
40Bandwidth per slot [Gbps]
720Total bandwidth [Gbps]
Performance (SUP720)
Virtual LANsVirtual LANs
Allows separation of Allows separation of logicallogical and and physicalphysical connectivityconnectivity multiple logical connections are sharing single physical multiple logical connections are sharing single physical
connectionconnection
VLAN 1
VLAN 2
VLAN 3
VLAN ConceptsVLAN Concepts Tagging Tagging
ImplicitImplicit explicitexplicit
AwarenessAwareness Association rulesAssociation rules Frame distributionFrame distribution
Virtual LANs (contd.)Virtual LANs (contd.)
Groupware environmentGroupware environmentFineFine--grained BW preservationgrained BW preservation
ApplicationApplicationProtocol based access controlProtocol based access controlProtocolProtocolIP mobilityIP mobilityIP subnetIP subnetStation mobilityStation mobilityMAC addressMAC address
Software patch panelSoftware patch panelSwitch segmentationSwitch segmentationSecurity and BW preservationSecurity and BW preservation
PortPortApplicationApplicationAssociation RuleAssociation Rule
802.1Q Tagging802.1Q Tagging
Priority Priority –– 802.1p guys needed 802.1p guys needed ““homehome”” Canonical Format Indicator (CFI) Canonical Format Indicator (CFI) –– indicates indicates endianesendianes of the of the
frameframe 0 0 --> addresses are in canonical format (Little > addresses are in canonical format (Little EndianEndian) and no source ) and no source
routing information in the VLAN tagrouting information in the VLAN tag 1 1 --> addresses are in non> addresses are in non--canonical format (Big canonical format (Big EnianEnian) and tag is ) and tag is
extended with source routing informationextended with source routing information VLAN Identifier VLAN Identifier –– up to 4096 up to 4096 VLANsVLANs
0xFFF 0xFFF -- RFURFU 0x000 0x000 –– used to indicate priority without VLANused to indicate priority without VLAN
DestinationAddress
SourceAddress
VLANProtocol
ID
TagControl
InfoData FCS
6 6 2 2 0-1500 4
Length/Type
2
0x8100 Priority
CF
I=0
Bytes
Bits 16 3 1
VLAN Identifier
12
802.1Q Switch Flow802.1Q Switch Flow
Port 1(input)
Port 2(input)
Port n(input)
.
.
.
AcceptableFrame Filter
IngressRules
IngressFilter
Forward.Decision
AcceptableFrame Filter
IngressRules
IngressFilter
AcceptableFrame Filter
IngressRules
IngressFilter
Forward.Decision
Forward.Decision
.
.
.
.
.
.
Port 1(outp.)
Port 2(outp.)
Port n(outp.)
.
.
.
EgressRules
EgressFilter
EgressRules
EgressFilter
EgressRules
EgressFilter
.
.
.
SwitchingFabric
Filtering DBIngress Progress Egress
Link Aggregation (802.3ad)Link Aggregation (802.3ad)
Frame Distributor
112
2
1213
Interface 1
Interface 2
Interface 3
Interface 4
Frame Collector
1
1
2
2
1
2
13
Interface 4
Interface 3
Interface 2
Interface 1
12 1
121
32
Switch 1 Switch 2
Timeline
AggregatorTransmitQueue
AggregatorReceiverQueue
Aggregated Links
REQUIREMENTS:•Mulitple interfaces with same MAC and data rate•Mapping (distribution function to physical links)•Configuration control (Link Aggregation Sublayer)
Unify multiple connections in a single link:Increased capacityIncremental capacityHigher availability
802.3ad802.3ad
Applies only to EthernetApplies only to Ethernet Burden is on Burden is on DistributorDistributor rather than rather than CollectorCollector Link Aggregation Control Protocol (LACP)Link Aggregation Control Protocol (LACP)
Concept of Concept of ActorActor and and PartnerPartner Port modes:Port modes:
Active mode Active mode –– port emits messages on a periodic basis port emits messages on a periodic basis (fast rate = 1 sec, slow rate (fast rate = 1 sec, slow rate –– 30 sec)30 sec)
Passive mode Passive mode –– port will not speak unless spoken toport will not speak unless spoken to
Network Management and Network Management and MonitoringMonitoring
SSimple imple NNetwork etwork MManagement anagement PProtocolrotocol
SNMP SNMP –– introduced in 1988 as a protocol for introduced in 1988 as a protocol for management of IP devicesmanagement of IP devices RFC 1157 RFC 1157 -- Simple Network Management Protocol Simple Network Management Protocol
(SNMP) (SNMP) RFC 1213 RFC 1213 -- The Management Information Base II The Management Information Base II
(MIB II)(MIB II) RFC 1902 RFC 1902 -- The Structure for Management The Structure for Management
Information (SMI)Information (SMI) A simple set of operations and information that A simple set of operations and information that
enables operator to set/get state of a deviceenables operator to set/get state of a device
SNMP (contd.)SNMP (contd.)
Managers and agentsManagers and agents UDP based protocolUDP based protocol Ports 161 and 162Ports 161 and 162 CommunityCommunity SMI SMI -- Managed Object Managed Object
with attributes:with attributes: NameName Type and syntaxType and syntax EncodingEncoding
NMS Application
UDPIP
Network Access
Agent Application
UDPIP
Network Access
Request
ResponseTrap
<name> OBJECT<name> OBJECT--TYPETYPESYNTAX <SYNTAX <datatypedatatype>>ACCESS <either readACCESS <either read--only, readonly, read--write, writewrite, write--only, or notonly, or not--accessible>accessible>STATUS <either mandatory, optional or obsolete>STATUS <either mandatory, optional or obsolete>DESCRIPTIONDESCRIPTION
““Textual description of the managed objectTextual description of the managed object””::=::={ <Unique OID that defines this object> }{ <Unique OID that defines this object> }
Management Information BaseManagement Information BaseLogical grouping of managed objects as they pertain to a Logical grouping of managed objects as they pertain to a
management taskmanagement task
Example Example MIBsMIBs MIBMIB--II (RFC 1213)II (RFC 1213) ATM (RFC 2515)ATM (RFC 2515) DNS Server (RFC 1611)DNS Server (RFC 1611) 802.11MIB802.11MIB CHRMCHRM--SYS.mibSYS.mib
(private Cisco (private Cisco mibmib for for forfor general system general system variables )variables )
root
ccitt0
Iso1
joint2
org3
dod6
internet1
directory1
management2
experimental3
private4
mib-22
enterprise1
Example:MIB managed object: 1.3.6.1.2.1.1.6 (iso.org.dod.internet.management.mib-2.system.sysLocation)
SNMP OperationsSNMP Operations
Simple request/response protocol. Simple request/response protocol. SNMP Operations:SNMP Operations:
Retrieving Data:Retrieving Data: get, get, getnextgetnext, , getbulkgetbulk, , getresponsegetresponse Altering VariablesAltering Variables : set: set Receiving Unsolicited Messages:Receiving Unsolicited Messages: trap, trap, notification, notification,
inform, reportinform, report Each operation has a standard Protocol Data Unit Each operation has a standard Protocol Data Unit
(PDU) format(PDU) format Most implementations have command line operation Most implementations have command line operation
equivalentsequivalents
SNMPv2 and SNMPv3 only
Management/Monitoring ToolsManagement/Monitoring ToolsCommercial SNMP basedCommercial SNMP based
NMSNMS HPHP’’ss OpenViewOpenView
Network Node Network Node ManagerManager
Castle RockCastle Rock’’s s SNMPcSNMPcEnterprise EditionEnterprise Edition
AgentAgent HP HP OpenViewOpenView AgentAgent OS SNMP AgentOS SNMP Agent Cisco IOSCisco IOS SystemEDGESystemEDGE
Open Source SNMP basedOpen Source SNMP based NMSNMS
NagiosNagios Multi Router Traffic Multi Router Traffic
GrapherGrapher (MRTG)(MRTG) OpenNMSOpenNMS NetdiscoNetdisco
AgentAgent NetNet--SNMPSNMP
Other toolsOther tools RRDToolRRDTool GangliaGanglia
Case Study:Case Study:ORBIT NetworkORBIT Network
ROUTING
Subnet IP Range Netmask Default Gateway
VLAN Note
Internal 10.0.0.0 255.255.0.0 10.0.0.1 2 CM 10.1.0.0 255.255.0.0 10.1.0.1 3 Aruba 10.2.0.0 255.255.0.0 10.2.0.1 4 Instrumentation 10.3.0.0 255.255.0.0 10.3.0.1 5 Control 10.10.0.0 255.255.0.0 10.10.0.1 6 Data 10.15.0.0 255.255.0.0 10.15.0.1 7 Winlab 10.20.0.0 255.255.0.0 10.20.0.1 8 Winlab network mapping
GRID VLANS VLAN
ID Usage
10 Frisbee 11 EDB1 to IDB1 copy 12 13
““BigBig”” PicturePicture
Network ProgrammingNetwork Programming
Linux KernelLinux Kernel
Hardware
a11
a22
3a3
4a4
b1
b2
b3
b4
5
6
7
8
Vcc1
0 0
Vcc2
a11
a22
3a3
4a4
b1
b2
b3
b4
5
6
7
8GND
0
Process Management
ArchitectureDependant
Code
a11
a223
a34a4
b1b2b3b4
5678Vcc1
0 0
Vcc2a1
1a2
23a34a4
b1b2b3b4
5678GND
0
a11
a223
a34a4
b1b2b3b4
5678Vcc1
0 0
Vcc2a1
1a2
23a34a4
b1b2b3b4
5678GND
0
a11
a223
a34a4
b1b2b3b4
5678Vcc1
0 0
Vcc2a1
1a2
23a34a4
b1b2b3b4
5678GND
0
a11
a223
a34a4
b1b2b3b4
5678Vcc1
0 0
Vcc2a1
1a2
23a34a4
b1b2b3b4
5678GND
0
MemoryManager
File System Types
Character Devices
NetworkSubsystem
Block devices NetworkDrivers
CPU Memory Disk Console NIC
MemoryManagement Filesystems Device
Control Networking
ConcurrencyMultitasking
VirtualMemory Files and drs TTYS Connectivity
KernelSubsystem
FeatureImplemented
SoftwareSupport
The System Call Interface
Linux Protocol Stack LayeringLinux Protocol Stack Layering
BSD sockets layer – provides standard UNIX sockets API for inter-process communication (not necessarily only across network).
INET layer – manages communication end-points for the IP based protocols (TCP and UDP).
Device driver – manages physical device.
Application routed
(BSD) Socket Interface
INET Socket Interface
TCP UDP
IP
Device Driver
Socket Address StructuresSocket Address Structures
Generic socket address structureGeneric socket address structure
Always passed by referenceAlways passed by reference Socket functions take pointer to Socket functions take pointer to
the generic socket address the generic socket address structurestructure
Protocol dependant address Protocol dependant address conversion functionsconversion functions
Linux Linux sockaddrssockaddrs don't have a don't have a length field, only a familylength field, only a family
Length AF_INET
16 bit port#
32-bitIPv4 address
Unused
sockaddr_in()IPv4
Length AF_INET6
16 bit port#
32-bitFlow label
sockaddr_in6()IPv6
16 bytes128-bit
IPv6 address
Length AF_LOCAL
sockaddr_un()Unix
Pathname(up to 1024 bytes)
24 bytes
variable
Length AF_LINK
Interface index
sockaddr_dl()Datalink
Interface nameand
link-layer address
variable
Type Name len.Addr. Len. Sel. len
struct sockaddr{uint8_t sa_len;sa_family_t sa_family;char sa_data[14];
}
struct sockaddr_in server; // IPv4…bind(sockfd, (struct sockaddr *) server, sizeof(server));
#define AF_UNSPEC 0#define AF_UNIX 1 /* Unix domain sockets */#define AF_LOCAL 1 /* POSIX name for AF_UNIX */#define AF_INET 2 /* Internet IP Protocol */#define AF_AX25 3 /* Amateur Radio AX.25 */#define AF_IPX 4 /* Novell IPX */#define AF_APPLETALK 5 /* AppleTalk DDP */#define AF_NETROM 6 /* Amateur Radio NET/ROM */#define AF_BRIDGE 7 /* Multiprotocol bridge */#define AF_ATMPVC 8 /* ATM PVCs */#define AF_X25 9 /* Reserved for X.25 project */#define AF_INET6 10 /* IP version 6 */
Support FunctionsSupport Functions
Byte Ordering FunctionsByte Ordering Functions EndiannessEndianness -- order in which integer values are stored order in which integer values are stored
as bytes in computer memory (byte order)as bytes in computer memory (byte order)
Internet protocol use bigInternet protocol use big--endian byte ordering for endian byte ordering for multibytemultibyte integers (integers (network byte ordernetwork byte order))
““ll”” –– long (32 bit), long (32 bit), ““ss”” –– short (16short (16--bit)bit) On systems with same host and network byte order On systems with same host and network byte order
these are null macrosthese are null macros
Byte Manipulating FunctionsByte Manipulating Functions
Address Conversion FunctionsAddress Conversion Functions Convert between ASCII string and network Convert between ASCII string and network
byte ordered binary valuesbyte ordered binary values
#include <strings.h>
void bzero(void *s, size_t n);void bcopy(const void *src, void *dest, size_t n);int bcmp(const void *s1, const void *s2, size_t n);
#include <netinet/in.h>#include <arpa/inet.h>
int inet_aton(const char *cp, struct in_addr *inp);char *inet_ntoa(struct in_addr in);in_addr_t inet_addr(const char *cp);
int inet_pton(int af, const char *src, void *dst);const char *inet_ntop(int af, const void *src, char *dst,
socklen_t cnt);
#include <netinet/in.h>#include <arpa/inet.h>
uint32_t htonl(uint32_t hostlong);uint16_t htons(uint16_t hostshort);uint32_t ntohl(uint32_t netlong);uint16_t ntohs(uint16_t netshort);
Big Big endianendianLittle Little endianendian
LowLow--bytebyteHighHigh--bytebyteaddraddr A+1A+1
HighHigh--bytebyteLowLow--bytebyteaddraddr AA
Socket FunctionsSocket Functions
Creates an endpoint for communication and returns a (socket) Creates an endpoint for communication and returns a (socket) descriptor.descriptor.
Functions Functions setsockoptsetsockopt and and getsockoptgetsockopt can be used to control the socket can be used to control the socket operation.operation.
Raw network protocol accessRaw network protocol accessSOCK_RAWSOCK_RAW
Reliable datagram layer that does not guarantee orderingReliable datagram layer that does not guarantee orderingSOCK_RDMSOCK_RDM
Sequenced, reliable, twoSequenced, reliable, two--way connectionway connection--based datagram based datagram layerlayer
SOCK_SEQPACKETSOCK_SEQPACKET
OboleteOboleteSOCK_PACKETSOCK_PACKET
Connectionless, unreliable fixed maximum Connectionless, unreliable fixed maximum length length datagramsdatagrams (UDP)(UDP)
SOCK_DGRAMSOCK_DGRAM
Sequenced, reliable, twoSequenced, reliable, two--way, connectionway, connection--based based byte streams (TCP)byte streams (TCP)
SOCK_STREAMSOCK_STREAM
DescriptionDescriptionTypeType
#include <sys/types.h>#include <sys/socket.h>
int socket(int domain, int type, int protocol);
IPv6 Internet protocolsIPv6 Internet protocolsPF_INET6PF_INET6
IPX IPX -- Novell protocolsNovell protocolsPF_IPXPF_IPX
Bluetooth socketsBluetooth socketsPF_BLUETOOTHPF_BLUETOOTH
Low level packet interfaceLow level packet interfacePF_PACKETPF_PACKET
AppletalkAppletalkPF_APPLETALKPF_APPLETALK
Access to raw ATM Access to raw ATM PVCsPVCsPF_ATMPVCPF_ATMPVC
Amateur radio AX.25 protocolAmateur radio AX.25 protocolPF_AX25PF_AX25
ITUITU--T X.25 / ISOT X.25 / ISO--8208 protocol8208 protocolPF_X25PF_X25
IPv4 Internet protocolsIPv4 Internet protocolsPF_INETPF_INET
Local communicationLocal communicationPF_UNIX,PF_LOCALPF_UNIX,PF_LOCAL
ProtocolProtocolFamily (domain)Family (domain)
Socket Functions (contd.)Socket Functions (contd.)
Connects the Connects the sockfdsockfd socket to the address specified by socket to the address specified by serv_addrserv_addr Clients: kernel will chose source address and ephemeral port Clients: kernel will chose source address and ephemeral port For connectionFor connection--based protocol it will attempt to connect; for based protocol it will attempt to connect; for
connectionless sets the default destination address.connectionless sets the default destination address. Return 0 on success; Return 0 on success; --1 and 1 and errnoerrno on failure.on failure.
#include <sys/types.h>#include <sys/socket.h>
int connect(int sockfd, const struct sockaddr *serv_addr, socklen_t addrlen);
#include <sys/types.h>#include <sys/socket.h>
int bind(int sockfd, const struct sockaddr *my_addr, socklen_t addrlen);
Assigns local address to the Assigns local address to the sockfdsockfd socket (also socket (also refferedreffered to as to as ““passigningpassigning a name to a socketa name to a socket””)) Used by servers to Used by servers to ““bind their well known portbind their well known port””; on clients, can be used to a ; on clients, can be used to a
specific IP address for multispecific IP address for multi--homed hosts.homed hosts. Return 0 on success; Return 0 on success; --1 and 1 and errnoerrno on failure.on failure.
Socket Functions (contd.)Socket Functions (contd.)
Connects the Connects the sockfdsockfd socket to the address specified by socket to the address specified by serv_addrserv_addr Clients: kernel will chose source address and ephemeral port Clients: kernel will chose source address and ephemeral port For connectionFor connection--based protocol it will attempt to connect; for based protocol it will attempt to connect; for
connectionless sets the default destination address.connectionless sets the default destination address. Return 0 on success; Return 0 on success; --1 and 1 and errnoerrno on failure.on failure.
#include <sys/types.h>#include <sys/socket.h>
int connect(int sockfd, const struct sockaddr *serv_addr, socklen_t addrlen);
#include <sys/types.h>#include <sys/socket.h>
int bind(int sockfd, const struct sockaddr *my_addr, socklen_t addrlen);
Assigns local address to the Assigns local address to the sockfdsockfd socket (also socket (also refferedreffered to as to as ““passigningpassigning a name to a socketa name to a socket””)) Used by servers to Used by servers to ““bind their well known portbind their well known port””; on clients, can be used to a ; on clients, can be used to a
specific IP address for multispecific IP address for multi--homed hosts.homed hosts. Return 0 on success; Return 0 on success; --1 and 1 and errnoerrno on failure.on failure.
Socket Functions (contd.)Socket Functions (contd.)
Extracts the first from the queue of pending connections, createExtracts the first from the queue of pending connections, creates a s a new connected socket, and returns a new file descriptor referrinnew connected socket, and returns a new file descriptor referring to g to that socket.that socket. Only for connectionOnly for connection--based protocols; blocks until connection is availablebased protocols; blocks until connection is available addraddr is filled in with the address of the peeris filled in with the address of the peer Returns Returns socketfdsocketfd on success; on success; --1 and 1 and errnoerrno on failure.on failure.
#include <sys/types.h>#include <sys/socket.h>
int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);
#include <sys/types.h>#include <sys/socket.h>
int listen(int sockfd, int backlog);
Indicated willingness to accept incoming connections Indicated willingness to accept incoming connections backlogbacklog –– maximum queue length for established sockets waiting to be maximum queue length for established sockets waiting to be
accepted (the ones for which threeaccepted (the ones for which three--way way handhsakehandhsake has completed)has completed) Return 0 on success; Return 0 on success; --1 and 1 and errnoerrno on failure.on failure.
Socket Functions (contd.)Socket Functions (contd.)
Closes the socket descriptor.Closes the socket descriptor. Returns Returns socketfdsocketfd on success; on success; --1 and 1 and errnoerrno on failure.on failure.
#include <sys/types.h>#include <sys/socket.h>
int close(int fd);
#include <sys/types.h>#include <sys/socket.h>
ssize_t recv(int s, void *buf, size_t len, int flags);ssize_t recvfrom(int s, void *buf, size_t len, int flags, struct sockaddr *from, socklen_t *fromlen);
ssize_t send(int s, const void *buf, size_t len, int flags);ssize_t sendto(int s, const void *buf, size_t len, int flags, const struct sockaddr *to, socklen_t tolen);
Used to receive and send data Used to receive and send data If no data, the receive call waits (unless If no data, the receive call waits (unless nonblockingnonblocking); otherwise it returns any data ); otherwise it returns any data
available, up to available, up to lenlen.. For send, socket has to be in connected state (endpoint has to bFor send, socket has to be in connected state (endpoint has to be set); error is also e set); error is also
returned for message too long to pass atomically through the undreturned for message too long to pass atomically through the underlying protocol.erlying protocol. All functions return the length of the data on success; All functions return the length of the data on success; --1 and 1 and errnoerrno on failure.on failure.
Simple TCP ClientSimple TCP Client--Server ExampleServer Example
socket()
bind()
listen()
accept()
socket()
TCP Server
TCP Client
connect()
write()
read()
read()write()
close()close()
Data (request)
Data (reply)
End-of-file notification
Conection establishment
(TCP three-way handshake
ECHO Server: ECHO Server: Client sends a line of textClient sends a line of text Server reads that line end Server reads that line end
echoes it backechoes it back Client reads the line and Client reads the line and
displays it on its displays it on its stdoutstdout
ClientClient
/*/** Simple echo client* Simple echo client*/*/
#include <#include <stdio.hstdio.h>>#include <#include <stdlib.hstdlib.h>>#include <#include <strings.hstrings.h>>#include <#include <errno.herrno.h>>#include <#include <resolv.hresolv.h>>#include <sys/#include <sys/socket.hsocket.h>>#include <#include <arpa/inet.harpa/inet.h>>
#define SERVER "127.0.0.1"#define SERVER "127.0.0.1"#define ECHO_PORT 7+1000#define ECHO_PORT 7+1000#define MAXBUF 1024#define MAXBUF 1024
intint main()main(){{
intint sockfdsockfd;;structstruct sockaddr_insockaddr_in servaddrservaddr;;char char buffer[MAXBUFbuffer[MAXBUF];];
if ( (if ( (sockfdsockfd = = socket(socket(AF_INETAF_INET, SOCK_STREAM, 0)) < 0 ), SOCK_STREAM, 0)) < 0 ){{perror("socketperror("socket"); "); exit(errnoexit(errno););
}}
/* Server address structure *//* Server address structure */bzero(&servaddrbzero(&servaddr, , sizeof(servaddrsizeof(servaddr));));servaddr.sin_familyservaddr.sin_family = AF_INET;= AF_INET;servaddr.sin_portservaddr.sin_port = = htons(ECHO_PORThtons(ECHO_PORT); // Port); // Portif ( if ( inet_atoninet_aton(SERVER(SERVER, &, &servaddr.sin_addrservaddr.sin_addr) == 0 ) // Address) == 0 ) // Address{{perror(SERVERperror(SERVER); ); exit(errnoexit(errno););
}}/* Connect to server *//* Connect to server */if ( if ( connectconnect(sockfd(sockfd, (, (structstruct sockaddrsockaddr*)&*)&servaddrservaddr, ,
sizeof(servaddrsizeof(servaddr)) != 0 ))) != 0 ){{perror("connectperror("connect"); "); exit(errnoexit(errno););
}}sprintf(buffer,"Testsprintf(buffer,"Test message");message");sendsend(sockfd(sockfd, buffer, , buffer, sizeof(buffersizeof(buffer), 0);), 0);bzero(bufferbzero(buffer, MAXBUF);, MAXBUF);recvrecv(sockfd(sockfd, buffer, , buffer, sizeof(buffersizeof(buffer), 0);), 0);printf("%sprintf("%s\\nn", buffer);", buffer);closeclose(sockfd(sockfd););return 0;return 0;
}}
ServerServerservaddr.sin_addr.s_addrservaddr.sin_addr.s_addr = INADDR_ANY; // On any interface= INADDR_ANY; // On any interfaceservaddr.sin_portservaddr.sin_port = = htons(ECHO_PORThtons(ECHO_PORT); // Our port); // Our port/* Let's bind to the port/address *//* Let's bind to the port/address */if ( if ( bind(sockfdbind(sockfd, (, (structstruct sockaddrsockaddr*)&*)&servaddrservaddr, , sizeof(servaddrsizeof(servaddr)) != 0 ))) != 0 ){{perror("bindperror("bind");");exit(errnoexit(errno););
}}
/* We are the server *//* We are the server */if ( if ( listen(sockfdlisten(sockfd, LISTENQ) != 0 ), LISTENQ) != 0 ){{perror("listenperror("listen");");exit(errnoexit(errno););
}}
for ( ; ; )for ( ; ; ){{intint clientfdclientfd;;structstruct sockaddr_insockaddr_in clientaddrclientaddr;;intint addrlenaddrlen==sizeof(clientaddrsizeof(clientaddr););
/* Accept a client connection *//* Accept a client connection */clientfdclientfd = = accept(sockfdaccept(sockfd, (, (structstruct sockaddrsockaddr*)&*)&clientaddrclientaddr, &, &addrlenaddrlen););printf("%s:%dprintf("%s:%d connectedconnected\\n", n", inet_ntoa(clientaddr.sin_addrinet_ntoa(clientaddr.sin_addr),),
ntohs(clientaddr.sin_portntohs(clientaddr.sin_port));));send(clientfdsend(clientfd, buffer, , buffer, recv(clientfdrecv(clientfd, buffer, MAXBUF, 0), 0);, buffer, MAXBUF, 0), 0);close(clientfdclose(clientfd););
}}close(sockfdclose(sockfd););return 0;return 0;
}}
/* Simple echo server. *//* Simple echo server. */#include <#include <stdio.hstdio.h>>#include <#include <stdlib.hstdlib.h>>#include <#include <strings.hstrings.h>>#include <#include <errno.herrno.h>>#include <#include <resolv.hresolv.h>>#include <sys/#include <sys/socket.hsocket.h>>#include <#include <arpa/inet.harpa/inet.h>>
#define ECHO_PORT 7 + 1000#define ECHO_PORT 7 + 1000#define MAXBUF 1024#define MAXBUF 1024#define LISTENQ 20#define LISTENQ 20
intint main(intmain(int argcargc, char *, char *argvargv[])[]){{intint sockfdsockfd;;structstruct sockaddr_insockaddr_in servaddrservaddr;;char char buffer[MAXBUFbuffer[MAXBUF];];
/*/** Create a TCP socket* Create a TCP socket*/*/if ( (if ( (sockfdsockfd = = socketsocket(AF_INET(AF_INET, SOCK_STREAM, 0)) < 0 ), SOCK_STREAM, 0)) < 0 ){{/* Show why we failed *//* Show why we failed */perror("socketperror("socket");");exit(errnoexit(errno););
}}/* Set our address and port *//* Set our address and port */bzero(&servaddrbzero(&servaddr, , sizeof(servaddrsizeof(servaddr));));servaddr.sin_familyservaddr.sin_family = AF_INET;= AF_INET;