Inspur OAI Product Introduction
Transcript of Inspur OAI Product Introduction
Inspur OAI Product Introduction
Jan 2021
AutoML Suite
AIStation(Training Platform)
Server
AI Resource
Platform
AI Computing
Platform
Accelerator Card
F07V F10A F37X N10X F10STraining Inference Edge
Model Development
Model Deployment
Application Development
T-Eye
AI application and framework feature
analyzer
N20X
AIStation(Inference Platform)
Compatible with multiple DL frameworks
Support AI model online testing and evaluation
Multi-model deployment
and weighted calculation
Caffe-MPITensorFlow-optLMSTF2
AI Algorithm
Toolkit Platform Cloud& On-premise
Deployment
Losslessmodel inaccuracy
Speed upFPGA
development
Automatic modeling
Automatic tuning
Automatic cropping
One of the firstparallel versions ofCaffe framework
Optimized TensorFlowframework with the fastest
AI training speed on thepublic cloud, 512GPU
expansion efficiency 90%
Self-developed AImodel computing
framework,supporting GPU
large-scaletraining
Smart City MedicineEducation ManufactureTelecom Finance MediaInternetE2E AI Solution
General Server and Open Compute
Telecom FinTech HealthcareBroadcastGovernment ManufacturingTransportationInternet
“Solution Partner”-SI&ISV able to deliver total solution for industries
Efficient Innovation
AI Computing Platform
• Industry’s Most Comprehensive AI Server
Portfolio
• General Server 2U/4U/6U
• Open Hardware Compute and OAI
• M5 AI Servers, FPGAs, ASIC Cards….
Agile Collaboration
AI Resource Platform
• AIStation: One-stop AI development platform,
efficient and flexible computing resource
scheduling; easy to deploy AI dev environment
• T-Eye: AI performance profiling and tuning tool,
empower AI application optimization
Time to Delivery
Algorithm Toolkit
• AutoML Suite: On-Premise & Cloud deployment;
Parallel Acceleration; Effortless Model Generation
• Caffe-MPI: 1st Parallel Version of Caffe
• TensorFlow-Opt: Scale-out TensorFlow on public
cloud, optimization on cloud RoCE
“Algorithm Partner”-AI Companies able
to develop core AI capabilities
ODCC Rack
OCPRack
Project Olympus Rack
Open19 Rack
InCloud Rack with Intel® RSD
1st Industry 21” OAM Platform
ODCC Solution Provider
Intel® Rack Scale Design
One of the Key Members
Inspur is a Key Member in Open Platform Communities.
OCP Platinum MemberSolution Provider
MicrosoftProject Olympus
Data
Computing Resources Utilization
Training Time
40% 80%2 days 4Hours
Telecom
Finance
MedicalScience
Government
Manufacture
transport
Internet
Low Model Development Efficiency
On-premiseDeployment
Public Cloud
Private Cloud
Model Development and Training
Model Deploymentand Inference
AI App
Efficient and flexible
platform, obtain AI
computing resources on
demand to speed up
model training efficiency
Easily deploy AI
development
environment and
development process,
significantly improving
development efficiency
Low Utilization of Computing Resource
The deployment complication to
deploy the trained model into production
The deployment complication to get the trained model into production
Seamless connection
between model
development and
deployment, shorten the
time of scaling to
production
Unified management
of multiple models,
Centralized scheduling
of computing
resources
Dynamic allocation,
Elastic expansion
One-stop Model Deployment
Multi-application load balancing and resource elastic scaling
Data Model AI Service
2 days 5 min
PC
Mobile
Manufacture
Robot
IOT
141mm
• SAS Switch for pooling HDDs, improving
storage flexibility
• PCIe Switch for pooling GPU, GPU
acceleration ratio increases linearly
• GPU/FPGA over Fabric, heterogeneous
acceleration remote expansion
Server 1 Server 2 Server 3 Server 4
PCIe Switch
NVMe over Fabric / PCIe / Ethernet
GPU Pool
FPGA Pool 1 GPU Pool 2
2018/11
World’s First 21” OAI Reference
System
2020/2
54V OAM Power on
2020/5 2020/82019/112019/8 2019/92019/52019/3
OAI Reference SystemMX1
OCP Certificated 2S Compute NoteON5263M5 (San Jose)
High Density Whisper Cable
WhisperConnector
Front IOconnector
QSFP-DD Connector for OAMExpansion
4 x HHHL PCIe Expansion
1570W without OAMs
141mm
35
Ambient Temperature
Supported
Product Model: MX1
Chassis 21” 3OU Rack mount
Dimensions 537W*141H*803D (mm)
Connection with Compute node
Up to PCIe Gen4 x32
OAMSupport Max 8pcs 48~54V OAM(up to 450W each);Support Max 8pcs 12V OAM (up to 350W each)
Power without OAM 1570W
PCIe Switch Support PCIe Gen4 (100lanes/chip)
PCIe re-timer Support PCIe Gen4 x16
Phy re-timer 56Gbps PAM-4 or 10/28Gbps NRZ x16
Expansion slots Up to 4 x PCle Gen4 x16 low profile standard card
BMC AST2520
I/ODongle connector for dedicate NIC and UBS, UID/PWR Button with LED , QSFDDx8 for OAM scale out, micro USBx2 for OAM debug
Ambient Working Temperature
5-35 ℃
INSPUR CONFIDENTIAL
OAI Reference SystemMX1
OCP Certificated 2S Compute NoteON5263M5 (San Jose)
High Density Whisper Cable
35
Ambient Temperature
Supported
Front IOconnector
QSFP-DD Connector for OAMExpansion
4 x HHHL PCIe Expansion
1570W without OAMs
INSPUR CONFIDENTIAL
ComputeNode
54V HSC x9
54V to 12VVR x6
PCleRe-timerPT4161L
PCleRe-timerPT4161L
PCleRe-timerPT4161L
PCleRe-timerPT4161L
PCleRe-timerPT4161L
PCleRe-timerPT4161L
PCleRe-timerPT4161L
PCleRe-timerPT4161L
OAM0 OAM1 OAM2 OAM3 OAM4 OAM5 OAM6 OAM7
CPLD
CPLD
QSFP
-DD
QSFP
-DD
QSFP
-DD
QSFP
-DD
QSFP
-DD
QSFP
-DD
QSFP
-DD
QSFP
-DD
IBModule
IBModule
Pcle SwitchPM42100
Pcle SwitchPM42100
Pcle SwitchPM42100
Pcle SwitchPM42100
BMCAST2520
(Management)
CPLD
PhyRe-timer
CRT50216P
PhyRe-timer
CRT50216P
PhyRe-timer
CRT50216P
PhyRe-timer
CRT50216P
I2C
Power Monitor
I2C/JTAG
I2C
UBB
HIB
I2C
PDB
Signal symbol Signal type
PCle x16
Management
Serdes
A I O p s a n d M g m t
Open RMC
Open BMC
Physical Infrastructure Manager
Open standard mgmt. interface
For
• Solutions for the implementation of the
rack Mgmt based on node level
• Southbound manages system resources;
northbound presents Info
• Meet the needs of Mgmt encryption and
resource pooling
• Relying on vendors maintenance
for traditional BMC code base
• Complex to modify the traditional
BMC code for new HW
• Poor readability of IPMI tool
binary code
For
To B u i l d a s m a r t e r D C
Automated AssetMgmt
Intelligent alarm andFault Mgmt
One-click upgradeMgmt
Visual 3D Mgmt
100K units scaleMgmt capability
INSPUR CONFIDENTIAL
• RMC Web Server:基于OpenBMC的Rack Manager控制器服务
• RMC Web UI:资源收集及服务配置文件
• 南向接口:支持Redfish RESTful API
• 北向接口:支持Redfish RESTful API并丰富了服务配置文件
RMCWEB SERVER
BMCBMCBMC
Redfish Redfish Redfish
Redfish Redfish
USER TOOLSRMC WEBUI100G Switch
1G Mgmt Switch
OAI system
Compute node x2
OAl system
OAl system
Compute node x2
OAl system
Power Shelf
48VDC Open Rack
1 pairs 48V Bus Bar
1 shelf per Rack
Power Shelf
33KW(12xPSU)
40V-58V
93mm (H, 2OU) x 537mm
(W, 21”) x 586 (D) mm
System Devices
Inspur 3OU OAI systems x4
Inspur 2OU compute node x4