Building an Enterprise Analytics Platform with Jupyter Enterprise Gateway · 2020. 7. 13. ·...

1
Building an Enterprise Analytics Platform with Jupyter Enterprise Gateway github.com/jupyter-incubator/enterprise_gateway Jupyter Notebook Platform Architecture Enterprise Analytics Platform Characteristics Shared Compute Resources Enterprise Cloud, Public Cloud or Hybrid Distributed Consumers Notebooks running local Notebooks as services Different Utilization Patterns High number of idle resources Jupyter Notebook Platform Limitations Single-user sharing the same distributed filesystem and privileges Resources are limited to single node No security, users can see and control each other’s process Jupyter Enterprise Gateway addresses enterprise requirements and use cases Current Feature Set Optimized Resource Allocation Run Spark in YARN Cluster Mode to better utilize cluster resources Supports YARN, IBM Spectrum Conductor, Kubernetes resource managers Pluggable architecture for additional resource managers Enhanced Security Enable TLS for all socket communications Any HTTP communication should be encrypted (SSL) Multiuser support with user impersonation Enhance security and sandboxing by enabling user impersonation when running kernels. Individual HDFS home folder for each notebook user. Use the same user ID for notebook and batch jobs. Distributed Kernels == Optimized Resources 8 8 8 8 -20 30 80 4 Nodes 8 Nodes 12 Nodes 16 Nodes Cluster Size (32GB Nodes) MAXIMUM NUMBER OF SIMULTANEOUS KERNELS 16 32 48 64 0 20 40 60 80 4 Nodes 8 Nodes 12 Nodes 16 Nodes Cluster Size (32GB Nodes) MAXIMUM NUMBER OF SIMULTANEOUS KERNELS Notebook UI runs in the browser The Notebook Server serves the ’Notebooks’ Kernels interpret/execute cell contents Are responsible for code execution Abstracts different languages J U P Y T E R E N T E R P R I S E G A T E W A Y Remote Kernel Manager Distributed Process Proxy YARN Cluster Process Proxy Kubernetes Process Proxy Conductor Cluster Process Proxy J U P Y T E R N O T E B O O K NB2KG Lab Jupyter Kernel Gateway Jupyter Notebook Spectrum Conductor Supported Platforms Jupyter Client Integrated with the Jupyter stack Remote Kernel Creation & Shutdown Remote Kernelspec { "language": "python", "display_name": "Python on Kubernetes", "process_proxy": { "class_name": "enterprise_gateway.services.processproxies.k8s.KubernetesProcessProxy", "config": { "image_name": "elyra/kubernetes-kernel-py:dev" } }, "env": { }, "argv": [ "python", "/usr/local/share/jupyter/kernels/python_kubernetes/scripts/launch_kubernetes.py", "{connection_file}", "--RemoteProcessProxy.response-address", "{response_address}", "--RemoteProcessProxy.spark-context-initialization-mode", "none" ] } User Notebook w/ NB2KG Enterprise Gateway Resource Manager Kernel Request Kernel POST /api/kernels Submit via RM RM selects node Connection Info Request Kernel Info Kernel Info Reply Locate via RM POST Response Kernel shut down Shutdown Kernel DELETE /api/kernels/<id> Shutdown message sent to kernel Terminate via RM Confirm via RM DELETE Response Kernel Creation Kernel Shutdown Kernel Ready Websocket Est.

Transcript of Building an Enterprise Analytics Platform with Jupyter Enterprise Gateway · 2020. 7. 13. ·...

  • Building an Enterprise Analytics Platform with Jupyter Enterprise Gateway

    github.com/jupyter-incubator/enterprise_gateway

    Jupyter Notebook Platform Architecture

    Enterprise Analytics Platform Characteristics

    Shared Compute Resources• Enterprise Cloud, Public

    Cloud or Hybrid

    Distributed Consumers• Notebooks running local• Notebooks as services

    Different Utilization Patterns• High number of idle

    resources

    Jupyter Notebook Platform Limitations

    • Single-user sharing the same distributed filesystem and privileges

    • Resources are limited to single node• No security, users can see and control each other’s

    process

    Jupyter Enterprise Gateway addresses enterprise requirements and use cases

    Current Feature Set

    Optimized Resource Allocation• Run Spark in YARN Cluster Mode to better utilize cluster resources• Supports YARN, IBM Spectrum Conductor, Kubernetes resource

    managers• Pluggable architecture for additional resource managers

    Enhanced Security• Enable TLS for all socket communications• Any HTTP communication should be encrypted (SSL)

    Multiuser support with user impersonation • Enhance security and sandboxing by enabling user impersonation

    when running kernels.• Individual HDFS home folder for each notebook user.• Use the same user ID for notebook and batch jobs.

    Distributed Kernels == Optimized Resources

    8 8 8 8

    -20

    30

    80

    4 Nodes 8 Nodes 12 Nodes 16 Nodes

    Cluster Size (32GB Nodes)

    MAXIMUM NUMBER OF SIMULTANEOUS KERNELS

    16

    32

    48

    64

    0

    20

    40

    60

    80

    4 Nodes 8 Nodes 12 Nodes 16 NodesCluster Size (32GB Nodes)

    MAXIMUM NUMBER OF SIMULTANEOUS KERNELS

    • Notebook UI runs in the browser

    • The Notebook Server serves the ’Notebooks’

    • Kernels interpret/execute cell contents

    • Are responsible for code execution

    • Abstracts different languages

    J U P Y T E R E N T E R P R I S E G A T E W A Y

    RemoteKernel

    Manager

    DistributedProcess Proxy

    YARN ClusterProcess Proxy

    KubernetesProcess Proxy

    Conductor Cluster

    Process Proxy

    J U P Y T E R N O T E B O O K

    NB2KG Lab

    Jupyter Kernel Gateway

    Jupyter Notebook

    Spectrum Conductor

    SupportedPlatforms

    Jupyter Client

    Integrated with the Jupyter stack

    Remote Kernel Creation & ShutdownRemote Kernelspec{"language": "python","display_name": "Python on Kubernetes","process_proxy": {"class_name": "enterprise_gateway.services.processproxies.k8s.KubernetesProcessProxy","config": {"image_name": "elyra/kubernetes-kernel-py:dev"

    }},"env": {},"argv": ["python","/usr/local/share/jupyter/kernels/python_kubernetes/scripts/launch_kubernetes.py","{connection_file}","--RemoteProcessProxy.response-address","{response_address}","--RemoteProcessProxy.spark-context-initialization-mode","none"

    ]}

    User Notebook w/ NB2KG

    EnterpriseGateway

    Resource Manager Kernel

    Request Kernel POST /api/kernels Submit via RM RM selects node

    Connection Info

    Request Kernel InfoKernel Info Reply

    Locate via RM

    POST Response

    Kernel shut down

    Shutdown KernelDELETE /api/kernels/

    Shutdown message sent to kernel

    Terminate via RM

    Confirm via RMDELETE Response

    Kern

    el

    Crea

    tion

    Kern

    el

    Shut

    dow

    n

    Kernel ReadyWebsocket Est.