Connecting BinderHub to existing JupyterHub (installed with DaskGateway) (original) (raw)

@manics As suggested, added https://github.com/jupyterhub/binderhub/blob/c85cac4b56ae14389bfbd066016aaca6dfb6a41a/helm-chart/binderhub/values.yaml#L78-L211 and upgraded existing JupyterHub chart.

On browsing JupyterHUb URL we see message as Service Unavailable and hub pod is getting continuously restarted with below logs:

Loading /usr/local/etc/jupyterhub/secret/values.yaml
No config at /usr/local/etc/jupyterhub/existing-secret/values.yaml
Loading extra config: 0-binderspawnermixin
Loading extra config: 00-add-dask-gateway-values
Setting DASK_GATEWAY__ADDRESS http://proxy-public/services/dask-gateway
Adding dask-gateway service URL
Loading extra config: 00-binder
[I 2024-02-06 17:54:43.743 JupyterHub app:2775] Running JupyterHub version 3.0.0
[I 2024-02-06 17:54:43.743 JupyterHub app:2805] Using Authenticator: oauthenticator.github.GitHubOAuthenticator-15.1.0
[I 2024-02-06 17:54:43.743 JupyterHub app:2805] Using Spawner: builtins.BinderSpawner
[I 2024-02-06 17:54:43.743 JupyterHub app:2805] Using Proxy: jupyterhub.proxy.ConfigurableHTTPProxy-3.0.0
[I 2024-02-06 17:54:43.798 JupyterHub app:1934] Not using allowed_users. Any authenticated user will be allowed.
[I 2024-02-06 17:54:43.825 JupyterHub provider:653] Updating oauth client service-dask-gateway
[I 2024-02-06 17:54:43.854 JupyterHub app:2145] Found unexisting services binder in role definition binder
[E 2024-02-06 17:54:43.854 JupyterHub app:3297]
    Traceback (most recent call last):
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/app.py", line 3294, in launch_instance_async
        await self.initialize(argv)
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/app.py", line 2826, in initialize
        await self.init_role_assignment()
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/app.py", line 2157, in init_role_assignment
        raise ValueError(
    ValueError: services binder defined in config role definition binder but not present in database

Below is the values.yaml for further reference:

dask-gateway:
  enabled: true
  gateway:
    auth:
      type: jupyterhub
    backend:
      scheduler:
        cores:
          limit: 1
          request: 0.01
        extraPodConfig: null
        memory:
          limit: 1G
          request: 128M
      worker:
        extraContainerConfig:
          securityContext:
            runAsGroup: 1000
            runAsUser: 1000
        extraPodConfig:
          securityContext:
            fsGroup: 1000
    extraConfig:
      idle: |
        # timeout after 30 minutes of inactivity
        c.KubeClusterConfig.idle_timeout = 1800
      optionHandler: |
        from dask_gateway_server.options import Options, Integer, Float, String, Mapping
        import string

        # Escape a string to be dns-safe in the same way that KubeSpawner does it.
        # Reference https://github.com/jupyterhub/kubespawner/blob/616f72c4aee26c3d2127c6af6086ec50d6cda383/kubespawner/spawner.py#L1828-L1835
        # Adapted from https://github.com/minrk/escapism to avoid installing the package
        # in the dask-gateway api pod which would have been problematic.
        def escape_string_label_safe(to_escape):
            safe_chars = set(string.ascii_lowercase + string.digits)
            escape_char = "-"
            chars = []
            for c in to_escape:
                if c in safe_chars:
                    chars.append(c)
                else:
                    # escape one character
                    buf = []
                    # UTF-8 uses 1 to 4 bytes per character, depending on the Unicode symbol
                    # so we need to transform each byte to its hex value
                    for byte in c.encode("utf8"):
                        buf.append(escape_char)
                        # %X is the hex value of the byte
                        buf.append('%X' % byte)
                    escaped_hex_char = "".join(buf)
                    chars.append(escaped_hex_char)
            return u''.join(chars)

        def cluster_options(user):
            safe_username = escape_string_label_safe(user.name)
            def option_handler(options):
                if ":" not in options.image:
                    raise ValueError("When specifying an image you must also provide a tag")
                scheduler_extra_pod_annotations = {
                    "hub.jupyter.org/username": safe_username,
                    "prometheus.io/scrape": "true",
                    "prometheus.io/port": "8787",
                }
                extra_labels = {
                    "hub.jupyter.org/username": safe_username,
                }
                return {
                    "worker_cores_limit": options.worker_cores,
                    "worker_cores": options.worker_cores,
                    "worker_memory": "%fG" % options.worker_memory,
                    "image": options.image,
                    "scheduler_extra_pod_annotations": scheduler_extra_pod_annotations,
                    "scheduler_extra_pod_labels": extra_labels,
                    "worker_extra_pod_labels": extra_labels,
                    "environment": options.environment,
                }
            return Options(
                Integer("worker_cores", 4, min=1, label="Worker Cores"),
                Float("worker_memory", 8, min=1, label="Worker Memory (GiB)"),
                # The default image is set via DASK_GATEWAY__CLUSTER__OPTIONS__IMAGE env variable
                String("image", label="Image"),
                Mapping("environment", {}, label="Environment Variables"),
                handler=option_handler,
            )
        c.Backend.cluster_options = cluster_options
    prefix: /services/dask-gateway
  traefik:
    service:
      type: ClusterIP
jupyterhub:
  hub:
    config:
      Authenticator:
        admin_users:
        <Redacted>
      GitHubOAuthenticator:
        allowed_organizations:
        - <Redacted>
        client_id: <Redacted>
        client_secret: <Redacted>
        oauth_callback_url: https://<Redacted>/hub/oauth_callback
        scope:
        - read:org
      JupyterHub:
        authenticator_class: github
    loadRoles:
      binder:
        services:
          - binder
        scopes:
          - servers
          # we don't need admin:users if auth is not enabled!
          - "admin:users"
    extraConfig:
      0-binderspawnermixin: |
        """
        Helpers for creating BinderSpawners

        FIXME:
        This file is defined in binderhub/binderspawner_mixin.py
        and is copied to helm-chart/binderhub/values.yaml
        by ci/check_embedded_chart_code.py

        The BinderHub repo is just used as the distribution mechanism for this spawner,
        BinderHub itself doesn't require this code.

        Longer term options include:
        - Move BinderSpawnerMixin to a separate Python package and include it in the Z2JH Hub
          image
        - Override the Z2JH hub with a custom image built in this repository
        - Duplicate the code here and in binderhub/binderspawner_mixin.py
        """
        from tornado import web
        from traitlets import Bool, Unicode
        from traitlets.config import Configurable


        class BinderSpawnerMixin(Configurable):
            """
            Mixin to convert a JupyterHub container spawner to a BinderHub spawner

            Container spawner must support the following properties that will be set
            via spawn options:
            - image: Container image to launch
            - token: JupyterHub API token
            """

            def __init__(self, *args, **kwargs):
                # Is this right? Is it possible to having multiple inheritance with both
                # classes using traitlets?
                # https://stackoverflow.com/questions/9575409/calling-parent-class-init-with-multiple-inheritance-whats-the-right-way
                # https://github.com/ipython/traitlets/pull/175
                super().__init__(*args, **kwargs)

            auth_enabled = Bool(
                False,
                help="""
                Enable authenticated binderhub setup.

                Requires `jupyterhub-singleuser` to be available inside the repositories
                being built.
                """,
                config=True,
            )

            cors_allow_origin = Unicode(
                "",
                help="""
                Origins that can access the spawned notebooks.

                Sets the Access-Control-Allow-Origin header in the spawned
                notebooks. Set to '*' to allow any origin to access spawned
                notebook servers.

                See also BinderHub.cors_allow_origin in binderhub config
                for controlling CORS policy for the BinderHub API endpoint.
                """,
                config=True,
            )

            def get_args(self):
                if self.auth_enabled:
                    args = super().get_args()
                else:
                    args = [
                        "--ip=0.0.0.0",
                        f"--port={self.port}",
                        f"--NotebookApp.base_url={self.server.base_url}",
                        f"--NotebookApp.token={self.user_options['token']}",
                        "--NotebookApp.trust_xheaders=True",
                    ]
                    if self.default_url:
                        args.append(f"--NotebookApp.default_url={self.default_url}")

                    if self.cors_allow_origin:
                        args.append("--NotebookApp.allow_origin=" + self.cors_allow_origin)
                    # allow_origin=* doesn't properly allow cross-origin requests to single files
                    # see https://github.com/jupyter/notebook/pull/5898
                    if self.cors_allow_origin == "*":
                        args.append("--NotebookApp.allow_origin_pat=.*")
                    args += self.args
                    # ServerApp compatibility: duplicate NotebookApp args
                    for arg in list(args):
                        if arg.startswith("--NotebookApp."):
                            args.append(arg.replace("--NotebookApp.", "--ServerApp."))
                return args

            def start(self):
                if not self.auth_enabled:
                    if "token" not in self.user_options:
                        raise web.HTTPError(400, "token required")
                    if "image" not in self.user_options:
                        raise web.HTTPError(400, "image required")
                if "image" in self.user_options:
                    self.image = self.user_options["image"]
                return super().start()

            def get_env(self):
                env = super().get_env()
                if "repo_url" in self.user_options:
                    env["BINDER_REPO_URL"] = self.user_options["repo_url"]
                for key in (
                    "binder_ref_url",
                    "binder_launch_host",
                    "binder_persistent_request",
                    "binder_request",
                ):
                    if key in self.user_options:
                        env[key.upper()] = self.user_options[key]
                return env

      00-binder: |
        # image & token are set via spawn options
        from kubespawner import KubeSpawner

        class BinderSpawner(BinderSpawnerMixin, KubeSpawner):
            pass

        c.JupyterHub.spawner_class = BinderSpawner        
  ingress:
    annotations:
      cert-manager.io/cluster-issuer: incommon
      nginx.ingress.kubernetes.io/proxy-body-size: 600m
      nginx.ingress.kubernetes.io/proxy-read-timeout: "1800"
      nginx.ingress.kubernetes.io/proxy-send-timeout: "1800"
      nginx.ingress.kubernetes.io/rewrite-target: /
      nginx.ingress.kubernetes.io/secure-backends: "true"
      nginx.ingress.kubernetes.io/ssl-redirect: "true"
      nginx.ingress.kubernetes.io/websocket-services: proxy-public
      nginx.org/client-max-body-size: 10m
      nginx.org/websocket-services: proxy-public
    enabled: true
    hosts:
    - <Redacted>
    ingressClassName: nginx
    tls:
    - hosts:
      - <Redacted>
      secretName: https-auto-incommon
  proxy:
    secretToken: <Redacted>
    service:
      type: ClusterIP
  singleuser:
    extraEnv:
      DASK_DISTRIBUTED__DASHBOARD__LINK: $(JUPYTERHUB_SERVICE_PREFIX)proxy/{port}/status
      DASK_GATEWAY__CLUSTER__OPTIONS__IMAGE: $(JUPYTER_IMAGE_SPEC)
    image:
      name: pangeo/pangeo-notebook
      tag: 2023.05.18
    profileList:
    - default: true
      description: Start a container with the chosen specifications on a node of this
        type
      display_name: Pangeo Notebook
      kubespawner_override:
        cpu_limit: null
        mem_limit: null
      profile_options:
        requests:
          choices:
            mem_1:
              default: true
              display_name: ~1 GB, ~0.125 CPU
              kubespawner_override:
                cpu_guarantee: 0.013
                mem_guarantee: 0.904G
          display_name: Container Selection
      slug: pangeo
     - <Redacted>
    storage:
      extraVolumeMounts:
      - mountPath: /test/campaign
        name: campaign
        readOnly: true
      extraVolumes:
      - name: campaign
        nfs:
          path: /gpfs/<Redacted>
          server: <Redacted>

Please suggest.