ClusterManager

clustermanager (object, REQUIRED) section describes multiple configurable resource pools for emulation sessions. This is currently the most complex configuration section.

  • name (string, REQUIRED): Cluster manager’s symbolic name.
  • admin_api_access_token (string, REQUIRED): User-defined access token for the admin RESTful API. Currently it is a string consisting of arbitrary characters.
  • providers (array of objects, REQUIRED): List of resource provider configurations.

Example:

clustermanager:
    name: "example"
    admin_api_access_token: "super-secret-token"
    providers:
      - name: "provider-1"
        type: blades
        # ... blades specific configuration ...

      - name: "provider-2"
        type: gce
        # ... gce specific configuration ...

      - name: "provider-3"
        type: jclouds
        # ... jclouds specific configuration ...

ResourceProviders

Note

Durations are represented as integers followed by unit suffixes

  • For Milliseconds: ms, msec, msecs (e.g. 10ms, 250 msecs)
  • For Seconds: s, sec, secs (e.g. 1s, 15 secs)
  • For Minutes: m, min, mins (e.g. 1m, 3 mins)
  • For Hours: h, hour, hours (e.g. 1h, 2 hours)

Note

Resource specs are currently represented as objects with cpu and memory fields:

# Resource spec representing 2 CPU cores and 8GB RAM
node_capacity:
    cpu: 2
    memory: 8GB

CPU values can be specified as full or fractional number of CPU cores:

  • cpu: 0.5 will be parsed as half of a CPU core
  • cpu: 1 will be parsed as one full CPU core
  • cpu: 2.5 will be parsed as two and a half CPU cores

Memory values can be specified as integers followed by unit suffixes.

  • For Kilobytes: k, K, kb, KB (e.g. 512k, 750KB)
  • For Megabytes: m, M, mb, MB (e.g. 128m, 512MB)
  • For Gigabytes: g, G, gb, GB (e.g. 1g, 2GB)

Each of the supported resource provider types have multiple common configuration fields.

  • name (string, REQUIRED): Symbolic name of the resource provider.

  • type (string, REQUIRED): Provider’s type name. Valid values are blades, gce, jclouds

  • labels (object, OPTIONAL): Set of key/value pairs, interpreted as labels.

  • deferred_allocations_gc_interval (duration, OPTIONAL): Interval between checks for deferred allocations with expired deadlines.

  • request_history (object, OPTIONAL): Parameters for history management of resource allocation requests.

    • update_interval (duration, OPTIONAL): Interval between statistics update.
    • max_request_age (duration, OPTIONAL): Max. age of requests to keep in the history.
    • max_number_requests (integer, OPTIONAL): Max. number of requests to keep in the history.
  • preallocation (object, OPTIONAL): Parameters for preallocation decisions.

    • request_history_multiplier (float, OPTIONAL): Scaling factor used for calculation of resources to preallocate. Values smaller than 1.0 will result in smaller amounts of preallocated resources. Values greater than 1.0 will result in more aggressive preallocation.
    • min_bound (resource spec, OPTIONAL): Min. amount of resources to preallocate.
    • max_bound (resource spec, OPTIONAL): Max. amount of resources to preallocate.
  • poolscaler (object, REQUIRED): Parameters for resource provider’s poolscaler.

    • min_poolsize (integer, REQUIRED): Min. number of nodes to keep in the pool.

    • max_poolsize (integer, REQUIRED): Max. number of nodes allowed in the pool.

    • pool_scaling_interval (duration, OPTIONAL): Interval between poolscaler runs.

    • scaleup (object, OPTIONAL): Parameters for node pool’s scale-up actions.

      • max_poolsize_adjustment (integer, OPTIONAL): Max. number of nodes the pool can grow per interation.
    • scaledown (object, OPTIONAL): Parameters for node pool’s scale-down actions.

      • max_poolsize_adjustment (integer, OPTIONAL): Max. number of nodes the pool can shrink per interation.
      • node_warmup_period (duration, OPTIONAL): Min. amount of time after start to exclude a node from scale-down actions.
      • node_cooldown_period (duration, OPTIONAL): Max. amount of time in unused state to exclude a node from scale-down actions.
  • node_allocator (object, REQUIRED): Parameters for resource provider’s node allocator.

    • healthcheck (object, OPTIONAL): Parameters for node healtchecks.

      • url_template (string, OPTIONAL): HTTP URL to use for healthchecking.
      • connect_timeout (duration, OPTIONAL): Max. time for connecting to node.
      • read_timeout (duration, OPTIONAL): Max. time for reading from node.
      • failure_timeout (duration, OPTIONAL): Time in unreachable state to mark a node as failed.
      • interval (duration, OPTIONAL): Time between healtcheck runs.
      • num_parallel_requests (integer, OPTIONAL): Max. number of healthchecks to execute in parallel.

Additional resource provider’s type-specific configuration fields are described in the corresponding subsections (see blades, gce and jclouds).

Blades

Resource providers of type blades support the following configuration parameters in addition to parameters described in previous section:

  • node_allocator (object, REQUIRED):

    • node_capacity (resource spec, REQUIRED): Capacity of a single node in the node pool.
    • node_addresses (array of strings, REQUIRED): List of IP addresses of all nodes, that should be available in the resource provider’s node pool.

Example:

clustermanager:
    # ... other fields ...
    providers:
      - name: "example"
        type: blades
        # ... other fields ...

        node_allocator:
            # Each node in the node pool
            # has 16 cores and 16GB RAM
            node_capacity:
                cpu: 16
                memory: 16384MB

            # Node pool of 3 nodes
            node_addresses:
              - "192.168.178.1:8080"
              - "192.168.178.2:8080"
              - "192.168.178.3:8080"

Google Compute Engine

Resource providers of type gce support the following configuration parameters in addition to parameters described in previous section:

  • node_allocator (object, REQUIRED):

    • application_name (string, OPTIONAL): Symbolic name of the GCE application.

    • project_id (string, REQUIRED): ID of the GCE project.

    • zone_name (string, REQUIRED): Name of the zone where the node pool should be located. Currently available zones are listed here.

    • network_name (string, REQUIRED): Name of the network to use for the node pool. It is expected, that this network exists and is configured (see this overview).

    • node_name_prefix (string, OPTIONAL): Prefix to use for nodes added to the node pool.

    • credentials_file (string, REQUIRED): Absolute path to a JSON file containing credentials for the project specified with project_id. To get this file follow steps a-f from this guide.

    • vm (object, REQUIRED): Parameters for allocation of new VMs.

      • machine_type (string, REQUIRED): Name of the machine type, as specified here.

      • persistent_disk (object, REQUIRED): Parameters for the main persistent disk.

        • type (string, REQUIRED): Type of the boot disk. Use either pd-standard for hard-disk drives or pd-ssd for solid-state drives.
        • size (integer, REQUIRED): Size of the disk in GB.
        • image_url (string, REQUIRED): URL to a boot image.
      • accelerators (array of objects, OPTIONAL): List of GPUs to attach to each node. Each entry must provide the following fields:

        • type (string, REQUIRED): Type or name of the accelerator. Currently only nvidia-tesla-k80 is supported.
        • count (integer, REQUIRED): Number of accelerators to add to each node.
      • boot_poll_interval (duration, OPTIONAL): Time between checks for VM readiness during boot.

      • boot_poll_interval_delta (duration, OPTIONAL): Randomization factor for checks during VM boot.

      • max_num_boot_polls (duration, OPTIONAL): Max. number of boot checks to perform. If the VM is still not ready after that, it will be marked as failed.

    • api (object, OPTIONAL): Parameters for accessing rate-limited APIs

      • poll_interval (duration, OPTIONAL): Time between polling for results of async-operations.
      • poll_interval_delta (duration, OPTIONAL): Randomization factor for polling.
      • retry_interval (duration, OPTIONAL): Time between rate-limited requests.
      • retry_interval_delta (duration, OPTIONAL): Randomization factor for rate-limited requests.
      • max_num_retries (integer, OPTIONAL): Max. number of retries to consider a rate-limited request as failed.

Example:

clustermanager:
    # ... other fields ...
    providers:
      - name: "example"
        type: gce
        # ... other fields ...

        node_allocator:
            # General settings
            project_id: "eaas-123"
            zone_name: "europe-west1-d"
            network_name: "eaas-network"
            node_name_prefix: "eaas-prod-"
            credentials_file: "/path/to/eaas-credentials.json"

            # VM configuration
            vm:
                machine_type: n1-standard-1
                persistent_disk:
                    type: pd-standard
                    size: 10  # in GB
                    image_url: "projects/eaas-123/global/images/emucomp-prod-v2"

                # Request 1 GPU for each VM
                accelerators:
                  - type: nvidia-tesla-k80
                    count: 1

JClouds

Resource providers of type jclouds currently support only OpenStack-based backends. The following configuration parameters in addition to parameters described in previous section can be specified:

  • node_allocator (object, REQUIRED):

    • security_group_name (string, REQUIRED): Name of the security group to use for the node pool.

    • node_group_name (string, OPTIONAL): Symbolic name of the node pool.

    • node_name_prefix (string, OPTIONAL): Prefix to use for nodes created in the node pool.

    • provider (object, REQUIRED): Backend provider specific settings.

      • name (string, REQUIRED): Name of the backend. Currently openstack-nova
      • identity (string, REQUIRED): Tenant-ID, followed by : and full username as provided by the backend (see the example below).
      • credential (string, REQUIRED): User’s password for authentication.
      • endpoint (string, REQUIRED): Backend’s service endpoint.
    • vm (object, REQUIRED): Parameters for allocation of new VMs.

      • network_id (string, REQUIRED): ID of the network to add VMs to.
      • hardware_id (string, REQUIRED): ID of the hardware flavor.
      • image_id (string, REQUIRED): ID of the boot image.
      • boot_poll_interval (duration, OPTIONAL): Time between checks for VM readiness during boot.
      • boot_poll_interval_delta (duration, OPTIONAL): Randomization factor for checks during VM boot.
      • max_num_boot_polls (integer, OPTIONAL): Max. number of boot checks to perform. If the VM is still not ready after that, it will be marked as failed.

Example:

clustermanager:
    # ... other fields ...
    providers:
      - name: "example"
        type: jclouds
        # ... other fields ...
        
        node_allocator:
            security_group_name: "default"
            node_group_name: "eaas-nodes"
            node_name_prefix: "eaas-prod-"

            # Backend provider authentication parameters
            provider:
                name: openstack-nova
                endpoint: "https://cloud-provider.com:5000/v2.0/"
                identity: "eaas:user@cloud-provider.com"
                credential: "password"

            # Parameters for VM setup
            vm:
                network_id: "d017ffbd-bffb-4c66-a218-ff6dbd76d11d"
                hardware_id: "europe/4fd0073e-f743-4b2a-aca4-a9f0a8a72e52"
                image_id: "europe/b19b3966-c183-4c99-8ece-2bc452187b42"

Defaults

clustermanager:
    name: "default"
    providers.defaults:
        all:
            # Parameters for all provider types
            deferred_allocations_gc_interval: 30 secs
            request_history:
                update_interval: 30 secs
                max_request_age: 5 mins
                max_number_requests: 128

            node_allocator:
                healthcheck:
                    num_parallel_requests: 4
                    connect_timeout: 10 secs
                    read_timeout: 5 secs
                    failure_timeout: 2 min
                    interval: 30 secs

            poolscaler:
                pool_scaling_interval: 1 min
                scaledown:
                    node_warmup_period: 10 mins
                    node_cooldown_period: 5 mins

            preallocation:
                min_bound: { cpu: 0, memory: 0 }
                max_bound: { cpu: +inf, memory: +inf }
                request_history_multiplier: 0.5

        blades:
            # Parameters for the BladeCluster provider
            node_allocator:
                healthcheck:
                    url_template: "http://{{address}}/emucomp/health"

            poolscaler:
                scaleup:
                    max_poolsize_adjustment: 100

                scaledown:
                    max_poolsize_adjustment: 100


        gce:
            # Parameters for the Google Compute Engine provider
            node_allocator:
                application_name: eaas
                node_name_prefix: eaas-node-

                vm:
                    boot_poll_interval: 3 secs
                    boot_poll_interval_delta: 2 secs
                    max_num_boot_polls: 50
                    machine_type: n1-standard-4
                    persistent_disk:
                        type: pd-standard
                        size: 10  # in GB

                api:
                    poll_interval: 2 secs
                    poll_interval_delta: 1 secs
                    retry_interval: 1 secs
                    retry_interval_delta: 2 secs
                    max_num_retries: 3

                healthcheck:
                    url_template: "http://{{address}}/emucomp/health"

            poolscaler:
                scaleup:
                    max_poolsize_adjustment: 10

                scaledown:
                    max_poolsize_adjustment: 20


        jclouds:
            # Parameters for the JClouds provider
            node_allocator:
                node_group_name: eaas-nodes
                node_name_prefix: eaas-node-

                vm:
                    boot_poll_interval: 3 secs
                    boot_poll_interval_delta: 2 secs
                    max_num_boot_polls: 50

                healthcheck:
                    url_template: "http://{{address}}/emucomp/health"

            poolscaler:
                scaleup:
                    max_poolsize_adjustment: 10

                scaledown:
                    max_poolsize_adjustment: 20