OpenShift 4.3 – Configuring Metering to use AWS Billing information

My task is to figure out how to configure the Metering correlating AWS billing. The OpenShift documentation in the reference is where I started. I decided to record the end-to-end steps on how I set this up since there were some lessons learned in the process of it. I hope this helps you to set up the Metering with AWS billing much smoother.

Prerequisites:

Setting up AWS Report

  1. Before creating anything, you need to have data in the Billing & Cost Management Dashboard already.
  2. If you have a brand new account, you may have to wait until you get some data to show up before you proceed. You will have to have access to Cost & Usage Report​ under AWS Billing to set up the report.
  3. Log in to AWS, go to My Billing Dashboard
  4. Click Cost & Usage Reports
  5. Click Create reports
  6. Provide a name and check Include resource IDs
  7. Click Next
  8. Click Configure → add S3 bucket name and Region-> click Next
  9. Provide `prefix` and select your options for your report → Click Next
  10. Once you created a report similar to the followingScreen Shot 2020-04-28 at 7.29.23 PM.png
  11. Click onto the S3 bucket and validate reports are being created under the folder.
  12. Click Permissions tab
  13. Click Bucket Policy
  14. Copy and save the bucket policy somewhere you can get back to

Setting up AWS user permission policy

  1. Go to My Security Credentials
  2. Click Users → Click the username name will be used for accessing the reports and for OpenShift metering.
  3. Click Add Permissions→ Attach existing policies directly​ → Create policy → click JSON
  4. Paste the buck policy from the Cost & Usage report from the s3 bucket step #14 in the last session.
  5. Use the same step to add the following policy:
    { 
      "Version": "2012-10-17", 
      "Statement": [ 
      { 
        "Sid": "1", 
        "Effect": "Allow", 
        "Action": [ 
          "s3:AbortMultipartUpload", 
          "s3:DeleteObject", 
          "s3:GetObject", 
          "s3:HeadBucket", 
          "s3:ListBucket", 
          "s3:ListMultipartUploadParts", 
          "s3:PutObject" 
         ], 
         "Resource": [ 
            "arn:aws:s3:::<YOUR S3 BUCKET NAME FOR BILLING REPORT>/*",  
            "arn:aws:s3:::<YOUR S3 BUCKET NAME FOR BILLING REPORT>"  
          ] 
        } 
        ] 
    }
  6. Since I am using s3 bucket for metering storage, I also added the following policy to the user:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "1",
                "Effect": "Allow",
                "Action": [
                    "s3:AbortMultipartUpload",
                    "s3:DeleteObject",
                    "s3:GetObject",
                    "s3:HeadBucket",
                    "s3:ListBucket",
                    "s3:CreateBucket",
                    "s3:DeleteBucket",
                    "s3:ListMultipartUploadParts",
                    "s3:PutObject"
                ],
                "Resource": [
                    "arn:aws:s3:::<YOUR S3 BUCKET NAME FOR METERING STORAGE>/*",
                    "arn:aws:s3:::<YOUR S3 BUCKET NAME FOR METERING STORAGE>"
                ]
            }
        ]
    }

Configuration:

Install Metering Operator

  1. Login OpenShift Container Platform web console as cluster-admin, click AdministrationNamespacesCreate Namespace
  2. Enter openshift-metering
  3. Add openshift.io/cluster-monitoring=true as label → click Create.
  4. Click ComputeMachine Sets
  5. If you are like me, the cluster is using the default configuration on AWS. In my test, I increase 1 more worker per AZs.
  6. I notice that one pod for Metering requires more resources, and the standard size may not be big enough. I create an m5.2xlarge machine set. I only need 1 replica for this machineset.
    1. Create a template machine-set YAML:
      oc project openshift-machine-api
      oc get machineset poc-p6czj-worker-us-west-2a -o yaml > m52xLms.yaml
    2. Modify the YAML file by updating the name of the machine set and instance type, removing the status, timestamp, id, selflink, etc… Here is my example of a machine set for m5.2xlarge.
      apiVersion: machine.openshift.io/v1beta1
      kind: MachineSet
      metadata:
        labels:
          machine.openshift.io/cluster-api-cluster: poc-p6czj
        name: poc-p6czj-xl-worker-us-west-2a
        namespace: openshift-machine-api
      spec:
        replicas: 1
        selector:
          matchLabels:
            machine.openshift.io/cluster-api-cluster: poc-p6czj
            machine.openshift.io/cluster-api-machineset: poc-p6czj-xl-worker-us-west-2a
        template:
          metadata:
            creationTimestamp: null
            labels:
              machine.openshift.io/cluster-api-cluster: poc-p6czj
              machine.openshift.io/cluster-api-machine-role: worker
              machine.openshift.io/cluster-api-machine-type: worker
              machine.openshift.io/cluster-api-machineset: poc-p6czj-xl-worker-us-west-2a
          spec:
            metadata:
              creationTimestamp: null
            providerSpec:
              value:
                ami:
                  id: ami-0f0fac946d1d31e97
                apiVersion: awsproviderconfig.openshift.io/v1beta1
                blockDevices:
                - ebs:
                    iops: 0
                    volumeSize: 120
                    volumeType: gp2
                credentialsSecret:
                  name: aws-cloud-credentials
                deviceIndex: 0
                iamInstanceProfile:
                  id: poc-p6czj-worker-profile
                instanceType: m5.2xlarge
                kind: AWSMachineProviderConfig
                metadata:
                  creationTimestamp: null
                placement:
                  availabilityZone: us-west-2a
                  region: us-west-2
                publicIp: null
                securityGroups:
                - filters:
                  - name: tag:Name
                    values:
                    - poc-p6czj-worker-sg
                subnet:
                  filters:
                  - name: tag:Name
                    values:
                    - poc-p6czj-private-us-west-2a
                tags:
                - name: kubernetes.io/cluster/poc-p6czj
                  value: owned
                userDataSecret:
                  name: worker-user-data
    3. run:
      oc create -f m52xLms.yaml
      # wait for the new machine for m5.2xlarge created
      oc get machineset
  7. Create a secret to access the AWS account and make sure you are cluster-admin and run the following commands:
    oc project openshift-metering
    oc create secret -n openshift-metering generic my-aws-secret --from-literal=aws-access-key-id=<YOUR AWS KEY> --from-literal=aws-secret-access-key=<YOUR AWS SECRET>
  8. Back to Console, click OperatorsOperatorHub and type ‘metering` in the filter to find the Metering Operator.
  9. Click the Metering (provided by Red Hat), review the package description, and then click install.
  10. Under Installation Mode, select openshift-metering as namespace. Specify your update channel and approval strategy, then click Subscribe to install Metering.
  11. Click Installed Operators from the left menu, wait for Succeeded as status is shown next to the Metering Operator.
  12. Click WorkloadsPods → metering operator pod is in Running state
  13. Go back to your terminal, run:
    oc project openshift-metering
  14. We are now ready to create the MeteringConfig Object. Create a file `metering-config.yaml` as shown below. See the reference for more details of the MeteringConfig object.
    apiVersion: metering.openshift.io/v1
    kind: MeteringConfig
    metadata:
      name: operator-metering
      namespace: openshift-metering
    spec:
      openshift-reporting:
        spec:
          awsBillingReportDataSource:
            enabled: true
            bucket: "logs4reports"
            prefix: "bubble/ocpreports/"
            region: "us-west-2"
      storage:
        type: hive
        hive:
          s3:
            bucket: shanna-meter/demo
            createBucket: true
            region: us-west-2
            secretName: my-aws-secret
          type: s3
      presto:
        spec:
          config:
            aws:
              secretName: my-aws-secret
      hive:
        spec:
          config:
            aws:
              secretName: my-aws-secret
      reporting-operator:
        spec:
          config:
            aws:
              secretName: my-aws-secret
          resources:
            limits:
              cpu: 1
              memory: 500Mi
            requests:
              cpu: 500m
              memory: 100Mi
  15. Create MeteringConfig:
    oc create -f metering-config.yaml
  16. To monitor the process:
    watch 'oc get pod'
  17. Wait until you see all pods are up and running:
    $ oc get pods
    NAME                              READY STATUS   RESTARTS AGE
    hive-metastore-0                   2/2   Running 0        2m35s
    hive-server-0                      3/3   Running 0        2m36s
    metering-operator-69b664dc57-knd86 2/2   Running 0        31m
    presto-coordinator-0               2/2   Running 0        2m8s
    reporting-operator-674cb5d7b-zxwf4 1/2   Running 0        96s
  18. Verify the AWS report data source:
    $ oc get reportdatasource |grep aws
    aws-billing                                                                                                                                                     3m41s
    aws-ec2-billing-data-raw
  19. Verify the AWS report queries:
    $ oc get reportquery |grep aws
    aws-ec2-billing-data                         5m19s
    aws-ec2-billing-data-raw                     5m19s
    aws-ec2-cluster-cost                         5m19s
    pod-cpu-request-aws                          5m19s
    pod-cpu-usage-aws                            5m19s
    pod-memory-request-aws                       5m18s
    pod-memory-usage-aws                         5m18s

    For more information about the ReportDataSource and the ReportQuery​, please check out the GitHub link in the reference.

  20. Create reports to get AWS billing from the following YAML:
    apiVersion: metering.openshift.io/v1
    kind: Report
    metadata:
      name: pod-cpu-request-billing-run-once
    spec:
      query: "pod-cpu-request-aws"
      reportingStart: '2020-04-12T00:00:00Z'
      reportingEnd: '2020-04-30T00:00:00Z'
      runImmediately: true
    ---
    apiVersion: metering.openshift.io/v1
    kind: Report
    metadata:
      name: pod-memory-request-billing-run-once
    spec:
      query: "pod-memory-request-aws"
      reportingStart: '2020-04-12T00:00:00Z'
      reportingEnd: '2020-04-30T00:00:00Z'
      runImmediately: true
  21. Create reports (status as `RunImmediately`):
    $ oc create -f aws-reports.yaml
    $ oc get reports
    NAME                                  QUERY                    SCHEDULE   RUNNING          FAILED   LAST REPORT TIME   AGE
    pod-cpu-request-billing-run-once      pod-cpu-request-aws                 RunImmediately                               5s
    pod-memory-request-billing-run-once   pod-memory-request-aws              RunImmediately                               5s
  22. Wait until reports are completed (status as `Finished`):
    $ oc get reports
    NAME                                  QUERY                    SCHEDULE   RUNNING    FAILED   LAST REPORT TIME       AGE
    pod-cpu-request-billing-run-once      pod-cpu-request-aws                 Finished            2020-04-30T00:00:00Z   79s
    pod-memory-request-billing-run-once   pod-memory-request-aws              Finished            2020-04-30T00:00:00Z   79s
  23. I created a simple script (viewReport.sh) as shown below to view any report which requires $1 as the name of the report from oc get reports
    reportName=$1
    reportFormat=csv
    token="$(oc whoami -t)"
    meteringRoute="$(oc get routes metering -o jsonpath='{.spec.host}')"
    curl --insecure -H "Authorization: Bearer ${token}" "https://${meteringRoute}/api/v1/reports/get?name=${reportName}&namespace=openshift-metering&format=$reportFormat"
  24. Before running the script, please make sure you get a validate token via oc whoami -t
  25. View report by run the simple script in step #23:
    ./viewReport.sh pod-cpu-request-billing-run-once
    period_start,period_end,pod,namespace,node,pod_request_cpu_core_seconds,pod_cpu_usage_percent,pod_cost
    2020-04-12 00:00:00 +0000 UTC,2020-04-30 00:00:00 +0000 UTC,alertmanager-main-0,openshift-monitoring,ip-10-0-174-47.us-west-2.compute.internal,792.000000,0.006587,
    2020-04-12 00:00:00 +0000 UTC,2020-04-30 00:00:00 +0000 UTC,alertmanager-main-1,openshift-monitoring,ip-10-0-138-24.us-west-2.compute.internal,792.000000,0.006587,
    2020-04-12 00:00:00 +0000 UTC,2020-04-30 00:00:00 +0000 UTC,alertmanager-main-2,openshift-monitoring,ip-10-0-148-172.us-west-2.compute.internal,792.000000,0.006587,
    2020-04-12 00:00:00 +0000 UTC,2020-04-30 00:00:00 +0000 UTC,apiserver-9dhcr,openshift-apiserver,ip-10-0-157-2.us-west-2.compute.internal,1080.000000,0.008982,
    2020-04-12 00:00:00 +0000 UTC,2020-04-30 00:00:00 +0000 UTC,apiserver-fr7w5,openshift-apiserver,ip-10-0-171-27.us-west-2.compute.internal,1080.000000,0.008982,
    2020-04-12 00:00:00 +0000 UTC,2020-04-30 00:00:00 +0000 UTC,apiserver-sdlsj,openshift-apiserver,ip-10-0-139-242.us-west-2.compute.internal,1080.000000,0.008982,
    2020-04-12 00:00:00 +0000 UTC,2020-04-30 00:00:00 +0000 UTC,apiservice-cabundle-injector-54ff756f6d-f4vl6,openshift-service-ca,ip-10-0-157-2.us-west-2.compute.internal,72.000000,0.000599,
    2020-04-12 00:00:00 +0000 UTC,2020-04-30 00:00:00 +0000 UTC,authentication-operator-6d865c4957-2jsql,openshift-authentication-operator,ip-10-0-171-27.us-west-2.compute.internal,72.000000,0.000599,
    2020-04-12 00:00:00 +0000 UTC,2020-04-30 00:00:00 +0000 UTC,catalog-operator-868fd6ddb5-rmfk7,openshift-operator-lifecycle-manager,ip-10-0-139-242.us-west-2.compute.internal,72.000000,0.000599,
    2020-04-12 00:00:00 +0000 UTC,2020-04-30 00:00:00 +0000 UTC,certified-operators-58874b4f86-rcbsl,openshift-marketplace,ip-10-0-148-172.us-west-2.compute.internal,20.400000,0.000170,
    2020-04-12 00:00:00 +0000 UTC,2020-04-30 00:00:00 +0000 UTC,certified-operators-5b86f97d6f-pcvqk,openshift-marketplace,ip-10-0-148-172.us-west-2.compute.internal,16.800000,0.000140,
    2020-04-12 00:00:00 +0000 UTC,2020-04-30 00:00:00 +0000 UTC,certified-operators-5fdf46bd6d-hhtqd,openshift-marketplace,ip-10-0-148-172.us-west-2.compute.internal,37.200000,0.000309,
    2020-04-12 00:00:00 +0000 UTC,2020-04-30 00:00:00 +0000 UTC,cloud-credential-operator-868c5f9f7f-tw5pn,openshift-cloud-credential-operator,ip-10-0-157-2.us-west-2.compute.internal,72.000000,0.000599,
    2020-04-12 00:00:00 +0000 UTC,2020-04-30 00:00:00 +0000 UTC,cluster-autoscaler-operator-74b5d8858b-bwtfc,openshift-machine-api,ip-10-0-139-242.us-west-2.compute.internal,144.000000,0.001198,
    2020-04-12 00:00:00 +0000 UTC,2020-04-30 00:00:00 +0000 UTC,cluster-image-registry-operator-9754995-cqm7v,openshift-image-registry,ip-10-0-139-242.us-west-2.compute.internal,144.000000,0.001198,
    ...
  26. The output from the preview steps are not too readable. Instead, I downloaded the output from the previous step into a file.
    ./viewReport.sh pod-cpu-request-billing-run-once > aws-pod-cpu-billing.txt
  27. Import the output file into a spreadsheet as shown below: Screen Shot 2020-04-30 at 11.02.25 PM.png

Troubleshoot:

The most useful log is the report operator log for debugging any report issues.

Reference:

OpenShift metering documentation: https://docs.openshift.com/container-platform/4.3/metering/metering-about-metering.html

Configure AWS Billing Correlation: https://docs.openshift.com/container-platform/4.6/metering/configuring_metering/metering-configure-aws-billing-correlation.html

Addition information: https://github.com/kube-reporting/metering-operator/blob/master/Documentation/metering-architecture.md

OpenShift4.3: Retest Static IP configuration on vSphere

Lesson learned from the last test (https://shanna-chan.blog/2019/07/26/openshift4-vsphere-static-ip/), and I got questions around clarification on using static IP. My apologies for the confusion from my last test since it was my test without any real documentation. I want to record all my errors so I can help others to troubleshoot.

Anyway, I decided to retest the installation of OCP 4.3 using static IP. The goal to clarify the installation instructions my last note from the last blog if you are trying to install OCP4 on the VMware environment manually using static IP.

Environment:

Screen Shot 2020-03-16 at 2.22.46 PM.png

  • OCP 4.3.5
  • vSphere 6.7

 

List of VMs:

  • Bootstrap 192.168.1.110
  • Master0 192.168.1.111
  • Master1 192.168.1.112
  • Master2 192.168.1.113
  • Worker0 192.168.1.114
  • Worker1 192.168.1.115

Prerequisites:

The following components are already running in my test environment.

DNS Server

  1. Add Zone /etc/named.conf. An example can be found here https://github.com/christianh814/openshift-toolbox/blob/master/ocp4_upi/docs/0.prereqs.md#dns
  2. Configures the zone files for all the DNS entries. An example configuration is shown below.
    ; The api points to the IP of your load balancer
    api.ocp43	IN	A	192.168.1.72
    api-int.ocp43	IN	A	192.168.1.72
    ;
    ; The wildcard also points to the load balancer
    *.apps.ocp43	IN	A	192.168.1.72
    ;
    ; Create entry for the bootstrap host
    bootstrap0.ocp43	IN	A	192.168.1.110
    ;
    ; Create entries for the master hosts
    master01.ocp43	IN	A	192.168.1.111
    master02.ocp43	IN	A	192.168.1.112
    master03.ocp43	IN	A	192.168.1.113
    ;
    ; Create entries for the worker hosts
    worker01.ocp43	IN	A	192.168.1.114
    worker02.ocp43	IN	A	192.168.1.115
    ;
    ; The ETCd cluster lives on the masters...so point these to the IP of the masters
    etcd-0.ocp43	IN	A	192.168.1.111
    etcd-1.ocp43	IN	A	192.168.1.112
    etcd-2.ocp43	IN	A	192.168.1.113
    ;
    ; The SRV records are IMPORTANT....make sure you get these right...note the trailing dot at the end...
    _etcd-server-ssl._tcp.ocp43	IN	SRV	0 10 2380 etcd-0.ocp43.example.com.
    _etcd-server-ssl._tcp.ocp43	IN	SRV	0 10 2380 etcd-1.ocp43.example.com.
    _etcd-server-ssl._tcp.ocp43	IN	SRV	0 10 2380 etcd-2.ocp43.example.com.

Load balancer

  1. Update /etc/haproxy/haproxy.cfg with cluster information. An example is shown below.
    #---------------------------------------------------------------------
    
    listen stats
        bind *:9000
        mode http
        stats enable
        stats uri /
        monitor-uri /healthz
    
    #---------------------------------------------------------------------
    #Cluster ocp43 - static ip test
    frontend openshift-api-server
        bind *:6443
        default_backend openshift-api-server
        mode tcp
        option tcplog
    
    backend openshift-api-server
        balance source
        mode tcp
        #server bootstrap0.ocp43.example.com 192.168.1.110:6443 check
        server master01.ocp43.example.com 192.168.1.111:6443 check
        server master02.ocp43.example.com 192.168.1.112:6443 check
        server master03.ocp43.example.com 192.168.1.113:6443 check
    
    frontend machine-config-server
        bind *:22623
        default_backend machine-config-server
        mode tcp
        option tcplog
    
    backend machine-config-server
        balance source
        mode tcp
        # server bootstrap0.ocp43.example.com 192.168.1.110:22623 check
        server master01.ocp43.example.com 192.168.1.111:22623 check
        server master02.ocp43.example.com 192.168.1.112:22623 check
        server master03.ocp43.example.com 192.168.1.113:22623 check
    
    frontend ingress-http
        bind *:80
        default_backend ingress-http
        mode tcp
        option tcplog
    
    backend ingress-http
        balance source
        mode tcp
        server worker01.ocp43.example.com 192.168.1.114:80 check
        server worker02.ocp43.example.com 192.168.1.115:80 check
    
    frontend ingress-https
        bind *:443
        default_backend ingress-https
        mode tcp
        option tcplog
    
    backend ingress-https
        balance source
        mode tcp
        server worker01.ocp43.example.com 192.168.1.114:443 check
        server worker02.ocp43.example.com 192.168.1.115:443 check

Web Server

  1. Configure a web server. In my example, I configure httpd on an RHEL VM.
yum -y install httpd
systemctl enable --now httpd
firewall-cmd --add-service=8080/tcp --permanent
firewall-cmd --reload

Installation downloads

Installation Using Static IP address

Prepare installation

  1. Generate SSH key:
    $ ssh-keygen -t rsa -b 4096 -N '' -f ~/.ssh/vsphere-ocp43
  2. Start ssh-agent:
    $ eval "$(ssh-agent -s)"
  3.  Add ssh private key to the ssh-agent:
    $ ssh-add ~/.ssh/vsphere-ocp43
    Identity added: /Users/shannachan/.ssh/vsphere-ocp43 (shannachan@MacBook-Pro)
  4. Download & extract OpenShift Installer:
    wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/4.3.5/openshift-install-mac-4.3.5.tar.gz
    tar zxvf openshift-install-mac-4.3.5.tar.gz
  5. Download & extract OpenShift CLI:
    wget wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/4.3.5/openshift-client-mac-4.3.5.tar.gz
    tar zxvf openshift-client-mac-4.3.5.tar.gz
  6. Copy or download the pull secret from cloud.redhat.com
    1. Go to cloud.redhat.com
    2. Login with your credential (create an account if you don’t have one)
    3. Click “Create Cluster”
    4. Click OpenShift Container Platform
    5. Scroll down and click “VMware vSphere”
    6. Click on “Download Pull Secret” to download the secret

Create Installation manifests and ignition files

  1. Create an installation directory:
    mkdir ocp43
  2. Create `install-config.yaml` as shown below.
    apiVersion: v1
    baseDomain: example.com
    compute:
    - name: worker
      replicas: 0
    controlPlane:
      hyperthreading: Enabled
      name: master
      replicas: 3
    metadata:
      name: ocp43
    platform:
      vsphere:
        vcenter: 192.168.1.200
        username: vsphereadmin
        password: xxxx
        datacenter: Datacenter
        defaultDatastore: datastore3T
    pullSecret: '<copy your pull secret here>'
    sshKey: '<copy your public key here>'
  3. Backup install-config.yaml  and copy it into the installation directory
  4. Generate Kubernetes manifests for the cluster:
    $./openshift-install create manifests --dir=./ocp43
    INFO Consuming Install Config from target directory
    WARNING Making control-plane schedulable by setting MastersSchedulable to true for Scheduler cluster settings
  5. Modify <installation directory>/manifests/cluster-scheduler-02-config.yml
  6. Update mastersSchedulable to false
  7. Obtain Ignition files:
    $ ./openshift-install create ignition-configs --dir=./ocp43
    INFO Consuming Common Manifests from target directory
    INFO Consuming Worker Machines from target directory
    INFO Consuming Master Machines from target directory
    INFO Consuming OpenShift Install (Manifests) from target directory
    INFO Consuming Openshift Manifests from target directory
  8. Files that were created:
    $ tree ocp43
    ocp43
    ├── auth
    │   ├── kubeadmin-password
    │   └── kubeconfig
    ├── bootstrap.ign
    ├── master.ign
    ├── metadata.json
    └── worker.ign

Upload files to the webserver

  1. Upload the rhcos-4.3.0-x86_64-metal.raw.gz to web server location
  2. Upload all the ignition files to the webserver location
  3. Update the file permission on the *.ign files on the webserver:
    chmod 644 *.ign

Note: check and make sure that you can download the ignition files and gz file for the webserver.

Custom ISO

Create all custom ISO files with the parameters that you need for each VMs. This step can skip if you plan to type all the kernel parameters by hand when prompt.

  1. Download rhcos-4.3.0-x86_64-installer.iso and rhcos-4.3.0-x86_64-metal.raw.gz
  2. Extract ISO to a temporary location:
    sudo mount rhcos-410.8.20190425.1-installer.iso /mnt/ 
    mkdir /tmp/rhcos 
    rsync -a /mnt/* /tmp/rhcos/ 
    cd /tmp/rhcos 
    vi isolinux/isolinux.cfg
  3. Modify the boot entry similar to this:
    label linux
      menu label ^Install RHEL CoreOS
      kernel /images/vmlinuz
      append initrd=/images/initramfs.img nomodeset rd.neednet=1 coreos.inst=yes ip=192.168.1.110::192.168.1.1:255.255.255.0:bootstrap0.ocp43.example.com:ens192:none nameserver=192.168.1.188 coreos.inst.install_dev=sda coreos.inst.image_url=http://192.168.1.230:8080/rhcos-4.3.0-x86_64-metal.raw.gz coreos.inst.ignition_url=http://192.168.1.230:8080/bootstrap.ign

    where:

    ip=<ip address of the VM>::<gateway>:<netmask>:<hostname of the VM>:<interface>:none

    nameserver=<DNS>

    coreos.inst.image_url=http://<webserver host:port>/rhcos-4.3.0-x86_64-metal.raw.gz

    coreos.inst.ignition_url=http://<webserver host:port>/<bootstrap, master or worker ignition>.ign

  4. Create new ISO as /tmp/rhcos_install.iso:
    sudo mkisofs -U -A "RHCOS-x86_64" -V "RHCOS-x86_64" -volset "RHCOS-x86_64" -J -joliet-long -r -v -T -x ./lost+found -o /tmp/rhcos_install.iso -b isolinux/isolinux.bin -c isolinux/boot.cat -no-emul-boot -boot-load-size 4 -boot-info-table -eltorito-alt-boot -e images/efiboot.img -no-emul-boot .
  5.  Upload all the custom ISOs to the datastore for VM creation via vCenter
  6. You will repeat the steps for all VMs with the specific IP and ign file. You only need to create individual VM for the cluster if you don’t want to type the kernel parameters at the prompt when installing via the ISO. I would recommend that since it actually takes less time to do that than typing the kernel parameters each time.

Create VM using custom ISO

  1. Create a resource folder
    • Action -> New folder -> New VM or Template folder
    • I normally give the name as the cluster id
  2. Create VM with 4 CPU and 16 RAM
    • Action -> New Virtual Machine
    • Select Create New Virtual Machine -> click Next
    • Add name
    • Select the VM folder -> Next
    • Select datacenter -> Next
    • Select storage -> Next
    • Use ESXi 6.7 -> Next
    • Select Linux and RHEL 7 -> Next
    • Use these parameters:
      • CPU: 4
      • Memory: 16 (Reserve all guest memory)
      • 120 GB disk
      • Select the corresponding ISO from Datastore and check “connect”
      • VMOption -> advantage -> Edit configuration -> Add configuration Params -> Add “disk.EnableUUID”: Specify TRUE
      • Click OK
      • Click Next
      • Click Finish
  3. Power on the bootstrap, masters and workers VMs as the steps below
  4. Go the VM console: Screen Shot 2020-03-04 at 12.27.44 PM.png
  5. Hit Enter
  6. You should see the login screen once the VM boots successfully Screen Shot 2020-03-04 at 12.34.04 PM.png
  7. repeat on all servers and make sure the specific ISO for the given VM is used.

Tips: you can clone the existing VM and just modify the ISO files for VM creation.

Creating Cluster

  1. Monitor the cluster:
    ./openshift-install --dir=<installation_directory> wait-for bootstrap-complete --log-level=info
    INFO Waiting up to 30m0s for the Kubernetes API at https://api.ocp43.example.com:6443...
    INFO API v1.16.2 up
    INFO Waiting up to 30m0s for bootstrapping to complete...
    INFO It is now safe to remove the bootstrap resources
  2.  From the bootstrap VM, similar log messages are shown:
    $ ssh -i ~/.ssh/vsphere-ocp43 core@bootstrap-vm
    $ journalctl -b -f -u bootkube.service
    ...
    Mar 16 20:03:57 bootstrap0.ocp43.example.com bootkube.sh[2816]: Tearing down temporary bootstrap control plane...
    Mar 16 20:03:57 bootstrap0.ocp43.example.com podman[18629]: 2020-03-16 20:03:57.232567868 +0000 UTC m=+726.128069883 container died 695412d7eece5a9bd099aac5b6bc6a8d412c8037b14391ff54ee33132ebce0e1 (image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:222fbfd3323ec347babbda1a66929019221fcee82cfc324a173b39b218cf6c4b, name=zen_lamarr)
    Mar 16 20:03:57 bootstrap0.ocp43.example.com podman[18629]: 2020-03-16 20:03:57.379721836 +0000 UTC m=+726.275223886 container remove 695412d7eece5a9bd099aac5b6bc6a8d412c8037b14391ff54ee33132ebce0e1 (image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:222fbfd3323ec347babbda1a66929019221fcee82cfc324a173b39b218cf6c4b, name=zen_lamarr)
    Mar 16 20:03:57 bootstrap0.ocp43.example.com bootkube.sh[2816]: bootkube.service complete
  3. Load balancer status
  4. Remove the bootstrap from the Load Balancer. You can check the status of LB from the status page

LB.png

 

Logging in to the Cluster

  1.  Export the kubeadmiin credentials:
    export KUBECONFIG=./ocp43/auth/kubeconfig
  2.  Verify cluster role via oc CLI
    $ oc whoami
    system:admin
  3. Approving the CSRs
    $ oc get nodes
    NAME                         STATUS   ROLES    AGE   VERSION
    master01.ocp43.example.com   Ready    master   60m   v1.16.2
    master02.ocp43.example.com   Ready    master   60m   v1.16.2
    master03.ocp43.example.com   Ready    master   60m   v1.16.2
    worker01.ocp43.example.com   Ready    worker   52m   v1.16.2
    worker02.ocp43.example.com   Ready    worker   51m   v1.16.2
    
    $ oc get csr
    NAME        AGE   REQUESTOR                                                                   CONDITION
    csr-66l6l   60m   system:node:master02.ocp43.example.com                                      Approved,Issued
    csr-8r2dc   52m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
    csr-hvt2d   51m   system:node:worker02.ocp43.example.com                                      Approved,Issued
    csr-k2ggg   60m   system:node:master03.ocp43.example.com                                      Approved,Issued
    csr-kg72s   52m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
    csr-qvbg2   60m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
    csr-rtncq   52m   system:node:worker01.ocp43.example.com                                      Approved,Issued
    csr-tsfxx   60m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
    csr-wn7rp   60m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
    csr-zl87q   60m   system:node:master01.ocp43.example.com                                      Approved,Issued
  4. If there is pending CSR, approve the CSR via the command below.
    oc adm certificate approve <csr_name>
  5.  Validate the cluster components all available:
    $ oc get co
    NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
    authentication                             4.3.5     True        False         False      41m
    cloud-credential                           4.3.5     True        False         False      63m
    cluster-autoscaler                         4.3.5     True        False         False      47m
    console                                    4.3.5     True        False         False      43m
    dns                                        4.3.5     True        False         False      54m
    image-registry                             4.3.5     True        False         False      49m
    ingress                                    4.3.5     True        False         False      48m
    insights                                   4.3.5     True        False         False      58m
    kube-apiserver                             4.3.5     True        False         False      53m
    kube-controller-manager                    4.3.5     True        False         False      54m
    kube-scheduler                             4.3.5     True        False         False      54m
    machine-api                                4.3.5     True        False         False      55m
    machine-config                             4.3.5     True        False         False      55m
    marketplace                                4.3.5     True        False         False      48m
    monitoring                                 4.3.5     True        False         False      42m
    network                                    4.3.5     True        False         False      59m
    node-tuning                                4.3.5     True        False         False      50m
    openshift-apiserver                        4.3.5     True        False         False      51m
    openshift-controller-manager               4.3.5     True        False         False      55m
    openshift-samples                          4.3.5     True        False         False      46m
    operator-lifecycle-manager                 4.3.5     True        False         False      55m
    operator-lifecycle-manager-catalog         4.3.5     True        False         False      55m
    operator-lifecycle-manager-packageserver   4.3.5     True        False         False      51m
    service-ca                                 4.3.5     True        False         False      58m
    service-catalog-apiserver                  4.3.5     True        False         False      50m
    service-catalog-controller-manager         4.3.5     True        False         False      50m
    storage                                    4.3.5     True        False         False      49m

Configure the Image Registry to use ephemeral storage for now.

I will update the image registry in the other blog since I want to focus on the completion of the installation.

To set emptyDir for the image registry:

oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"storage":{"emptyDir":{}}}}'

Completing the installation:

$ ./openshift-install --dir=./ocp43 wait-for install-complete
INFO Waiting up to 30m0s for the cluster at https://api.ocp43.example.com:6443 to initialize...
INFO Waiting up to 10m0s for the openshift-console route to be created...
INFO Install complete!
INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/Users/shannachan/projects/ocp4.3/ocp43/auth/kubeconfig'
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.ocp43.example.com
INFO Login to the console with user: kubeadmin, password: xxxxxxxxxxxxxx

Congratulation Cluster is up!

Screen Shot 2020-03-16 at 6.22.41 PM.png

Troubleshoot tips:

Access any server via the command below:

ssh -i ~/.ssh/vsphere-ocp43 core@vm-server

Reference:

https://docs.openshift.com/container-platform/4.3/installing/installing_bare_metal/installing-bare-metal.html

https://docs.openshift.com/container-platform/4.3/installing/installing_vsphere/installing-vsphere.html

https://shanna-chan.blog/2019/07/26/openshift4-vsphere-static-ip/

OpenShift4: vSphere + Static IP

There are many ways to install OCP4. One of the most common ask is how to install OCP4 with the static IP address on the vSphere environment. This is one of the use cases that I want to test out and hope I can share my lessons learned.

Environment:

  • vSphere 6.7 Update2
  • Run install from macOS Mojave 10.14.5

Requirements:

  • No DHCP server
  • Need to use static IP addresses

Problems I had:

Error #1: Dracut: FATAL: Sorry, ‘ip=dhcp’ does not make sense for multiple interface configurations.

dracut.png

Cause:

When I tried to overwrite the IP address by setting the kernel parameters using ip=<ip>::<gateway>:<net mask>:<FQDN>:<interface>:none with cloning from OVA.

Solution:

Setting the IP parameter before the initramfs is created from the rhcos-install.iso instead of from OVA.

Here are steps to create custom ISO with the parameters to simplify the process. You can use the downloaded ISO, but it will be a lot of typing, so the following steps are very useful when creating many VMs from the ISO.

sudo mount rhcos-410.8.20190425.1-installer.iso /mnt/
mkdir /tmp/rhcos
rsync -a /mnt/* /tmp/rhcos/
cd /tmp/rhcos
vi isolinux/isolinux.cfg
  • Modify the boot entry at the end of the file similar to this:
label linux
  menu label ^Install RHEL CoreOS
  kernel /images/vmlinuz
  append initrd=/images/initramfs.img nomodeset rd.neednet=1 coreos.inst=yes ip=192.168.1.124::192.168.1.1:255.255.255.0:bootstrap.ocp4.example.com:ens192:none nameserver=192.168.1.188 coreos.inst.install_dev=sda coreos.inst.image_url=http://192.168.1.231:8080/rhcos-4.1.0-x86_64-metal-bios.raw.gz coreos.inst.ignition_url=http://192.168.1.231:8080/static.ign

where:

ip=<ip address>::<gateway>:<netmask>:<hostname>:<interface>:none

nameserver=<DNS> 

coreos.inst.image_url=http://<webserver host:port>/rhcos-4.1.0-x86_64-metal-bios.raw.gz

coreos.inst.ignition_url=http://<webserver host:port>/<master or worker ignition>.ign 

  • Create new ISO as /tmp/rhcos_install.iso
sudo mkisofs -U -A "RHCOS-x86_64" -V "RHCOS-x86_64" -volset "RHCOS-x86_64" -J -joliet-long -r -v -T -x ./lost+found -o /tmp/rhcos_install.iso -b isolinux/isolinux.bin -c isolinux/boot.cat -no-emul-boot -boot-load-size 4 -boot-info-table -eltorito-alt-boot -e images/efiboot.img -no-emul-boot .
  • Update the custom ISO to the datastore for VM creation.

Error #2: No such host

no such host.png

Cause:

Most likely the network did not set up correctly when the master or worker start.

Solution:

In my case, this is an issue when creating masters/workers from an OVA and network configuration did not get setup when the RHCOS is booted.

Error #3: Getting EOF from LB

EOF.png

Cause:

Most likely the DNS and webserver configuration errors.

Solution:

Make sure all FQDN resolve to the correct IPs and restart related services.

Error #4: X509 cert error

x509error.png

Cause:

The reason in my case was the clocks on all servers were not synced and required to regenerate my SSH key.

Solution:

NTP was setup on DNS and webserver and make sure the clock is synced across. I also regenerate the SSH and update my install-config.yaml file.

Prerequisites:

The above components are required in my setup. I used the link [3] in the Reference section to setup DNS, load balancer, and webserver. I configured NTP on my DNS, webserver, load balancer and make sure I configure the time on my ESXi server as well. The filetranspiler is an awesome tool for manipulating the ignition files. I used it thought out the test here.

Preparing the infrastructure:

I started my installation with OCP 4 official documentation for vSphere (Reference [1] below).

  • SSH keygen

Captured my example steps here. Please use your own value.

ssh-keygen -t rsa -b 4096 -N '' -f ~/.ssh/ocp4vsphere
eval "$(ssh-agent -s)"
ssh-add ~/.ssh/ocp4vsphere
  • Download OpenShift 4 installer
    • extract it
    • chmod +x openshift-installer
    • mv to /usr/local/bin directory
wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest/openshift-install-mac-4.1.7.tar.gz
apiVersion: v1
baseDomain: example.com 
compute:
- hyperthreading: Enabled   
  name: worker
  replicas: 0 
controlPlane:
  hyperthreading: Enabled   
  name: master
  replicas: 3 
metadata:
  name: ocp4
platform:
  vsphere:
    vcenter: <vCenter host>
    username: <administrator>
    password: <password>
    datacenter: dc
    defaultDatastore: datastore
pullSecret: '<your pull seceret>' 
sshKey: '<your public ssh key>'
  • Create ignition files
openshift-install create ignition-configs --dir=<installation_directory>
  • Prepare for creating bootstrap with hostname and the static IP
    • Download filetranspiler:
      • git clone https://github.com/ashcrow/filetranspiler
    • Copy <installation_directory>/bootstrap.ign to <filetranspile_directory>/
    • Create bootstrap hostname file:
      echo "bootstrap.ocp4.example.com" > hostname
    • move hostname file to <filetranspile_directory>/bootstrap/etc/
    • Create ifcfg-ens192 file under

      <filetranspile_directory>/bootstrap/etc/sysconfig/network-scripts with following content

      NAME=ens192
      DEVICE=ens192
      TYPE=Ethernet
      BOOTPROTO=none
      ONBOOT=yes
      IPADDR=<bootstrap IP address>
      NETMASK=<netmask>
      GATEWAY=<gateway>
      DOMAIN=example.com
      DNS1=<dns>
      PREFIX=24
      DEFROUTE=yes
      IPV6INIT=no
    • Run this command to create new boostrap ignition file:
      cd <filetranspile_directory>
      ./filetranspile -i bootstrap.ign -f bootstrap -o bootstrap-static.ign
    • Upload bootstrap-static.ign to the webserver:
      scp bootstrap-static.ign user@<webserverip>:/var/www/html/bootstrap.ign
    • Create an append-bootstrap.ign. Example as shown below.
      {
        "ignition": {
          "config": {
            "append": [
              {
                "source": "http://<webserverip:port>/bootstrap.ign", 
                "verification": {}
              }
            ]
          },
          "timeouts": {},
          "version": "2.1.0"
        },
        "networkd": {},
        "passwd": {},
        "storage": {},
        "systemd": {}
      }
    • Encode the append-bootstrap.ign file.
      openssl base64 -A -in append-bootstrap.ign -out append-bootstrap.64
    • Upload master0-static.ign to the webserver:
      scp master0-static.ign user@<webserverip>:/var/www/html/master0.ign
      • Note that master0.ign is used in the kernel parameter when installing the ISO.
    • Create VM from the custom ISO
      • Create VM with 4 CPU and 16 RAM
      • Select the custom ISO
      • add “disk.EnableUUID”: Specify TRUE under VM Options and Edit Configuration.
      • Power on the VM
      • Go the VM console:
      • Screen Shot 2019-07-26 at 1.37.09 PM.png
      • Hit <Tab>
      • Screen Shot 2019-07-26 at 1.37.22 PM.png
      • you can modify the parameters per each server here.
      • Hit <enter>
      • The server will reboot after installation.
  • Repeat for all masters and workers.

Installation:

  • When you have all the VMs created, run the following command.
$ openshift-install --dir=ocp4 wait-for bootstrap-complete --log-level debug

DEBUG OpenShift Installer v4.1.7-201907171753-dirty 
DEBUG Built from commit 5175a461235612ac64d576aae09939764ac1845d 
INFO Waiting up to 30m0s for the Kubernetes API at https://api.ocp4.example.com:6443... 
INFO API v1.13.4+3a25c9b up                       
INFO Waiting up to 30m0s for bootstrapping to complete... 
DEBUG Bootstrap status: complete                  
INFO It is now safe to remove the bootstrap resources 

 

Verification

  • Log in:
$ export KUBECONFIG=ocp4/auth/kubeconfig
$ oc whoami

$ oc get nodes
NAME                       STATUS   ROLES    AGE     VERSION
master0.ocp4.example.com   Ready    master   35m     v1.13.4+205da2b4a
master1.ocp4.example.com   Ready    master   35m     v1.13.4+205da2b4a
master2.ocp4.example.com   Ready    master   35m     v1.13.4+205da2b4a
worker0.ocp4.example.com   Ready    worker   20m     v1.13.4+205da2b4a
worker1.ocp4.example.com   Ready    worker   11m     v1.13.4+205da2b4a
worker2.ocp4.example.com   Ready    worker   5m25s   v1.13.4+205da2b4a
  • Validate all CSR is approved
$ oc get csr

NAME        AGE     REQUESTOR                                                                   CONDITION
csr-6vqqn   35m     system:node:master1.ocp4.example.com                                        Approved,Issued
csr-7hlkk   20m     system:node:worker0.ocp4.example.com                                        Approved,Issued
csr-9p6sw   11m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-b4cst   35m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-gx4dz   5m33s   system:node:worker2.ocp4.example.com                                        Approved,Issued
csr-kqcfv   11m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-lh5zg   35m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-m2hvl   35m     system:node:master0.ocp4.example.com                                        Approved,Issued
csr-npb4l   35m     system:node:master2.ocp4.example.com                                        Approved,Issued
csr-rdpgm   20m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-s2d7z   11m     system:node:worker1.ocp4.example.com                                        Approved,Issued
csr-sx2r5   6m      system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-tvgbq   35m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-vvp2h   6m11s   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
  • Patching the images registry for non-production environment
$oc project openshift-image-registry
$oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"storage":{"emptyDir":{}}}}'
config.imageregistry.operator.openshift.io/cluster patched

Next step?

To improve the process, we need to automate this.

Reference:

[1] OpenShift 4 Official Installation Documentation for vSphere

[2] http://Using Static IP for OCP4 Installation Guide

[3] Setting Up Pre-requisites Guide

Knative on OCP 4.1 Test Run

Install OCP 4.1.2

This blog assumes that you went to try.openshift.com and created your OCP 4.1 IPI cluster. If you have not, you can go to try.openshift.com –> Get Started to set up an OCP 4.1 cluster.

Install Istio (Maistra 0.11)

Istio is required before installing Knative. However, Knative operator will install the minimum Istio components if Istio is not installed on the platform. For my test, I did install service mesh on OCP 4.1 using the community version. Here are my steps:

  • Install service mesh operator
oc new-project istio-operator
oc new-project istio-system
oc project istio-operator
oc apply -f https://raw.githubusercontent.com/Maistra/istio-operator/maistra-0.11/deploy/maistra-operator.yaml
  • Service Mesh operator is up and running
#to get the name of the operator pod
oc get pods
#view the logs of the pod
oc logs <name of the pod from above step>

#log shown as below
{"level":"info","ts":1562602857.4691303,"logger":"kubebuilder.controller","caller":"controller/controller.go:153","msg":"Starting workers","Controller":"servicemeshcontrolplane-controller","WorkerCount":1}
  • Create custom resource as cr.yaml using the below content.
apiVersion: maistra.io/v1
kind: ServiceMeshControlPlane
metadata:
  name: basic-install
spec:
  # NOTE, if you remove all children from an element, you should remove the
  # element too.  An empty element is interpreted as null and will override all
  # default values (i.e. no values will be specified for that element, not even
  # the defaults baked into the chart values.yaml).
  istio:
    global:
      proxy:
        # constrain resources for use in smaller environments
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 128Mi

    gateways:
      istio-egressgateway:
        # disable autoscaling for use in smaller environments
        autoscaleEnabled: false
      istio-ingressgateway:
        # disable autoscaling for use in smaller environments
        autoscaleEnabled: false
        # set to true to enable IOR
        ior_enabled: true

    mixer:
      policy:
        # disable autoscaling for use in smaller environments
        autoscaleEnabled: false

      telemetry:
        # disable autoscaling for use in smaller environments
        autoscaleEnabled: false
        # constrain resources for use in smaller environments
        resources:
          requests:
            cpu: 100m
            memory: 1G
          limits:
            cpu: 500m
            memory: 4G

    pilot:
      # disable autoscaling for use in smaller environments
      autoscaleEnabled: false
      # increase random sampling rate for development/testing
      traceSampling: 100.0

    kiali:
      # change to false to disable kiali
      enabled: true

      # to use oauth, remove the following 'dashboard' section (note, oauth is broken on OCP 4.0 with kiali 0.16.2)
      # create a secret for accessing kiali dashboard with the following credentials
      dashboard:
        user: admin
        passphrase: admin

    tracing:
      # change to false to disable tracing (i.e. jaeger)
      enabled: true
  • Install service mesh
oc project istio-system
oc create -f cr.yaml

#it will take a while to have all the pods up
watch 'oc get pods'
  • When service mesh is available
Every 2.0s: oc get pod -n istio-system                                                                                              Mon Jul  8 16:39:46 2019

NAME                                      READY   STATUS    RESTARTS   AGE
elasticsearch-0                           1/1     Running   0          13m
grafana-86dc5978b8-k2dvl                  1/1     Running   0          9m15s
ior-6656b5cfdb-cjt7z                      1/1     Running   0          9m55s
istio-citadel-7678d4749b-bjqq8            1/1     Running   0          14m
istio-egressgateway-66d8b969b8-wmcfm	  1/1     Running   0          9m55s
istio-galley-7f57cd4c6c-6d2r8             1/1     Running   0          11m
istio-ingressgateway-7794d8d4fc-dd72g     1/1     Running   0          9m55s
istio-pilot-77d65868d4-68lzd              2/2     Running   0          10m
istio-policy-7486f4cb6c-fdw6q             2/2     Running   0          11m
istio-sidecar-injector-66d49c6865-clqzm   1/1     Running   0          9m39s
istio-telemetry-799557976b-9ljz4          2/2     Running   0          11m
jaeger-agent-b7bz8                        1/1     Running   0          13m
jaeger-agent-j4dnp                        1/1     Running   0          13m
jaeger-agent-xmwzz                        1/1     Running   0          13m
jaeger-collector-96756f879-n889z          1/1     Running   3          13m
jaeger-query-6f4456546c-mwjkk             1/1     Running   3          13m
kiali-c58c8476d-wzhj6                     1/1     Running   0          8m45s
prometheus-5cb5d7549b-lmjtk               1/1     Running   0          14m

Install Knative 0.6

  • Install Knative serving operator
    • Click Catalog -> OperatorHub -> search for “knative” keyword
    • Click “Knative Serving Operator”
    • Click “Install”

Screen Shot 2019-07-08 at 9.45.57 AM.png

  • Install Knative eventing operator
    • Click Catalog -> OperatorHub -> search for “knative” keyword
    • Click “Knative Eventing Operator”
    • Click “Install”

Screen Shot 2019-07-08 at 9.47.24 AM.png

  • I also manually scale up my nodes to prepare for the tutorial deployment.
  • Click Compute -> Machine Sets
  • Click “3 dots” at the end of each machine set-> click Edit Count -> enter 2

Screen Shot 2019-07-08 at 9.49.15 AM.png

  • validate (Installed Operator under openshift-operators project)

Screen Shot 2019-07-08 at 9.52.02 AM.png

Install Knative Client – kn

Knative client CLI (kn) can list, create, delete, and update Knative service.

$ which kn
/usr/local/bin/kn
$ kn version
Version:      v20190625-13ff277
Build Date:   2019-06-25 09:52:20
Git Revision: 13ff277
Dependencies:
- serving:    

 

Let’s Have Some Fun

Knative serving via kn

I am using my already build an image that is out in docker hub for this example. Here are the steps to create a simple Knative service.

oc new-project knative-demo
oc adm policy add-scc-to-user anyuid -z default -n knative-demo
oc adm policy add-scc-to-user privileged -z default -n knative-demo
kn service create mysvc --image docker.io/piggyvenus/greeter:0.0.1

List Knative service

$ kn service list
NAME    DOMAIN                                                        GENERATION   AGE   CONDITIONS   READY   REASON
mysvc   mysvc.knative-demo.apps.cluster-6c33.sandbox661.opentlc.com   1            29s   3 OK / 3     True    

Execute the service

$ curl mysvc.knative-demo.apps.cluster-6c33.sandbox661.opentlc.com
Hi  greeter => '0bd7a995d27e' : 1

Knative serving via a YAML file

Create Knative service YAML can also do the trick. Example, as shown below.

apiVersion: serving.knative.dev/v1alpha1
kind: Service
metadata:
  name: greeter
spec:
  runLatest:
    configuration:
      revisionTemplate:
        spec:
          container:
            image: docker.io/piggyvenus/greeter:0.0.1
            livenessProbe:
              httpGet:
                path: /healthz
            readinessProbe:
              httpGet:
                path: /healthz

 

Create Knative service

oc apply -f service.yaml

Check out Knative service resources

oc get deployment
oc get pods
oc get services.serving.knative.dev
oc get configuration.serving.knative.dev
oc get routes.serving.knative.dev

Invoke Knative service

oc get routes.serving.knative.dev
curl mysvc.knative-demo.apps.cluster-6c33.sandbox661.opentlc.com

Please check out Knative tutorial! There are more examples of Knative. I hope you find this blog useful.

OpenShift Service Mesh on OpenShift v4.0 Test Run

This is just quick notes on what I recently tried on OpenShift v4.0 from try.openshift.com. OpenShift uses Maistra for installing the OpenShift Service Mesh via a Kuberetes operator. OCP4 (Cluster version is 4.0.0-0.9) makes creating OpenShift Cluster so easy!

Here are the steps I used:

  1. Install OpenShift. Please follow the instruction on try.openshift.com
  2. Create an istio-operator project:
    • oc new-project istio-operator
    • oc process -f https://raw.githubusercontent.com/Maistra/openshift-ansible/maistra-0.9/istio/istio_community_operator_template.yaml | oc create -f -
  3. Create Custom Resource as the example below and save it as cr.yaml
    • apiVersion: "istio.openshift.com/v1alpha1"
      kind: "Installation"
      metadata:
        name: "istio-installation"
        namespace: istio-operator
      spec:
        istio:
          authentication: true
          prefix: docker.io/maistra
        jaeger:
          prefix: docker.io/jaegertracing
        kiali:
          username: admin
          password: admin
          prefix: docker.io/kiali
  4. Check if the operator is ready
    • oc logs <istio operator pod name> -n istio-operator
    • Output from logs:
      time="2019-04-05T06:52:41Z" level=info msg="Go Version: go1.11.5"
      time="2019-04-05T06:52:41Z" level=info msg="Go OS/Arch: linux/amd64"
      time="2019-04-05T06:52:41Z" level=info msg="operator-sdk Version: 0.0.5+git"
      time="2019-04-05T06:52:41Z" level=info msg="Metrics service istio-operator created"
      time="2019-04-05T06:52:41Z" level=info msg="Watching resource istio.openshift.com/v1alpha1, kind Installation, namespace istio-operator, resyncPeriod 0"
      time="2019-04-05T06:53:56Z" level=info msg="Installing istio for Installation istio-installation"
  5. Install OpenShift service mesh via operator using the cr.yaml that was created in step #3
    • oc create -f cr.yaml -n istio-operator
  6. Verify Service Mesh Installation
    • oc get pods -n istio-system
      NAME                                          READY     STATUS      RESTARTS   AGE
      elasticsearch-0                               1/1       Running     0          11h
      grafana-d5d978b4d-pj6wf                       1/1       Running     0          11h
      istio-citadel-6f7fdc6685-vbx6n                1/1       Running     0          11h
      istio-egressgateway-5458749989-5xswb          1/1       Running     0          11h
      istio-galley-58bd6d9546-6nlhv                 1/1       Running     0          11h
      istio-ingressgateway-77f9dc475b-6cx5x         1/1       Running     0          11h
      istio-pilot-6f5f59477c-7kzvr                  2/2       Running     0          11h
      istio-policy-6c574bccd8-55wwx                 2/2       Running     5          11h
      istio-sidecar-injector-7f866d4796-k6dqw       1/1       Running     0          11h
      istio-telemetry-54795dfc9c-d5cr4              2/2       Running     5          11h
      jaeger-agent-gsbk8                            1/1       Running     0          11h
      jaeger-agent-lqmx4                            1/1       Running     0          11h
      jaeger-agent-zsb9l                            1/1       Running     0          11h
      jaeger-collector-668488cff9-24p9b             1/1       Running     0          11h
      jaeger-query-57f78497d5-7ttbw                 1/1       Running     0          11h
      kiali-ddbf7d4d9-c54kf                         1/1       Running     0          11h
      openshift-ansible-istio-installer-job-vpdrl   0/1       Completed   0          11h
      prometheus-b5fb89775-x4bzd                    1/1       Running     0          11h
  7.  Install the bookinfo application. Here are the steps that I used.
  8.  Access the bookinfo application
    • To get the host for ingressgateway route:
      • oc get route -n istio-system istio-ingressgateway -o jsonpath='{.spec.host}'
        
        output:
        istio-ingressgateway-istio-system.apps.test3-0404.sc.ocpdemo.online
    •  Go to your browser and use your ingressgateway host as the URL similar to the one shown below.
      • http://istio-ingressgateway-istio-system.apps.test3-0404.sc.ocpdemo.online/productpage

 Observability

  • Kiali Console
    • oc get route -n istio-system|grep kiali
    • use the output of the host from kiali route to access the kiali console

Screen Shot 2019-04-05 at 12.15.57 PM.png

  • Jaegar:
    • jaeger-query-istio-system.apps.test3-0404.sc.ocpdemo.online

    • use the output of the host from jaegar route to access the jaegar

Screen Shot 2019-04-05 at 12.17.32 PM.png

OpenShift v3.11 – Configure vSphere Cloud Provider

I would like to share my configuration for testing vSphere volume on OpenShift here. I hope you will have a better experience with it after reading this blog.

vSphere Configuration

  1. Create a folder “RHEL” for all the VMs
  2. Create OPENSHIFT as the resource pool and assign all VMs under the same resource pool
  3. The name of the virtual machine must match the name of the nodes for the OpenShift cluster. For example, the name of the host is used in vCenter must match the name of the node that is used in the ansible inventory host for installation. For example, the name of the VM is pocnode.sc.ocpdemo.online and I used the same FQDN in the OpenShift inventory file.
  4. To prepare the environment for vSphere cloud provider, steps are shown below.
  • Set up the GOVC environment:
curl -LO https://github.com/vmware/govmomi/releases/download/v0.15.0/govc_linux_amd64.gz
gunzip govc_linux_amd64.gz
chmod +x govc_linux_amd64
cp govc_linux_amd64 /usr/bin/govc
export GOVC_URL='vCenter IP OR FQDN'
export GOVC_USERNAME='vCenter User'
export GOVC_PASSWORD='vCenter Password'
export GOVC_INSECURE=1
  • Find the host VM path:
govc ls /datacenter/vm/<vm-folder-name>
  • Set disk.EnableUUID to true for all VMs:
govc vm.change -e="disk.enableUUID=1" -vm='VM Path/vm-name'

OpenShift Configuration

vSphere.conf

To configure OpenShift to use vSphere volume, it is required to configure the vSphere cloud provider. To configure the cloud provider, you will need to create a file /etc/origin/cloudprovider/vsphere.conf as shown below.

[Global] 
        user = "vcenter username" 
        password = "vcenter password" 
        port = "443" 
        insecure-flag = "1" 
        datacenters = "Datacenter" 
[VirtualCenter "1.2.3.4"] 

[Workspace] 
        server = "1.2.3.4" 
        datacenter = "Datacenter"
        folder = "/Datacenter/vm/RHEL" 
        default-datastore = "Shared-NFS" 
        resourcepool-path = "OPENSHIFT" 

[Disk]
        scsicontrollertype = pvscsi 
[Network]
        public-network = "VM Network 2"

Observation:

  • use govc ls to figure out the value for ‘folder`
  • use just the name of the resource pool, not the entire path

Update master-config.yaml

Add the following to the /etc/origin/master/master-config.yaml

kubernetesMasterConfig:
  ...
  apiServerArguments:
    cloud-provider:
      - "vsphere"
    cloud-config:
      - "/etc/origin/cloudprovider/vsphere.conf"
  controllerArguments:
    cloud-provider:
      - "vsphere"
    cloud-config:
      - "/etc/origin/cloudprovider/vsphere.conf"

Update node-config.yaml

Add the following to the /etc/origin/node/node-config.yaml

kubeletArguments:
  cloud-provider:
    - "vsphere"
  cloud-config:
    - "/etc/origin/cloudprovider/vsphere.conf"

Restart services

From the master, restart services

master-restart api
master-restart controllers
systemctl restart atomic-openshift-node

From all the nodes, restart service

systemctl restart atomic-openshift-node

Remove node to add providerID

The following steps to delete nodes and restart node services. Observation from this step is the .spec.providerID was added to the node YAML file after delete and restart the node. Use the validation step below before and after deleting the node to review the node YAML details via `oc get node <name of node> -o json`.

  • Check and backup existing node labels:
oc describe node <node_name> | grep -Poz '(?s)Labels.*\n.*(?=Taints)'
  • Delete the nodes
oc delete node <node_name>
  • Restart all node services
systemctl restart atomic-openshift-node
  • Step to validate
#To make sure all nodes are Ready
oc get nodes
#To check .spec.providerID for each nodes is added 
oc get nodes -o json

Create vSphere storage-class

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: "vsphere-standard" 
provisioner: kubernetes.io/vsphere-volume 
parameters:
    diskformat: zeroedthick 
    datastore: "Shared-NFS" 
reclaimPolicy: Delete

Let test it out

Create a PVC that uses the vSphere-volume storage-class

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: vsphere-test-storage
  annotations:
    volume.beta.kubernetes.io/storage-class: vsphere-standard
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

Result

NAME                   STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS       AGE
vsphere-test-storage   Bound     pvc-b70a6916-2d21-11e9-affb-005056a0e841   1Gi        RWO            vsphere-standard   22h

Troubleshooting

If it still does not work, here are some suggestions to debug:

  • Check your resource pool if it is correct

The following command returns the list of VMs that belongs to “OPENSHIFT” resource pool. It should list out all your host for your OpenShift cluster.

govc pool.info -json /Datacenter/host/Cluster/Resources/OPENSHIFT | jq -r '.ResourcePools[].Vm[] | join(":")' | xargs govc ls -L
  • Check if the node YAML has added externalID and providerID after deleting and restarting atomic-openshift-node.
kubectl get nodes -o json | jq '.items[]|[.metadata.name, .spec.externalID, .spec.providerID]'

If you don’t have jq, you can download from https://github.com/stedolan/jq/releases/download/jq-1.6/jq-linux64

Configure and Troubleshoot LDAP on OpenShift

One of the most frequent asked questions that I get is how to configure LDAP on OpenShift. Instead of replying back with my PDF every time I get a request about this. It maybe good to share the info here, so I can always refer back to. As for configuring LDAP on OpenShift is pretty straight forward if you have all the correct information to connect. In the blog I will walk you through the configuration for both pre & post installation options. Also, I will provide some of the troubleshooting steps on how to debug if you run into issues.

Problem: Can’t login with LDAP users.

AD usually is using sAMAccountName as uid for login. LDAP usually is using uid for login

Step 1

Use the following ldapsearch to validate the informaiton was given by customer: ldapsearch -x – D “CN=xxx,OU=Service-Accounts,OU=DCS,DC=homeoffice,DC=example,DC=com” \ -W -H ldaps://ldaphost.example.com -b “ou=Users,dc=office,dc=example,DC=com” \ -s sub ‘sAMAccountName=user1’

If the ldapsearch did not return any user, it means -D or -b may not be correct. Retry different baseDN. If there is too many enteries returns, add filter to your search. Filter example is (objectclass=people)

filter example: (objectclass=person)

Step 2

Logging: set OPTIONS=–loglevel=5 in /etc/sysconfig/atomic-openshift-master step 3

Since customer had htpasswd provider setup before switch to Active Directory and the user identity was created for the same users. In journalctl -u atomic-openshift-master, it logged conflict with the user identity when user trying to login.

Here was the step
oc get identity 
oc delete identity <name_of_identity_that_user1> 
oc get user oc delete user user1
Inspiration from :
Final configuration in master-config.yaml was as shown below.

oauthConfig:
  assetPublicURL: https://master.example.com:8443/console/
  grantConfig:
    method: auto
  identityProviders:
  - name: "OfficeAD"
    challenge: true
    login: true
    provider:
      apiVersion: v1
      kind: LDAPPasswordIdentityProvider
      attributes:
        id:
        - dn
        email:
        - mail
        name:
        - cn
        preferredUsername:
        - sAMAccountName
      bindDN: "CN=LinuxSVC,OU=Service-Accounts,OU=DCS,DC=office,DC=example,DC=com"
      bindPassword: "password"
      ca: ad-ca.pem.crt
      insecure: false
      url: "ldaps://ad-server.example.com:636/CN=Users,DC=hoffice,DC=example,DC=com?sAMAccountName?sub"

Installing OCP 3.9 on Azure

This is an update from my previous blog about OpenShift 3.7 on Azure. OpenShift 3.9 is out and I tested the latest version on Azure.

I discovered that there are few things that are difference from version 3.7. I am going to share that in this blog and hope it will help someone out there.

The environment is using unmanaged disk for my VMs, and it is running RHEL 7.5. The version of OpenShift is 3.9.14.

Host Configuration

In this version, I still use the same rule as for configuring the nodes in the inventory file. Here is my latest example for the [nodes] session as shown below. It is important to have the openshift_hostname the same as the Azure instance names that show in the Azure portal.

10.0.0.5 openshift_ip=10.0.0.5 openshift_hostname=ocpnode1 openshift_node_labels="{'region': 'primary', 'zone': 'west'}"

NetworkManager

I did not need to touch NetworkManager in this test since I am using the default Azure domain in here. If you are using custom DNS for your VMs, I will still make sure the NetworkManager is working correctly before the installation. See my previous post for more information on this.

Ansible Inventory file example

Here is my sample inventory file (/etc/ansible/hosts) for installing OCP 3.9.14. In this test, my goal is to test cloud provider plugin on Azure.

[OSEv3:children]
masters
nodes
etcd
nfs
[OSEv3:vars]
ansible_ssh_user=root
deployment_type=openshift-enterprise
openshift_clock_enabled=true
openshift_disable_check=disk_availability,memory_availability,docker_image_availability,docker_storage
openshift_template_service_broker_namespaces=['openshift']
openshift_enable_service_catalog=true
template_service_broker_install=true
ansible_service_broker_local_registry_whitelist=['.*-apb$']
openshift_master_default_subdomain=apps.poc.openshift.online
openshift_hosted_router_selector='region=infra'
openshift_hosted_registry_selector='region=infra'
openshift_install_examples=true
openshift_docker_insecure_registries=172.30.0.0/16
openshift_hosted_registry_storage_nfs_directory=/exports
openshift_hosted_manage_router=true
openshift_hosted_manage_registry=true
openshift_hosted_registry_storage_kind=nfs
openshift_hosted_registry_storage_access_modes=['ReadWriteMany']
openshift_hosted_registry_storage_nfs_directory=/exports
openshift_hosted_registry_storage_nfs_options='*(rw,root_squash)'
openshift_hosted_registry_storage_volume_name=registry
openshift_hosted_registry_storage_volume_size=20Gi
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/openshift/openshift-passwd'}]
[masters]
10.0.0.4
[nfs]
10.0.0.4
[etcd]
10.0.0.4
[nodes]
10.0.0.4 openshift_ip=10.0.0.4 openshift_hostname=ocpmaster openshift_public_ip=1.2.3.4 openshift_node_labels="{'region': 'infra', 'zone': 'default'}" openshift_scheduleable=true openshift_public_hostname=ocpmaster.poc.openshift.online
10.0.0.5 openshift_ip=10.0.0.5 openshift_hostname=ocpnode1 openshift_node_labels="{'region': 'primary', 'zone': 'west'}"

Install the Cluster

After creating the ansible inventory file, you will execute the follow command to get OCP installed.

ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/prerequisites.yml
ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml

Service Principal

This is required for configuring Azure as cloud provider. Here is my example command to create my service principal.

az ad sp create-for-rbac -n ocpsp --password XXXX --role contributor --scopes /subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

Create /etc/azure/azure.conf File

Out of all the steps, this is the one of the steps where is different from the last version. Here is the azure.conf that I used in my test.

tenantID: xx1x11xx-x1x1-1x11-x1x1-x1x1111111x1
subscriptionID: 11xx111x-1x1x-1xxx-1x11-x1x1x1x11xx1
aadClientID: 1111111x-1111-1xxx-1111-x1111x111xxx
aadClientSecret: 0000000
resourceGroup: name-of-resource-group
cloud: AzurePublicCloud
location: westus
vnetName: virtual-network-name
securityGroupName: network-security-group-name

Update master-config.yaml and node-config.yaml

This configuration is the same as 3.7. Here is the sample configuration that I have in my /etc/origin/master/master-config.yaml and /etc/origin/node/node-config.yaml. Only adding the bold portion to your existing yaml files.

Configuration of master-config.yaml:

kubernetesMasterConfig:
  apiServerArguments:
    cloud-config:
    - /etc/azure/azure.conf
    cloud-provider:
    - azure
    runtime-config:
    - apis/settings.k8s.io/v1alpha1=true
    storage-backend:
    - etcd3
    storage-media-type:
    - application/vnd.kubernetes.protobuf
  controllerArguments:
    cloud-provider:
    - azure
    cloud-config:
    - /etc/azure/azure.conf 

Configuration of node-config.yaml:

kubeletArguments: 
  cloud-config:
  - /etc/azure/azure.conf
  cloud-provider:
  - azure
  node-labels:
  - region=infra
  - zone=default

Backup the Node labels

Just on the safe side, we should back up the node labels before starting the services. To gather the labels for each node, run oc get nodes --show-labels and save the output. In case your nodes did not have all the labels after restarting the services, you can restore them back from the backup.

Restart Services

Documentation indicates that removing the node for azure configuration to work. In my test, I DO NOT delete my nodes, the process of restarting would remove and add the nodes back to the cluster automatically. It is by observation how I get the cloud provider to work in this version.

  • Restart all master services
systemctl restart atomic-openshift-master-api
systemctl restart atomic-openshift-master-controllers

To monitoring the event during restart

journalctl -u atomic-openshift-master-api -f
journalctl -u atomic-openshift-master-controllers -f
  • Restart all nodes service
systemctl restart atomic-openshift-node

To monitoring the event during restart

journalctl -u atomic-openshift-node -f
oc get node -w

Notes: The journalctl shows the node will get removed from cluster list. Running oc get nodes to monitoring the list of nodes in the cluster. It will eventually add back the node to the list if every works correctly.

Update Roles and Labels

Role is defined for each nodes in OpenShift 3.9 cluster. It will look similar to the following from oc get nodes -o wide --show-labels

Output look similar to this

NAME        STATUS    ROLES     AGE       VERSION             EXTERNAL-IP   OS-IMAGE       KERNEL-VERSION          CONTAINER-RUNTIME   LABELS

ocpmaster   Ready     master    11h       v1.9.1+a0ce1bc657   <none>        Employee SKU   3.10.0-862.el7.x86_64   docker://1.13.1     beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_E2s_v3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=westus,failure-domain.beta.kubernetes.io/zone=0,kubernetes.io/hostname=ocpmaster,node-role.kubernetes.io/master=true,region=infra,zone=default

ocpnode1    Ready     compute   11h       v1.9.1+a0ce1bc657   <none>        Employee SKU   3.10.0-862.el7.x86_64   docker://1.13.1     beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_E2s_v3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=westus,failure-domain.beta.kubernetes.io/zone=0,kubernetes.io/hostname=ocpnode1,node-role.kubernetes.io/compute=true,region=primary,zone=west

After restarting master and node services, you will need to add the role back to the nodes once their status show Ready. Here are the command to add the role back.

oc label node ocpmaster node-role.kubernetes.io/master=true
oc label node ocpnode1 node-role.kubernetes.io/compute=true

Setting up Azure Disk

In this test, I used unmanaged disk with my VMs, you will need to know what type of disk that you have before creating the storageclass for the cluster.

Here is how you will configure unmanged disk storageclass.

  • Create a storageclass.yaml file with the following information
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: azure-storageclass
provisioner: kubernetes.io/azure-disk
parameters:
  storageAccount: pocadmin
  • Run the follow command with the file that was created in the previous step
oc create -f storageclass.yaml
  • Make this storageclass as default
oc annotate storageclass azure-storageclass storageclass.beta.kubernetes.io/is-default-class="true"

 

Now, you will be able to use the Azure disk as your default storageclass from above steps.

Installing OpenShift behind proxy

I have been wanting to write this blog to summarize the challenges that I had with proxy when installing OpenShift. The truth is that I don’t have the complete list of how to solve every problem in a proxy environment. I will try to list out what I did in my past and help you to avoid or debug the proxy related issues as much as I can.

Environment Variable

  • Whitelisting hosts that the platform will be accessing, for example:

If you are using subscription manager on a RHEL, you will need to whitelist the following hosts.

For RHSM/RHN (rpms):

For RH’s docker registry:

Example of other access that you may need:

    • index.docker.io
    • github.com
    • maven.org (Maven Central)
    • docker.io (dockerhub connection)
    • npmjs.org (node js build)
  • Setup /etc/profile.d/proxy.sh on all the nodes for your platform
#cat /etc/profile.d/proxy.sh
export http_proxy=http://host.name:port/
export https_proxy=http://host.name:port/
export no_proxy=.example.com,.svc

Note: “.svc” is needed with you want to install Service Catalog

Add Proxy Information into Ansible Hosts File

Here is the list of parameters for proxy environment

openshift_http_proxy=http://IPADDR:PORT
openshift_https_proxy=https://IPADDR:PORT
openshift_no_proxy='.example.com,some-internal-hosts.com'

Update Dockerfile with Proxy Information

After installing, the internal docker registry service IP will need to add to the docker configuration in /etc/sysconfig/docker for NO_PROXY parameter.

Getting the service IP of docker-register on OpenShift

oc get svc docker-registry -n default

Append the service IP to the NO_PROXY list in the file.

Testing Build and Pushing Images to Registry

After the installation, it is always good to test out the build and make sure image can be pushed into the internal registry.

Here is my check list if the build failed or image cannot be pushed.

  • Check if the hosts that you are trying to access are on the whitelist for your proxy
  • Check if the gitNoProxy is configured correctly under the BuildDefaults plug-in in the /etc/origin/master/master-config.yaml. For example, if you are access to an internal git repo location, please make sure they repository server is on the gitNoProxy list.
  • In 3.7, you will also need to add the kubernetes service IP to the NO_PROXY environment variable and redeploy the docker-register. Otherwise, you will get error when trying to push images to the internal docker register. See this link for more details: https://bugzilla.redhat.com/show_bug.cgi?id=1511870.

Note: to get the service IP for kubernetes: oc get svc kubernetes -n default

Hopefully, the checklist will help you to avoid any proxy related issue during installation.

Install OpenShift on Atomic Host on AWS

This blog is to share my experience on installing OpenShift 3.7 on Atomic Host. Since I am using OpenShift Container Platform (supported by Red Hat), there are 2 options for installation. They are the RPM install which is on Red Hat Enterprise Linux (RHEL) and containerized install which is using Atomic Host (Container OS). In version 3.7, installation can be done via a container on Atomic Host, or from an Linux bastion host.

My test used a Linux host to install OpenShift on Atomic host using AWS which is one of the ways to get an Atomic instance provisioned. Atomic host are provisioned via private AMI image for cloud provider account, the AMI image is ami-e9494989 for my test. Here is where you need to register to get access for importing the the private AMI here https://www.redhat.com/en/technologies/cloud-computing/cloud-access.

There are many ways to automate the steps for installation which is not my blog is about. I want to test out how easy or hard to installation OpenShift on a container OS, so I want to test all the steps for the installation manually.

Setting Up on a Cloud Provider

There are few things we need to setup on AWS. A wildcard entry and a public master hostname are required prior to the installation. I used Route53 for adding the A records for both of the requirements.

Per my test, I also had to add a tag to all Atomic instances with key as KubernetesCluster and the value of the key can be anything. The value of the KubernetesCluster key, will be used for parameter openshift_clusterid in the ansible inventory file. Without this tag on the Atomic instances, I will not be able to register the OpenShift node with the cloud provider.

Setting Up Bastion host

Bastion host is a Linux host (RHEL) to run automation scripts to prepare and install OpenShift on all Atomic hosts. This is one of the option to install on Atomic host. I like this option because I can reuse the same bastion host to install more that one cluster.

The step to prepare the bastion host is straight forward.

subscription-manager register
subscription-manager attach --pool=
subscription-manager repos --disable="*"
subscription-manager repos \
    --enable="rhel-7-server-rpms" \
    --enable="rhel-7-server-extras-rpms" \
    --enable="rhel-7-server-ose-3.7-rpms" \
    --enable="rhel-7-fast-datapath-rpms"
yum install atomic-openshift-utils -y

Preparation before installation.

Preparation steps are available at https://docs.openshift.com/container-platform/latest/install_config/install/host_preparation.html.

1. Generate SSH key on Bastion host via 'ssh-keygen' as root

2. Distribute the SSH key too all hosts (master and node) using the following command from bastion host:
   ssh-copy-id -i ~/.ssh/id_rsa.pub 

3. Create a hosts.prepocp file which include all the hostnames for the cluster. 
   Example is shown below.
   [nodes]
   ip-172-31-7-15.us-west-2.compute.internal
   ip-172-31-5-243.us-west-2.compute.internal

4. Create a ansible-play (openshiftprep.yml) to automate the host preparation. 
   An Example is: here https://github.com/piggyvenus/examples/blob/master/installAnsibleSample/v3.7/atomic/openshiftprep.yml

5. Execute the ansible playbook which will register, update Atomic host and configure docker on to the added device (/dev/xvdb)
   ansible-playbook -i hosts.prepocp openshiftprep.yml

Create Ansible Hosts file for OpenShift Advance Installation

Since we will need to create an inventory file (often refer to ansible hosts file) for OpenShift installation, here is an example of OpenShift Advance Installation for Atomic host on AWS: https://raw.githubusercontent.com/piggyvenus/examples/master/installAnsibleSample/v3.7/atomic/hosts.atomic.template

Download this file and update the corresponding information for installation. Then, save this file as /etc/ansible/hosts.

There are a few lesson learned here as for creating the ansible hosts file. I added the following parameters to get successful installation:

openshift_release=v3.7.23
openshift_image_tag=v3.7.23
openshift_pkg_version=-3.7.23
openshift_clusterid=<value of key KubernetesCluster from AWS Atomic instance>

Setting up Atomic Host For OCP Installation

I learned that there are few extra steps which I had to add to prepare the Atomic installation before OpenShift Installation. These are the steps that I used in my test. I am not an Atomic expert. It does what I wanted it to do.

Besides adding disk for docker, I also need extra disk space for root partition. The OOTB Atomic instance from AWS has only 3GB root partition which is not enough for OpenShift installer. I have to do the following to get my docker and root partition configure to the way I wanted it. My goal is to extend my root partition to have extra disk space and configure docker using the added volume that I attached to the instance.

The preparation script did configure docker to use /dev/xvdb. After running the previous ansible playbook, the following steps were to use to extend my root partition.

ansible all -m shell -a "lvextend -L+50G /dev/mapper/atomicos-root"
ansible all -m shell -a "xfs_growfs /"
ansible all -m shell -a "df -h"

Next is to reboot all hosts via the following command.

systemctl reboot

The following step is to configure docker and startup docker after all hosts were rebooted from the bastion host.

1. Download this ansible playbook 
   https://raw.githubusercontent.com/piggyvenus/examples/master/installAnsibleSample/v3.7/atomic/openshiftprep2.yml
2. Run ansible playbook using the same hosts.prepocp file as shown below. 
   ansible-playbook -i hosts.prepocp openshiftprep2.yml

OpenShift Containerized Installation

Once the ansible host (/etc/ansible/hosts) is updated, installation can be started by executing the following command.

ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml

note: if the inventory file is /etc/ansible/hosts, no need to specify with "-i" option.

If there is no error after this step, you can access the OpenShift console via https://<your-public-master-hostname&gt;:8443/ and login as any username and password.

Setting up Persistence for Registry for Non-Production

There are many options to setup the persistence for Registry on AWS. Since I only have 1 registry for the single master cluster, I decided to use what gp2 which is configured as default storageclass after the installation. Here are the steps I setup storage for OpenShift insternal registry. I used ReadWriteOnce as access mode because AWSElasticBlockStore volume plugin only support ReadWriteOnce (https://kubernetes.io/docs/concepts/storage/persistent-volumes/)

1. ssh to the Atomic master host
2. /usr/local/bin/oc login -u system:admin
3. oc project default
4. run the following:
oc create -f - <<EOF
{
  "apiVersion": "v1",
  "kind": "PersistentVolumeClaim",
  "metadata": {
 "name": "registry-volume-claim",
 "labels": {
   "deploymentconfig": "docker-registry"
 }
  },
  "spec": {
 "accessModes": [ "ReadWriteOnce" ],
 "resources": {
   "requests": {
     "storage": "20Gi"
   }
        }
   }
}
EOF

2. oc volume deploymentconfigs/docker-registry --add --name=registry-storage -t pvc  --claim-name=registry-volume-claim --overwrite

Setting up Metrics with Dynamic storage

Since the dynamic provisioning is configured, I used the default gp2 storageclass to configure metics as well. Here are the steps.

1. Add following in /etc/ansible/hosts file on bastion host
openshift_metrics_install_metrics=true
openshift_metrics_hawkular_hostname=hawkular-metrics.<your wildcard suffix>
openshift_metrics_cassandra_storage_type=dynamic
openshift_metrics_image_version=v3.7.23

2. Run the metrics playbook to setup metrics from bastion host
ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-metrics.yml 

If the playbook failed, simply uninstall metrics component by setting openshift_metrics_install_metrics=false and re-run the metric playbook.

Setting up Logging with Dynamic Storage

Logging can be configured via ansible playbook as well. I am using the default gp2 storageclass since it provides dynamic provision for the Persistence Volume.  Here are the steps.

1. Add following in /etc/ansible/hosts file on bastion host
openshift_logging_install_logging=true
openshift_logging_image_version=v3.7.23
openshift_logging_es_pvc_dynamic=true
openshift_logging_es_pvc_size=30Gi
openshift_logging_es_cluster_size=1
openshift_logging_es_memory_limit=1Gi

2.Run the logging playbook to setup logging from bastion host
ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml

If the playbook failed, simply uninstall logging component by setting openshift_logging_install_logging=false and re-run the logging playbook.

Installing from a Container on an Atomic Host Option

Instead of using a bastion host to execute ansible playbook to install on an Atomic host. You can execute the following command to install OpenShift from a container. Here are the steps I tested.

1.  atomic install --system \
> --storage=ostree \
> --set INVENTORY_FILE=/root/hosts \
registry.access.redhat.com/openshift3/ose-ansible:v3.7
Getting image source signatures
Copying blob sha256:9cadd93b16ff2a0c51ac967ea2abfadfac50cfa3af8b5bf983d89b8f8647f3e4
 71.41 MB / ? [----------------------------------=-------------------------] 7s 
Copying blob sha256:4aa565ad8b7a87248163ce7dba1dd3894821aac97e846b932ff6b8ef9a8a508a
 1.21 KB / ? [=------------------------------------------------------------] 0s 
Copying blob sha256:7952714329657fa2bb63bbd6dddf27fcf717186a9613b7fab22aeb7f7831b08a
 146.93 MB / ? [---------------------------------------------=------------] 16s 
Copying config sha256:45abc081093b825a638ec53a19991af0612e96e099554bbdfa88b341cdfcd2e6
 4.23 KB / 4.23 KB [========================================================] 0s
Writing manifest to image destination
Storing signatures
Extracting to /var/lib/containers/atomic/ose-ansible-v3.7.0
systemctl daemon-reload
systemd-tmpfiles --create /etc/tmpfiles.d/ose-ansible-v3.7.conf
systemctl enable ose-ansible-v3.7

2. systemctl start ose-ansible-v3.7
3. journalctl -xfu ose-ansible-v3.7

Hope this will help someone to have a successful OpenShift containerized installation.