Skip to content

HIVE-29579: K8s operator#6452

Open
ayushtkn wants to merge 1 commit intoapache:masterfrom
ayushtkn:k8sOperator
Open

HIVE-29579: K8s operator#6452
ayushtkn wants to merge 1 commit intoapache:masterfrom
ayushtkn:k8sOperator

Conversation

@ayushtkn
Copy link
Copy Markdown
Member

@ayushtkn ayushtkn commented Apr 24, 2026

What changes were proposed in this pull request?

Add Hive Operators to deploy Hive

Why are the changes needed?

To deploy Hive on K8s

Does this PR introduce any user-facing change?

No

How was this patch tested?

Deployed on Docker Desktop -> Follow Readme -> Build Hive master -> Build hive image -> Then Scenario 3 in README

After building the apache/hive-4.3.0-SNPASHOT docker image like

mvn clean install -DskipTests -Pdist
cd packaging/src/docker/
./build.sh -hadoop 3.4.1 -tez 0.10.5 -tez-snapshot 1.0.0-SNAPSHOT

From Hive root directory

mvn clean package -pl packaging/src/kubernetes -Pkubernetes -DskipTests
cd packaging/src/kubernetes
kubectl apply -f src/gen/hiveclusters.hive.apache.org-v1.yml

kubectl create namespace hive-operator

kubectl apply -f config/rbac/service-account.yaml
kubectl apply -f config/rbac/cluster-role.yaml
kubectl apply -f config/rbac/cluster-role-binding.yaml

export HIVE_VERSION=4.3.0-SNAPSHOT
envsubst < config/operator/deployment.yaml | kubectl apply -f -

kubectl apply -f - <<'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
  name: zookeeper
spec:
  replicas: 1
  selector:
    matchLabels:
      app: zookeeper
  template:
    metadata:
      labels:
        app: zookeeper
    spec:
      containers:
      - name: zookeeper
        image: zookeeper:3.9
        ports:
        - containerPort: 2181
---
apiVersion: v1
kind: Service
metadata:
  name: zookeeper
spec:
  selector:
    app: zookeeper
  ports:
  - port: 2181
EOF

helm repo add ozone https://apache.github.io/ozone-helm-charts/
helm install ozone ozone/ozone --version 0.2.0 --wait

kubectl exec statefulset/ozone-om -- ozone sh volume create /s3v
kubectl exec statefulset/ozone-om -- ozone sh bucket create /s3v/hive

kubectl create secret generic hive-db-secret \
  --from-literal=password=hive123

kubectl apply -f - <<'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres
spec:
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:15
        env:
        - name: POSTGRES_DB
          value: metastore
        - name: POSTGRES_USER
          value: hive
        - name: POSTGRES_PASSWORD
          value: hive123
        ports:
        - containerPort: 5432
---
apiVersion: v1
kind: Service
metadata:
  name: postgres
spec:
  selector:
    app: postgres
  ports:
  - port: 5432
EOF

envsubst < config/samples/hivecluster-full-ha.yaml | kubectl apply -f -

Running Pods:

image

Beeline

kubectl exec -it deployment/my-hive-hiveserver2 -- beeline -u "jdbc:hive2://my-hive-hiveserver2:10000/"
image

HiveServer2 UI:

ayushsaxena@Q3NW54Y0C5 kubernetes % kubectl port-forward svc/my-hive-hiveserver2 10002:10002
Forwarding from 127.0.0.1:10002 -> 10002
Forwarding from [::1]:10002 -> 10002
image

Testing with External S3 Bucket

All Steps - the Ozone from above
Changed

Created Key

kubectl create secret generic aws-s3-creds \
  --from-literal=accessKey="<My KEY ID>" \
  --from-literal=secretKey="<MY KEY>"

In hivecluster-full-ha.yaml
Changed

 warehouseDir: "s3a://ayush-k8s-bucket/warehouse"

And

  storage:
    endpoint: "https://s3.ap-south-1.amazonaws.com"
    bucket: "ayush-k8s-bucket"
    pathStyleAccess: false
    accessKeySecretRef:
      name: aws-s3-creds
      key: accessKey
    secretKeySecretRef:
      name: aws-s3-creds
      key: secretKey

Remove the hadoop override which was for ozone

 hadoop:
    coreSiteOverrides:
      fs.s3a.connection.ssl.enabled: "false"

All same steps, Created an Iceberg table & checked the location of the files ingested
image

With Oracle DB + S3 Bucket

Same as above.
Deployed Oracle as

kubectl apply -f - <<'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
  name: oracle
spec:
  replicas: 1
  selector:
    matchLabels:
      app: oracle
  template:
    metadata:
      labels:
        app: oracle
    spec:
      containers:
      - name: oracle
        image: gvenzl/oracle-free:latest
        env:
        - name: ORACLE_PASSWORD
          value: hive123
        - name: APP_USER
          value: hive
        - name: APP_USER_PASSWORD
          value: hive123
        ports:
        - containerPort: 1521
---
apiVersion: v1
kind: Service
metadata:
  name: oracle
spec:
  selector:
    app: oracle
  ports:
  - port: 1521
EOF

Updated the yml as

  metastore:
    database:
      type: oracle
      url: "jdbc:oracle:thin:@oracle:1521/FREEPDB1"
      driver: "oracle.jdbc.OracleDriver"
      username: hive
      passwordSecretRef:
        name: hive-db-secret
        key: password
      driverJarUrl: "https://repo1.maven.org/maven2/com/oracle/database/jdbc/ojdbc11/23.3.0.23.09/ojdbc11-23.3.0.23.09.jar"
    warehouseDir: "s3a://ayush-k8s-bucket/warehouse"
    configOverrides:
      metastore.catalog.servlet.port: "9001"
      metastore.catalog.servlet.auth: "none"

Created a table & verified inside Oracle DB
image

External HMS

Deployed HMS outside k8s with docker pointing to local Ozone.

Built Hive Metastore 4.3.0-SNAPSHOT HMS Docker Image then

docker compose as

version: '3.9'
services:
  postgres:
    image: postgres
    restart: unless-stopped
    container_name: postgres
    hostname: postgres
    environment:
      POSTGRES_DB: 'metastore_db'
      POSTGRES_USER: 'hive'
      POSTGRES_PASSWORD: 'password'
    ports:
      - '5432:5432'
    volumes:
      - hive-db:/var/lib/postgresql
    networks:
      - hive

  metastore:
    image: apache/hive:standalone-metastore-${HIVE_VERSION}
    depends_on:
      - postgres
    restart: unless-stopped
    container_name: metastore
    hostname: metastore
    environment:
      DEFAULT_FS: "s3a://hive"
      HIVE_WAREHOUSE_PATH: "/warehouse"
      HADOOP_CLASSPATH: "/opt/hadoop/share/hadoop/tools/lib/*:/tmp/ext-jars/*"
      DB_DRIVER: postgres
      SERVICE_NAME: 'metastore'
      SERVICE_OPTS: >
        -Xmx1G
        -Djavax.jdo.option.ConnectionDriverName=org.postgresql.Driver
        -Djavax.jdo.option.ConnectionURL=jdbc:postgresql://postgres:5432/metastore_db
        -Djavax.jdo.option.ConnectionUserName=hive
        -Djavax.jdo.option.ConnectionPassword=password

      S3_ENDPOINT_URL: "http://host.docker.internal:9878"
      AWS_ACCESS_KEY_ID: "ozone"
      AWS_SECRET_ACCESS_KEY: "ozone"
    ports:
        - '9001:9001'
        - '9083:9083'
    volumes:
        - warehouse:/opt/hive/data/warehouse
        - type: bind
          source: ${POSTGRES_LOCAL_PATH}
          target: /opt/hive/lib/postgres.jar
        # Mount local jars to a temporary staging area (Read-Only)
        - ./jars:/tmp/ext-jars:ro
    networks:
      - hive

volumes:
  hive-db:
  warehouse:

networks:
  hive:
    name: hive

Verified the HMS lives alone in Docker env
image

Modified the metastore column in hivecluster-full-ha.yaml to be as

  metastore:
    enabled: false
    externalUri: "thrift://host.docker.internal:9083"

The externalUri points to our external HMS

Deployed hivecluster-full-ha.yaml No HMS here
image

@ayushtkn ayushtkn changed the title WIP: K8s operator WIP: HIVE-29579: K8s operator Apr 24, 2026
@ayushtkn ayushtkn marked this pull request as draft April 24, 2026 14:39
@zhangbutao
Copy link
Copy Markdown
Contributor

I can't wait to try out this feature!!!
BTW, besides the LLAP mode, can we also execute normal Tez tasks?

@ayushtkn
Copy link
Copy Markdown
Member Author

ayushtkn commented May 4, 2026

BTW, besides the LLAP mode, can we also execute normal Tez tasks?

Thanks @zhangbutao for taking a look! Currently, the operator only supports LLAP mode (along with a Tez Local mode strictly for development and testing). Standard Tez mode isn't on my immediate radar right now; I can look into Tez in a follow-up maybe after them, though LLAP generally makes more sense in cloud environments

@zhangbutao
Copy link
Copy Markdown
Contributor

Standard Tez

@ayushtkn That sounds great. I think LLAP on cloud/k8s is definitely a milestone for Hive — a really nice feature.

However, as far as I know, in the real world, there are still a large number of Hive on standard Tez mode jobs running on YARN. Many users want to migrate these Tez jobs to native Kubernetes(without relying on YARN's scheduling), just like Spark/Flink on k8s. I’m really looking forward to further updates on Tez on k8s as well.

Thanks!

@ayushtkn ayushtkn changed the title WIP: HIVE-29579: K8s operator HIVE-29579: K8s operator May 5, 2026
@ayushtkn
Copy link
Copy Markdown
Member Author

ayushtkn commented May 5, 2026

Thanks @zhangbutao for sharing the use case. I’ll definitely look into it. Most likely, we’ll have support for both LLAP mode and Tez mode before the release — or at least I’ll try to cover as much as possible. I will create a ticket to track that.

Need to see from Tez side, I think it would require changes there first YARN -> Kubernetes

In the end, it’s up to the users which mode they want to use. Whichever mode they choose, they’re still using Hive, which is a win-win for us :-)

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 8, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants