🔥 L40s Server Is Now Live – Just 0.83 credits/hr!
How to Run GPT-OSS-20B and GPT-OSS-120B on Glows.ai
Tutorial

How to Run GPT-OSS-20B and GPT-OSS-120B on Glows.ai

This tutorial will guide you on how to run the latest gpt-oss-20b model on Glows.ai using a rented NVIDIA GeForce RTX 4090 GPU. Using the same method, you can also run the gpt-oss-120b model on an NVIDIA H100 SXM5.

The tutorial covers the following topics:

  • How to create an instance on Glows.ai
  • How to use gpt-oss-20b
  • Sharing access with others
  • Programmatically calling gpt-oss-20b via API
  • Using Auto Deploy for “start-on-demand” instances

GPT-OSS-20B is an open-source large language model released by OpenAI in August 2025, featuring around 21 billion parameters. It uses a Mixture-of-Experts (MoE) architecture, where each token only activates about 3.6 billion parameters, reducing inference resource requirements. The model supports local deployment, with performance optimizations for MoE layers, and runs with only 16GB of GPU memory. According to official benchmarks, its performance is close to OpenAI o3-mini on several common tests.

image-20250811161540192

Creating an Instance

On Glows.ai, create an on-demand instance following this guide. Make sure to select the official pre-configured GPT OSS 20B image (img-neqm8dp2).

On the Create New page, set Workload Type to Inference GPU -- 4090 and select the GPT OSS 20B image. This image comes with the necessary runtime environment and pre-started services: Ollama (listening on port 11434) and OpenWebUI (listening on port 8080).

image-20250811142101261

Datadrive is Glows.ai’s cloud storage service, allowing you to upload data, models, and code before creating an instance. During instance creation, you can click Mount to mount Datadrive, giving direct access from the instance.

This tutorial focuses on inference only, so mounting Datadrive is not required.

After completing the setup, click Complete Checkout in the bottom right to create the instance.

image-20250811142313589

The GPT OSS 20B instance takes about 30–60 seconds to start. Once started, you can check its status and access links in My Instances:

  • SSH Port 22: SSH connection
  • HTTP Port 8888: JupyterLab
  • HTTP Port 11434: Ollama API
  • HTTP Port 8080: OpenWebUI

image-20250811142452011

Using GPT OSS 20B

Go to the My Instances page and click the Open button under HTTP Port 8080 to access the Open WebUI service. The first time you use it, you’ll need to create an administrator account.

image-20250811142719474

Once the account is created, you’ll enter the chat interface and can interact directly with the gpt-oss-20b model.

image-20250811143014826

Open WebUI also suggests follow-up questions based on the current conversation, making it easier to extend dialogues.

image-20250811150602422

Sharing with Friends

OpenWebUI has a built-in account management system to isolate conversations. Follow these steps: click the top-left menu → bottom-left avatar → Admin Panel.

image-20250811143426787

Click Settings, set Default User Role to user, enable New Sign Ups, and click Save.

image-20250811144203704

Share the HTTP Port 8080 link from My Instances with friends. They can open it in a browser and click Sign up to create a new account.

image-20250811144249486

After users register, the admin can view their accounts and question history in the Admin Panel.

image-20250811144448371

Calling GPT OSS 20B via API

To call GPT-OSS-20B programmatically, enable it as an API service.

In the Admin Panel → Settings, scroll down to Enable API Key, enable it, and click Save.

image-20250811144752670

In your personal account, go to Settings → Account to view or create API keys. Keep your key safe, as it will be used in Auto Deploy examples.

image-20250811144841124 image-20250811144912480

After obtaining an API key, you can send requests as follows. API endpoint format:

bash
API Endpoint = HTTP Port 8080 link + /api/chat/completions
Example: https:/tw-05.access.glows.ai:25947/api/chat/completions

Example request:

bash
curl -X POST https:/tw-05.access.glows.ai:25947/api/chat/completions \
  -H "Authorization: Bearer sk-f9xxxxxxxxxxxx0" \
  -H "Content-Type: application/json" \
  -d '{
        "messages": [
          {
            "role": "user",
            "content": "I just started college and want to learn Python artificial intelligence. Please help me plan my study content."
          }
        ],
        "model": "gpt-oss:20b",
        "temperature": 0.7
      }'

The model returns results after inference.

image-20250811150332264

You can also use OpenAI’s Python SDK:

python
import openai
client = openai.Client(
    base_url="https:/tw-06.access.glows.ai:25947/api", api_key="sk-f9xxxxxxxxxxxx0")

# Chat completion
response = client.chat.completions.create(
    model="gpt-oss:20b",
    messages=[
        {"role": "system", "content": "You are a helpful AI assistant"},
        {"role": "user", "content": "Tell me which is bigger, 9.11 or 9.8"},
    ],
    temperature=0.6
)
print(response)

image-20250811170502665

Advanced Usage: Auto Deploy

Traditional GPU deployment requires manually creating and releasing instances. For sporadic usage or providing API access to third parties, this is inconvenient.

Glows.ai Auto Deploy creates a fixed service link. When a request is sent to the link, Glows.ai automatically creates an instance to handle it. If no requests occur within 5 minutes, the system automatically releases the instance—achieving “start-on-demand.”

Creating a Snapshot

To preserve API keys and environment configurations, take a snapshot via Take Snapshot on Glows.ai.

image-20250811170732162

Enter a snapshot name and select automatically released, so the instance is released after snapshot creation.

image-20250811170845190

If you haven’t purchased storage, Glows.ai stores snapshots under Snapshots → Restorable. For this tutorial, the snapshot is small. You can buy a 5GB storage plan for $0.5/month at Storage Space and allocate 1GB to the snapshot. Afterward, move it to Available in Snapshots.

image-20250811171559043

Setting Up Auto Deploy

Go to Auto Deploy → New Deploy to create a new configuration. Name it for easy identification.

image-20250811171843622

Select the GPU and environment. You can use a snapshot or system image; here, we choose the previously created snapshot.

image-20250811171948063

Set the service port and start command. For this example, Open WebUI runs on port 8080 automatically, so only the port needs configuration.

bash
Port: 8080

image-20250811172106189 Click Confirm to complete setup.

Viewing Configuration

After setup, you can see the fixed link and configuration details.

image-20250811172148517

Replace the API endpoint with the Auto Deploy link and include your API key in Authorization:

bash
curl -X POST https:/tw-05.sgw.glxxxxxx224w/api/chat/completions \
  -H "Authorization: Bearer sk-f9xxxxxxxxxxxx0" \
  -H "Content-Type: application/json" \
  -d '{
        "messages": [
          {
            "role": "user",
            "content": "I just started college and want to learn Python artificial intelligence. Please help me plan my study content."
          }
        ],
        "model": "gpt-oss:20b",
        "temperature": 0.7
      }'

image-20250811173542494

If no new requests occur within 5 minutes, the system releases the instance automatically. Auto Deploy also shows total costs and Instance Status:

  • Standby: Configuration normal; no instance started.
  • Idle: Request received, instance creating; or instance auto-releasing.
  • Running: Instance running and processing requests; automatically released if idle for 5 minutes.

For large-scale use, contact Glows.ai to pre-cache images or snapshots for faster instance startup.

Contact Us

If you have any questions or feedback while using Glows.ai, feel free to contact us: