NVIDIA

Using the API to Find Free Hosted Models on NVIDIA Builder

The NVIDIA Developer Program provides access to a wide catalog of AI models through NVIDIA Inference Microservices (NIM), offering an OpenAI-compatible API. You can browse and discover available models at build.nvidia.com/explore/discover .

If you want to find models with free hosted endpoints in the browser, you can enable the “Free Endpoint” filter on the model catalog page. But what if you need that information programmatically – in a script, a CI pipeline, or as part of an automated workflow? The browser filter is not accessible through the API, and the /v1/models endpoint does not distinguish between free hosted models and everything else.

Benchmarking GPUs: Measuring Throughput between CPU and GPU

This article was inspired by a LinkedIn post by Dennis Kennetz . The CPU to GPU bandwidth check is available on GitHub which uses a specific flow to assess the data transfer rates. Like many in the industry, my focus is on AI and ML workloads and how we can improve efficiencies and performance using DRAM, CXL, CPU, GPUs, and software improvements.

In the rapidly evolving landscape of artificial intelligence (AI) and machine learning (ML), the ability to process vast amounts of data efficiently is paramount. As AI models grow in complexity and size, the demand for high-performance computing resources intensifies. At the heart of this demand lies the crucial task of optimizing data transfers between various components of a computing system, particularly from DRAM, CPU, and emerging technologies like CXL (Compute Express Link) to and from the GPU.