Submit

FasterTransformer

NVIDIA FasterTransformer accelerates LLM and decoder inference with optimized CUDA kernels, tensor parallelism, and quantization support.

Open Source

FasterTransformer

36
Inference Optimization
November 15, 2025
330

Replicate

Replicate hosts thousands of open-source LLMs and multimodal models with an easy API for inference, versioning, and deployments.

4
November 15, 2025
513

Outlines

Outlines is a Python library for structured prompt generation using type constraints, regex, or grammars to ensure deterministic LLM outputs.

10
November 15, 2025
478

Anyscale Endpoints

Anyscale provides hosted LLM endpoints built on Ray, enabling fine-tuning, scalable inference, and enterprise workloads.

41
November 15, 2025
788

🤖 Help GenAIFolks discover smarter tools ✨

Submit

About

Explore 🤖 the AI stack transforming productivity and innovation.
GenAIFolks Tools curates top AI apps, APIs, and frameworks — making it easy for builders, coders, and founders to find the right solution fast. 💡

💬 Got an AI product or partnership idea? Let’s connect at genaifolks.com/contact

FasterTransformer

Related

Replicate

Outlines

Anyscale Endpoints

About

Navigation

Follow