FasterTransformer
NVIDIA FasterTransformer accelerates LLM and decoder inference with optimized CUDA kernels, tensor parallelism, and quantization support.
NVIDIA FasterTransformer accelerates LLM and decoder inference with optimized CUDA kernels, tensor parallelism, and quantization support.
🤖 Help GenAIFolks discover smarter tools ✨
SubmitExplore 🤖 the AI stack transforming productivity and innovation.
GenAIFolks Tools curates top AI apps, APIs, and frameworks — making it easy for builders, coders, and founders to find the right solution fast. 💡
💬 Got an AI product or partnership idea? Let’s connect at genaifolks.com/contact