The process of using a trained AI model to make predictions or generate outputs. Training teaches the model; inference is when it applies what it learned. When you ask ChatGPT a question, that is inference. Inference speed and cost are key factors for production AI systems.