Measure AI output quality using key metrics for evaluating prompt performance. Learn about BLEU, ROUGE, and LLM-as-a-judge evaluation frameworks.