GenAI 101 - How Generative AI is Rewriting the Rules for ML Engineers (And How to Adapt and get Started)
Some Thoughts and Approaches to get started in GenAI space
It was around late 2022 when I first realized something fundamental was changing in machine learning. After years of fine-tuning BERT models for text classification and training CNNs for computer vision tasks, suddenly everyone was talking about ChatGPT and generative AI.
I remember thinking, "This is just another trend that will settle down." But man, was I wrong.
We're witnessing a paradigm shift unlike anything I've seen in my years as an ML practitioner. Having now implemented both paradigms in production environments, I can tell you this: understanding this shift isn't just good career advice – it's essential if you want to remain relevant in today's AI landscape. Moreover, a firm grasp of generative AI is crucial for acing ML Design interviews, where in-depth knowledge of cutting-edge techniques can truly set you apart
In this post, I'll break down the key differences between discriminative and generative approaches, explain why the latter has exploded in popularity, and share practical advice on transitioning between these paradigms based on my real-world experience.
Discriminative vs Generative: A 2-Minute Crash Course
So, in my experience, every few years, we see new buzzwords in the ML community that gain popularity. Data science itself was once such a buzzword (and still is, for some). With generative AI, though, we're dealing with a fundamental shift in how we approach problems.
Let me explain the key differences between the past Discriminative models and the current Generative models in the simplest way possible:
Discriminative Models: These learn to separate different classes by creating decision boundaries. They excel at classifying inputs into predefined categories by modeling P(Y|X) – the probability of a label Y given an input X.
Generative Models: These learn the underlying data distribution itself by modeling P(X, Y) or P(X). This allows them to not only classify but also generate new data samples that resemble the training distribution.
Let’s understand the difference with a practical example as well. We will look at a sentiment analysis problem implemented both ways:
# APPROACH 1: Discriminative (BERT)
from transformers import BertForSequenceClassification, BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
def predict_sentiment_discriminative(text):
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
outputs = model(**inputs)
return "positive" if outputs.logits[0][1] > outputs.logits[0][0] else "negative"
Now let's see how a generative approach would handle the same task:
# APPROACH 2: Generative (LLaMA)
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b")
def predict_sentiment_generative(text):
prompt = f"Review: '{text}'\nQuestion: Is this review positive or negative?\nAnswer:"
inputs = tokenizer(prompt, return_tensors="pt")
# Generate a continuation
output = model.generate(
inputs.input_ids,
max_new_tokens=5,
temperature=0.1
)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
# Extract prediction
prediction = generated_text.split("Answer:")[1].strip().lower()
return "positive" if "positive" in prediction else "negative"
Notice the key difference? The discriminative approach directly maps inputs to outputs through a classification boundary. The generative approach understands what positive and negative reviews look like and can generate appropriate text in response to a prompt. And this brings me to the —>
The 3 Superpowers of Generative Models
As we now understand the difference between Generative and discriminative models, Let’s talk about the several superpowers that generative models have currently that the discriminative models simply don't possess:
1. Zero/Few-Shot Learning: The Cold Start Problem Solved
I still remember the first time I realized what made generative models truly special. Think about if you were trying to classify a new type of customer complaint that had emerged after a product update. Normally, this would mean collecting hundreds of examples, labeling them, training a classifier, and then deploying it – a process that could take weeks.
Instead, now you can write a prompt that describes the categories and just give a couple of examples. The model can now start classifying new complaints with reasonable accuracy immediately. No training. No fine-tuning. It just works.
Here's a simplified example of what this may look like in practice:
# Scenario: Classify customer feedback without labeled examples
categories = ["shipping issue", "product quality", "billing problem", "product availability"]
new_feedback = "I was charged twice for my order but only received one package."
# APPROACH 1: Traditional ML (Discriminative)
# Without labeled examples, we simply cannot train this model
clf = RandomForestClassifier()
# clf.fit(???, ???) # No way to train without labeled data!
# APPROACH 2: Generative AI
def classify_text(text, categories):
prompt = f"""
Classify the following text into exactly one of these categories: {', '.join(categories)}.
Text: "{text}"
The category is:
"""
# API call to an LLM
response = call_llm_api(prompt)
return response
The implications of this capability are enormous. Think about it – no more cold start problems. No more waiting for enough labeled data. You can deploy solutions immediately and iterate as you collect more examples.
2. Cross-Domain Generalization: Transfer Learning on Steroids
Another transformative capability is how generative models handle domain shifts. I recall working on a project where we had trained a classifier on data from our US market, but we needed to deploy it in another English-speaking market. The language differences were subtle but significant enough that our model's performance dropped by almost 15%.
With generative models, this problem largely disappears. Since they're trained on diverse, internet-scale data spanning numerous domains, they already have a fundamental understanding of different regional contexts, industries, and vocabulary.
3. Creating Rather Than Just Classifying:
The most obvious superpower is right in the name: generative models can create new content. This isn't just a novelty – it's transforming how we build ML applications.
With the same model, you can:
Draft personalized email campaigns
Generate code for API integrations
Create summaries of long documents
Translate content into multiple languages
Build entire content strategies
Draft technical specifications or documentation
And so much more
Old World: 10 tasks = 10 models.
New World: 1 model to rule them all.
And this brings us to answer the most important question of our times —>
Why Innovation Just Got Turbocharged?
The impact of generative models on the speed of innovation has been staggering. These are some of the concrete ways I have seen where generative AI has not just changed the game, but completely transformed the playing field, rewriting the rules of engagement:
1. Prototype in Days, Not Months
Before generative AI, creating a new ML solution required collecting task-specific data, labeling examples, engineering features, training a model, and iterating based on evaluation. This process can often take months.
Now, many of these steps can be skipped entirely with prompt engineering. A specialized news article classifier that once would have taken weeks to build can be created in minutes by describing the categories and providing a few examples. Adding a new category, which would have meant again collecting additional data, can just be achieved by adding that category in a list of choices.
This dramatic acceleration has completely changed how we approach new ML projects. We can test ideas quickly, get feedback faster, and pivot if something isn't working.
# Step 1: Write a prompt
# Step 2: Ship it 🚢
2. Democratized Access - Grandma Can Build ML Now
Perhaps the most fascinating change I've witnessed is how non-technical folks are now building ML solutions themselves. A doctor can create a medical information assistant, a teacher can build an educational tool, and an influencer can develop a content generator—all by simply describing what they want in natural language. For coders, it has removed the language boundaries where a coder can now get the code and all instructions at hand to code and build anything in any language. I remember how I was able to build and deploy a small JavaScript website on Vercel in a day without any knowledge of JavaScript or Vercel. It was simply amazing.
This democratization reminds me of how WordPress and similar CMS platforms made website creation accessible to non-developers in the early 2000s. We're seeing a similar revolution where domain experts can directly implement AI solutions without technical intermediaries
3. API-Driven Dev (Goodbye, GPU Hell)
The way we build and deploy AI applications has completely changed. Before generative AI, developing a sophisticated ML system meant:
Setting up GPU infrastructure
Managing model weights and configurations
Building inference servers
Handling scaling and reliability
Now? It's just a few API calls.
I remember spending three weeks building an inference pipeline for a BERT-based classification system. The equivalent today is three lines of code calling OpenAI's API.
import openai
response = openai.ChatCompletion.create(...) # Profit 💸
This accessibility has unleashed an explosion of innovation:
Startups are launching faster with sophisticated AI capabilities.
Teams are iterating on concepts in days rather than months. Honestly right now, every person in my current team is running at least 2-3X the amount of AB tests they used to run before. And most of them are around simple routing classifiers for which we never bothered to create models and just used heuristics.
Specialized vertical applications are emerging across industries where LLMs are solving problems that were previously considered too ambiguous or complex for automation.
And the good thing about it is that the current software can easily incorporate AI capabilities without rebuilding by letting AI take care of a few or more of the system modules.
I was and still am skeptical about LLMs (especially coming from a background of building custom models), but the productivity gains have been undeniable and I can say that they are there to stay. But there are some problems that needs solving as well —>
Production Nightmares: Where Generative AI Bites Back
Despite the incredible advantages, deploying generative models in production comes with significant challenges that aren't immediately obvious to a first-time user and hence I would like to bring them to notice:
1. The Cost Reality
Let me be blunt: For all its advantages, generative inference could be expensive.
Generative models, particularly large language models, are substantially more expensive to run than their discriminative counterparts. A simple sentiment analysis that takes milliseconds with a discriminative model might take 3-4x time and hence compute with a generative approach.
For applications making millions of predictions daily, this cost difference can be the difference between a profitable product and one that bleeds money.
2. “Confidently Wrong” Syndrome
Another challenge one would face is the reliability of generative outputs. These models can behave sometimes like that overly confident person you might know of — i.e., they can confidently produce incorrect information if they don’t know about a topic,– this phenomenon is commonly called "hallucination."
This tendency creates significant risks in domains where accuracy is critical. While discriminative models may simply say "I don't know" or classify something as "other," or give a very low confidence score for their prediction, generative models can produce detailed but fabricated responses.
This reliability gap means generative models require additional guardrails and verification systems in production environments.
3. Evaluation Chaos
How do you measure if a generated response is good? It's surprisingly difficult.
Traditional metrics like accuracy or F1 score don't work well for evaluating generative outputs. You can't simply compare the generated text to a "correct" answer because there might be many valid responses.
This makes monitoring model quality in production extremely challenging. In most cases, you'll need some form of human evaluation, which doesn't scale well.
And, this brings me to the strategies I have found helpful for deploying such systems —>
Practical Transition Strategies: My Playbook
Based on my experience implementing both paradigms in production, here are some practical tips for making the transition from discriminative to generative AI:
1. Go Hybrid: Best of Both Worlds
I've found that combining both paradigms often gives the best results. Here's a pattern that has worked well for me:
Use a fast, lightweight discriminative model as the first line of defense.
For some particular cases or when the discriminative model's confidence is low, fall back to a generative model. This could depend on a rule.
Cache common generation patterns to reduce API calls and costs.
For example, a search system might use a discriminative classifier to route a search query through a specific path, but employ a generative model to handle unusual and longer requests which require more thought.
This hybrid approach gives you the speed and reliability of discriminative models with the flexibility of generative ones when needed.
2. Retrieval-Augmented Generation (RAG): Your Hallucination Antidote
def rag_answer(question):
# Step 1: Retrieve
relevant_text = semantic_search_knowledge_base(question)
# Step 2: Ground
return generate_answer(f"Context: {relevant_text}\nQ: {question}")
The most effective technique for reducing hallucinations is Retrieval-Augmented Generation (RAG).
Instead of relying solely on the generative model's parametric knowledge, a RAG system first retrieves relevant information from a trusted knowledge base, then uses that information to guide the generation process.
This approach grounds generation in verified facts and is particularly valuable for domain-specific applications where factual accuracy is critical.
3. Prompt Engineering = New Feature Engineering
In the discriminative world, feature engineering was king. Now, prompt engineering has taken that throne.
I've spent countless hours experimenting with different prompting techniques to squeeze better performance out of generative models. Some patterns I've found particularly effective:
Zero-shot prompts: Simply describing the task without examples
Few-shot prompts: Including 2-5 examples in the prompt to guide the model
Chain-of-thought prompts: Encouraging step-by-step reasoning by adding "Let's think about this step by step" to complex tasks
System prompts: Setting the overall behavior and constraints for the model
Although I sometimes make fun of Prompt Engineering as a profession on its own, the right prompting strategy can often improve model performance more than architectural changes or additional training data, making prompt engineering a critical skill for today's ML practitioners.
4. Distilling Generative Knowledge into Discriminative Models
You can get the best of both worlds by distilling knowledge from generative models into smaller, faster discriminative ones.
Use LLM to label data → Train a tiny BERT: 100x cheaper, 80% as good
This process typically involves using a generative model to create synthetic training data or labels, then training a smaller discriminative model on this data. The resulting model combines the accuracy and insights of the generative model with the speed and efficiency of discriminative approaches.
This technique is particularly valuable for high-volume production applications where inference cost and latency are of utmost concern.
5. Fine-Tuning: Bend Generative Models to Your Will
While LLMs are great at general chat, they might fail miserably at parsing semiconductor datasheets for your business.
When to Fine-Tune:
✅ Domain-specific jargon (medical, legal, engineering)
✅ Output format constraints (strict JSON/XML schemas)
✅ New Data Availability → Can be handled through RAG as well.
✅ When non fine-tuned model responses don’t work for you.
The Code:
# Fine-tune with LoRA (Low-Rank Adaptation)
from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b")
# Inject LoRA adapters (trains only 0.5% of params!)
config = LoraConfig(
r=8,
lora_alpha=16,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05
)
model = get_peft_model(model, config)
model.print_trainable_parameters() # "0.5% of 7B = 35M params"
# Train
trainer = Trainer(
model=model,
train_dataset=dataset,
args=TrainingArguments(
per_device_train_batch_size=4,
gradient_accumulation_steps=2,
warmup_steps=100,
max_steps=1000,
learning_rate=3e-4,
fp16=True
)
)
trainer.train()
Some Tips:
Start with parameter-efficient methods (LoRA, Adapters) and not full Fine Tuning of the whole model.
Use synthetic data generation to bootstrap training examples for PEFT based Fine-Tuning:
def generate_examples(prompt_template, n=1000):
return [llm.generate(prompt_template) for _ in range(n)]
6. Key Architecture Considerations
And finally some considerations you should always think before deploying these models:
Cost Control:
Use Quantization for Self-Hosted Models — Shrink LLMs to run on cheaper hardware without massive accuracy drops
Use Prompt Caching — Users probably ask the same questions 100x/day, just cache the responses from the model in a never expiring cache.
Version Control: Think of Prompts as Code
Store prompts in JSON/YAML with metadata
A/B test prompts. Use bandit algorithms to auto-promote winning variants
Latency Control:
Asynchronously build your caches —
Pre-generate common responses before they’re requested
Graciously fail first time a query comes and cache it async
And here is a learning path for those of you who are just starting out —>
Learning Path for Generative AI (From a Recovering Skeptic)
If you're looking to deepen your knowledge in generative AI, here are some excellent Coursera courses that I've personally found valuable:
DeepLearning.AI: Generative AI with Large Language Models - This course provides a comprehensive overview of how LLMs work and how to apply them effectively.
Natural Language Processing Specialization - Covers essential NLP concepts that form the foundation of text generation models.
Prompt Engineering for ChatGPT - Focused specifically on developing effective prompting skills.
My learning approach recommendation:
Start with fundamentals: Make sure you understand the basics of deep learning, particularly attention mechanisms and transformers
Focus on practical applications: Build simple applications using APIs from OpenAI or Anthropic to get comfortable with generative models
Learn prompting techniques: Experiment with different prompting techniques to understand how they affect model outputs
Learn about retrieval and RAG if you need to build factual, reliable systems
Experiment with fine-tuning: Try adapter-based fine-tuning on your domain data
Stay current: This field moves incredibly fast, so follow recent research, and don’t forget to subscribe to my blog for the recent updates.
Conclusion
The shift from discriminative to generative AI doesn't mean you should abandon your old tools. It means you have powerful new options in your ML toolkit.
I've found the most success by:
Using generative models for flexibility, handling edge cases, and tasks requiring creative outputs
Sticking with discriminative models where speed, cost-efficiency, and deterministic outputs are critical
Building hybrid systems that leverage the strengths of both approaches
Ultimately, the best approach depends on your specific requirements, budget constraints, and accuracy needs. Don't chase the shiny new technology just because it's trendy – choose the right tool for your specific problem.
Don’t: “We need ChatGPT for this simple classifier.”
Do: “Let’s prototype with LLM, then distill to Logistic Regression.”
What's your experience with this paradigm shift? Have you found creative ways to combine these approaches in your work? Let me know in the comments!
Thanks for the read. I am going to be writing more beginner-friendly posts in the future too. Follow me up at Medium or Subscribe to my blog to be informed about them.