Is AI Going to Replace Data Engineering?
tl;dr No, at least not yet. Only
In the previous post, I looked at what companies ask for when hiring data engineers. One thing I wanted to dig into: where does AI actually show up in these jobs?
Summary:
- Only
% of DE job postings mention any AI tool - in the bottom half across all tech roles - Most roles use AI for productivity; AI/ML engineers are the exception - they build with LLM integration frameworks
- AI isn't replacing data engineers. It's being added as a nice-to-have at companies that already expect a long list of skills.
AI tools by role
How often do job postings mention any AI tool? It varies a lot by role.
AI engineers lead at over 54%, followed by ML engineers and data scientists. Then there's a sharp drop: software engineers at ~12%, and a cluster of roles in the 4-11% range. Data engineers land at
This makes sense. Data engineers spend their days in pipelines, orchestration DAGs, and infrastructure config - not the kind of work where AI tools show up in job requirements yet.
Coding help or building with LLMs?
Among jobs that mention AI tools, there are two distinct use cases: productivity AI (Copilot, Cursor, OpenAI) that helps you work faster, and LLM integration (LangChain, LlamaIndex, Azure OpenAI) where you're building AI into the product. How does this split vary by role?
Note: this includes only the roles that mention any of the related tools or technologies.
AI and ML engineers are mostly building with LLM integration frameworks - that's the job. For most other roles, including data engineers, productivity AI dominates - they're using tools like Copilot and OpenAI to work faster, not building AI features.
Zooming out
Across all
The vast majority of tech job postings don't mention AI tools at all. My prediction is that most never will. I have a feeling that the tools will become so prevalent so fast that they won't be mentioned as skill requirements, similar to Jira or Calendar.
Which AI tools appear in DE jobs?
Of the
Productivity AI (red) - tools like GitHub Copilot, OpenAI, and Cursor that help you work faster. LLM integration (purple) - frameworks like LangChain and LlamaIndex where you're building AI into the product.
A note on classification: "OpenAI", "Anthropic", and "Gemini" in job ads are ambiguous - they could mean "use ChatGPT" (productivity) or "integrate the API" (building). We classify them as productivity because most job ads use these names to mean the chat product, not the API. Jobs that specifically need API integration tend to list frameworks (LangChain, LlamaIndex) or cloud wrappers (Azure OpenAI, Amazon Bedrock).
Who's actually listing AI tools? DE jobs that mention any AI tool list a median of 12 technologies in the posting, compared to 8 for those that don't. Two thirds of AI-mentioning DE jobs list 10+ technologies. These aren't focused AI roles - they're kitchen-sink postings that ask for everything. AI tools are being tacked onto already-long requirement lists, not driving new kinds of positions.
What this means
"Will AI take my job?" is a common question in data engineering communities right now.
The data says: not yet, and not the way you think.
% is not a revolution. About 1 in 16 DE job postings mention any AI tool. The core DE stack - Airflow, Spark, dbt - each appears in far more postings. AI tools are a footnote. - Most mentions are productivity tools. These are nice-to-haves, not core requirements - companies aren't restructuring hiring around them.
The real question isn't whether AI will replace data engineers. It's whether AI will make each data engineer more productive - and whether that means companies hire fewer of them. The job ads can't answer that yet.
Anthropic's own research on AI's labor market impact found that even among developers with access to AI tools, the effect on hiring is ambiguous - productivity gains don't straightforwardly translate to fewer jobs.
What they can tell you: right now, employers aren't restructuring data engineering around AI. They're hiring the same roles, asking for the same tools, and occasionally mentioning Copilot in the nice-to-have section.
Methodology
Time window:
AI tool classification: "Productivity AI" = Cursor, Claude Code, GitHub Copilot, Windsurf, OpenAI Codex, Google Antigravity, OpenAI, Gemini, Anthropic, Mistral. "LLM integration" = LangChain, LlamaIndex, Azure OpenAI, Amazon Bedrock, CrewAI, Semantic Kernel, Haystack, Cohere, Ollama, vLLM. MLOps platforms (MLflow, SageMaker) and ML frameworks (PyTorch, TensorFlow) are excluded - they're infrastructure, not the "AI replacing engineers" signal this post investigates.
Role matching: 10 role families matched by title keywords against English-translated titles. Broader families group related titles: "Software Engineer" includes backend, frontend, full stack, and mobile; "DevOps / Platform" includes SRE, infrastructure, and cloud engineers; "Data / Business Analyst" includes BI, product, and marketing analysts.