My Honest Experience Using AI for Data Analytics and Coding
A data analyst shares real experiences using AI for data analytics, SQL optimization, and Python automation. Learn what works and where AI fails.
A data analyst shares real experiences using AI for data analytics, SQL optimization, and Python automation. Learn what works and where AI fails.
I once spent 14 hours staring at a messy CSV file, writing nested loops to extract poorly formatted dates. I thought I was being a rigorous analyst. I was just wasting time. Today, that same task takes me about 45 seconds using basic prompt techniques. The shift from manual data manipulation to AI-assisted workflows completely changed how I approach data analytics. But it is not a magic wand. If you do not understand the underlying logic, these tools will confidently generate garbage. I have spent the last year integrating these tools into my daily routine. Here is exactly how I use AI in my workflow, what actually works, and where it falls flat.
Prompt Engineering for Data Science involves structuring your requests with explicit schemas, constraints, and business logic to get usable code. It typically cuts script drafting time in half, though the results always require careful human review.
I used to think prompts were just asking questions. They are not. They are specifications. When you treat an AI like a search engine, you get generic, broken code. When you treat it like a junior developer who needs exact instructions, you get results. I learned this when transitioning from basic Excel sheets to more complex environments.
Moving from Excel to Python automation requires shifting from visual cell manipulation to programmatic logic. AI bridges this gap by writing the initial boilerplate scripts, allowing analysts to focus on the logic rather than the syntax.
I had a weekly task of merging 6 different regional sales reports. Doing this manually took hours. I asked an AI to write a script. The first attempt failed because I did not specify the date formats. The second attempt, where I provided a sample row from each file, worked perfectly. It is all about the context you provide.
Automated data cleaning with AI relies on scripts that detect anomalies, handle missing values, and standardize formats. It works best when you explicitly define the rules for edge cases upfront.
You can ask an AI to clean your data. It will usually write a script using Pandas and NumPy libraries. But here is the catch. It might drop rows with null values when you actually needed to impute them. I learned this the hard way when I lost 18% of a client's dataset during a routine clean. Always enforce strict LLM-generated code verification before running anything on production data.
Converting Natural Language to SQL requires feeding the AI your exact table structures and relationship definitions. Without proper context, the generated queries often fail or return inaccurate metrics.
Writing SQL from scratch is tedious. Using Large Language Models (LLMs) to draft them is faster. But you have to provide the schema. I keep a text file with my most-used table schemas just to copy and paste into prompts.
Optimizing SQL queries with AI involves pasting your slow query and asking for execution plan improvements. The AI often suggests better indexing or rewriting subqueries as joins, which can significantly reduce runtimes.
I had a query taking 4 minutes to run. I fed it to an AI and asked for SQL query optimization. It rewrote my nested subqueries into temporary tables. The runtime dropped to 12 seconds. However, it sometimes suggests indexes that I do not have permissions to create, so you have to know your database constraints.
SQL window functions perform calculations across a set of table rows related to the current row without collapsing the result set. They are essential for running totals, moving averages, and ranking.
SQL window functions sound intimidating. They are not. Think of them as a way to peek at neighboring rows without collapsing your data. Run this query on the sample table below and watch what happens.
-- Sample data setup
CREATE TABLE sales_data (
date DATE,
sales_rep VARCHAR(50),
revenue DECIMAL(10,2)
);
-- Using a window function for a running total
SELECT
sales_rep,
revenue,
SUM(revenue) OVER (PARTITION BY sales_rep ORDER BY date) as running_total
FROM sales_data;
GitHub Copilot for VS Code acts as a real-time pair programmer that suggests code snippets based on your active file and comments. It is highly effective for boilerplate code but requires supervision for complex business logic.
I installed this tool a year ago. It reads my comments and writes the function. It is great for API integration tasks where the boilerplate is always the same. But it is not perfect.
| Tool Feature | Primary Use Case | Where It Fails |
|---|---|---|
| Inline Autocomplete | Repetitive formatting and loops | Complex statistical logic |
| Chat Interface | Explaining unfamiliar syntax | Understanding full project context |
| Test Generation | Creating basic unit tests | Testing edge cases in dirty data |
Using AI within Jupyter Notebooks allows analysts to generate visualization code and data transformations directly in their analysis environment. This integration keeps you in the flow without switching tabs.
When doing Exploratory Data Analysis (EDA), I live in notebooks. The inline suggestions speed up the boring parts, like setting up matplotlib parameters or importing standard libraries.
AI-assisted debugging involves pasting error tracebacks into a prompt to get plain-English explanations and suggested fixes. It is also useful for code refactoring to make scripts more readable and efficient.
When a script fails, I paste the traceback. The AI usually spots the typo. But when refactoring, it sometimes introduces new bugs. Always use version control with Git before letting AI rewrite your files. Otherwise, you are just accumulating technical debt.
AI accelerates visualization and modeling by generating the syntax for complex charts and machine learning pipelines. The analyst's job shifts from writing syntax to interpreting the outputs.
Making charts is easy. Making good charts is hard. AI helps with the former, but you are responsible for the latter.
Creating Data visualization with Seaborn using AI prompts allows you to quickly generate complex statistical graphics. You simply describe the relationship you want to show, and the AI provides the styling code.
I hate memorizing plotting syntax. Now I just ask for a specific chart. "Create a violin plot showing sales distribution by region, using a muted color palette." It works perfectly 90% of the time.
AI helps structure Machine Learning workflows by drafting the steps for preprocessing, model selection, and evaluation. It provides a solid baseline that data scientists can then fine-tune.
For predictive modeling, AI gives me a starting point. It sets up the train-test split and the basic model. But I still have to tune the hyperparameters and validate the assumptions.
AI hallucinations in data occur when models confidently invent non-existent functions, fake statistics, or incorrect data relationships. Relying on these outputs without verification leads to fundamentally flawed analysis.
This is the biggest risk. AI will lie to you. It will make up a library function that does not exist just to satisfy your prompt. You must verify everything.
"Effective AI integration requires treating the model as a junior developer rather than an infallible expert." [1]
Generating Synthetic data generation with AI can be useful for testing, but models often fail to capture the true statistical noise and edge cases of real-world data. Training models on this data can lead to poor real-world performance.
I tried using AI to create a dummy dataset to test a dashboard. The data looked perfect. Too perfect. It lacked the messy reality of actual user behavior, which meant my dashboard broke the moment real data hit it.
Translating AI-generated analysis into actionable business insights requires deep domain knowledge and context that AI lacks. Analysts must connect the raw metrics to actual business decisions using clear data storytelling.
AI can crunch the numbers. It cannot tell the CEO what to do. You still need Business Intelligence (BI) tools to distribute the findings. You still need business intelligence to know which metrics actually matter. Data storytelling is a human skill. The AI just builds the props for your presentation.
Here are some common questions I get from colleagues about integrating these tools.
Q: Do I still need to learn Python if AI can write it for me?
A: Yes. If you do not know Python, you cannot verify the code the AI generates. You need to be able to read and debug the output, otherwise you will introduce critical errors into your data pipelines.
Q: How much does GitHub Copilot typically cost?
A: It is reported to be around $10 per month for individual developers, though enterprise pricing varies. From my experience, the time saved on boilerplate code easily justifies the cost.
Q: Can I upload sensitive company data to ChatGPT?
A: No. You should never upload sensitive or Personally Identifiable Information (PII) to public LLMs. Always use anonymized data or rely on secure, enterprise-approved instances of these tools.
What has been your biggest frustration when using AI for data analysis? Test out the window function query above and see how it changes your reporting.
Michael Park
5-year data analyst with hands-on experience from Excel to Python and SQL.
Learn SQL for data analytics from Michael Park. Transition from Excel to MySQL, master joins, CTEs, and integrate AI for faster query optimization.
Learn essential statistics for data analytics using Python. Michael Park covers EDA, hypothesis testing, regression, and A/B testing for business insights.
Expert review of Python data analysis using NumPy and Pandas. Learn about DataFrames, vectorized operations, and building a professional data portfolio.