My Honest Experience Using AI for Data Analytics and Coding

A data analyst shares real experiences using AI for data analytics, SQL optimization, and Python automation. Learn what works and where AI fails.

By Michael Park·9 min read

I once spent 14 hours staring at a messy CSV file, writing nested loops to extract poorly formatted dates. I thought I was being a rigorous analyst. I was just wasting time. Today, that same task takes me about 45 seconds using basic prompt techniques. The shift from manual data manipulation to AI-assisted workflows completely changed how I approach data analytics. But it is not a magic wand. If you do not understand the underlying logic, these tools will confidently generate garbage. I have spent the last year integrating these tools into my daily routine. Here is exactly how I use AI in my workflow, what actually works, and where it falls flat.

The Reality of Prompt Engineering for Data Science

Prompt Engineering for Data Science involves structuring your requests with explicit schemas, constraints, and business logic to get usable code. It typically cuts script drafting time in half, though the results always require careful human review.

I used to think prompts were just asking questions. They are not. They are specifications. When you treat an AI like a search engine, you get generic, broken code. When you treat it like a junior developer who needs exact instructions, you get results. I learned this when transitioning from basic Excel sheets to more complex environments.

Moving from Excel to Python Automation

Moving from Excel to Python automation requires shifting from visual cell manipulation to programmatic logic. AI bridges this gap by writing the initial boilerplate scripts, allowing analysts to focus on the logic rather than the syntax.

I had a weekly task of merging 6 different regional sales reports. Doing this manually took hours. I asked an AI to write a script. The first attempt failed because I did not specify the date formats. The second attempt, where I provided a sample row from each file, worked perfectly. It is all about the context you provide.

  • Always provide sample data structures (headers and one row of data).
  • Specify the exact libraries you want to use.
  • State what should happen if a file is missing or corrupted.

Handling Automated Data Cleaning

Automated data cleaning with AI relies on scripts that detect anomalies, handle missing values, and standardize formats. It works best when you explicitly define the rules for edge cases upfront.

You can ask an AI to clean your data. It will usually write a script using Pandas and NumPy libraries. But here is the catch. It might drop rows with null values when you actually needed to impute them. I learned this the hard way when I lost 18% of a client's dataset during a routine clean. Always enforce strict LLM-generated code verification before running anything on production data.

Writing Queries: Natural Language to SQL

Converting Natural Language to SQL requires feeding the AI your exact table structures and relationship definitions. Without proper context, the generated queries often fail or return inaccurate metrics.

Writing SQL from scratch is tedious. Using Large Language Models (LLMs) to draft them is faster. But you have to provide the schema. I keep a text file with my most-used table schemas just to copy and paste into prompts.

SQL Query Optimization in Practice

Optimizing SQL queries with AI involves pasting your slow query and asking for execution plan improvements. The AI often suggests better indexing or rewriting subqueries as joins, which can significantly reduce runtimes.

I had a query taking 4 minutes to run. I fed it to an AI and asked for SQL query optimization. It rewrote my nested subqueries into temporary tables. The runtime dropped to 12 seconds. However, it sometimes suggests indexes that I do not have permissions to create, so you have to know your database constraints.

The Truth About SQL Window Functions

SQL window functions perform calculations across a set of table rows related to the current row without collapsing the result set. They are essential for running totals, moving averages, and ranking.

SQL window functions sound intimidating. They are not. Think of them as a way to peek at neighboring rows without collapsing your data. Run this query on the sample table below and watch what happens.

-- Sample data setup
CREATE TABLE sales_data (
 date DATE,
 sales_rep VARCHAR(50),
 revenue DECIMAL(10,2)
);

-- Using a window function for a running total
SELECT 
 sales_rep, 
 revenue, 
 SUM(revenue) OVER (PARTITION BY sales_rep ORDER BY date) as running_total
FROM sales_data;

Integrating GitHub Copilot for VS Code

GitHub Copilot for VS Code acts as a real-time pair programmer that suggests code snippets based on your active file and comments. It is highly effective for boilerplate code but requires supervision for complex business logic.

I installed this tool a year ago. It reads my comments and writes the function. It is great for API integration tasks where the boilerplate is always the same. But it is not perfect.

Tool FeaturePrimary Use CaseWhere It Fails
Inline AutocompleteRepetitive formatting and loopsComplex statistical logic
Chat InterfaceExplaining unfamiliar syntaxUnderstanding full project context
Test GenerationCreating basic unit testsTesting edge cases in dirty data

Jupyter Notebooks Integration

Using AI within Jupyter Notebooks allows analysts to generate visualization code and data transformations directly in their analysis environment. This integration keeps you in the flow without switching tabs.

When doing Exploratory Data Analysis (EDA), I live in notebooks. The inline suggestions speed up the boring parts, like setting up matplotlib parameters or importing standard libraries.

AI-Assisted Debugging and Code Refactoring

AI-assisted debugging involves pasting error tracebacks into a prompt to get plain-English explanations and suggested fixes. It is also useful for code refactoring to make scripts more readable and efficient.

When a script fails, I paste the traceback. The AI usually spots the typo. But when refactoring, it sometimes introduces new bugs. Always use version control with Git before letting AI rewrite your files. Otherwise, you are just accumulating technical debt.

Visualization and Predictive Modeling

AI accelerates visualization and modeling by generating the syntax for complex charts and machine learning pipelines. The analyst's job shifts from writing syntax to interpreting the outputs.

Making charts is easy. Making good charts is hard. AI helps with the former, but you are responsible for the latter.

Data Visualization with Seaborn

Creating Data visualization with Seaborn using AI prompts allows you to quickly generate complex statistical graphics. You simply describe the relationship you want to show, and the AI provides the styling code.

I hate memorizing plotting syntax. Now I just ask for a specific chart. "Create a violin plot showing sales distribution by region, using a muted color palette." It works perfectly 90% of the time.

Building Machine Learning Workflows

AI helps structure Machine Learning workflows by drafting the steps for preprocessing, model selection, and evaluation. It provides a solid baseline that data scientists can then fine-tune.

For predictive modeling, AI gives me a starting point. It sets up the train-test split and the basic model. But I still have to tune the hyperparameters and validate the assumptions.

The Dangers: AI Hallucinations in Data

AI hallucinations in data occur when models confidently invent non-existent functions, fake statistics, or incorrect data relationships. Relying on these outputs without verification leads to fundamentally flawed analysis.

This is the biggest risk. AI will lie to you. It will make up a library function that does not exist just to satisfy your prompt. You must verify everything.

"Effective AI integration requires treating the model as a junior developer rather than an infallible expert." [1]

Why Synthetic Data Generation is Risky

Generating Synthetic data generation with AI can be useful for testing, but models often fail to capture the true statistical noise and edge cases of real-world data. Training models on this data can lead to poor real-world performance.

I tried using AI to create a dummy dataset to test a dashboard. The data looked perfect. Too perfect. It lacked the messy reality of actual user behavior, which meant my dashboard broke the moment real data hit it.

Bridging the Gap to Actionable Business Insights

Translating AI-generated analysis into actionable business insights requires deep domain knowledge and context that AI lacks. Analysts must connect the raw metrics to actual business decisions using clear data storytelling.

AI can crunch the numbers. It cannot tell the CEO what to do. You still need Business Intelligence (BI) tools to distribute the findings. You still need business intelligence to know which metrics actually matter. Data storytelling is a human skill. The AI just builds the props for your presentation.

Frequently Asked Questions

Here are some common questions I get from colleagues about integrating these tools.

Q: Do I still need to learn Python if AI can write it for me?

A: Yes. If you do not know Python, you cannot verify the code the AI generates. You need to be able to read and debug the output, otherwise you will introduce critical errors into your data pipelines.

Q: How much does GitHub Copilot typically cost?

A: It is reported to be around $10 per month for individual developers, though enterprise pricing varies. From my experience, the time saved on boilerplate code easily justifies the cost.

Q: Can I upload sensitive company data to ChatGPT?

A: No. You should never upload sensitive or Personally Identifiable Information (PII) to public LLMs. Always use anonymized data or rely on secure, enterprise-approved instances of these tools.

What has been your biggest frustration when using AI for data analysis? Test out the window function query above and see how it changes your reporting.

Sources

  1. Master AI-Assisted Coding with ChatGPT - Udemy

data analyticsprompt engineeringpython automationsql optimizationgithub copilotdata science
📊

Michael Park

5-year data analyst with hands-on experience from Excel to Python and SQL.

Related Articles