Data science through natural language with ChatGPT's Code Interpreter - PubMed (original) (raw)

Data science through natural language with ChatGPT's Code Interpreter

Sangzin Ahn. Transl Clin Pharmacol. 2024 Jun.

Abstract

Large language models (LLMs) have emerged as a powerful tool for biomedical researchers, demonstrating remarkable capabilities in understanding and generating human-like text. ChatGPT with its Code Interpreter functionality, an LLM connected with the ability to write and execute code, streamlines data analysis workflows by enabling natural language interactions. Using materials from a previously published tutorial, similar analyses can be performed through conversational interactions with the chatbot, covering data loading and exploration, model development and comparison, permutation feature importance, partial dependence plots, and additional analyses and recommendations. The findings highlight the significant potential of LLMs in assisting researchers with data analysis tasks, allowing them to focus on higher-level aspects of their work. However, there are limitations and potential concerns associated with the use of LLMs, such as the importance of critical thinking, privacy, security, and equitable access to these tools. As LLMs continue to improve and integrate with available tools, data science may experience a transformation similar to the shift from manual to automatic transmission in driving. The advancements in LLMs call for considering the future directions of data science and its education, ensuring that the benefits of these powerful tools are utilized with proper human supervision and responsibility.

Keywords: Artificial Intelligence; Data Analysis; Data Science; Machine Learning; Natural Language Processing.

Copyright © 2024 Translational and Clinical Pharmacology.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest: - Authors: Nothing to declare - Reviewers: Nothing to declare - Editors: Nothing to declare

Figures

Figure 1

Figure 1. Demonstration of ChatGPT Code Interpreter functionality. (A) An example prompt and output for rolling a dice 1,000 times and generating a bar plot. (B) The Data Analyst GPT, a specialized chatbot for data analysis tasks, found in the GPTs by ChatGPT section.

Figure 2

Figure 2. Utilization of Code Interpreter to explore a dataset and generate a research proposal. (A) Prompting ChatGPT to explore an uploaded dataset. ChatGPT uses Code Interpreter to load a dataset, display the first few rows, and understand its structure and contents. (B) A research proposal generated by ChatGPT, including the title, background, objectives, methodology, expected outcomes, and significance, based on the dataset and provided description.

Figure 3

Figure 3. Model development, comparison, and visualization using Code Interpreter. (A) ChatGPT was prompted to train and compare the performance of Linear Regression, Neural Network, and Random Forest models. Code Interpreter calculated root mean square error and R2 scores as evaluation metrics. (B) Visualization of the three predictive models using scatterplots, displaying the predicted to actual warfarin dose values. (C) Adjustments to the scatterplot settings and a request for a figure legend.

Figure 4

Figure 4. Permutation feature importance analysis using Code Interpreter. (A) User request for visualizing the relationship between features and the dependent variable, and identifying important features for each model. ChatGPT performs permutation feature importance and provides code for aggregating and visualizing the results. (B) User-specified adjustments made to the code by requesting in natural language.

Figure 5

Figure 5. Partial dependence plots generated using Code Interpreter. (A) User request for partial dependence plots for each model. ChatGPT generates plots for the top 3 features across each model, illustrating how the predicted warfarin dose varies with changes in a single feature while holding all other features constant at their average values. (B) User-specified adjustments made to the code by requesting in natural language.

Figure 6

Figure 6. Additional analysis recommendations and creative analysis examples using Code Interpreter. (A) User request for recommended analyses and ChatGPT’s suggestions. (B) Visualizations from the user prompt for the most creative analysis the chatbot can perform.

References

    1. Barone L, Williams J, Micklos D. Unmet needs for analyzing biological big data: a survey of 704 NSF principal investigators. PLoS Comput Biol. 2017;13:e1005755. -PMC -PubMed
    1. Kocoń J, Cichecki I, Kaszyca O, Kochanek M, Szydło D, Baran J, et al. ChatGPT: jack of all trades, master of none. Inf Fusion. 2023;99:101861
    1. Nordling L. How ChatGPT is transforming the postdoc experience. Nature. 2023;622:655–657. -PubMed
    1. Zheng T, Zhang G, Shen T, Liu X, Lin BY, Fu J, et al. OpenCodeInterpreter: integrating code generation with execution and refinement. arXiv. 2024
    1. Ahn S. Building and analyzing machine learning-based warfarin dose prediction models using scikit-learn. Transl Clin Pharmacol. 2022;30:172–181. -PMC -PubMed

LinkOut - more resources