Have students install Positron before class
Problems with LLMs
- There are a variety of meaningful ethical concerns about using LLMs
- The use a lot of energy to train and run and therefore put a lot of CO2 in the atmosphere
- They use millions of peoples work without credit or payment, arguably in violation of copyright and licenses
- Writing code with them when lacking the background to evaluate that code can also be dangerous
- They are good at giving you an answer, even if it’s the wrong one & code that is wrong but runs is the scariest kind of code in science
- Bad code can accidentally destroy your work
- Malicious hackers are finding ways to get models to generate code that lead to compromised computers
- Cost
- Start out with generous free plans
- Most companies already increasing prices (and still losing money)
- Expect costs to continue to rise
- Privacy/confidentiality
- Reduced learning making it difficult to go from beginner to expert
- But they can be useful tools and we’ll spend today exploring how
What are you doing now?
- First - How are you using LLMs for coding now?
Chat
- UF provides access to a number of different model’s chat systems for free
- UF Navigator
- Navigate to: https://chat.ai.it.ufl.edu/
-
Sign in
- So, many of you have already used chat interfaces
- “How do you drop NA’s from a table in R”
- Copy the resulting code into RStudio
- Give it a quick read for anything bad
- Run it
Improving output from LLMs
- Provide details (value of knowing things)
- Ask model to write code not return results
- Provide model with context
- Better prompts
- Attaching files
- Including relevant urls
- If the resulting code doesn’t run, tell model what happened and ask for fixes
- This set of practices is so common they are now integrated into many IDEs
Assistants
- “AI Coding Assistants” provide a combination of chat, autocomplete, and local code execution
Activating the assistant in Positron
- Click on ⚙️
- Choose
Settings - Type “Assistant”
- Scroll down and select
Enable - Restart Positron
Welcome->New Folder->R Project
Integrated Chat
- Click on the 🤖 (bottom of sidebar)
Add a Chat Provider->GitHub Copilot->Sign in- On
Authorize your devicepage paste code ->Continue Authorize GitHub Copilot Plugin->Close- Return to Positron
- Type “How do I remove rows with na using dplyr”
- Show at top of code block
- From the top of code blocks you can
- Run code in the console
- Move code to the editor either in the current file or a new one
- Copy it
- Can then keep or undo changes
- More useful if we give it context
library(dplyr)
library(readr)
surveys <- read_csv("https://ndownloader.figshare.com/files/2292172")
- Save file
- Show that file is in the context in chat
- “Add a dplyr pipeline that removes rows with NA in the weight column from the surveys table”
- We could copy this code over, or we can change the settings for the assistant to let it edit our files
- Switch from “Ask” to “Edit”
- Rerun query
- Read and accept the edits
Inline assistant
- In the text editor
Ctrl-i - “Only keep the year, month, day, and weight columns”
- Accept changes
Autocomplete
- Hit enter after the last line
- Look at the autocomplete
- Interpret output
- We typically don’t want an LLM guessing at what analysis we want
- Start it with a comment
# calculate the average weight and number of individuals in each year in each month
Debugging
- Let autocomplete do as much of this as possible
# Write a dplyr pipeline that returns only species starting with the letter "D" and
# where weights are greater than 46 then calculate the average weight of each species
- Run the pipeline and debug. If the model doesn’t error introduce an error
- Highlight code and select
Review - There will soon be Fix and Explain options that pop up in the error message itself to further simplify this process
Vibe coding & Agent-based approaches
- Who’s heard of vibe coding before?
- Instead of writing code yourself with assitance from an LLM just have the LLM write a full first draft
- Provide feedback to the LLM to make improvements to the code
-
Anyone have experience with this yet?
- Change to
Agentmodel - This will allow the model to actually run code for you
- By default it will write the code, show it to you, and ask you if you want to run it
-
Be careful, I relatively recently asked a model to perform a git operation I didn’t remember how to do and the resulting code would have deleted by work from the last hour
- “Using R create species distribution models for Great Egrets, White Ibis, and Roseatte Spoonbills. Use these models to make predictions for the distribution of those species in 26 years. Create a website displaying the current model predictions and the future predictions.”
- Explore the code, emphasize need to read it
- “Download the relevant data from gbif, update the code to use that data, and check that it works”