UPDATE: Collaborative Git Workflow

Data science
R
GitHub
Author

Kyle Grealis

Published

April 17, 2025

Using a collaborative git workflow, part 2

Welcome to my blog post about effective team collaboration for R package development! As someone who’s spent time developing R packages for personal use & in academic settings, I’ve learned that a clear Git workflow is essential for smooth collaboration. Having this understanding is just as advantageous for team development as it is for working by yourself.

This guide outlines how our team manages development for our R packages, with multiple contributors working in parallel while maintaining a stable codebase. We use a branching strategy that keeps our CRAN releases pristine while allowing active development to continue.

Our team follows this workflow with a main branch containing only stable CRAN-approved versions, a dev branch serving as the base for future improvements, and feature branches for individual enhancements. This approach has proven effective for small academic teams (3-4 people) collaborating on R packages for machine learning and statistics.

I will be referencing my preferred method of using bash commands used in the terminal, though you can incorporate this into your IDE-specific workflow too. I have also included some RStudio guidelines along the way.

This is an update to last year’s post: Quick git for beginners… yeah, that’s me!. 11 months later and I still find plenty of valuable tricks along the way.

Terms to understand:

local: The project version that is on your working machine.
origin: The project version on GitHub, for example.

Our Branch Structure

  • main: Contains only stable, CRAN-approved versions
  • dev: Our primary development branch where we integrate completed features
  • Feature branches: Individual branches created from dev for specific features/improvements

Why We Use Merge vs. Rebase

We recommend using merge rather than rebase for our academic team because:

  1. Safer for collaboration: Merge preserves the exact history of what happened and when
  2. More straightforward conflict handling: With merge, you resolve conflicts just once at the point of integration, while rebase might require resolving the same conflict multiple times as it replays each of your commits
  3. Easier to understand: The merge commit clearly shows when code from dev was integrated
  4. Less risky: Rebase rewrites commit history, which can cause issues when multiple people work on the same branch

Git Commands Explained

Command Plain Language Explanation
git fetch origin “Check what changes have been made on GitHub since I last synced. This updates my local copy of what exists on GitHub, but doesn’t actually change my working files yet.”
git merge origin/dev “Take all the new commits that exist in the dev branch and add them to my current branch, creating a new ‘merge commit’ that joins the two branches’ histories.”
git pull Shortcut that combines fetch + merge for your current branch. Equivalent to git pull origin <current-branch>
git pull origin dev “Fetch from GitHub and merge the remote dev branch into my current branch”

Understanding Fetch vs. Pull

  • git fetch only downloads new data from GitHub but doesn’t integrate it into your working files. It’s like checking what’s new without actually applying those changes.

    Benefits of fetch before merge:

    • You can examine what changed before deciding to merge
    • You can prepare for potential conflicts
    • You can run git diff origin/dev to see exactly what will change
    • It’s safer when you’re in the middle of complex work and want to know what’s coming
  • git pull immediately fetches AND merges in one step. It’s convenient but gives you less control.

For an academic team, the fetch then merge approach gives you time to prepare for integration and is less likely to introduce unexpected conflicts in the middle of your work.

Resolving Conflicts

Conflicts occur when the same part of a file has been modified differently in both branches being merged. Here are recommended ways to resolve them:

Using GitHub (Web Interface)

  1. When a pull request has conflicts, GitHub shows a “Resolve conflicts” button
  2. This opens a text editor where you can edit the conflicted files
  3. Remove the conflict markers (<<<<<<<, =======, >>>>>>>) and keep the code you want
  4. Click “Mark as resolved” for each file, then “Commit merge”

Using RStudio IDE

  1. When conflicts occur, files with conflicts will be marked in the Git pane
  2. Click on the file to open the diff view
  3. For each conflict section, choose “Use me” (your changes) or “Use them” (incoming changes), or edit the code manually
  4. Save the file, then stage it (check the box in the Git pane)
  5. Complete the merge by committing

Using Command Line

# After git merge shows conflicts, you'll see a message identifying which files have conflicts
# You MUST open these files in your preferred text editor to see and resolve the conflicts
# The conflicts are marked with <<<<<<<, =======, and >>>>>>> markers

# For example, open a conflicted file:
open conflicted_file.R

# After editing and resolving all conflicts in all files, mark them as resolved:
git add <resolved-file>

# Once all conflicts are resolved and added:
git commit -m "Merge dev into feature-branch, resolving conflicts"

Common Scenarios

Scenario: dev has changed significantly while you were working

git switch your-feature-branch
git fetch origin
git merge origin/dev
# Resolve conflicts if any
# Run tests to make sure everything still works

Scenario: Multiple team members working on the same feature

# Before starting work each day
git switch feature-branch
git pull origin feature-branch

# After finishing for the day
git push origin feature-branch

Scenario: Need to temporarily switch to another task

# Save your current work without committing
git stash

# Switch to another branch
git switch other-branch

# Do your work, commit, etc.
# ...

# Go back to your original branch
git switch feature-branch

# Restore your work
git stash pop

Scenario: Two people modified the same file in the same branch

This is a common situation, especially when team communication isn’t perfect:

  1. Joe and Maria are both working on the same feature branch
  2. Joe modifies the plot_results.R file, commits, and pushes to GitHub
  3. Meanwhile, Maria has also modified plot_results.R locally (different changes)
  4. When Maria tries to push her changes, she gets an error:
! [rejected] feature-branch -> feature-branch (fetch first)
error: failed to push some refs

What happened? GitHub rejected Maria’s push because her local branch has diverged from the remote branch. She needs to integrate Joe’s changes before she can push her own.

How to resolve this:

# First, get Joe's changes
git pull origin feature-branch

# Git will attempt to auto-merge the changes
# If there are conflicts, Git will tell you which files are affected

# Open each conflicted file in your editor
# You'll see conflict markers like:
# <<<<<<< HEAD (your changes)
# function(x) { return(x^2) }
# =======
# function(x) { return(x^2 + 1) }
# >>>>>>> (Joe's changes)

# Edit the file to resolve conflicts
# Remove the conflict markers and keep the correct code

# After editing all conflicts:
git add plot_results.R
git commit -m "Merge Joe's changes with mine"
git push origin feature-branch
Helpful tip:

This situation highlights why it’s crucial to communicate with teammates about who’s working on what. Consider using GitHub issues to track who’s assigned to each task, or at minimum, send a quick message to teammates when you start working on a file that others might also be editing.

Best Practices for R Package Development

  1. Always update roxygen documentation when modifying functions
  2. Run tests after merging in changes from dev
  3. Use meaningful commit messages that explain why a change was made
  4. Consider using GitHub Issues to track features and bugs
  5. Review each other’s pull requests before merging to dev
  6. When updating a function, also update the corresponding unit tests
  7. Document all changes in NEWS.md

Common Pitfalls to Avoid

  1. Merging without testing: Always test after merging to ensure your code still works
  2. Long-lived or stale branches: Try to keep feature branches short-lived (1-2 weeks max)
  3. Infrequent merges from dev: Update regularly to avoid massive conflicts later
  4. Pushing broken code: Always run devtools::check() before pushing to shared branches
  5. Ambiguous commit messages: Use clear messages that explain the “why” not just the “what”

Share your insights at kylegrealis@icloud.com. Together, we can make our R projects more robust, reproducible, and ready for collaboration!

Happy coding!

~Kyle


Additional Resources