Git commands I run before reading any code
Informative sentiment showcasing modern tools like Jujutsu for code analysis, though highlighting issues with commit message quality.
--- Comments ---
- pzmarzly: Jujutsu equivalents, if anyone is curious:<p>What Changes the Most<p><pre><code> jj log --no-graph -r 'ancestors(trunk()) & committer_date(after:"1 year ago")' \
-T 'self.diff().files().map(|f| f.path() ++ "\n").join("")' \
| sort | uniq -c | sort -nr | head -20
</code></pre>
Who Built This<p><pre><code> jj log --no-graph -r 'ancestors(trunk()) & ~merges()' \
-T 'self.author().name() ++ "\n"' \
| sort | uniq -c | sort -nr
</code></pre>
Where Do Bugs Cluster<p><pre><code> jj log --no-graph -r 'ancestors(trunk()) & description(regex:"(?i)fix|bug|broken")' \
-T 'self.diff().files().map(|f| f.path() ++ "\n").join("")' \
| sort | uniq -c | sort -nr | head -20
</code></pre>
Is This Project Accelerating or Dying<p><pre><code> jj log --no-graph -r 'ancestors(trunk())' \
-T 'self.committer().timestamp().format("%Y-%m") ++ "\n"' \
| sort | uniq -c
</code></pre>
How Often Is the Team Firefighting<p><pre><code> jj log --no-graph \
-r 'ancestors(trunk()) & committer_date(after:"1 year ago") & description(regex:"(?i)revert|hotfix|emergency|rollback")'
</code></pre>
Much more verbose, closer to programming than shell scripting. But less flags to remember.
- bsuvc: I love how the author thinks developers write commit messages.<p>All joking aside, it really is a chronic problem in the corporate world. Most codebases I encounter just have "changed stuff" or "hope this works now".<p>It's a small minority of developers (myself included) who consider the git commit log to be important enough to spend time writing something meaningful.<p>AI generated commit messages helps this a lot, if developers would actually use it (I hope they will).
- joshstrange: I ran these commands on a number of codebases I work on and I have to say they paint a very different picture than the reality I know to be true.<p>> git shortlog -sn --no-merges<p>Is the most egregious. In one codebase there is a developer's name at the top of the list who outpaced the number 2 by almost 3x the number of commits. That developer no longer works at the company? Crisis? Nope, the opposite. The developer was a net-negative to the team in more ways than one, didn't understand the codebase very well at all, and just happened to commit every time they turned around for some reason.
- mattrighetti: I have a summary alias that kind of does similar things<p><pre><code> # summary: print a helpful summary of some typical metrics
summary = "!f() { \
printf \"Summary of this branch...\n\"; \
printf \"%s\n\" $(git rev-parse --abbrev-ref HEAD); \
printf \"%s first commit timestamp\n\" $(git log --date-order --format=%cI | tail -1); \
printf \"%s latest commit timestamp\n\" $(git log -1 --date-order --format=%cI); \
printf \"%d commit count\n\" $(git rev-list --count HEAD); \
printf \"%d date count\n\" $(git log --format=oneline --format=\"%ad\" --date=format:\"%Y-%m-%d\" | awk '{a[$0]=1}END{for(i in a){n++;} print n}'); \
printf \"%d tag count\n\" $(git tag | wc -l); \
printf \"%d author count\n\" $(git log --format=oneline --format=\"%aE\" | awk '{a[$0]=1}END{for(i in a){n++;} print n}'); \
printf \"%d committer count\n\" $(git log --format=oneline --format=\"%cE\" | awk '{a[$0]=1}END{for(i in a){n++;} print n}'); \
printf \"%d local branch count\n\" $(git branch | grep -v \" -> \" | wc -l); \
printf \"%d remote branch count\n\" $(git branch -r | grep -v \" -> \" | wc -l); \
printf \"\nSummary of this directory...\n\"; \
printf \"%s\n\" $(pwd); \
printf \"%d file count via git ls-files\n\" $(git ls-files | wc -l); \
printf \"%d file count via find command\n\" $(find . | wc -l); \
printf \"%d disk usage\n\" $(du -s | awk '{print $1}'); \
printf \"\nMost-active authors, with commit count and %%...\n\"; git log-of-count-and-email | head -7; \
printf \"\nMost-active dates, with commit count and %%...\n\"; git log-of-count-and-day | head -7; \
printf \"\nMost-active files, with churn count\n\"; git churn | head -7; \
}; f"
</code></pre>
EDIT: props to <a href="https://github.com/GitAlias/gitalias" rel="nofollow">https://github.com/GitAlias/gitalias</a>
- ramon156: > The 20 most-changed files in the last year. The file at the top is almost always the one people warn me about. “Oh yeah, that file. Everyone’s afraid to touch it.”<p>The most changed file is the one people are afraid of touching?
- JetSetIlly: Some nice ideas but the regexes should include word boundaries. For example:<p>git log -i -E --grep="\b(fix|fixed|fixes|bug|broken)\b" --name-only --format='' | sort | uniq -c | sort -nr | head -20<p>I have a project with a large package named "debugger". The presence of "bug" within "debugger" causes the original command to go crazy.
- blenderob: > Is This Project Accelerating or Dying
>
> git log --format='%ad' --date=format:'%Y-%m' | sort | uniq -c<p>If the commit frequency goes down, does it really mean that the project is dying? Maybe it is just becoming stable?
- icedchai: I wouldn't trust "commit counts." The quality and content of a "commit" can vary widely between developers. I have one guy on my team who commits only working code that has been thoroughly tested locally, another guy who commits one line changes that often don't work, only to be followed by fixes, and more fixes. His "commits" have about 1/100th of the value of the first guy.
- whstl: <i>> One caveat: squash-merge workflows compress authorship. If the team squashes every PR into a single commit, this output reflects who merged, not who wrote. Worth asking about the merge strategy before drawing conclusions.</i><p>In my experience, when the team doesn't squash, this will reflect the messiest members of the team.<p>The top committer on the repository I maintain has 8x more commits than the second one. They were fired before I joined and nobody even remembers what they did. Git itself says: not much, just changing the same few files over and over.<p>Of course if nobody is making a mess in their own commits, this is not an issue. But if they are, squash can be quite more truthful.
- croemer: Rather than using an LLM to write fluffy paragraphs explaining what each command does and what it tells them, the author should have shown their output (truncated if necessary)
- keybored: Let me comment before reading the comments (they will likely spoil it for me).<p>That this is useful for them enough to do as the first thing means that commit histories are often enough good enough to have any signal at all. Which is remarkable to me if this is in a close source/private/corporate sector; what I tend to hear even from the biased perspective of people who post about this on their free time (which means they are interested enough to at least have an opinion) is that the version control software is very secondary to whatever issue tracker is in use. And that people who know what they are doing are kept from using useful tools with square-peg rules like always using a certain merge strategy. (Note: Letting people who know what they are doing <i>do it</i> is different from demanding that everyone should know what they are doing with regards to Git.)<p>My personal belief is that a well-curated commit history matters. My experience is that it isn’t appreciated, even as people bemoan that they can’t understand from the commit history itself (maybe not even the PR, or even the issue) why some code exists.
- Cthulhu_: For "what changes the most", in my project it's package.json / lock (because of automatic dependency updates) and translation / localization files; I'd argue that's pretty normal and healthy.<p>For the "bus factor", there's one guy and then there's me, but I stopped being a primary contributor to this project nearly two years ago, lol.
- pscanf: I just finished¹ building an experimental tool that tries to figure out if a repo is slopware or not just by looking at it's git history (plus some GitHub activity data).<p>The takeaway from my experiment is that you can really tell a lot by how / when / what people commit, but conclusions are very hard to generalize.<p>For example, I've also stumbled upon the "merge vs squash" issue, where squashes compress and mostly hide big chunks of history, so drawing conclusions from a squashed commit is basically just wild guessing.<p>(The author of course has also flagged this. But I just wanted to add my voice: yeah, careful to generalize.)<p>¹ Nothing is ever finished.
- yieldcrv: blog posts are just comments that would have been torn apart if only posted on a forum, now masquerading as important universal edicts