Follow-up to Journal article about politics and mindkilling.  That post showed that people can be convinced that a view is correct by being told that their political party endorses it, even if their party actually opposes it.  A similar, but stranger, effect, is that people can be convinced that a view is correct because their favorite software implements it - even if they have stated that the view is wrong just minutes ago.

Subversion is a popular version-control system used by software developers.  The "repository" is where subversion stores the definitive copy of each file it is keeping track of.  A "diff" is when you ask subversion to show you all differences between your new code, and the last version of that code that it knows was in the repository.  A "tag" is when you associate a set of versions of different files together, in a way so that they can be easily compared against or reverted to, without creating something called a "branch".

I've had variants of this conversation about Subversion three times so far:

Me:  [problem X] happened because Subversion doesn't let you diff against the repository.

Other:  What?

Me: Subversion doesn't let you diff against the repository.  It only diffs against its local copy of the repository, which it updates when you do a checkout, commit, or update.  So it won't show you any changes that someone else has made to the code since then.  You can never see what changes other people have made since your last commit, because to get the changes, you have to do an update; and the changes are added to your code without being shown to you, and a diff won't show them.  So you just cross your fingers and hope their changes are compatible with yours.

Other:  That's ridiculous!  Subversion doesn't do that.  That would defeat the whole purpose of version control.  That would be idiotic.

Me:  Really, it does that.  I've tried it.  Repeatedly.  I've wasted days of work because of it.

Other:  Nonsense.  Of COURSE Subversion diffs against the repository.

Me:  Try this:  Create a new file foo in your checkout in directory X.  Then svn add foo and svn commit foo.  Check out the same repository in directory Y.  Modify foo in directory X.  svn commit foo.  Then do an svn diff foo from directory Y.

[A few minutes later, after trying it:]

Other:  Well, of COURSE Subversion doesn't diff against the repository.  It's meant for large, distributed projects.  You wouldn't want to have to do diffs over the web.

Me:  Why?

Other:  It would be too slow.

Me:  You do checkouts and commits and updates over the web.  Are they too slow?

Other:  You want to diff against your previous version.  It would be too confusing to see the changes other people have made, too.

Me:  Three minutes ago you said it would be idiotic not to diff against the repository.

Other:  Look, Subversion is an industry standard!

Me:  Subversion doesn't even let you tag releases.

Other:  Of COURSE Subversion lets you tag releases.

[Conversation eventually ends with Other explaining why you don't need to tag releases anyway.]

 

[P.S. - There is a very long syntax for svn diff that lets you specify full paths to the repository and your checkout directory.  It can't mix paths and URLs, so you have to specify your checkout directory as a complete URL.  No one that I know uses this syntax.]

New to LessWrong?

New Comment
66 comments, sorted by Click to highlight new comments since: Today at 12:56 AM

You can never see what changes other people have made since your last commit, because to get the changes, you have to do an update

svn diff -rBASE:HEAD to see the changes since your last update.
svn diff -rHEAD to diff your working tree against the repository.
Which does send the diffs over the web, and is inconveniently slow.

(I'm not a svn user. I just agreed with the initial reaction of "that's ridiculous", and followed it up with "I bet there really is a way to do that" and looked at the manpage.)

Wow, it is right there in svn help diff! I'm going to try this first thing Monday.

You might also enjoy svn status's --show-updates switch, which shows what files would be updated if you ran svn update.

Oh good, I thought I was going crazy :)

This is phenomenal! Thanks!

I remember doing something similar. As kids, a friend and I were trying to figure out something computer-related - how to use some MS-DOS file compression software, I think. My friend suggested using some specific command, which I thought obviously couldn't work. He typed it in anyway, and behold! It did work. I blinked, and then it felt like a floodgate had opened in my mind and an explanation of why it did work came pouring in to my consciousness.

I've wondered if this might be a case of constraint propagation. Picture my mind as a network of beliefs, together with some algorithm trying to make sure that they are at least roughly consistent. A bunch of (incorrect) beliefs held with moderate confidence combine to suggest that the belief "this command would work" is incorrect with high confidence. But then I find out that the command does work, and the external evidence changes the value of that node. This forces an update to the beliefs that were connected to it, and the change propagates through the network and adjusts beliefs until I finally have high confidence in a theory that's completely different from what I believed in a minute ago.

After reading the first paragraph, I was going to comment on how this phenomenon is often useful, but you're second paragraph implicitly addresses that.

[nitpick]

Exchanges are easier to follow if you bold the person speaking.

The happy ending is that nobody uses subversion any more, git won and has none of these problems.

It's up to you how seriously you read my comment.

Hee. We still use subversion every day.

Version control systems nowadays suffer from the problem that all new version control systems are created by groups of hackers working on projects so big and complex that the existing systems aren't powerful enough for them. So you keep getting more and more powerful and complex systems. git is so complex that no one who isn't a software developer can use it correctly.

I was tasked with moving a complex natural-language processing program for the NIH from, I think, SVCS, to git. After three days studying git man pages and trying to explain them to a group of linguists, I gave up and put everything under QVCS, and it was smooth sailing after that.

git is so complex that no one who isn't a software developer can use it correctly.

Try mercurial. It's got basically the same features, but is more comprehensible to human beings. There's an excellent tutorial called hg init.

(And if you should happen to need to use other people's stuff that's in git, you can just use the git extension for mercurial.)

blinks

I was taught to use git within a few days of starting to become a professional programmer. I'm a dyed-in-the-wool fanboy. I probably have no perspective at all here. But whenever I've used Mercurial everything seems backwards. People start recommending that I do wacky-sounding things like making two clones of a repository just to do what I'd normally do with git branch/git checkout... Is there any way to track multiple heads without just making multiple checkouts all over your disk?

Also, I strongly suspect that people who have trouble with git are just having trouble visualizing the DAG in their heads. If you run gitk --all whenever you get confused, you can actually see the thing, and then there's nothing to be confused about.

...Though I suppose the above might just translate to "I'm a visual thinker, and everyone should be more like me."

Well, to me, git's DAG model is 100% obvious, and gitk --all is helpful in exactly the way you state — but at the beginning it was still confusing which command used how would produce the effect on the DAG (and working tree and index...) I wanted. Similarly, the commands to configure and manipulate branches and remotes are not entirely obvious, especially if you've gotten off the beaten path and want to fix your config to be normal.

Is there any way to track multiple heads without just making multiple checkouts all over your disk?

Taboo "track" and "checkouts". I don't know what you mean by "track", and Mercurial doesn't have checkouts, as I understand the term. A clone isn't "checked out" of anything. (This was actually the hardest part for me to wrap my head around, coming from Subversion and the central-repository model, but I'm wondering whether you're talking about the same thing or not.)

If you simply mean you want more than one head or branch, you don't need multiple clones. You can switch your working copy between named branches or heads with "hg up", and list them with "hg heads".

It's true that people often suggest just using clones instead of named branches, but IMO this only makes sense for short-lived branches that are going to be folded in to something else. Mercurial works just fine with named branches and multiple heads. You can also use bookmarks to give local names to specific heads -- a kind of locally-named branch whose name isn't propagated to other repositories.

I strongly suspect that people who have trouble with git are just having trouble visualizing the DAG in their heads.

No, we just read the man pages and run screaming. It's not the model of a change-based system that's the problem, it's the UI design (or lack thereof). ;-)

From an outsider's perspective, git's UI is to mercurial's UI as Perl's is to Python. And since I've programmed almost exclusively in Python for about 13 years now, guess which one looks more attractive to me?

(Note: this doesn't have anything to do with Mercurial's innards being written in Python; other DVCS's have been written in Python and didn't have the same orthogonality of design.)

I'm told git massively improved its interface in the last few years. I started using it mainly in 2010 after switching from bzr, and had little trouble understanding the system (in fact I found hg's interface to be kind of weird). But there you go.

(Also, wrt

Taboo "track" and "checkouts". I don't know what you mean by "track", and Mercurial doesn't have checkouts, as I understand the term. A clone isn't "checked out" of anything.

In git-land "checkout" means a working directory; by "multiple checkouts all over your disk" I assume MBlume means multiple clones of the repository.)

git's UI is to mercurial's UI as Perl's is to Python

Harsh!

Git is new. It's already gotten easier to use (I'm already too much of a newb to have ever used the Git of Yore, which supposedly you needed a CS PhD to use effectively), and the folks at GitHub in particular seem to be working hard at sanding down its rough edges.

My experience with git was in 2006 or 2007.

This is quite ancient. git started as a solution to technical problem of high performance distributed version control. They got user interface into something reasonable only later.

It's still not that great. The internal DAG model is quite clean and clear. The actual commands do not always map clearly to this model. One common failure is often hiding or doing implicit magic to the staging area. Another is that many commands are a mish-mash of "manipulate the DAG" and "common user operations", where doing only one or the other would be much clearer. I really doubt that the user interface will get much better, because to do so they really need to throw out backward compatibility.

There are some problem with DAG, too, because you are supposed to store the information with little meta-information.

There are precedents of tools wrapping Git command-line interface, so that part possibly could be fixed. I frankly do not know why nobody does it.

Of course, Subversion is still "majority" VCS even for open-source projects. Maybe people need something other than Git to change that - or maybe SVK should become more widespread way to use SVN.

And for the sake of speed and stability Git doesn't store some data that every other open-source DVCS does store, and I have heard some Git users to say it is acceptable tradeoff (which is true for them) and some to say that nobody should care about this kind of data.

Of course, better tool is never a solution to tool ideology. Evaluating multiple other tools isn't either - after doing it with DVCSes, I now hate Git and implicitly assume that every tradeoff there is not fit for the medium-sized projects I'd care about.

I would guess that git is already more popular than svn for new projects (see github), and in at least some circles like among Ruby programmers still using svn for new stuff would raise some eyebrows. It's definitely way past just early adopters, but I have no idea how to get reasonable vcs usage statistics.

I don't know what you mean by these tradeoffs, git tends to store more data not less.

Well, Git stores code per se, for the rest of things it stores less data than either SVN or Bazaar (Mercurial, Monotone, Veracity).

It doesn't track explicit directory renaming. It doesn't keep branch history - if it did, reflog (which is local and only kept for 30 days) wouldn't be needed. It only allows unique tags - so if you want to mark every revision where buildbot succeded to make both update and rolling back easy - you are out of luck (there can be a way - it is not obvious at all).

It doesn't track explicit directory renaming.

It knows each directory by its content, so it knows when a directory was renamed, without needing to be explicitly told.

It doesn't keep branch history - if it did, reflog (which is local and only kept for 30 days) wouldn't be needed.

Reflog is an essentially local thing, it shows where a branch used to point in a particular repository instance. It has little to do with history of the project, and often includes temporary commits that shouldn't be distributed.

It only allows unique tags - so if you want to mark every revision where buildbot succeded to make both update and rolling back easy - you are out of luck

You need some way to specify what you'd want to update or roll back to - what kind of use case are you thinking about? You could support a successful-build branch, for example, so that its tip points to the last successful build, and you could create merge commits with previous successful builds as first parents, for the purpose of linking together all successful builds in that branch.

Tracking path by their content is not always good... It couples content changes with intent changes. If I need to make a copy of directory and then make the copies slowly diverge until they have nothing in common, I may want to mark which is for original intent and which is spin-off.

Branch history is not an inherently local thing.

When I have feature branches for recurring tasks, I will probably call them always the same. I will sometimes merge them with trunk and sometimes spawn them from the new trunk state. Later, I may wish to check whether some change was done in trunk or in the feature branch - it is quite likely to provide some information about commit intent. I can get it in every DVCS I know except Git easily - in Git I need to track DAG to get this information.

About succesful-build branch: for some projects I try to update to trunk head, and if it gives me too much trouble I look for closest previous revision which I can expect to work. In Monotone I simply mark some development commits as tested enough, there is a simple command to get all the branch.tested commits from the last month. This information says something about a commit, and to lose it I have to do something with the certificate that states it. In Git, rev-list behaviour depends on many things that happen later.

Linux kernel history is too big for any of the things I say to make sense for it. But in a medium project, I want to have access to strange parts of history to find out what happenned and how and what did we mean.

It knows each directory by its content, so it knows when a directory was renamed, without needing to be explicitly told.

Doesn't work so well if the content is 'nothing'.

Git doesn't notice these at all.

Which is my point exactly. It is one aspect of Vi's criticism of git not storing some important data that is clearly valid. It is a tradeoff that probably doesn't matter if you are Linus and you are storing code for a Linux kernel but in other cases it is a blatant flaw that needs to be worked around via compromises or kludges.

Git is the absolute worst version control system out there (except for all the others).

In what situations would you want to store an empty directory and pay attention to whether it is renamed?

Empty directories are sometimes necessary and it's a pain in the ass that git cannot store them at all. I had to put almost empty README.txt files in directories like log/ in many projects. It's more a minor annoyance than anything more.

I have a complex enough deployment helper living in Monotone repository for which it is simpler and more natural to keep a few empty directories in the repository than to check-and-create from each of ten shellscripts. It is checkout-and-use, no other setup makes sense, so "just creating them in Makefile" would be suboptimal.

A single line of:

mkdir -p one/directory two/directory three/more/directories

will deal with it. It's a nice idempotent action. I started using mkdir -p as workaround for git issues, but now it just seems to make far more sense than dicking around manually maintaining working directories.

I know about "mkdir -p" - my non-problem (I was not going to use Git anyway for this project) is that I multiple places where to put it and if I miss one I will not notice for some time.

Saying that recreating something just in case right after checking out the new version makes more sense than simply storing along with all the rest seems to be exactly an example of tool imposing some workflow ideas on people.

Saying that recreating something just in case right after checking out the new version makes more sense than simply storing along with all the rest seems to be exactly an example of tool imposing some workflow ideas on people.

You have it backwards. Using version control to store working areas for programs rather than programs simply mkdir -ping working areas they need seems to be exactly an example of tool imposing some workflow ideas on people.

I'm mostly serious here.

I have two choices, you have one. My tool imposes less workflow ideas here. It's totally information-theoretical.

There are so many common cases where you absolutely need mkdir -p like dynamic working directory layout that it's mentally simpler to just use it always. It works on 100% of problems, is idempotent, and resistant to human errors.

Why would I ever bother with VCS-based solution what only works in some simple cases like static working directory layouts, is based on non-idempotent operations, and fails often in case of human mistakes?

It just creates so much less mental overhead if you simply mkdir -p place where you want to create your files always, no exceptions.

I understand that people who use languages where mkdir -p is a non-trivial operation won't get it, but that's problem with their tools limiting their mindset.

I always try to do things so that they fail either often or never. Sometimes I don't care much which is the case - if they do fail often, it is easy to reproduce and I will fix it with less effort than an elusive bug.

"mkdir -p" is not resistant to human error of not calling it / not including the file that does it in one of ten scripts.

"mkdir -p" means that browsing the VCS history you do not see the real layout.

I bother with VCS-based solution when it works better (and my "better" includes estimated time to catch trivial mistakes and ease of looking up old history) than "mkdir -p". Dynamic working directory layout is something I have yet to see often, so I do not discard better-for-me solutions because of reasons unapplicable to current project.

Human mistakes depend on workflow. Do you often accidentally remove your checkout? And human mistakes in initial setups should cause the failure as often as possible when they happen.

"mkdir -p" before each cd will make the scripts considerably longer for no sane reason. And I have a few places in this very deployment helper system where "mkdir -p" before cd will make the failure mode less convenient.

And yes, all that is writen in shell script and I do use "mkdir -p" in the code where it made sense for my task at hand.

"mkdir -p" before each cd

Why are you using cd at all? You use mkdir -p before creating temporary files, and never randomly cd.

Anyway, this thread isn't really getting anywhere.

Somehow, most buildsystems for complex packages run make with changing working directories. Even in single script using subshells, it is trivial to save some effort with changing directories without complicating the code.

It is a deployment helper, not a single-package build system, so it has to do some work with numerous checkouts of dependencies. It is more natural to write some parts of code to run inside working directories of individual packages.

This is closer to trolling at Vi than it is to a deep insight.

I'm mostly serious here.

You're mostly wrong. Enough so that I reread your comment 4 times to be sure I was parsing correctly.

Is that something I need a justification for? My version control system throws away stuff that I am trying store. I'd also prefer it not to throw away files staring with 'b'.

I've learned to make my programs pessimistic and recreate the file system if necessary. It surprised me a few times before I learned the quirks.

Is that something I need a justification for?

No, just curious. I have not encountered and could not imagine a use case.

Directories, in my mind, are meta-information about files, so it makes no sense to me to store an empty directory.

I may be missing context here, but I frequently create empty directories to guide future filing/sorting behavior.

The examples mentioned so far could be described as meta information about future intended files.

Fun fact: it's possible to make a fully distributed version control system that maintains complete history of every branch at all times, down to the individual keystrokes if necessary, on large projects, in realtime, and have it be fast. It can even be peer-to-peer, and operate over an unreliable mesh network, if you like. When people start arguing that version control systems can't do something with reasonable performance, they're usually dead wrong.

That sounds amazingly cool. What version control system(s) are you thinking of, there?

It looks like an estimation, not a VCS link.

Think of it this way: Vim undo history is a tree which you can walk visiting every branch (not that it's a thing you want to do). Now, writing all this data out has some cost in IO bandwidth - comparable to bandwidth of the keyboard, i.e. kBytes/minute. Vim users don't notice the cost of maintaining the tree in RAM.

Synchronising it at first opportunity is also not hard if you do it in the background and so latency can be tolerated most of the time.

The merges.. you can try to do them mostly on marked commits, and then they can be done just like they are done now.

But implementing all that is a great undertaking, to be sure.

Wait - vim undo is a tree? So I can get back the revisions that I lost by undoing the last 100 operations and then carelessly inserting a character? HOW?

Wow! It is? I had no idea!

From the looks of it this script might be a helpful way to use the feature.

Well, now that you new that it is there, you could just type :help undo-tree. Basically, it is about g+/g-.

And the next chapter of undo.txt tells you about saving undo history.

The limiting case of merge frequency is to do a branch and merge on every keystroke, and create something like Etherpad. This is completely practical.

Ooh, I hadn't thought about it that way - sure, it'd take thousands of those to clog a modern high-speed connection.

An unreleased, not production-ready VCS that I made so I could finish grad school. :-)

The basic approach is similar to how git works: store all your revision state as a tree, and have code for merging trees. If you choose a good representation for this tree, and take some care with how you implement the merging operations, and do the merge in such a way that you're guaranteed to achieve convergence regardless of merge order, then you can get all those snazzy properties I mentioned earlier.

I've proven asymptotic time bounds and correctness for all the operations, and verified that it actually works the way it's supposed to in real code, but for now this is of mostly theoretical interest. For now.

I've fallen to it myself, but mostly to the reverse version of it : not saying "this feature is obviously required" and then when learning that my favorite tool (shell, text editor, language, web browser, version control, whatever) doesn't have it saying "oh but it's useless", but the reverse : someone tells me "you should try tool X, it has that great feature" and then I answer "I don't see the point of that feature, it's just that your tool X is bloatware, I'll still to my tool Y" and when a later version of my tool Y implements the said feature I then switched and said to everyone "you should use tool Y, it has that great feature".

It's very similar (and since I became aware I've a tendency to this bias, I try to force myself to not fall to it again, but not totally successfully...) but it's somehow more understandable : you very often see the point of a feature/tool/... once you started using it regularly in your own use cases.

I don't think I did the direct version of it since the time where I was a teenager. At least, I hope I didn't...

I've read that people only use small subsets of the available features in huge programs such as Microsoft Word, so it would seem like they should be able to get rid of "all those features nobody uses" and make a version without all that complicated bloat that confuses everyone. The problem, however, is that everyone uses a different subset of the feature set, so one person's useless bloat is someone else's essential, obvious feature that they don't know how people would get along without.

"Features seem useless until you get used to them, at which point they become essential" may be a fairly common pattern...

Hmm... but what do they think a few days later? That is: few people are cool-headed enough to bite a bullet and lose face in the middle of an argument. Did the Subversion fan continue to believe that it was dead-sure necessary that it do things the way it does, or did they think to themselves "Maybe it could be done a better way..."

Not to say that you aren't seeing a genuine failure of rationality; if they're willing to update on new evidence only a long while after encountering it during an argument, then (a) that means they'll react slower in situations where reacting faster would get them more utilons, and (b) they're probably less likely even later on to update, since the argument may still have an argh-that-jerk-said-that-nasty-thing-about-my-awesome-stuff vibe clinging to it.

Hmm... but what do they think a few days later?

I don't know.

That is: few people are cool-headed enough to bite a bullet and lose face in the middle of an argument.

Why would they lose more face by admitting the tool did the wrong thing, than by admitting they were wrong in saying that it's better to diff against the repository? It appears their loyalty to the tool is more important to them than their beliefs.

Why would they lose more face by admitting the tool did the wrong thing, than by admitting they were wrong in saying that it's better to diff against the repository?

Because by admitting the tool did the wrong thing after they said they're a fan of the tool, that's admitting that they screwed up with their tool choice. The harder they promoted it before the problem was revealed to them, the more face can be lost.

This is part of a larger phenomena where for some reason preferences in software are stored as group loyalties, to the point where one can actually go "yes, that piece of software is [better in way X], but [my favourite software] is better [in general but not in this specific context], and it'd be treason!"

Software often has network effects — for instance in the exchange of good techniques (or, in an open-source world, patches!), compatibility of file formats, comfort sharing a work environment. And then there are the sunk costs of learning to use a specific piece of software and adapting your work habits to use it.

These suggest that users may have a good reason to deter their fellow users from switching away.

Perhaps explaining why Apple users are famously evangelical.

That's a good point... and also rather scary.

Huh, that's pretty weird. Live an learn ! FWIW, both TortoiseSVN and Eclipse seem to be diffing against the repository (or at least they have an option to easily do so).

It's been a long time since I'd used the command-line svn tools, so I am not surprised (though saddened) that I didn't know how they worked in detail.