Panasonic Youth rob sanheim writes about software, business, ruby, music, stuff and things



Posted
30 July 2008 @ 8pm

Tagged
Git, Linux, Open Source

Discuss

Git Clone vs cp -R –> WTF?

I knew git was fast, and I even knew it was faster than a lot of plain linux local file operations. Still, this still blew me away:

CODE:
  1. rsanheim@ares:~/src/personal/oss $ du -hd 0 insoshi/
  2.  26M    insoshi/
  3.  
  4. rsanheim@ares:~/src/personal/oss $ time git clone insoshi/ /tmp/insoshi
  5. Initialize /tmp/insoshi/.git
  6. Initialized empty Git repository in /private/tmp/insoshi/.git/
  7. Checking out files: 100% (2193/2193), done.
  8.  
  9. real    0m3.826s
  10. user    0m0.251s
  11. sys 0m0.658s
  12.  
  13. rsanheim@ares:~/src/personal/oss $ time cp -R insoshi/ /tmp/insoshi_cp
  14.  
  15. real    0m9.065s
  16. user    0m0.114s
  17. sys 0m1.442s

Ok, so a 26 meg repo takes almost three times as long to copy via a recursive cp than a local git clone. Thats a fairly small repo, lets try something bigger:

CODE:
  1. rsanheim@ares:~/src/relevance $ du -hd 0 rails
  2.  75M    rails
  3.  
  4. rsanheim@ares:~/src/relevance $ time git clone rails /tmp/rails2
  5. Initialize /tmp/rails2/.git
  6. Initialized empty Git repository in /private/tmp/rails2/.git/
  7.  
  8. real    0m2.321s
  9. user    0m0.151s
  10. sys 0m0.465s
  11.  
  12. rsanheim@ares:~/src/relevance $ time cp -R rails/ /tmp/rails
  13.  
  14. real    0m7.133s
  15. user    0m0.067s
  16. sys 0m1.505s

The rails repo at 75 megs is still ~ 3 times faster.

Obviously, this is not scientific at all, but the point is pretty clear. Git is doing some magic that lets it move files around locally 2 to 3 times faster than a plain copy. From looking at the man page, I would guess it has something to do with git using hardlinks for things in .git/objects when cloning locally. My linux fu falls down a bit here -- what are the ramifications of using hard links versus doing a "real" copy?

(This also makes me want to try out gitbak even more...)


5 Comments

Posted by
Clint
31 July 2008 @ 6am

‘gitbak’ link is broken. It should be: eigenclass.org/hiki/gibak-backup-system-introduction though that site seems to be down currently.

Thanks for the post though.


Posted by
bryanl
31 July 2008 @ 1pm

cp is not very fast. a faster way to copy is “tar cf - | (mkdir -p ; cd ; tar xf -)”


Posted by
Piers Cawley
31 July 2008 @ 4pm

Pretty straightforward really. A hard link isn’t a copy:

$ touch foo; ln foo bar; echo ‘I am foo!’ >> foo; cat bar
I am foo!

So the cp -R is slinging data around, git clone is building a bunch of directories and pointing them at the data.


Posted by
Dieter_be
10 September 2008 @ 4am

A “file” consists of a datablock on the filesystem and an inode that points to it. When you create a hard link, you just create a new inode that points to the same datablock. So it looks like you have the same file twice. Like in a directory, ‘.’ and ‘..’ are also hard links.

Note that if you change one file, the other will be changed too. (That’s why you should be carefull when messing with files in the .git directory ;-)


Posted by
j
19 September 2008 @ 2pm

i hope you arent using cp to backup git repos. that makes no sense. cp isnt atomic.

what happens when you try to cp your git repo just as some one is committing? you get fucked.


Leave a Comment

Quick: Find the Bug or Gotcha with named_scope Scp or rsync failing with no error message? Check your startup scripts…