Git Submodules/Subtrees

Table of Contents

Introduction

This page shall give an introduction to the functions submodule and subtree of git. Both commands allow the inclusion of a remote repository into a so called host repository. In the following both commands are briefly introduced and compared. At the end a conclusion is drawn.

Submodules

Links about how to work with submodules

https://github.com/blog/2104-working-with-submodules

https://git-scm.com/book/en/v2/Git-Tools-Submodules

Submodules allow to include other git repositories into you're project.


ConsPros 

Makes branching, cloning and forking unhandy

No duplication of source code

Download of source code not straight forward

Can be used if maven dependencies don't work

It's often said that submodules are not good practice

Link1Link2, Link3

Changes need to be pulled in manually. 

Workflow

Assumed you have one repository which serves as host and an external repository to include in the host repository.

Host name: HostRepo

External name: externalSubModule

> cd HostRepo

> git submodule add https://github.com/marpet/externalSubModule.git externalSubModule

This creates a directory named externalSubModule in the host project by using the external project.

Depending on the git version the created folder might be empty. If so the following command need to be executed.

> git submodule update --init --recursive

This inits the externalSubModule folder. Newer version of git do this automatically. You will also see a .gitmodules file in the parent. Necessary information about the submodules are stored there.

> cd externalSubModule

> git add -A; git commit -m "Message"; git push

Pushing changes on the submodules can be done within the submodule by standard git commands. The adding and committing changes need to be done within submodule but push can be done within the parent:

> git push --recurse-submodules=on-demand

But unfortunately this seems not work. Changes are still not committed after the command.

To pull in changes from the submodule, it is necessary to step into its directory and do git fetch/git pull as usual.

Easier is to use the following in the parent project:

> git submodule update --remote

 

After cloning the first time it is necessary to init the submodules separately.

> git submodule init

> git submodule update

Or the option --recursive is used with the clone command

> git clone --recursive https://github.com/marpet/hostrepo.git

 

Subtrees

Some Links:

https://medium.com/@v/git-subtrees-a-tutorial-6ff568381844#.nznf1580u

https://medium.com/@porteneuve/mastering-git-subtrees-943d29a798ec#.iqux14wg8

http://blog.nwcadence.com/git-subtrees/

https://makingsoftware.wordpress.com/2013/02/16/using-git-subtrees-for-repository-separation/


ConsPros
Different merge strategy

No change to default Workflow; live for developer easier;
only new commands are needed when push/pull to external repo is necessary

Contributing back to foreign repository is more complicated
First: git subtree push -->
Push to a branch
  • Create pull request based on new branch
  • Merge

or push to the master directly

No special handling on cloning, branching, forking
Contents of foreign repository can be changed locally
Merge conflicts are easier to handle
No additional .gitmodules file
Changes need to be pulled in manually from remote

Do I have to pull in changes manually from foreign repo, or only when I've switched to a new branch?

→ YES. Changes need to be pulled in explicitly.

Merging changes from foreign repo should be done with option --squash. This will compact commit messages to a single one.

Workflow

Assumed you have one repository which serves as host and an external repository to include in the host repository.

Host name: HostRepo

External name: externalSubModule

> cd HostRepo

> git remote add externalSubModuleRepo https://github.com/marpet/externalSubModule.git

externalSubModuleRepo is from now on the ID used to refer to this repository

> git subtree add --prefix=externalSubModule externalSubModuleRepo master --squash

The option --prefix is used as new folder name in the host repository. Then the repository and the branch are specified.

> git push

The project is now ready to be cloned or forked.

> git subtree pull –-prefix=externalSubModule externalSubModuleRepo master --squash

Or

> git pull -s subtree externalSubModuleRepo master --squash

The above command is to get changes from the external repository into the host

After doing changes in the host repository on the external code

> git push

This pushes the changes only to the host repository

> git subtree push --prefix=externalSubModule externalSubModuleRepo master

This pushes the changes to the external repository. Pushing to the remote can be tricky. Sometimes it fails. Leaving the user wondering why it fails. So it might be better to only change on the remote repository directly.


Neither for subtrees nor for submodules IntelliJ has support for. Pulling and pushing to the external repo needs to be done on the command line.


Switching between branches/tags needs to remove the directory first from the host and then add it again with git subtree add

> git rm externalSubModule

> git commit

> git subtree add --prefix=externalSubModule externalSubModuleRepo <branch>


For splitting an existing directory into a separate repository but keeping it in the master follow this link: https://medium.com/@porteneuve/mastering-git-subtrees-943d29a798ec#.jr3n7c1fq. See the section "Turning a directory into a subtree"

Conclusion

Finally, the conclusion can be drawn that subtrees are better. The daily workflow does not need to be changed. Only if the updates from the external repository shall be included, special commands are necessary. An occasional user of the source code does not need to know anything about the subtree. Additionally, in contrast to submodules, it is possible to adapt the external code to the needs of the host project without changing the code on the external repository.

Links

Subtree Commands for Use-cases