Git Submodules/Subtrees
Table of Contents
Introduction
This page shall give an introduction to the functions submodule and subtree of git. Both commands allow the inclusion of a remote repository into a so called host repository. In the following both commands are briefly introduced and compared. At the end a conclusion is drawn.
Submodules
Links about how to work with submodules
https://github.com/blog/2104-working-with-submodules
https://git-scm.com/book/en/v2/Git-Tools-Submodules
Submodules allow to include other git repositories into you're project.
Cons | Pros |
---|---|
Makes branching, cloning and forking unhandy | No duplication of source code |
Download of source code not straight forward | Can be used if maven dependencies don't work |
It's often said that submodules are not good practice | Changes need to be pulled in manually. |
Workflow
Assumed you have one repository which serves as host and an external repository to include in the host repository.
Host name: HostRepo
External name: externalSubModule
> cd HostRepo
> git submodule add https://github.com/marpet/externalSubModule.git
externalSubModule
This creates a directory named externalSubModule in the host project by using the external project.
Depending on the git version the created folder might be empty. If so the following command need to be executed.
> git submodule update --init --recursive
This inits the externalSubModule folder. Newer version of git do this automatically. You will also see a .gitmodules file in the parent. Necessary information about the submodules are stored there.
> cd externalSubModule
> git add -A; git commit -m "Message"; git push
Pushing changes on the submodules can be done within the submodule by standard git commands. The adding and committing changes need to be done within submodule but push can be done within the parent:
> git push --recurse-submodules=on-demand
But unfortunately this seems not work. Changes are still not committed after the command.
To pull in changes from the submodule, it is necessary to step into its directory and do git fetch/git pull as usual.
Easier is to use the following in the parent project:
> git submodule update --remote
After cloning the first time it is necessary to init the submodules separately.
> git submodule init
> git submodule update
Or the option --recursive is used with the clone command
> git clone --recursive https://github.com/marpet/hostrepo.git
Subtrees
Some Links:
https://medium.com/@v/git-subtrees-a-tutorial-6ff568381844#.nznf1580u
https://medium.com/@porteneuve/mastering-git-subtrees-943d29a798ec#.iqux14wg8
http://blog.nwcadence.com/git-subtrees/
https://makingsoftware.wordpress.com/2013/02/16/using-git-subtrees-for-repository-separation/
Cons | Pros |
---|---|
Different merge strategy | No change to default Workflow; live for developer easier; |
Contributing back to foreign repository is more complicated First: git subtree push --> Push to a branch
or push to the master directly | No special handling on cloning, branching, forking |
Contents of foreign repository can be changed locally | |
Merge conflicts are easier to handle | |
No additional .gitmodules file | |
Changes need to be pulled in manually from remote |
Do I have to pull in changes manually from foreign repo, or only when I've switched to a new branch?
→ YES. Changes need to be pulled in explicitly.
Merging changes from foreign repo should be done with option --squash. This will compact commit messages to a single one.
Workflow
Assumed you have one repository which serves as host and an external repository to include in the host repository.
Host name: HostRepo
External name: externalSubModule
> cd HostRepo
> git remote add externalSubModuleRepo https://github.com/marpet/externalSubModule.git
externalSubModuleRepo is from now on the ID used to refer to this repository
> git subtree add --prefix=externalSubModule externalSubModuleRepo
master --squash
The option --prefix is used as new folder name in the host repository. Then the repository and the branch are specified.
> git push
The project is now ready to be cloned or forked.
> git subtree pull –-prefix=externalSubModule externalSubModuleRepo
master --squash
Or
> git pull -s subtree externalSubModuleRepo master --squash
The above command is to get changes from the external repository into the host
After doing changes in the host repository on the external code
> git push
This pushes the changes only to the host repository
> git subtree push --prefix=externalSubModule externalSubModuleRepo
master
This pushes the changes to the external repository. Pushing to the remote can be tricky. Sometimes it fails. Leaving the user wondering why it fails. So it might be better to only change on the remote repository directly.
Switching between branches/tags needs to remove the directory first from the host and then add it again with git subtree add
> git rm externalSubModule
> git commit
> git subtree add --prefix=externalSubModule externalSubModuleRepo <branch>
For splitting an existing directory into a separate repository but keeping it in the master follow this link: https://medium.com/@porteneuve/mastering-git-subtrees-943d29a798ec#.jr3n7c1fq. See the section "Turning a directory into a subtree"
Conclusion
Finally, the conclusion can be drawn that subtrees are better. The daily workflow does not need to be changed. Only if the updates from the external repository shall be included, special commands are necessary. An occasional user of the source code does not need to know anything about the subtree. Additionally, in contrast to submodules, it is possible to adapt the external code to the needs of the host project without changing the code on the external repository.
Links
Subtree Commands for Use-cases