Retention Blog: August 2013

Having used SVN and Clearcase extensively before coming around to using Git, I feel that I have a good background to support my decision to choose Git over lessor VCS's. For the sake of comparison, I will directly compare Git to SVN, but in general, points made in this post can be made for any non-distributed vs distributed VCS's.

Git is awesome because:

It is distributed.

When you work in git, the first thing you typically do is copy someone else's existing git repository with "git clone." This is opposed to interacting with the location of the central repository over the lifetime of your development. This is one big bullet point that infers many concrete benefits:

Better collaboration: Whenever you do something in SVN, everybody knows about it. In Git, when you clone a repository, you are the project owner of your own local copy and its private until you want to make it public. It doesn't matter how many times people clone the central repository, you will never know the difference. This design has led to the great success of open-source projects. Communities of developers feel at ease in forking a repository and doing their own work without ever risking breaking the trunk code base.
No noise in central: An extension to the previous point, in which project owners in SVN know who their collaborators are. To create a branch in SVN, you create a new directory in which you copy all the code to, and this directory lives on the central repository. In Git, you create branches on your own repository, a repository in which you own and is private. No more problems with developers leaving the company or forgetting to clean up their branches, leaving behind a mess in the SVN repository.
Access control: Both repository systems allow you to control read and write permissions. But, developers who don't have write access in SVN, can't use version control at all. Developers using Git can keep their changes under version control in their own local repository for a later date when they have permission to publish.
Work offline: Because you have your own copy of the repo, you can work offline and as a result can work well and fast with slow/no network access.
Backups: Every time you clone the repo, you are creating a backup, in case the central repo fails for some reason.
Flexible work flow: Git is extremely flexible. In Git, there is no "central" repository. Every Git repository is the same, and every repository can interact with each other in the same ways. Because of this loose coupling, there is a lot of flexibility in the collaboration of teams. An example here:

In this example the developers request the integration manager to pull each of their requests. The manager handles the merging, before pushing into the "central" repository. The integration manager interacts with the developer's repos the same way it interacts with the "central" repository.The point of all these being that you can manage any workflow independent of a central repository.

2. Branching is Easier

I haven't really seen this problem when I used SVN, but I have been told by others- that people don't like making branches in SVN to the point that they just don't do it. This is bad. ~~Committing to trunk is bad.~~ After writing this I talked with a colleague who told me about continuous integration. Though I have heard this term before, I didn't associate that as a opposing force to "committing to the trunk is bad." So I'm retracting this statement, and I will do a future post about continuous integration. There is still the point that developers should be able to create branches easily for experimentation or whatnot. With Git, creating, switching to and deleting branches is easy, so developers won't think twice about creating an experimental branch, or one to start developing a new feature on.

People don't branch in SVN: There are few reasons why people don't like branching in SVN. One is you usually have to type in a long URL where the SVN repo is located... twice. Once for copying from and once for copying to. This operation takes a few seconds, as you are copying the entire branch over. When its all over, you have a directory on the SVN server that everybody can see with your name spelled wrong.
Branching in Git at two levels:

The first level of branching in Git people don't really realize, that is "git clone". Whenever you clone a repo, you are copying the code base from one place to another, effectively creating a branch off the original code base. In fact, the command is similar in speed of execution and annoying-ness as creating a branch in SVN. The difference is that you are required to clone, there is no other option. This forces the workflow of separating a developer's workspace from the trunk.
The second level of branching is with the "git branch" command. To create a branch in git you do. "git branch <branch_name>". This is very fast and lightweight command. More so, switching branches is just as easy, just do: "git checkout <branch_name>". This additional second level of branching not present in SVN give developers flexibility to separate their features, bugs and experiments at their discretion. The added complexity is managed by them, not the central server, and pushing changes to the server is selective to only the branches of their choosing.

3. Faster and Smaller

I won't go too much into detail about this, and you can look into the specific implementations of Git and SVN to understand why Git is faster and smaller. But, its a good thing.

The Lightweight Directory Access Protocol is a protocol that sits in the application layer¹. The protocol is used for accessing and maintaining internals of distributed directory services- which are covered by the x.500 standard series³.

LDAP derived from DAP, that used to run only on a deployed OSI² network. As the TCP/IP network stack took over the internet, LDAP rose up as the TCP/IP alternate to DAP. The L ("lightweight") comes from the significant less bandwidth required for transactions.

To start a LDAP session, a client connects to a LDAP server (a Directory System Agent (DSA) in x.500 terms). Client will make requests to server, and server will respond to requests - all asynchronously. All information is sent using Basic Encoding Rules (BER)⁴. After initial connection the user must send a request to BIND, that will authenticate the user.

Directory Structure:
An example of an entry stored in LDAP:

dn: cn=John Doe,dc=example,dc=com
cn: John Doe
givenName: John
sn: Doe
telephoneNumber: +1 888 555 6789
telephoneNumber: +1 888 555 1232
mail: john@example.com
manager: cn=Barbara Doe,dc=example,dc=com
objectClass: inetOrgPerson
objectClass: organizationalPerson
objectClass: person
objectClass: top

"dn" is the distinguish name that is a composite of cn "common name" and one or more dc's "domain component." cn translates to a file name in a file system, and the collection of dn's would be the file path. It works most specific first then up. So above would translate in linux to com/example/John Doe.

Every line above is an attribute, which has the following syntax:
<key>:<value>
A class in LDAP defines a set of attributes that an entry can define. Classes can inherit from other classes and so a subclass will inherit all of its parents attributes by definition. (Normal OO stuff). The objectClass attribute defines classes that this entry uses (that then define what attributes it can set). "top" is the abstract parent class of all other classes. (either directly or indirectly)

Operations:
The data for Operation requests sent by the client are in similar format to how entries are represented on the LDAP servers. For example, the following data is for an ADD operation:

dn: uid=user,ou=people,dc=example,dc=com
changetype: add
objectClass: top
objectClass: person
uid: user
sn: last-name
cn: common-name
userPassword: password

In the above example, uid=user,ou=people,dc=example,dc=com must not exist, and ou=people,dc=example,dc=com must exist.

The complete list for operations is as follows:


StartTLS — use the LDAPv3 Transport Layer Security (TLS) extension for a secure connection
Bind — authenticate and specify LDAP protocol version
Search — search for and/or retrieve directory entries
Compare — test if a named entry contains a given attribute value
Add a new entry
Delete an entry
Modify an entry
Modify Distinguished Name (DN) — move or rename an entry
Abandon — abort a previous request
Extended Operation — generic operation used to define other operations
Unbind — close the connection (not the inverse of Bind)

¹Application Layer -> The TCP/IP networking layer that is above transport.

²OSI -> Open System Interconnection. Internet uses TCP/IP. There isn't too many actual implementations of OSI, but it is still used as a model for learning and debugging.
³Series of computer networking standards for directory services created by ITU-T(ITU Telecommunications Standardization Sector)
⁴AKA x.690 ->Format for encoding ASN.1 data structures.

Retention Blog

Wednesday, August 21, 2013

Why Git is Awesome

Thursday, August 15, 2013

An Introduction to LDAP