The structure contained within a source control repository looks very much like a file system. It contains what is (at least conceptually) a structure of directories and files. Where the repository extends this view is in its inclusion of history information. Items in the repository are given a version number on creation. On every modification a new version of the item is recorded. This is done in an additive fashion such that existing versions are immutable (barring special purpose administrative tools). In addition to the item itself a version will contain metadata such as a timestamp and the user who submitted it.
This version number is important in interaction with a users working copy. The source control system may determine which items have later versions in the repository by comparing the version in the working copy with the version in the repository. To this end most source control systems store metadata along with the working copy that records this and other relevant data.
In addition to versions on individual items many source control systems also include the concept of a
changeset. This is a grouping construct that contains a set of versions. It is created when items are checked in and includes all the versions created by that checkin operation. This is useful for determining related sets of changes. Changesets will have a changeset number which is generally global. In some source control systems the changeset number is the same as the version number, which implies version numbers are not necessarily sequential for any item. In others they will be separate numbers where the system records the relationship between changesets and item versions. This is an implementation detail and is not particularly significant to end users provided they understand which model their source control system uses.
Changesets are generally related to the concept of
atomic checkin. This is a desirable behaviour that enforces transaction-like semantics on the source control system. With atomic checkin a checkin operation will either completely succeed or will not occur at all, regardless of the number of files involved. This ensures that the repository retains a consistent and that a set of changes cannot be partially applied (assuming that all the relevant files are selected for checkin by the user).
Most source control systems internally implement this such that they do not always store the entire file for every version. Instead they may stored the changes between the new version and the previous one. This generally results in significantly less data to be recorded as in most cases the differences are smaller than the total file size. This is not always applicable; in particular binary (that is to say non-text) files are more complicated to handle in this fashion and many source control systems will simply store the entire file every time for these file types. This behaviour is an implementation detail of the source control system and is generally not of concern to end users. However knowing how the repository handles binary files is important when dealing with large or numerous binary files as this may quickly consume the storage space available to the repository.