MongoDB stores objects in a binary format called BSON. BinData is a BSON data type for a binary byte array. However, MongoDB objects are typically limited to 4MB in size. To deal with this, files are “chunked” into multiple objects that are less than 4MB each. This has the added advantage of letting us efficiently retrieve a specific range of the given file.
While we could write our own chunking code, a standard format for this chunking is predefined, call GridFS. GridFS support is included in many MongoDB drivers and also in the mongofiles command line utility.
GridFS is a storage specification for large objects in MongoDB. It works by splitting large object into small chunks, usually 256k in size. Each chunk is stored as a separate document in a chunks collection. Metadata about the file, including the filename, content type, and any optional information needed by the developer, is stored as a document in a files collection.
So for any given file stored using GridFS, there will exist one document in files collection and one or more documents in the chunks collection.