The repository pattern is a subject that caused me a lot of grief and frustration over the years. I've seen so many terrible examples of this pattern quoted off as gospel truth over the years, and I feel like a quick google search leads to a bunch of stack overflow articles that really just serve to confuse me farther. I've written terrible things that I have named as repositories that are not repositories and I feel bad for it. Here's my recompense by trying to make the world a slightly better place.
When searching for a definition of the repository pattern I immediately went to wikipedia. As crazy as it is, wikipedia has no entry for it. I vaguely recall there being one a few months ago, and I feel like it's disappearance is testament to it's controversial nature. Instead I decided to go with the source in Eric Evans book Domain Driven Development. Turns out that he's got a huge long complex definition that's summarized here http://mikehadlow.blogspot.com/2009/01/eric-evans-on-repositories.html
Ok that's a huge bit of text that's hard to wrap your head around. I like Fowlers short definition better:
"Mediates between the domain and data mapping layers using a collection-like interface for accessing domain objects." http://martinfowler.com/eaaCatalog/repository.html
MSDN has another huge breakdown of the pattern if you really want to get lost in diagrams: http://msdn.microsoft.com/en-us/library/ff649690.aspx
I'll also throw my hat into the ring of definitions and summarize it in a way that most makes sense to me:
"A repository is a class that helps you work with a collection of domain objects." - Me.
I feel like just those links and definitions should be required reading before anyone starts trying to implement a repository. But lets be honest, those are all dense technical tomes, and I don't know anyone that can learn a topic by looking at complex diagrams of boxes and arrows. Read on and hopefully all will be explained.
Repository is a contentious topic among popular architects that also seem to like to blog. This debate made my head spin around like a top when I was first trying to figure out this pattern. Really it's crazy how much debate this topic has spawned.
Ayende Rahien (famous for his orm NHibernate) had some harsh words for the pattern. http://ayende.com/blog/3955/repository-is-the-new-singleton In this post he (correctly imo) notes two problems that the repository pattern is suffering from in common usage.
I actually find a small bit of irony here. In the first point Ayende makes a good point about useless abstractions (and it can be), but then goes on to describe how a mature ORM is an implementation of a Repository pattern, when a mature ORM is by his own definition describing a DAO pattern.
As a side note check some of the top Stackoverflow answers for how to implement the repository pattern.
It's sad that at the time of writing these were the top links on google for implementing a repository. I consider them all wrong and terrible.
So lets take that fork in the road and travel down the path of what a DAO pattern is. Sometimes the best way to see something is by contrast, and I think that applies here.
"An object that provides an abstract interface to some type of database or other persistence mechanism" -http://en.wikipedia.org/wiki/Dataaccessobject
"A Data Access Object (DAO) is an object that helps you work with a pile of persisted data." - Me again.
Confused yet? If at first you think these two patterns could be describing the same thing take a closer look. Repository is an abstraction that models domain objects. DAO is an abstraction that models persisted data (most often in a database, disc, or api). There is a devil in the details there, and it's an important one to understand if you're going to make something and call it a repository. Eric Evans describes the term "aggregate roots" in the book Domain Driven Design and if you know that term a light bulb probably just went off over your head.
To visualize it, imagine you're building an application in a 3 layer architecture. You have a top layer for all your UI and display logic. This UI layer talks to the business logic (domain) layer where all your core application logic goes. Then that layer talks to the data (persistence) layer which is in charge of long term storage in a database or whatever.
A DAO sits in this bottom layer and deals with a collection of data that has been saved using a persistence model. A Repository sits in the middle layer and uses one or many DAOs to query and construct the domain models. Taken one step higher up to the UI layer and you might have a MVC Controller that queries one or many Repositories for the data it needs to construct a ViewModel. Going the opposite way you might say that someone posts a request or a ViewModel back to the Controller. The Controller uses one or many Repositories to find the Domain Models that need to change and updates them. The Repositories use one or many DAO's to update the persistence models from the domain models.
To be fair I believe Ayende is actually correct in his specific case. When your domain objects are persisted in their exact form, then you're using a repository. This actually happens in many scenarios, in particular code first patterns. If you have a simple domain where the almost all of your app is CRUD operations then the roles a DAO and a Repository would play would form two very anemic classes, to the point where having both could be a useless layer of abstraction. Also with the rise of more free form key value stores like mongo, and reddis it's getting pretty easy to store some pretty complex objects.
I think most of this comes down to the question, "How many layers of complexity does my application need?" In my experience throw-away scripts and quick prototypes deserves 1. Simple data entry and other forms of pure CRUD apps would qualify as 2 layers (no repository, just roll with a DAO). More complex applications that are going to need 3 or more depending on just how complicated they may become (if you think you need more than 5 you should have a really good business case and an even better plan for managing it imo.)
I'd like to stop and go over that last bit again just to be perfectly clear. Repository is a pattern for the persistence of domain objects with rich functionality that are meant to be comprised of many models and often will contain business logic. DAO is a pattern used for objects that describe what data looks like at rest. You don't have to pick one or the other, they can and stand back to back in your data access layer and do their thing. More commonly you can have a repository that is composed of a DAO or two. It is very common in the code that I've recently been working on for the DAO to be entity framework, and the persistence model to be derived from my more simple domain models.
I would like to start this out by saying that every single instance of generic Repository
I've also seen IRepository
That's about all I got for this one. I liked this topic to write a long rambling blog post about it so feel free to contact me if you want to get into a friendly nerd fight about it :)