In every single system that I've built, data has been a critical issue; every system either required a new database, needed to access legacy data sources-or both. With the growing prevalence of agile development processes, it's critical that we find ways to apply agility to data-oriented activities. In my experience, the first step is to take an evolutionary (iterative and incremental) approach to data modeling. The second step is to do so in a highly collaborative manner.
Contrary to some claims, there's nothing special about data modeling. In this series, I'll work through an example of agile data modeling to show you that it's not only possible, but necessary for agile development.
This fictitious project will build a Karate School Management System (KSMS) for a single dojo. In this column, I'll explore what happens to the data model in the first three iterations. I'll keep the data models relatively simple, leaving out details such as columns used to record the creation date of a row or the date it was last updated, so I can focus on the approach itself.
Iteration One
[click for larger image] Initial Data Model The first version of KSMS supports the critical functionality required to run the dojo. |
For physical data modeling, I'm using the Unified Modeling Language, the notation for which is described at www.agiledata.org/essays/umlDataModelingProfile.html. I've chosen to keep my key strategy simple, using surrogate values called persistent object identifiers (POIDs), which are unique across all records within the database. I could have used natural keys for many tables, or even just surrogate values unique only within each table, but I've found that POIDs work incredibly well. When agile data modeling, you can take any key strategy and use any notation you like, but, to allay your concerns, I'll elucidate what I'm doing.
Iteration Two
For iteration two, I added new functionality that has no real overlap with the existing data model. The Belt
table and the Student.BeltPOID
column were added to support the ability to track each student's current belt status. I also added the Student.EmailAddress
column, so we can e-mail membership information to students; and the Student.GradingDate
column, to track the last/next time a student grades for a new belt.
[click for larger image] Second Iteration I've followed the Agile Modeling practice of Applying Modeling Standards, as well as good naming conventions and modeling style guidelines.
|
The second point? I'm tracking only the students' current belts, not their entire history (I'm not tracking when they earned their yellow belts, their orange belts and so on). Nor am I adding extra columns at this point. Although I might need them sometime in the future, I don't have a requirement to do this work right now, so I won't do it. Lesson number three: Don't build something until it's needed.
Iteration Three
[click for larger image] Third Iteration In this data model, I added the ability to put memberships on hold and to manage gradings. |
MembershipHold
table to track when the membership was on hold, allowing the system to account for the number of weeks used from a given membership (memberships are three, six or 12 months in length).
To manage gradings, I added two new tables: BeltAttempt
, to track the belt a student is attempting during a given grading; and Grading
, to track basic information about the grading. These tables were straight additions to the database schema.
Notice how I'm keeping the data schema normalized—a "normal data schema stores information in one place and one place only. Although there are many normalization rules, some are more critical than others. First, remove repeating groups into their own table. For example, I don't have 10 columns to track the last 10 payments that an individual made; instead, I introduced a Payment
table. Then ensure that your tables are cohesive by insisting that all attributes are fully dependent on the primary key. For example, the payment type description (for example, Visa) was captured in the PaymentType
table instead of in the Payment
table because that information describes the payment type and not an individual payment. Normalization is described in detail at www.agiledata.org/essays/dataModeling101.html#Normalize.
Deployment
At the end of the third iteration, the users decided to deploy the system into production for a real-world test. Up to this point, the software was demoed to the users in previous iterations, but they felt that it wasn't sufficient to support real-world business activities. Agile lesson number four: You deliver working software on a regular basis, but you don't always deploy it.
The primary point? You can grow a data model over time—you don't need to model it all up-front. Just as programmers can work in an evolutionary manner, so can data professionals. Next month, I'll change the requirements dramatically and describe how the data modeling effort is affected. In the third segment, I'll describe in detail the new skills that data professionals need to be effective members on agile software development teams. Stay tuned.
KSMS Requirements The following user stories describe the initial usage requirements for our fictitious system. These stories were identified during an initial modeling session and then separated on a priority basis into two-week iterations. |
|
Iteration | User Stories |
1 | |
2 | |
3 | |
4 | |
5 | |
6 | —S.A. |
Scott Ambler is a senior consultant with Ronin International Inc. His latest book is Agile Database Techniques from Wiley Publishing.