Posts Tagged ‘normalization’

Database Normalization

March 9, 2012 Leave a comment

What is Normalization?

Normalization is the process of efficiently organizing data in your database. There are two goals of the normalization process: reducing redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (all fields in a tale can be uniquely determined from the primary key). Both of these are the main goals as they reduce the amount of space a database consumes and ensure that data is logically stored.
The Normal Forms

The database community has developed a series of guidelines for ensuring that databases are normalized. These are referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as first normal form or 1NF) through five (fifth normal form or 5NF). In practical applications, you’ll often see 1NF, 2NF, and 3NF. Fourth and Fifth normal forms won’t be discussed in this article since they are not much seen.

Before we begin our discussion of the normal forms, it’s important to point out that they are guidelines and guidelines only. Occasionally, it becomes necessary to stray from them to meet practical business requirements. However, when variations take place, it’s extremely important to evaluate any possible ramifications they could have on your system and account for possible inconsistencies. That said, let’s explore the normal forms.
First Normal Form (1NF)

First normal form (1NF) sets the very basic rules for an organized database:
• Eliminate duplicative columns from the same table (For example, to track an inventory item that may come from two possible sources, an inventory record may contain fields for Vendor Code 1 and Vendor Code 2.
What happens when you add a third vendor? Adding a field is not the answer; it requires program and table modifications and does not smoothly accommodate a dynamic number of vendors. Instead, place all vendor information in a separate table called Vendors, then link inventory to vendors with an item number key, or vendors to inventory with a vendor code key.)
• Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).
Second Normal Form (2NF)

Second normal form (2NF) further addresses the concept of removing duplicative data:
• Meet all the requirements of the first normal form.
• Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
• Create relationships between these new tables and their predecessors through the use of foreign keys.
Third Normal Form (3NF)

Third normal form (3NF) goes one large step further:
• Meet all the requirements of the second normal form.
• Remove columns that are not dependent upon the primary key.


Database Architecture Ground Rules

March 9, 2012 2 comments

I will highlight some of the most important Ground Rules while working with database architecture:

  1. Tables, Columns, Queries, Stored Procedures and any other objects in the database should only contain alphanumeric characters and/or underscores, any other characters are not allowed under any circumstances.
  2. Unless we have tables from two different projects under the same database, tables shouldn’t be prefixed like (Student and not tblStudents) since this will save you time when dealing with the tables as objects in your application.
  3. Use well defined and consistent names for tables and columns (e.g. School, StudentCourse, CourseID …).
  4. Use singular for table names (i.e. use StudentCourse instead of StudentCourses). Table represents a collection of entities, there is no need for plural names.
  5. Don’t use spaces for table names. Otherwise you will have to use ‘{‘, ‘[‘, ‘“’ etc. characters to define tables (i.e. for accesing table Student Course you’ll write “Student Course”. StudentCourse is much better).
  6. Keep passwords encrypted for security. Decrypt them in application when required.
  7. Use integer id fields for all tables. If id is not required for the time being, it may be required in the future (for association tables, indexing …).
  8. Choose columns with the integer data type (or its variants) for indexing. varchar column indexing will cause performance problems.
  9. Use bit fields for Boolean values. Using integer or varchar is unnecessarily storage consuming. Also start those column names with “Is”.
  10. Provide authentication for database access. Don’t give admin role to each user.
  11. Avoid “select *” queries until it is really needed. Use “select [required_columns_list]” for better performance.
  12. Use an ORM (object relational mapping) framework (i.e. hibernate, iBatis …) if application code is big enough. Performance issues of ORM frameworks can be handled by detailed configuration parameters.
  13. Partition big and unused/rarely used tables/table parts to different physical storages for better query performance.
  14. For big, sensitive and mission critical database systems, use disaster recovery and security services like failover clustering, auto backups, replication etc.
  15. Use constraints (foreign key, check, not null …) for data integrity. Don’t give whole control to application code.
  16. Lack of database documentation is evil. Document your database design with ER schemas and instructions. Also write comment lines for your triggers, stored procedures and other scripts.
  17. Use indexes for frequently used queries on big tables. Analyzer tools can be used to determine where indexes will be defined. For queries retrieving a range of rows, clustered indexes are usually better. For point queries, non-clustered indexes are usually better.
  18. Database server and the web server must be placed in different machines. This will provide more security (attackers can’t access data directly) and server CPU and memory performance will be better because of reduced request number and process usage.
  19. Image and blob data columns must not be defined in frequently queried tables because of performance issues. These data must be placed in separate tables and their pointer can be used in queried tables.
  20. Normalization must be used as required, to optimize the performance. Under-normalization will cause excessive repetition of data, over-normalization will cause excessive joins across too many tables. Both of them will get worse performance.
  21. Spend time for database modeling and design as much as required. Otherwise maintenance and re-design time will take much more.