Basic Principles of Designing Database Table Structures
12002Introduction to This Lesson
In this lesson, we will learn the basic principles of designing database table structures, including database normalization design, determining database requirements, designing data table structures, and optimizing data table design.
First, we will understand the concept and importance of database normalization, especially how to apply normalization design in practical projects, and learn about the three main normal forms of databases through documentation. We will analyze business requirements to determine the functions of the database and the relationships between data tables, including associations between different tables. In designing data table structures, we will learn about naming conventions for table names, field definitions, common data types, primary key settings, etc., to ensure data integrity. Additionally, we will gain a preliminary understanding of optimizing data table design: indexes, views, triggers, and transactions, and understand their concepts and roles.
Through this lesson, we will master the basic principles and optimization methods of database design, which are essential skills in application development.
1. Database Normalization Design
In simple data management scenarios, such as performing CRUD operations on just a few tables or querying views, database normalization design may not seem very important. However, in enterprise-level application development, whether creating dynamic websites or developing applications, database normalization design is crucial. A reasonable database design can enhance data quality, simplify the complex business logic of application development, thereby improving the performance of application projects, data management security, and maintainability.
The Concept and Importance of Normalization
Normalization is the process of organizing database data, which includes creating data tables and determining their relationships, with the goal of ensuring clarity and order in data storage.
Why is normalization important? Imagine if the chapters of a book were jumbled; it would be difficult to read. Similarly, if the data relationships in a database are disrupted, the data can become chaotic and unusable. Normalization is like organizing books, making data orderly and easy to manage and understand.
Normalization is an ongoing process that requires iteration. In application development, tables may initially be created simply based on business requirements, but as business needs change, it may be discovered that data settings are unreasonable or associations are chaotic, leading to issues such as data duplication and abnormal operations, necessitating reorganization. Although this may consume extra time and resources, it can prevent increased maintenance costs in the future.
In databases, normalization is about optimizing table structure design to ensure that data is organized systematically. Through normalization, a reasonable table structure is designed, which significantly impacts improving data quality, simplifying business logic, and enhancing application performance and data management security.
How to Apply Database Normalization Design in Practical Projects
Applying database normalization design in practical projects is a process of continuous improvement and optimization. This includes requirement analysis, data table structure design, primary and foreign key settings, eliminating duplicate data, and ongoing iteration.
In practical projects, database normalization design is like building with blocks, requiring step-by-step progress. First, we need to clarify the requirements, just as one must know what kind of building to construct before starting. For example, if we are developing a student grade management system to record students' exam scores, we need to store basic information about students, subject information, and grade records.
Once the requirements are clear, the next step is to design the table structure, akin to choosing the right way to assemble blocks of different shapes and sizes. We can create three tables: a student table (containing student ID, name, age, etc.), a subject table (containing subject ID, subject name, etc.), and a score table (containing student ID, subject ID, score, etc.), with each table storing the corresponding data.
-- Student Table
CREATE TABLE student (
studentId INT PRIMARY KEY,
studentName VARCHAR(50),
age INT
);
-- Create Subject Table
CREATE TABLE course (
courseId INT PRIMARY KEY,
courseName VARCHAR(50)
);
-- Score Table
CREATE TABLE score (
studentId INT,
courseId INT,
scoreValue DECIMAL(5,2),
FOREIGN KEY (studentId) REFERENCES student(studentId)
); In this design:
The student table (student) contains the student ID, name, and age information, with the student ID as the primary key.
The subject table (course) contains the subject ID and subject name, with the subject ID as the primary key.
The score table (score) contains student ID, subject ID, and score information, with the combination of student ID and subject ID as the primary key, while setting foreign key constraints to associate with the student and subject tables.
Why design it this way? First, let's understand two concepts: primary keys and foreign keys, as well as eliminating duplicate data.
A primary key is a field used to uniquely identify each row of data in a table, ensuring that each block has a unique identifier. For example, the student ID in the student table serves as the primary key, and the subject ID in the subject table also serves as the primary key. A foreign key is used to establish relationships between tables, such as the student ID and subject ID in the score table, which can serve as foreign keys to associate with the student and subject tables, ensuring data consistency and integrity.
Eliminating duplicate data is an important goal of normalization design, and duplicate data is also known as data redundancy. By designing table structures reasonably and establishing relationships, we can avoid data redundancy, thereby improving data quality and system stability.
For example, if we directly store student names in the score table, we would need to repeatedly input student name information every time a student takes an exam, leading to data redundancy. To avoid this, we store students' basic information in the student table and only store the student ID in the score table. This way, we eliminate duplicate data while ensuring data consistency and integrity. Such a design not only saves storage space but also makes data management clearer and more efficient.
Finally, continuous optimization is like constantly adjusting the structure of the blocks to ensure system stability. This is a simple process of applying database normalization design in practical projects.
Reading Documentation to Understand the Three Normal Forms of Databases
We have explained the concept and importance of normalization in simple and understandable language, and through a simple example of a student grade management system, demonstrated how to apply database normalization design in practical projects.
Next, you can continue to learn about the three normal forms of databases by reading documentation. A normal form is a standard that refers to the principles that should be followed when designing a database. When designing a relational database, adhering to different normalization requirements leads to a reasonable relational database. These different normalization requirements are referred to as different normal forms, and various normal forms present a hierarchy of standards, with higher normal forms resulting in less database redundancy.
References:
- First, Second, and Third Normal Forms of Databases
- Detailed Explanation of the Three Normal Forms of Database Design
2. Determining Database Requirements
Next, we will explain in detail how to determine database requirements. Database requirements refer to determining the functions and purposes of the database based on the business needs of enterprise application development and designing a reasonable data structure. Determining database requirements is a key step in the database design process, requiring comprehensive consideration of business needs, data management requirements, and future scalability to ensure the design of an efficient, secure, and reliable database system suitable for the business.
Analyzing Business Requirements
The first step in determining database requirements is to gain an in-depth understanding of the business needs of the enterprise, including business processes, data processing flows, business rules, etc. For example, if we are to develop a student enrollment system, we need to analyze aspects such as basic student information management, enrollment processes, class management, and fee management in detail.
So how do we analyze business requirements? The key is communication with users. First, we need to investigate and collect business information. By communicating with users, we can understand their expectations and needs for the application system, as well as various business-related rules, and gather historical business data.
Subsequently, based on the collected information, we first draw a simple business process diagram, including permission settings and data interactions for each link. We maintain continuous communication with users and adjust the business process diagram based on new user requirements or changes until the business process is clarified. This is an ongoing process that requires close contact with users to ensure that the design plan meets actual needs.
Finally, after confirming the requirements with users, we can begin designing the database. Before designing the database, we need to determine its functions and purposes to ensure that the database can completely and accurately reflect business needs while having good performance, especially in terms of security, maintainability, and scalability.
Determining the Functions and Purposes of the Database
Determining the functions and purposes of the database involves clarifying what the database needs to do based on business requirements and why it needs to do so. This includes:
What data to store: Determine what data needs to be stored in the database, i.e., what data tables there are and the relationships between the tables.
How to access data: Analyze user access to data, including how to view, update, and delete data, and who has permission to perform these operations.
Ensuring performance: Consider the performance requirements of the database, including data query speed, how many users will access it simultaneously, data backup and recovery, etc. Ensure that the database design can meet the performance requirements of the business.
Security: Ensure that the database design meets security requirements, including data encryption, access control, data backup and recovery strategies, etc., to prevent data leakage and damage.
Maintainability: Consider the maintainability of the database, including data cleaning, performance monitoring, system log recording, etc., to ensure that the database system runs stably and is easy to manage.
Future expansion: Consider future scalability needs of the database, including the addition of new business requirements and growth in data volume. Ensure that the database design can accommodate future changes.
Understanding Relationships Between Tables: Establishing One-to-One, One-to-Many, and Many-to-Many Relationships
When designing a database, it is very important to correctly understand and establish relationships between tables. The data association queries and view creation we learned earlier are essentially about building relationships between tables. In relational databases, the relationships between tables are mainly divided into three types: one-to-one, one-to-many, and many-to-many. Next, we will delve into these relationships through simple examples.
- One-to-One
A one-to-one relationship means that two tables share a common field, and this field is unique. For example, the relationship between the student table (student) and the score table (score) is one-to-one because each student in the student table has only one student ID (SId), and each score in the score table corresponds to only one student. This relationship helps ensure data accuracy and consistency.
If a data table stores many fields and appears complex, it can be split into several smaller tables based on one-to-one relationships. This design can make data cleaner, avoid storing duplicate information and complex associations, and improve data query speed, making it easier to manage.
- One-to-Many
A one-to-many relationship refers to a relationship between two tables where one table's record can correspond to multiple records in another table. For example, the relationship between the class table (class) and the student table (student) is one-to-many because one class can have multiple students, but one student can only belong to one class.
Through one-to-many relationships, it is easy to establish master-slave relationships between data, making complex data easier to organize and manage. For instance, in class and student management, we can query the class to obtain the corresponding student information and then perform other operations on the student information, allowing for more flexible filtering of the data we need and simplifying the query and management of student information.
- Many-to-Many
A many-to-many relationship means that there is a complex relationship between two tables, where one table's record can be associated with multiple records in another table, and vice versa. For example, the relationship between the student table (student) and the course table (course) is many-to-many because a student can choose multiple courses, and a course can be chosen by multiple students.
In many-to-many relationships, a third table, known as an association table or intermediate table, is usually introduced to connect the two main tables to resolve the many-to-many relationship. Intermediate tables can generally be divided into two types: one that is used solely to determine the association relationship without containing business-related fields, and another that not only serves as an association table but also contains business-related fields.
For example, we can create an intermediate table called student_course to connect students and courses, recording which courses students have chosen and which students have chosen a particular course.
Through many-to-many relationships, we can better describe complex relationships between tables, providing a flexible way to organize data while also facilitating data queries and management.
Establishing reasonable table relationships based on business needs can ensure data accuracy, consistency, and query efficiency. One-to-one relationships help simplify data storage and management, one-to-many relationships facilitate data organization, while many-to-many relationships better describe complex relationships and provide flexibility. By understanding these relationships, we can design reasonable data table structures, thereby improving data management efficiency and providing better support for various application projects.
3. Designing Data Table Structures Based on Requirements
In database design, designing data table structures based on requirements is a crucial step. First, we need to determine the data tables to be stored and the relationships between the tables based on business needs. Subsequently, to meet business functions, we name the data tables and define fields, setting appropriate data types for each field. During the design process, we consider data query efficiency, allowing for appropriate redundant fields; to ensure data uniqueness and simplify data insertion operations, we may consider using auto-incrementing IDs to generate primary keys. Finally, we conduct table structure testing to ensure that data can be correctly stored and retrieved, and optimize as needed. This is a basic process for designing table structures that can effectively meet business needs and improve data management efficiency.
Naming Conventions for Table Names
After determining the data tables to be stored and the relationships between them, we can name the business-related data tables. Naming conventions for table names are very important for ensuring the readability and maintainability of the database, as once a table name is determined, it cannot be changed arbitrarily. When naming tables, it is recommended to follow these conventions:
Clear and Concise: Table names should be clear and concise, accurately reflecting the data content or business function stored in the table. Avoid vague or overly abbreviated table names, ensuring that the table name clearly expresses its purpose.
Simple and Consistent: Table names should be simple, avoiding overly long or complex names, and maintaining consistency for easier understanding and maintenance by team members. Generally, they should not exceed three English words.
Use Singular Form: Table names should be in singular form rather than plural form, for example, using "student" instead of "students" for the student table.
Avoid Special Characters and Spaces: Table names should avoid using special characters and spaces, preferably using only the 26 letters, 0-9 digits, and underscores
_, with all letters in lowercase.Prohibit Using Database Keywords: Prohibit using database keywords such as name, time, datetime, password, etc., as table names to avoid conflicts. Ensure that table names do not duplicate the keywords of the database system.
When designing data tables for an organization and employee management system, we can name the tables as follows:
Organization Table
org: Using the abbreviation of organization, clearly expressing the data content stored in the table while avoiding long words that are prone to errors.Department Table
dept: Using the abbreviation of department, clearly and concisely representing department information.Employee Table
employee: Employees cannot be named with abbreviations; the full word is needed to clearly express the data content of the table.Management Permissions Table
org_employee_role: Using underscores to concatenate data makes it easy to understand the purpose and content of the table.
By naming in this way, we clearly express the data content while avoiding the issue of long words that are prone to errors. Using abbreviations and underscores makes table names easier to understand and recognize. In subsequent application development, using these data tables will also be convenient.
Defining Fields: Field Names and Data Types
After determining the table names for the data tables, the next step is to define the fields of the data tables. Field definitions include determining field names and selecting data types. If field definitions are unreasonable, it may lead to low efficiency in data queries and usage. Therefore, ensuring clear definitions and standardization of fields is crucial for application development.
When naming fields in data tables, the following standards can be referenced:
Use 26 letters, 0-9 digits, and underscores
_to separate words or camel case, such as student_id or studentId.Prohibit using database keywords, such as name, time, password, etc.
Field names should be easy to understand and generally not exceed three English words.
Field names should not include data types. For example, datetime.
Once the field names are set, appropriate data types must be assigned to the fields. Data types play an important role in programming and databases, defining the types of values that can be stored and the operations that can be performed. Each data type has its range, size, and format.
Data types are crucial for programming. First, data types can ensure the correctness of input data through data validation, making the data conform to expected formats and ranges. Second, choosing the appropriate data type can achieve storage optimization, saving storage space and improving database performance. Finally, different data types support different operations, such as mathematical operations and string manipulations, allowing programs to correctly process and manipulate data.
For a deeper understanding of the use of data types, you can refer to related documentation: SQL Common Data Types
Common Data Types (int, decimal, varchar)
In enterprise application development, we often use several common data types: integers (int), decimals (decimal), and strings (varchar). These data types play important roles in storing different types of information.
- Integers (int):
Integers are a basic data type used to store numbers without decimal points. For example, we can use integers to represent age, quantity, etc. Integers are relatively simple to represent in computers and have fast computation speeds.
- Decimals (decimal):
Decimals are used to store numbers with decimal points, such as monetary amounts or scientific data. Unlike integers, decimals can represent more precise values, suitable for situations requiring precise calculations.
- Strings (varchar):
Strings are used to store text information, such as names, addresses, emails, etc. The length of strings can be variable, making them suitable for storing text data of various lengths. String types are very useful when handling user input and text information.
When we need to store dates in a database, we typically choose to use the date (date) data type. However, sometimes we can also choose to use the string (varchar) type to store dates, which has the following advantages:
Flexibility: Storing dates as strings allows us to save dates in different formats to meet specific needs, such as "2024-03-15" or "15/03/2024."
Easier Data Cleaning: Sometimes, dates obtained from external data sources may have inconsistent formats; using string types can make it easier to accept and process this data before converting it to standard date formats.
Compatibility with Different Databases: Almost all databases support string types, which increases compatibility between databases when storing dates.
However, it is important to ensure that all dates are stored in the same format to maintain data consistency and accuracy when storing dates as strings. Additionally, compared to date types, storing dates as strings may affect query performance because additional date format conversion operations are required in queries.
Overall, while using strings to store dates has some advantages, it is essential to weigh flexibility against performance when making decisions. If strict date handling and sorting are required, it is still recommended to use date types for storing date data. This example aims to help everyone understand the importance of choosing the correct data type.
Understanding Redundant Fields
Earlier, when we introduced the concept of database normalization design, we mentioned that eliminating duplicate data is an important goal of normalization design, and duplicate data is also known as data redundancy. Similarly, redundant fields refer to fields stored in a database table that are duplicated. Generally, it is widely believed that redundant fields should be avoided in database design because they increase data storage redundancy, leading to inconsistencies and complexities during data updates.
However, in actual database design, there exists a balance between normalization and performance. The purpose of normalization is to eliminate data redundancy, avoid abnormal data operations, and improve data consistency and integrity; while the goal of performance optimization is to enhance data access efficiency and response speed.
Therefore, sometimes to improve data access performance, it may be reasonable to introduce a certain degree of redundant data. The purpose of designing such redundant fields is to reduce table joins and improve query efficiency, trading space for time. The following situations justify the use of redundant fields:
Frequently Queried Fields: If a certain field needs to be frequently queried, consider redundantly storing it in the table that requires querying to improve query efficiency.
Small Data Volume Fields: For fields with a small data volume, consider redundantly storing them in the tables that need to use them instead of retrieving them through join queries.
Cached Data: Some data can be cached or pre-computed and redundantly stored in the required tables to reduce computation and improve access speed.
When dealing with redundant fields, it is essential to avoid low-level redundancy and support high-level redundancy. Low-level redundancy refers to redundancy introduced without a clear optimization purpose, merely to simplify queries, which can lead to data inconsistencies and maintenance difficulties and should be avoided. High-level redundancy refers to redundancy designed with reasonable consideration and optimization, aimed at improving data processing speed and access efficiency.
In actual database design, designers need to balance normalization and performance optimization based on specific business needs and performance requirements, avoiding excessive redundancy and unnecessary complexity. It is necessary to comprehensively consider data normalization, performance optimization, and maintenance costs to choose the most suitable database design solution.
Primary Keys and Auto-Incrementing IDs
Once we have defined the field names and data types for the data tables, we need to consider setting primary keys.
- The Role and Setting of Primary Keys
Primary keys (PRIMARY KEY) play a unique role in relational database tables, used to uniquely identify each record. The purpose of setting a primary key is to ensure the uniqueness and integrity of the data. By setting a primary key, we can guarantee that each record in the table has a unique identifier, avoiding data duplication or redundancy, and ensuring data integrity.
For example, in the student table (student), we can set the studentId field as the primary key:
CREATE TABLE student (
studentId INT PRIMARY KEY,
studentName VARCHAR(50)
); This way, each student has a unique studentId as an identifier, ensuring the uniqueness and integrity of student data.
- The Importance of Auto-Incrementing IDs
Auto-incrementing IDs are a common method for generating primary keys, typically used to automatically generate unique identifiers for records in a table. The importance of auto-incrementing IDs lies in simplifying data management and operations.
With auto-incrementing IDs, the database system can automatically number each new record quickly, and the numbers grow incrementally, stored in order, which is very beneficial for retrieval. Additionally, auto-incrementing IDs are numeric, occupy little space, are easy to sort, and are convenient to pass in programs. When adding data, there is no need to worry about primary key duplication, as the system will handle it automatically. This method not only simplifies data insertion, updating, and deletion operations, reducing the possibility of human error but also improves the efficiency and accuracy of data management.
For example, in the student table (student), we can set the "studentId" field as an auto-incrementing ID:
CREATE TABLE student (
studentId INT AUTO_INCREMENT PRIMARY KEY,
studentName VARCHAR(50)
); In this example, the studentId field is set as the primary key with the AUTO_INCREMENT attribute, meaning the database system will automatically assign a unique, incrementing ID to each newly inserted student, simplifying data management and operations.
4. Optimizing Data Table Design: Indexes, Views, Triggers, and Transactions
Through analyzing business requirements, we have successfully determined the functions and purposes of the database and designed the corresponding data table structures based on business needs, preparing the database. However, optimizing data table design is a crucial part of data normalization design, including operations such as indexes, views, triggers, and transactions. Next, we will understand their concepts and roles.
Using Indexes: Improving Query Performance
An index is a data structure used to quickly locate specific data in a database table. An index is like a catalog index in a library, helping us quickly find the books we need. Suppose we have a huge book catalog, and finding a specific book would require flipping through the entire catalog, which would take a lot of time. But if there is an index sorted by book title, we can quickly find the location of the desired book by looking at the index.
In databases, by creating indexes on columns, we can similarly improve data retrieval efficiency, quickly locating the needed data without having to scan the entire dataset each time. Suppose we have a table containing employee information with thousands of records. If we frequently query based on employee names, we can create an index on the name column. This way, when we search for a specific employee name, the database engine will use the index to quickly locate the matching records without having to scan the entire table row by row. By setting indexes, we can significantly enhance the speed of querying employee information by name, making queries more efficient.
In the student table, we can create an index on the studentName column:
CREATE INDEX idx_name ON student (studentName); Through the above SQL statement, we created an index named idx_name on the studentName column. Thus, when we execute a query like the following:
SELECT * FROM student WHERE studentName = '小花'; The database engine will use the idx_name index to accelerate the query. It will first search the index to find the locations of all records with studentName as '小花', and then retrieve these records from the table. Since the index is organized in a specific order, the database engine can locate matching records more quickly, thus improving query performance.
Creating Views: Simplifying the Complexity of Program Development
When designing data tables, appropriately using views can help simplify the complexity of program development. In practical applications, views can be designed according to specific needs, helping developers better handle complex data relationships, improving code readability and maintainability, thereby simplifying the complexity of program development.
For example, the student table contains student ID (studentId), student name (studentName), and class ID (classId); the class table contains class ID (classId) and class name (className). The application needs to display student information in a class, including student names and the names of the classes they belong to. Without using views, the application would need to write complex SQL queries to join these two tables to extract the required data.
To simplify the complexity of program development, we can create a view called view_student_class that joins student information and class information together so that the application can directly query the view without worrying about the complexity of the underlying tables.
CREATE VIEW view_student_class AS
SELECT s.studentName, c.className
FROM student s
JOIN class c ON s.classId = c.classId; This way, we can directly obtain student information by querying the view view_student_class. The benefits of this approach include:
- Directly obtaining student and class information through the view without considering the relationship between the student table and the class table.
- Simplifying complex join queries, improving query efficiency and development speed.
- Only needing to focus on the structure of the view without delving into the relationships of the underlying tables, enhancing code readability and maintainability.
Triggers: Automatically Executing Specific Operations When Certain Events Occur
Triggers are powerful tools in database design, a special type of stored procedure associated with tables. Triggers act like preset automatic executors; when specific events occur in the database, triggers automatically execute pre-defined tasks, such as automatically updating related information when new data is added.
Therefore, in optimizing data table design, triggers can be used to implement complex business rules and constraints on data tables to ensure data integrity and consistency. Triggers can automatically trigger operations during data insertion, updating, or deletion, thereby relieving developers of some burdens and simplifying database operations.
For example, the student table contains student ID (studentId), student name (studentName), and class ID (classId); the class table contains class ID (classId), class name (className), and student count (student_count). We want to automatically update the student count in the class table when new data is inserted into the student table, which can be achieved using a trigger:
-- Create Trigger
CREATE TRIGGER class_student_count
AFTER INSERT ON student
FOR EACH ROW
BEGIN
-- Update the student count in the class table
UPDATE class
SET student_count = student_count + 1
WHERE classId = NEW.classId;
END; In this example, we created a trigger named class_student_count. This trigger will be triggered every time new data is inserted into the student table and will automatically update the student count field in the corresponding class in the class table. NEW.classId refers to the class ID of the newly inserted record.
Through such a trigger, we can ensure that the insertion of student data will also update the student count in the corresponding class table, thus maintaining data consistency and integrity.
Transactions: Ensuring Consistency and Integrity of Database Operations
A transaction is a logical unit of work in database operations, ensuring that database operations either all succeed or completely roll back to the state before the operation, maintaining data consistency and integrity.
In simple terms, a transaction can be likened to a shopping experience: either all items are purchased, or none are, ensuring that database operations either all take effect or are completely canceled, preventing data chaos. Through transactions, we can ensure data consistency across multiple operations, preventing data corruption or loss.
Building on the previous example with triggers, let's assume we also want to add a requirement: when inserting new data into the student table, we not only want to automatically update the student count in the class table but also wish to perform operations within the same transaction to ensure data consistency and integrity. This can be done as follows:
-- Start Transaction
START TRANSACTION;
-- Insert new data into the student table
INSERT INTO student (studentId, studentName, classId) VALUES (201, '小明', 302);
-- Update the student count in the class table
UPDATE class
SET student_count = student_count + 1
WHERE classId = 1;
-- Commit Transaction
COMMIT;
-- Rollback Transaction if an error occurs
ROLLBACK; In this example, we first insert new student data into the student table and then update the corresponding student count in the class table within the same transaction. Finally, we use the COMMIT command to commit the transaction, ensuring that all operations take effect successfully.
If an error or exception occurs during the transaction execution, such as the student data not meeting specifications, we can use the ROLLBACK command to roll back the transaction, undoing all operations to ensure data consistency and integrity.
The above outlines the basic concepts of optimizing data table design: indexes, views, triggers, and transactions, where indexes improve query performance, views simplify the complexity of program development, triggers implement automated operations, and transactions ensure the consistency and integrity of database operations.
In subsequent courses, we will delve into the specific uses of indexes, views, triggers, and transactions, learning how to use them flexibly to help us better optimize database design and improve system performance.