Data Structure Design: Index and Query Performance Optimization

12002

Introduction to This Lesson

In the introductory course, we briefly introduced the concept and function of indexes. In this lesson, we will provide a detailed explanation of how to use indexes to optimize table design. By learning the principles and types of database indexes, understanding the advantages and disadvantages of indexes and design principles, and mastering query optimization methods and the role of indexes in improving query efficiency, including the importance of using the EXPLAIN tool to analyze index usage.

1. Principles and Types of Indexes

Principles of Database Indexes

A database index is a data structure, similar to a book's table of contents, that helps the database system quickly locate and access specific data in a table. By creating indexes on columns, the database can find the required data faster without having to scan the entire table each time.

An index is essentially a data structure that sorts the values of one or more columns in a database table, recording the correspondence between data values and their storage locations in the table. Just like a book's table of contents lists keywords and page numbers. Thus, when you query a keyword, you first find the corresponding page number and then go directly to that page without having to search page by page. Database indexes work in a similar way, helping the system quickly find the location of data and directly locate the data, improving query efficiency.

In MySQL, indexes are key to improving query efficiency, allowing for quick location of required data, thereby reducing query overhead. MySQL supports various types of indexes, each with its own characteristics and applicable scenarios.

Types of Indexes

Indexes can be classified based on their functions and characteristics. Common types of MySQL indexes include: regular indexes, unique indexes, primary key indexes, and full-text indexes.

  • Regular Index

A regular index is one of the most basic types of indexes in MySQL, used to speed up the query of data in a table. It is solely for improving query efficiency and does not require the values in the indexed column to be unique.

Here is an example of creating a regular index in a table named student:

CREATE TABLE student (
    studentId INT AUTO_INCREMENT PRIMARY KEY,
    studentName VARCHAR(50),
    gender VARCHAR(100)
);

ALTER TABLE student ADD INDEX idx_studentName(studentName);
ALTER TABLE student ADD INDEX idx_gender (gender);

In this example, we created regular indexes named idx_studentName and idx_gender for the studentName and gender columns, respectively. After creating the regular indexes, these indexes can be utilized to speed up queries. For example:

SELECT * FROM student WHERE studentName = '小花';

With such index optimization, the database system can locate the matching data more quickly, improving query efficiency.

  • Unique Index

A unique index is set using the UNIQUE parameter. After creating a unique index, the values in the corresponding column must be unique across the entire table but can be null. An error will be raised when attempting to insert duplicate values, which helps ensure the uniqueness of data in the table.

Here is an example of creating a unique index in a table named student:

-- Create a unique index
CREATE TABLE student (
    studentId INT AUTO_INCREMENT PRIMARY KEY,
    studentName VARCHAR(50),
    UNIQUE INDEX idx_unique_studentName (studentName)  -- Create a unique index
);

In this example, we created a unique index idx_unique_studentName for the studentName column in the student table. Thus, when attempting to insert a duplicate student name, an error will be raised, ensuring the uniqueness of data in the table.

  • Primary Key Index

A primary key index is a special type of index in MySQL used to identify unique rows in each table. A primary key index requires that each value in the primary key column must be unique and cannot be null.

Here is an example of creating a primary key index in a table named student:

-- Create a student table with a primary key index
CREATE TABLE student (
    studentId INT AUTO_INCREMENT PRIMARY KEY, -- Primary key column
    studentName VARCHAR(50),
    gender VARCHAR(100)
);

In this example, we created a primary key index for the studentId column in the student table. The primary key index ensures that each student's student number is unique and cannot be null.

  • Full-Text Index

A full-text index is a special type of index in MySQL used for full-text searches on text fields. Full-text indexes can help speed up searches on text data and support advanced features of full-text search, such as fuzzy search and keyword matching.

Here is an example of creating a full-text index in a table named student:

CREATE FULLTEXT INDEX idx_studentName ON student (studentName);

In this example, we created a full-text index named idx_studentName for the studentName column to support full-text searches on student names.

2. Basic Principles of Index Design

Advantages and Disadvantages of Indexes

Using indexes is very helpful when you need to quickly find specific data or speed up query performance. However, there are also some disadvantages to using indexes, such as occupying additional storage space and potentially affecting the speed of data insertion, deletion, and modification.

First, let’s understand the advantages of indexes:

  1. Fast Data Retrieval: Indexes can significantly speed up the retrieval of specific data in a table, especially for frequently queried columns. For example, in a student information table, if queries are often made based on student ID, an index can be created on the ID column to accelerate retrieval speed.

  2. Accelerated Joins and Sorting: Indexes help speed up join operations (such as multi-table queries) and sorting operations, improving query efficiency. For example, when performing join queries across multiple tables, indexes can accelerate the data matching process, reducing query time.

  3. Uniqueness and Constraint Guarantee: Indexes can ensure the uniqueness of data or the satisfaction of constraints, such as indexes on primary keys or unique constraint columns. For example, in a user table, a unique index can be created on the username column to ensure that each username is unique.

Now let’s look at the disadvantages of indexes:

  1. Additional Storage Space Consumption: Indexes occupy additional storage space, especially when multiple indexes are created on large tables, which may lead to waste of storage space. For example, creating multiple indexes on a table with a large amount of data may consume a significant amount of disk space.

  2. Write Operation Performance Loss: Indexes increase the cost of data insertion, updating, and deletion operations because each write operation requires updating the index structure. The performance degradation of write operations may cause data insertion, updating, and deletion to become slower.

  3. Maintenance Cost of Updating Indexes: Frequently updated columns may lead to increased maintenance costs for indexes, as each update requires adjusting the index structure. For example, creating an index on a frequently updated timestamp column in a log table may increase the database's burden.

  4. Not Suitable for Small Tables or Infrequent Queries: For small tables or tables that are rarely queried, creating indexes may waste resources, as the overhead of scanning the entire table is low, and indexes may not improve query performance. For example, creating an index on a configuration table with only a few rows may waste space.

Based on the advantages and disadvantages of indexes, suitable scenarios for using indexes include:

  • Situations where specific data needs to be frequently searched, especially for frequently queried columns.
  • Situations where join operations need to be accelerated, such as when performing join queries across multiple tables.
  • Situations where sorting operations need to be sped up, such as queries that are frequently sorted by a certain column.
  • Situations where data uniqueness or constraints need to be ensured, such as creating indexes for primary keys or unique constraint columns.

For example, in a database with a table storing a lot of student information, if you frequently need to look up a student's information, such as their student ID, creating an index on that ID column will allow you to find that student's information faster.

Situations where indexes are not suitable or should be used with caution include:

  • Cases where the data volume is relatively small and query operations are infrequent, as the overhead of scanning the entire table is low.
  • Cases where a large number of write operations (inserts, updates, deletes) are performed frequently, as indexes will increase the cost of write operations.
  • Frequently updated columns, as update operations will increase the maintenance cost of indexes.
  • For certain specific queries, if the query conditions do not utilize indexed columns, indexes may not improve query performance.

In a database, if you have a table with only a few rows of data and rarely perform query operations, creating an index for that table may waste space, as scanning the entire table directly is also quick.

Index Design Principles

In general, the use of indexes should be considered based on specific situations. If you need to frequently look up certain data or perform join operations, indexes will help improve speed. However, if the data volume is small or queries are infrequent, indexes may not be necessary, as they will add additional overhead. Therefore, when designing indexes, you can refer to the following principles:

  • Choose Appropriate Columns: Select columns that are frequently used in queries to create indexes, such as columns often used for searching, sorting, or joining.

  • Avoid Over-Indexing: Do not create indexes for every column, as too many indexes will increase storage space and maintenance costs. Only create indexes for the columns most frequently used in queries.

  • Consider Selectivity: Selectivity refers to the ratio of the number of different values in the indexed column to the total number of rows. The higher the selectivity, the more effective the index. Generally, a selectivity between 0.1 and 1 is ideal.

  • Pay Attention to Index Order: When creating composite indexes, consider the order of the indexed columns. Determine the order of indexed columns based on the frequency and order of queries to improve query performance.

  • Regularly Maintain Indexes: As data increases and changes, indexes may become ineffective or less effective. Regularly checking and rebuilding indexes is important to ensure that database performance remains at a good level.

  • Understand the Query Optimizer: The database query optimizer selects the optimal execution plan based on the complexity of the query and the state of the indexes. Understanding how the optimizer works can help you better design indexes to improve query performance.

These principles can help you design more efficient and effective indexes, thereby enhancing the performance and responsiveness of the database.

3. Methods for Query Optimization

In application development, to enhance performance, some query optimization methods need to be employed. Common methods include: creating appropriate indexes, optimizing query statements, avoiding full table scans, simplifying data models, selecting appropriate data types, regularly maintaining and optimizing the database, and using caching. These methods can help improve query efficiency, speed up data retrieval, and reduce system load. Through these simple adjustments and optimizations, the database can operate more efficiently and stably.

Understanding Basic Methods of Query Tuning

When optimizing database query performance, the following are commonly used basic methods for query optimization:

  • Create Appropriate Indexes: Create indexes for columns that are frequently used in queries to speed up data retrieval. Ensure that the selectivity of the indexes is good and avoid over-indexing. For example, in the student table, if queries are often made based on student ID, an index can be created on the ID column.

  • Optimize Query Statements: Writing efficient query statements is crucial. Avoid using SELECT *, and only select the necessary columns. Use WHERE clauses, JOIN statements, and GROUP BY statements judiciously to reduce unnecessary data retrieval and processing.

  • Avoid Full Table Scans: Try to avoid full table scans, especially on large tables. Limit the amount of data scanned through indexes, WHERE conditions, and appropriate joins.

The Role of Indexes in Query Tuning

Indexes can speed up data retrieval, lower the query cost of the database, and enhance system performance. For example, creating an index on the student ID column in the student table can significantly improve the speed of queries based on student ID.

Using EXPLAIN to Analyze Index Usage

EXPLAIN is a tool used to analyze query execution plans, helping developers understand the execution of query statements, including index usage, number of rows scanned, etc. By analyzing the output of EXPLAIN, you can determine whether the query effectively utilizes indexes, thereby further optimizing query statements and index design.

Suppose we have a student table named student, containing fields such as student ID studentId, student name studentName, and class ID classId. Now, we want to optimize the query statement for retrieving student information based on student name and create the corresponding index.

First, we create the student table:

CREATE TABLE student (
    studentId INT PRIMARY KEY,
    studentName VARCHAR(50),
    classId INT
);

Then, we insert some sample data into the student table:

INSERT INTO student (studentId, studentName, classId) VALUES
(1, '小明', 101),
(2, '小红', 102),
(3, '小刚', 101),
(4, '小花', 103);

Now, we optimize the query statement for retrieving student information based on student name and create an index for the student name:

-- Create an index for student name
CREATE INDEX idx_studentName ON student (studentName);

-- Query student information
EXPLAIN SELECT * FROM student WHERE studentName = '小明';

Through the above example, we created an index named idx_studentName to optimize the query statement for retrieving student information based on student name. Then, we used EXPLAIN to analyze the query execution plan to confirm whether the index was effectively utilized.

4. Case Study: Setting Indexes to Improve Page Query Speed

In a membership management system, pages need to display a large amount of basic information about members. As the number of members increases, page loading speed may slow down. To address this issue, we can set indexes in the database table to enhance page query speed.

Suppose we have a table named member, which stores member information, including memberId as the unique identifier for members.

In this case, if we frequently need to query member information by memberId, setting the memberId column as an index can significantly improve query speed. An index is similar to a table of contents in a book; it helps the database quickly locate rows containing specific values without having to scan the entire table row by row.

Here is an example SQL statement for setting the memberId column as an index in the database table:

-- Create index
CREATE INDEX idx_memberId ON member(memberId);

By executing the above SQL statement, we created an index named idx_memberId on the memberId column of the member table. This will enable the database to query member information based on memberId values more quickly, thereby improving page query speed.