Application of Solid State Disk in Substation Automation Equipment
Abstract: Substation automation equipment uses a large number of solid-state hard drives as external memory, and when the number of full erases and writes of solid-state hard drives reaches a certain amount, it will affect the normal operation of automation equipment. Now, a typical SSD failure case of substation equipment is analyzed, and it is confirmed that inappropriate software usage of automation equipment will greatly shorten the service life of solid-state drives. To this end, the status monitoring function of solid-state hard disks is developed, and monitoring and early warning are carried out from three aspects: reserved capacity, reading and writing speed, and estimated lifespan, so as to provide technical support for the reliable operation of automation equipment using solid-state hard disk storage.
Unlike traditional mechanical hard disks that use magnetic disks as storage media and access data by rotating the magnetic head, solid state drives (solid state drives, SSDs) are hard disks made of flash memory electronic memory chip arrays without any mechanical moving parts.
Although mechanical hard disks occupy the main storage market due to their advantages of large capacity, high cost performance, long service life, and easy data recovery, solid-state hard disks are characterized by fast read and write speed, small size, low power consumption, strong vibration resistance, and wide operating temperature range. Wide and other advantages, it is still widely used in aviation, electric power and other industrial fields and consumer electronics fields.
Due to the relevant standards that clearly require that rotating parts are not allowed, substation automation equipment uses a large number of solid-state hardware as external memory to store programs and data, such as data communication gateways, intelligent telemotors, PMU phasor measurement devices, protection information management, etc. unit etc.
At present, there are many literature studies discussing the application of SSDs.
The literature combines solid-state disks with traditional disks, and utilizes the high performance of solid-state disks and the characteristics of low cost and large capacity of traditional disks to provide users with large-capacity storage space, ensure high performance of the system, and reduce costs at the same time.
The literature conducts performance tests and comparative analysis on solid-state hard disks and traditional hard disks. By testing the computer response time, a computer upgrade plan using SSD is designed, which improves the overall performance of the computer system and prolongs the service life of the computer.
In the literature, the solid-state hard disk is used as the storage medium in the relay protection device of the substation, and the large-capacity storage system design is realized based on the PCIe bus technology, which meets the safety and reliability requirements of the relay protection device for data processing.
Through the test of solid-state hard disk, the literature obtains the influence rules of working conditions such as read-write ratio, data packet size, access mode and fluctuation of power supply voltage on performance characteristics such as read/write speed and current. The test results show that the solid-state hard disk should be Ensure the stability of the input voltage.
1 Performance and Lifespan of SSD
The performance of the hard disk mainly refers to the data read and write speed, which has been far lower than that of the processor and memory for a long time.
The way to read data from a mechanical hard disk is to perform magnetic operations on a high-speed rotating platter through a magnetic head. Limited by the mechanical speed of the magnetic head, the mechanical disk with a rotational speed of 7200r/min can achieve a sequential reading speed of 160MB/s and a writing speed of 80MB/s for large data.
When reading and writing, the solid-state hard disk is an electronic access operation to the flash memory storage element, and the speed is faster. Even if there is an operation delay of the main control chip, its reading and writing speed can still reach 500MB/s, which is several times that of the mechanical hard disk.
Mechanical disks are limited by mechanical movements such as head movement and disk rotation. The response time and throughput rate have lagged far behind CPU and memory. The advantages of high-speed read and write of SSD external memory have effectively alleviated the long-standing hard disk I0 speed bottleneck problem in computer systems. .
The lifespan of a solid-state hard disk is mainly related to the number of erases and writes of the storage unit. As the number of erases and writes rises to the life limit, the read and write performance of the hard disk will decrease, and the stored data will become unreliable.
Writing data to a storage unit in a solid state drive is called one-time erasing, and erasing all the storage units is called one-time erasing (P/E). All solid-state drives have an all-erasing limit, which varies from tens of thousands, thousands, or hundreds of times depending on the process of the flash memory unit.
The technical principle of the flash memory unit used in the solid state drive is a field effect transistor (MOS transistor) based on floating gate technology, which expresses different data values according to the voltage value expressed by the different number of electrons stored in the floating gate. The data value is divided into 1, 2, 3 and 4 bits, corresponding to the 4 types of flash memory cells: single-level flash cell (SLC), secondary flash cell (MLC), triple-level flash cell (TLC), and quadruple-level flash cell )0LC). The increase in unit storage capacity is accompanied by a decrease in lifespan. It is generally believed that the lower limit of the lifespan of sLC is 10,000 times of full erasing, MLC is 3,000 times, TCL is 1,000 times, and 0LC is 150 times, and the price also decreases in order.
Solid-state hard drives are generally composed of a main controller, storage media, and firmware, and correspondingly have core technologies to ensure hard drive performance and lifespan. Firmware technology organizes storage media into blocks and manages them, and establishes the mapping relationship between logical addresses and physical addresses to improve read/write efficiency and balance write times.
If the storage block is frequently written beyond the limit, it will cause the storage particles to age and become a bad block. Therefore, when data is written, the master controller will coordinate to write the storage block with the least number of times of erasing to ensure that the times of erasing and writing of all memory blocks are close to each other. This process is called “wear leveling”.
The optimized wear leveling algorithm will also migrate the long-term unchanged data in the hard disk, and write new data with the vacated newer storage blocks to achieve static wear leveling. Therefore, the amount of data written by the upper-layer application will be less than the amount of data actually written to the storage unit each time. This phenomenon is called “write amplification”.
The storage unit is managed by block, and the writing of small data will occupy redundant storage units. This process will also generate write amplification. Therefore, reasonable migration and garbage collection will be carried out for small data. Improving the rationality of wear leveling, optimizing the garbage collection algorithm, and reducing the write amplification factor are the core technologies of SSD optimization, which can effectively improve the service life of SSDs.
The literature proposes a whole-process optimized garbage collection method, which considers the impact of each step on the SSD life as comprehensively as possible in terms of initial data placement, garbage collection target block selection, and effective data migration. Compared with typical algorithms, it can Reduce life wear by nearly 30%.
The literature proposes a superblock reorganization algorithm, which selects the physical block with the smallest effective data amount on each flash memory to reorganize the superblock during garbage collection, and uses it as the source superblock for garbage collection.
The experimental results show that, compared with the traditional garbage collection algorithm, the algorithm can reduce the write amplification by 2/3, and the system life can be increased by nearly 3 times. required scene.
The literature proposes to improve the overall performance of the solid-state hard disk by configuring a cache device inside the controller. The cache device can make random small data write only to the cache instead of the flash medium. It can effectively prolong the service life of the hard disk, but it is necessary to increase the power-down protection mechanism to prevent the cache data from being lost.
The literature reviews the methods to improve SSD durability, including improving wear leveling algorithm, using external data buffer, reducing write amplification factor, improving reserved space, applying block wear feedback technology, etc.
2 Typical SSD failure cases of substation equipment
The telecontrol gateway used in a 500kV substation frequently experienced program exits and equipment crashes half a year after it was put into operation, and it recovered after restarting. After testing and analysis by the equipment manufacturer, it is confirmed that SSD (MLC particles) have an average full erasing cycle of 2500 times, which is close to 3000 times, and some storage particles have a high degree of wear. After the SSD was replaced, the problem was solved, and the device was operating normally, but quantitative analysis was still needed to determine the relationship between the amount of data written to the SSD, the number of P/E times, and the failure.
The amount of data written to SSD can be monitored using the iotop command that comes with the Linux system, and the number of P/E needs to be obtained by the corresponding tools provided by the hard disk supplier. The on-site SSD is MLC storage particles with a capacity of 64GB and no cache.
Simulate the substation data environment, use 3 prototypes and SSD for testing, record the daily average amount of data written in the operating system, and compare it with the daily average P/E times obtained by the SSD detection tool, the results are shown in Table 1.
The tests came to the following conclusions:
(1) SSD with caching mechanism can effectively extend the service life and reduce the amplification factor:
(2) As the cumulative P/E times of SSD increase, the amplification factor will increase a lot:
(3) Assuming that the amplification factor is positively correlated with the current erasing times, the estimated service life of this type of SSD is 0.6 to 2.3 years.
The on-site SSD failure occurred half a year after the equipment was put into operation. Considering that the on-site debugging and on-site debugging of the equipment in the early stage took at least 4 months, it can be considered that the conclusion of the life comparison test is more in line with the on-site situation. The theoretical minimum cumulative P/E times of MLC granule SSDs is 3000 times. However, when the average number of P/E times reached 2500 times in this failure, bad blocks were frequently read and written, which led to equipment failure. It is reasonable to infer that this type of SSD The effect of the wear leveling algorithm is not good, causing some storage particles to degrade due to excessive wear, resulting in abnormal operation of the device program or operating system.
In addition, the daily average amount of written data of 30GB deviates from the scene cognition of the substation remote engine. In order to find out the proportion of the program that writes the amount of data, the iotop command is continuously used to locate and analyze the amount of data written by the program, and it is found that there is a certain service The average daily data write volume of the program reached 26GB. Through communication with the software supplier, it is confirmed that the remote machine has enabled a data section timing save function. This function is not required. After canceling, the average daily write data volume of the SSD of the site is reduced to 4GB, and the expected life span can be extended by 7 times.
Finally, the fundamental solution to this failure phenomenon is as follows:
(1) Substation automation equipment is replaced with other brands of SSDs with caching mechanisms;
(2) Turn off the unnecessary data saving function in the automation equipment system software.
3Automatic equipment SSD status monitoring
Substation equipment using SSD requires online monitoring of its status, evaluation of performance changes and service life, and early warning before SSD particles age. Monitoring objects include:
3.1 SSD remaining space
The write amplification factor of SSD is highly correlated with the remaining space, the larger the remaining space, the smaller the write amplification factor. Research data shows that when the remaining capacity is 50%, the write amplification factor is about 2 [l2]; when the remaining capacity is lower than 20%, the write amplification factor will increase more. Therefore, it is necessary to monitor and warn the remaining space. When the available space is less than 20%, the alarm indicator light will be on.
3.2 SSD read and write speed
It is more reasonable to use the tools provided by the supplier to regularly monitor the status of SSDs. In addition, considering the diversity of SSD brands, you can regularly perform sequential read and write tests on SSDs to detect whether the performance of the hard disk has declined significantly. Generally, the nominal alarm value is set to 100MB/s, and the speed detection is lower than When this value is set, the device should give an alarm.
The test writing process should not cause a large loss to the SSD. The test recommends a single data volume of 32kB, continuous writing of 1024 times, and monitoring once a day. The data volume of the day does not exceed 40MB, and the SSD loss is relatively small. To test write performance, Linux devices can generally use the dd command:
If it is a mechanical hard disk, the test result is generally returned at about 100MB/s, and the solid state disk is about 500MB/s.
3.3 SSD life expectancy
The status monitoring software module of the automation equipment integrates the monitoring tools provided by the supplier, and reads the status of the SSD once a day at regular intervals to obtain the current total erasing P/E times s, and the difference with the previous day is the daily P/E times T, use the formula (3000-s)/T to estimate the days of remaining life.
For new disks, the estimated lifespan should be greater than 8 years. Otherwise, you should analyze whether the amount of written data is too large, optimize the application program, and consider increasing the SSD capacity. Considering the replacement cycle, it is recommended to alert when the estimated remaining life is less than 90 days.
This paper introduces the principle and related technical concepts of SSD, and focuses on the core technology related to the performance and life of SSD. By analyzing a typical failure case related to SSD life of substation automation equipment, it is pointed out that the cache mechanism can help improve the life of SSD, and inappropriate software use will accelerate the aging of SSD.
Finally, this paper proposes an SSD online monitoring scheme for automation equipment, which monitors the state from three aspects: reserved capacity, read and write speed, and estimated life, and provides technical support for the stable and reliable operation of automation equipment using SSD storage.
The research content of this paper can provide a useful reference for the rational use of SSD in substation automation equipment.