This paper introduces an aircraft wing simulation data set (AWSD) created by an automatic workflow based on creating models, meshing, simulating the wing flight flow field solution, and parameterizing solution results. AWSD is a flexible, independent wing collection of simulations with specific engineering requirements. The data set is applicable to handle computer geometry processing tasks. In contrast to the existing 3D model data set, there are some advantages the scale of this data set is not limited by the collection source, the data files have high quality, no defects, redundancy, and other problems, and the models and simulation are all designed for the specific actual engineering demand. Moreover, AWSD has the characteristics of rich information and a similar model structure, which contributes to the construction of the surrogate model. On the other hand, this data set is suitable for advancing research of data mining in computational geometry graphics. To solve the problem that the CFD flows field results are not intuitive, this paper used the resampling method of surface data to sample the result to the model surface, then segmented the re-sampled 3D mesh surface, and compared with the differences among K-means algorithm, Mini-Batch K-means algorithm, and Spectral Clustering algorithm. AWSD provides 300 sets of models, meshes, CFD simulation results, and parametric results based on ARAP (As-Rigid-As-Possible) and Harmonic mapping for advancing the construction of engineering surrogate models, 3D mesh segmentation, surface resampling, and related geometric processing tasks.

The combination of data sets, neural network methods, and machine learning is changing some fields of computer science, while the data sets are the foundation of development. In industry, the establishment of surrogate model simulation calculations based on machine learning can solve the problem of a large number of physical simulation calculations. The traditional surrogate model can help design a simple regression model by manually adjusting the training shape. Modern surrogate models combine data sets with deep learning methods to learn directly from the shape of the model. Therefore, high-quality data sets have a great impact on the application of deep learning in engineering.

At the same time, 3D mesh segmentation based on machine learning has emerged in recent years. As far as the current state of research is concerned, there is no clear standard for the system evaluation criteria of segmentation methods and problems that the sensitivity of existing algorithms to models is different, so 3D mesh segmentation needs to be explored from multiple angles and fields. Therefore, high-quality data sets can provide rich information from multiple perspectives to promote the comprehensive consideration of 3D mesh segmentation, which is important significance to advance the application of machine learning in 3D mesh segmentation.

At present, the data collection of the 3D model data set comes from the following three sources:

Comprehensive generation of shape data. Users need to manually input programs or engineers to explore and input accurate geometric parameters to create models. Although the users can accurately control the shape model’s generation, this designed behavior limits the shape diversity and lacks authenticity.

Collecting shape data. The collection of this kind of shape data comes from all kinds of publicly available interfaces hosted. A new large-scale model data set was formed by filtering the collected models, including segmentation, classification, cleaning, shape exploration, and other operations. However, for example, ShapeNet, ABC, etc. This kind of model data sets need more practical engineering simulation, so it is impossible to develop the surrogate model.

Design competition data. The collection of this kind of shape data comes from the engineering design competition. The contestants create the model according to the specific engineering requirements, so this kind of data maintains the authenticity of the model and has engineering simulation. However, the data file submitted by the participants may have defects, redundancy, and other problems. It is necessary to clean up the competition works. In addition, the scale of the data set is limited by the number of works.

Considering the characteristics of three kinds of data collection, this paper proposes to create a simulated aircraft wing data set (see

The contributions are as follows:

Create data set: An automatic workflow used in batch to create CAD models, automatic subdivision surface mesh, calculate CFD flows field, and model parameterization of resampling of flow field results.

Data resampling: To solve the problem that the CFD flows field solution result is not intuitive, and sample the flow field solution result to the model surface.

3D mesh segmentation: Resampling results were used for 3D mesh segmentation based on iterative clustering and spectral clustering and compared the difference between iterative clustering and spectral clustering.

SimJEB is a complex, diverse, and open data set for mechanical support and related structural simulation. This data set is used in promoting deep learning, engineering surrogate modeling, and other tasks and comes from the GE Jet Engine Bracket Challenge. After filtering, SimJEB finally collected 381 models. Each model is equipped with five data type files as a set of data: a) CAD models, b) finite element models, c) tetrahedral mesh data, d) triangular surface mesh data, and e) simulation results. The surrogate model based on the model in SimJEB can simulate and solve the problems that cannot be solved by the original model [

ShapeNet is a large and informative 3D model repository. There are many kinds of models in ShapeNet, and each model is provided with a large number of annotation sets. However, there is a problem with expanding new data in ShapeNet that models need to artificially extend the annotated model and collect new models from new data sources. The models contained in ShapeNet only include the 3D models of objects from daily life, while excluding CAD mechanical parts, molecular structures, or other objects in specific fields [

ABC-Data set is a collection of more than one million independent and high-quality 3D models, which is applied to geometric deep learning. The source of models in the ABC-Data set comes from the publicly available interfaces hosted by OnShape. However, the model is designed and created artificially, so there is a problem that models may have imperfect boundaries, intersecting faces, and edges, or duplicate vertices. To avoid low-quality defective models, model collectors need to use geometric and topological criteria for filtering. Therefore, compared with ABC-Dataset, the CAD models in the data set we provided are created in batches by using the open-source software OpenCasCade to avoid incomplete models in model loading and translation [

Compared to the existing data sets mentioned above (see

Datasets | #Models | Source | Applications | Manual marking | Time |
---|---|---|---|---|---|

ABC | 1,000,000+ | Onshape | Geometric deep learning | √ | -so far |

ShapeNet | 3,000,000+ | Download online | Computer graphics, computer vision, robotics and other related disciplines | √ | -so far |

SimJEB | 300 | Competition collection | Surrogate model | √ | 14 person-years |

AWSD | 300 | Automatic generation | Surrogate model, |
- | 3–5 days |

Category | Comprehensive generation of shape data | Collecting shape data | Design competition data | AWSD |
---|---|---|---|---|

Size | Limited by manual and learning costs | Large collection of platform-based agents | Limited by the number of participants | User-defined |

How to collect | Manual input by the user or exploration of parameters by the engineer to generate the model | Originated from the collection of various original model data, the model data is filtered to form a new large model data set | Engineering design competition | Automated workflows |

Applications | Machine learning, artificial networks, computer vision | Artificial networks, computer vision | Machine learning, developing surrogate model | For training machine learning models to solve time-consuming and costly engineering simulation calculations |

Variety of data | - | - | √ | √ |

Advantages | Model accuracy can be guaranteed | Large scale and variety of data | Realistic engineering simulations to help build surrogate models | Addressing to reduce the cost of numerical simulation time and the high cost of wind tunnel experiments |

Disadvantages | High-time costs for collection, poor diversity of data, lack of realism in models | Lack of engineering simulations | Data quality is not high, screening required | Modeling skills are required to solve different engineering problems to complete solutions |

Data mining is a hot issue in artificial intelligence and database research. It is a decision-support process based on artificial intelligence, machine learning, pattern recognition, statistics, database, visualization, and other fields. Its advantages are to achieve highly automated data analysis and make reasonable inductive reasoning, tap potential patterns and help decision makers make correct decisions. The research on geometric modeling, data processing, and data mining theory based on algebra, geometry, and other core mathematical fields is one of the research hotspots of computer geometry graphics. The application of this technology can improve the research level in the basic theory of free curve and surface modeling, graphic image processing, and finite element analysis, and provide basic theoretical support for the secondary development and application of CAD, CAM, CAE, and other fields.

This section summarizes the research status of relevant data processing and data mining theories involved in this paper.

Point cloud resampling is an important tool to point cloud segmentation. There is a problem in that it takes a lot of time to calculate the traditional contour detection to obtain the normal and classification model of the surface [

Although the K-means clustering algorithm is considered to be one of the most powerful and popular data mining algorithms in the research field [

The following section describes the automated workflow for model creation, mesh generation, wing flight flow field solution simulation, and parameterization of solution results. The complete workflow is depicted in

There are 300 CAD models of aircraft wings, which come from the batch generation of programs. We took OpenCasCade as the development engine, used a 3D B-spline curve to approximate the geometric parameters of the wing end face, and completed the creation of the wing 3D model through lofting operation (See

The principle of batch generation of CAD models of an aircraft wing is to generate different models by changing the geometric parameters and distance of both end faces of the aircraft wing. The position of four points at both end face of the aircraft wing determines the shape of the 3D B-spline curve approaching the closed figure. The change of the end face distance is to change the Y value of four points in the smaller of the two end faces of an aircraft wing. To ensure the rationality of the aircraft wing model generated in batch, we design to change the geometric parameters of the model step in step. When the number of models generated in the batch is required to be less than 50, we change the value of distance and geometric parameters of two end faces, which changed each time are 0.5% of the current parameter. When the number of batch-generated models is required to be greater than 50 and less than 200, each variation in the geometric parameters is 0.1% of the current parameter value. When the number of batch-generated models is required to be greater than 200 and less than 500, each variation in the geometric parameters is 0.05% of the current parameter value.

Before CFD flows field analysis, it is necessary to mesh each CAD model. Although in

We finish batch CFD flows field analysis with the program. The input file data required for CFD flows field analysis are: a) CAD model data file, b) Cartesian grid [

Firstly, setting the symmetry plane. The symmetry plane of the CAD model is

Mesh param | Value | Description |
---|---|---|

Dimension | 3 | Dimension of model |

ModelName | ^{*}.stl |
The file path to the model file |

IsSym | 2 | The integer that stands for symmetry |

SymmPos | 0.00001 | The position of the symmetry face |

BackBox_BackLayer | 5 | The minimum split layer of non-intersect cells within backbox |

BackBox_ModelLayer | 5 | The minimum split layer of intersecting cells within backbox |

ModelBox_BackLayer | 5 | The minimum split layer of non-intersect cells within the model box |

ModelBox_ModelLayer | 13 | The minimum split layer of intersecting cells within the model box |

BackBox_Ratio | 8 | The ratio of the backbox size to the max size of the model |

ModelBox_Ratio | 0.1 | The enlarged ratio of the model box size to the minimum size of the model box |

BufferLayer | 2 | The layer num to buffer cells |

PunctureIterNum | 0 | Const thermal conduct coefficient |

DefineMaxLayer | 14 | Prandtl number |

Solution param | Value | Description |
---|---|---|

SIMULATION_KIND | STEADY | Simulation strategy |

PHYSICAL_PROBLEM | EULER | Physical governing equations |

MACH_NUMBER | 0.95 | Mach number |

AOA | 0 | Angle of attack |

AOS | 0 | Angle of slide |

FREESTREAM_PRESSURE | 101325 | Pressure of freestream |

FREESTREAM_TEMPERATURE | 300 | Temperature of freestream |

INLET_PRESSURE | 101001 | Pressure of inlet |

INLET_TEMPERATURE | 271 | Temperature of inlet |

OUTLET_PRESSURE | 101002 | Pressure of outlet |

OUTLET_TEMPERATURE | 272 | Temperature of outlet |

GAMMA_VALUE | 1.4 | The ratio of specific heats |

GAS_CONSTANT | 287.87 | Specific gas constant |

LAMINAR_VISC_MODEL | SUTHERLAND | Laminar Viscosity model |

SUTHERLAND_MU_REF | 1.716E-05 | Mu reference |

SUTHERLAND_T_REF | 273.15 | T reference |

SUTHERLAND_S_CONST | 110.555 | Sutherland const |

MU_CONST | 1.716E-05 | Const mu |

Finally, the Euler model is used for the solution. After a successful solution, we saved the results in CGNS format. The solution process is shown in

To assess the reasonableness of the generated models and their simulation solutions, it is unfortunately and not possible to measure all the models provided by AWSD and their simulation results in the form of an experimental comparison, as the generation in batches of CAD modeling for the airfoil shape is artificially defined. However, the airfoils included in the AWSD are based on the ONERA M6 Wing as a variant. The following will demonstrate the validity of the airfoil model in the form of a comparison of the CFD-solved simulation results and experimental values provided by the automated workflow that generates the AWSD. In the following, the pressure of the z/l = 0.2 wings section for the ONERA M6 wing is selected, and the experimental values are compared with the calculated simulation results (see

Parameterization of a 3D model is the process of mapping a 3D model to a 2D plane. In this section, we respectively used the ARAP (As-Rigid-As-Possible) algorithm [

Firstly, File parsing. The resampling results include not only the topological relationship and geometric information of the model but also the attribute information. Secondly, the parameterized object is a mesh model without attribute information, so we extracted the topological relationship and geometric information of the model, and then parameterized the extracted results. Finally, we bound the attribute information on the parameterized grid result to finish the parameterization of flow field results (see

From the creation of the CAD model, the meshing of the model surface to the solution of CFD flows field, the whole process is automated batch processing, so researchers can quickly obtain data sets of different sizes according to research needs to solve the problem of insufficient data. At the same time, data analysts only need to complete business understanding, and the rest of the modeling and solving processes that need repeated iteration do not need to be carried out manually, which greatly shortens the time and cost of data set creation. Batch-generated models will also greatly reduce the model cost. In addition, batch generation of data with strong similarity is conducive to promoting the application of machine learning and neural network methods in geometric data processing. As the size of the data set can be tailored to the actual geometric processing task, the length of the data set generation time is influenced by the number of iterative steps of the CFD numerical solution. The more iterative steps to solve the problem, the longer it takes to generate the data set. As the size of the data set currently used in engineering and equipped with the model, mesh, and solution results are around 300, the AWSD is sized at 300 sets. The automatic workflow of data batch generation is shown in

To quantify the time taken to generate the data set, the example of the generated data set provided by AWSD. Ninety-nine percent of the time the data set generated by AWSD focuses on simulation solutions. Because the number of iteration steps of the simulation solution affects the solution effect and the generation time of the data set, we will provide the solution results of 100 to 500 equal interval iteration steps (see

Iteration steps | 100 | 200 | 300 | 400 | 500 |
---|---|---|---|---|---|

Results |

#Models | Creation | Meshing | Simulation | Parameterization |
---|---|---|---|---|

300 | 5 min 12 s | 12 min 23 s | about 41 h | 10 min 31 s |

This section describes the analysis and applications of the data set obtained in

use resampling with data method to sample the solution results of the external flow field to the surface of the wing model.

cluster the pressure field, temperature field, Mach number, and enthalpy in the resampling results by using K-means, Mini-Batch K-means, and spectral clustering algorithm.

evaluate the clustering results.

Since the bounding box stored the results of the flow field solution data obtained in

Resampling steps are as follows:

In recent years, a variety of new 3D mesh segmentation has emerged. As an important research direction of computer graphics, 3D mesh segmentation will promote the development of digital modeling, mesh deformation, mesh compression and other fields to a certain extent [

The steps of iterative clustering segmentation are as follows:

We did a K-means analysis of the pressure field, temperature field, Mach number, and enthalpy after resampling as samples, where

The experimental results show that the surface features of the re-sampled model can be roughly extracted after 3D mesh segmentation based on K-means. We make statistics on the pressure, temperature, Mach number, and enthalpy respectively, as shown in

The silhouette coefficient is the evaluation index of clustering performance, so we quantitatively analyze the silhouette coefficient obtained by the three clustering algorithms. The experimental results demonstrated that the silhouette coefficient of spectral clustering results in pressure, temperature, and enthalpy is low while the silhouette coefficient calculated based on Mach number shows good results. It is consistent with the above conclusions of manual observation, but it is not as high as the silhouette coefficient obtained by Mini-Batch K-means and K-means. Since the silhouette coefficient can help find the optimal number of clusters K, we choose K with a larger silhouette coefficient as the number of clusters. The cluster number in pressure should be 4, in temperature should be 1, in Mach number should be 3, and in enthalpy should be 2 (see

Aircraft wing data set (AWSD) is a new and custom-sized collection of engineering model, and simulations based on specific engineering requirements. Mesh data provided by AWSD can be applied to digital modeling, grid deformation, grid compression, and other fields and try to promote the development of data mining, 3D mesh segmentation, and data resampling. In addition, the simulation results contained in this data set have the same constraints and boundary conditions, so AWSD is also very suitable for applying to establish an engineering surrogate model. Since the data set is established automatically from the establishment of model data, the meshing of the model surface to the CFD flows field calculation, and its parameterization, users can define the scale of the data set. Compared with the comprehensive generation of shape data, the collection of shape data, and the design competition data, the data we provided includes a completed data set for a specific engineering application and no limitation on the scale of the data set.

Moreover, to solve the problem that the CFD flows field solution can not directly display the results on the model surface, we used the resampling with surface data to reflect the flow field solution results on the 3D model. On this basis, we explored the problem of 3D mesh segmentation and took K-means, Mini-Batch K-means, and spectral clustering analysis on the attribute information stored in the re-sampled 3D model, and segmented the 3D mesh based on the analysis results. The experiments show that the segmentation results of K-means and Mini-Batch K-means are the same, and spectral clustering has a good result on the segmentation of data with strong continuity.

Future work may include improving the diversity of batch generation models and boundary conditions of flow field solutions, to increase the number of surrogate models for numerical simulation of different engineering requirements. The data set will be used in the direction of proxy model-assisted evolutionary algorithms, which will be used to construct a proxy model of the aircraft wing shape, then combined with multi-objective optimization algorithms, to optimize the design of the aircraft wing shape and thus achieve improved performance targets. On the other hand, other future projects could include improving the accuracy of resampling to support the exploration of 3D mesh segmentation.

This research is based upon work supported by

For all data sets and all information, please visit

The authors declare that they have no conflicts of interest to report regarding the present study.