Data Preprocessing

In the process of rough set data analysis, attributes can be reduced, which implies that some redundant attributes that do not play any role in distinguishing an object from the others, can be eliminated without any information loss. And the final result to which rough set approaches direct is the production rule that is capable of predicting newly gathered data [15].

b. Completion
The RST done the process of fill the missing value using

i/o completer (mean / mode fill value). For large datasets with missing values, complicated methods are not suitable because of their high computation cost. It tends to and simple methods that can reach performance as good as complicated ones. The results and experience obtained in the previous session suggested us that mean-and-mode method can be efficient and effective for large datasets with necessary improvements. The basic idea of our method is the cluster-based filling up of missing values (4). Instead of using mean-and-mode on the whole dataset will use mean-and-mode in its subsets obtained by clustering. In this algorithm can be applied to supervised data where missing value attributes can be either categorical or numeric. It produces a number of clusters equal to the number of values of the class attribute. By using this method the missing data will be filled by the comparison of other inputs and the filed data 90% suitable to that column [10].

c. Reduction
 After analyze all the data, the Rosetta tool provide the influential and important parameter those are decide the result. It built so much of combination related to the end result. By using the Johnson’s reduction algorithm produce highly reliable reduction data that have high influential parameters [11]. In the area of Reduction the Johnson Reducer (Johnson’s algorithm) used to find out the influential parameter for the highly impacted data for the future selection. It is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents [14]. In most of the cases this activity concerns processing human language texts by means of Natural Language Processing (NLP). Johnson’s algorithm is the task where it intends to reduce the dataset dimension by analyzing and understanding the impact of its features on a model [9]. Consider for example a predictive model C1A1 + C2A2 + C3A3 = S, where Ci are constants, Ai are features and S is the predictor output. It is interesting to understand how important are the used features (A1, A2 and A3) what are their relevance to the model and their correlation with S. Such analysis allows us to select a subset of the original features, reducing the dimension and complexity of future steps on the Data Mining process (4).

So this influenced parameter given as an input to the ANN based tool for prediction. A Neuron solution is one of the best simulation tools for ANN.

2) Prediction
In the area of prediction, the parameters are labeled by

Training, Testing and cross validation (7).

 In the training the network trained by using the influential parameters and that will compress to the level how to parameter give success and n success. It may train and it will check in the testing section.

b. Testing
 In the testing, the trained data may check by using

Supervised learning algorithm. If the testing suitable to give correct result means the data trained correctly otherwise the data will be train again [11].

c. Cross validation

If the training and testing are done correctly means the data will be validated using cross validation section [13].

3) Error rate. 
In the paper also shows the error rate between the Actual and desired output. If the error rate is low, then only we consider the system works correctly [13].

A properly trained neural network is capable of generating the information on the based on IVF data. To train an artificial neural network, a suitable training, cross validation and test data are selected. The neural network is trained with the training data, and checked with test data. The ANN will find the desired output-actual output map from the training set [6].

Source: Essay UK -

Not what you're looking for?

Search our thousands of essays:


About this resource

This Information Technology essay was submitted to us by a student in order to help you with your studies.

Word count:

This page has approximately words.



If you use part of this page in your own work, you need to provide a citation, as follows:

Essay UK, Data Preprocessing. Available from: <> [22-02-19].

More information:

If you are the original author of this content and no longer wish to have it published on our website then please click on the link below to request removal:

Essay and dissertation help

Latest essays in this category:

Our free essays: