Selasa, 10 Oktober 2017

What is Data Classification ? (ASSIGNMENT 6)

DATA CLASSIFICATION

Data classification is broadly defined as the process of organizing data by relevant categories so that it may be used and protected more efficiently. The classification process not only makes data easier to locate and retrieve – data classification is of particular importance when it comes to risk management, compliance, and data security.
Data classification involves tagging data, which makes it easily search able and track able. It also eliminates multiple duplications of data, which can reduce storage and backup costs, as well as speed up the search process.
To be effective, a classification scheme should be simple enough that all employees can execute it properly. Here is an example of what a data classification scheme might look like:


Category 4: Highly sensitive corporate and customer data that if disclosed could put the organization at financial or legal risk
Example: Employee social security numbers, customer credit card numbers

Category 3: Sensitive internal data that if disclosed could negatively affect operations.
Example: Contracts with third-party suppliers, employee reviews


Category 2: Internal data that is not meant for public disclosure.
Example: Sales contest rules, organizational charts

Category 1: Data that may be freely disclosed with the public.
Example: Contact information, price lists

Advantage of Data Classification
consistent use of data classification will facilitate more efficient business activities, and lower the costs of ensuring adequate information security. By classifying data, the company can prepare generally to identify the risk and impact of an incident based upon what type of data is involved. The classifications as listed (public, internal, confidential) give a basis for determining the impact based upon the level and type of access to data. Together, data classification and level of access drive the business impact which will determine the response, escalation and notifications of incidents.

EREADER SCORING AND EREADER TRANING ANALYSIS USING RAPIDMINER (ASSIGNMENT 5)

I try to make the Decision Tree of eReader Scoring and eReader Training Analysis using the Rapidminer software. This assignment reference from Matthew North Book's about Data Mining for the Masses in Chapter 10 on page 157-174. There are about 8 step i take when making this process:

1. Add data and input the eReader Scoring.csv and eReader Training.csv file to Rapidminer software :



2.  Drag the file of eReader Scoring and eReader to the design work process. Then, Make 2 Set Role operators to both your training and scoring streams. In the Parameters area on the right hand side of the screen, set the role of the User_ID attribute to id. And then make the another set role for Training Streams and set the role of the eReader_Adoption attribute to label. After that, search in the Operators tab for Decision Tree. Select the basic Decision Tree operator and add it to your training stream. And then drag the Apply Model Operators and connect it to your decision tree and scoring set role operatorsThen connect the Apply Model Operators to result point to finalized it and don't forget to click the Run.


3. Result with Decision Tree model


4. Result with Table Model


5. Result with Scatter Chart



6. Result with Statistics









Minggu, 08 Oktober 2017

PREDICTION MODEL (DATA PEMILU) USING ORANGE (ASSIGNMENT 4)



1. Decision Tree



2. Naive Bayes



3. K-Nearest Neighbor