Types of data

What is data?

Data is basically a collection of data objects and their attributes.

data table

Attributes (aka. Variable, field, characteristic, dimension, or feature)

  • Property, characteristics of an object
  • Eye color, temperature, id, etc.
  • Attribute values are the numbers or symbols assigned to an attribute for a particular object
  • Distinction between attributes and attribute values

① Same attributes can be mapped to different attribute values

  • Height can be measured in feet or meters (Different unit)

② Different attributes can be mapped to the same set of values

  • Attribute values for ID and age are integers

③ But properties of attribute can be different than the properties of the values used or represent the attribute types of data

Type of attributes

① Nominal

  • ‘Label’
  • ID numbers, eye color, sex, zip codes
  • =, ≠

② Ordinal

  • ‘Order’
  • Rankings, grades, height
  • $> , <$

③ Interval

  • ‘Unit of measurement’, but arbitrary
  • Calander dates, temperatures in Celsius or Fahrenheit
  • +, -

④ Ratio

  • ‘Unit of measurement’, but the origin is not arbitrary
  • Temperature in Kelvin, length, counts and elapsed time
  • *, /

Discrete and Continuous Attributes

① Discrete Attributes

  • Only has a finite or countably infinite set of values
  • Zip codes, counts, or set of words
  • Often represented as integer variables
  • Binary attributes are a special case of discrete attributes

② Continuous Attributes

  • Has real numbers as attribute values
  • Temperature, height, or weight
  • Practically could be measured and represented using a finite number of digits
  • Continuous attributes are typically represented as floating-point variables

Asymmetric Attributes

  • Only presence is regarded as important
  • Dummy variables

Key messages for Attribute Types

  • The types of operations chosen should be “MEANINGFUL” for the type of data
  • Four properties of data

① Distinctness

② Order

③ Meaningful intervals

④ Meaningful ratios

  • The data type you see (often numbers or strings) may NOT capture all the properties
  • The data type you see may suggest properties that are not present
  • Analysis may depend on these other properties of the data : Many statistical analyses depend only on the distribution
  • Key: ‘What is meaningful?’ can be determined by the domain you want to analyze about

2) Object

  • A collection of attributes
  1. Record, point, case, sample, entity, or instance.

Warmest regards,

This is the first post to explain massive amount of information of data mining. Hope you enjoyed, and please stay tuned for next uploads!

Tags:

Categories:

Updated: