Types of dataset

Types of Dataset

1) Important Characteristics of Data

① Dimensionality (# of attributes)

  • High dimensional data brings many challenges to analyze

② Sparsity

  • Only presence counts
  • Proportion of missing, NaN, set to 0 data

③ Resolution

  • Patterns depend on the scale
  • Quality, detail of the data

④ Size

  • Type of analysis may depend on the size of the data
  • Overfitting, Underfitting problems (discussed later)
  • 2) Types

① Record Data

①-① Data Matrix

Data with collection of records, which consists of a fixed set of attributes

document

  • Could be named as m x n matrix: m rows x n columns, one for each attribute

  • When there are data objects with a fixed set of numeric attributes, able to be thought of as points in a multi-dimensional space, each dimension represents a distinct attribute

①-② Document Data 스크린샷 2025-04-12 160644

  • Each document becomes a ‘term’ vector
  • Each term is a component (attribute) of the vector
  • The value of each component is the number of times the corresponding term occurs in the document

①-③ Transaction Data

스크린샷 2025-04-12 160851

transaction data

  • Each transaction involves a set of items
  • Several items included in one attribute
  • Can represent transaction data as record data

② Graph Data

  • Worldwide Web

Worldwideweb

  • Molecular Structures

molecular

③ Ordered Data

  • Spatial Data

map

  • Temporal Data

temporal

  • Sequential Data 스크린샷 2025-04-12 161442

Sequences of transactions

  • Genetic Sequence Data

genetic

Warmest regards,

Hope you enjoyed this post! Thank you! please hit the like button.

Tags:

Categories:

Updated: