Types of dataset
Types of Dataset
1) Important Characteristics of Data
① Dimensionality (# of attributes)
- High dimensional data brings many challenges to analyze
② Sparsity
- Only presence counts
- Proportion of missing, NaN, set to 0 data
③ Resolution
- Patterns depend on the scale
- Quality, detail of the data
④ Size
- Type of analysis may depend on the size of the data
- Overfitting, Underfitting problems (discussed later)
-
2) Types
① Record Data
①-① Data Matrix
Data with collection of records, which consists of a fixed set of attributes

-
Could be named as m x n matrix: m rows x n columns, one for each attribute
-
When there are data objects with a fixed set of numeric attributes, able to be thought of as points in a multi-dimensional space, each dimension represents a distinct attribute
①-② Document Data
- Each document becomes a ‘term’ vector
- Each term is a component (attribute) of the vector
- The value of each component is the number of times the corresponding term occurs in the document
①-③ Transaction Data

- Each transaction involves a set of items
- Several items included in one attribute
- Can represent transaction data as record data
② Graph Data
- Worldwide Web
- Molecular Structures

③ Ordered Data
- Spatial Data
- Temporal Data

- Sequential Data
Sequences of transactions
- Genetic Sequence Data

Warmest regards,
Hope you enjoyed this post! Thank you! please hit the like button.
Leave a comment