Database Design

From Colettapedia
Jump to navigation Jump to search

Vocab

Entity

  • Entity - Nouns. Something about which you want to store data, person, place thing, idea
    • N.B.: Really when you say entity, you mean entity type. Analogous to an abstract class in C++.
  • Weak Entity - an entity that cannot be uniquely identified by its attributes alone - it has no unique key attribute. (Customer #15 order #3)
  • Composite Entity
  • Attributes
    • A Key is an entity attribute.
    • Unique Key (Primary Key) - uniquely identifies each row in a table.

Relationship

  • Cardinality
  • Crow's foot notation
    • Zero = 'o'
    • One = '|'
    • One and only one = '||'
    • Many = '{' (crow's foot)
    • Zero or many 'o{' - Specifies an optional standalone relationship. Zero is minimum and many is maximum
    • One or many = '|{'
  • Other notation/verbiage
    • Full Participation Relationship - e.g., orders aren't possible without a customer. closely related to weak entity. Indicated on an ER diagram by a double line.
    • Weak Relationships

Keys

  • How to couple tables together
  • n-Tuple - a group of n elements, with the order carrying meaning.
    • 0-tuple {ex., void}
    • singleton: a set with exactly one element { "hi" } ... { {1,2,3} }, cardinality of one
    • pair: a collection of two distinguishable objects. An ordered pair.
  • Projection - projecting a relation onto an attribute list yieds a subset of the relation containing only the attribute.
  • Functional Dependency - like a function, X -> Y ... x is like a key.
  • Transitive Dependency - an indirect functional dependency, like X -> Y, Y -> Z.


Mermaid JS ER model syntax

  • The relationship label could be interpreted as a foreign key, so you might not want to additionally include it as an attribute.
  • Non-identifying relationship = both entities can have existence without the other. use `..`
  • Identifying relationship = use `--`
  • Relationship requires 3 components
    1. cardinality of 1st entity wrt 2nd
    2. whether relationship confers identity in 'child' identity
    3. cardinality of 2nd entity wrt 1st


Non-relational Database Models and allied technologies

Hierarchical

  • Organizes data into tree structure.
  • Parent and Child Data segments
  • 1:N mapping between record types
  • Child data node can have only one parent

Network

  • Based on mathematical set theory
  • More than one parent per child
  • The "set construct": An owner record type, a set name, and a member record type.
  • E.g., IDMS.

Object/Relational Model

  • Adds new object storage capabilities to relational: traditional fielded data, plus complex objects
  • SQL and Oracle are here

Object-Oriented Model

  • Manip DB using C++, Java, no need to translate calls into SQL
  • Less code, more natural data modeling.
  • one-to-one mapping of DB objects and OO programming instances
  • Entities and relationships map directly to object classes and named relationships

Semi-structured Model

  • Flexible, when you don't know the schema ahead of time.
  • The schema is in the data, "self-describing"
  • Modeled in terms of graphs which contain labels which give semantics to its underlying structure

Associative Model

  • Divides data into entities (items), and associations (links)
  • Source, verb and target


Relational Databases

RDBMS features

  • Defines data structures, storage and retrieval operations, integrity constraints
  • Flatten out complex data structure to fit into tables or joined together from those tables, which takes overhead
  • Robust transaction and performance management features

RDBMS table properties

  • Values are atomic
  • Each row is unique - id'ed by unique key
  • Column values are of the same kind
  • Sequence of rows and columns is insignificant, i.e., no default sorting
  • Each column has a unique name
  • table can have any number of rows
  • columns specified by schema

Keys

  • Designate certain fields as keys to optimize searching of that field, and maintain DB integrity
  • Primary key (PK) - one or more columns that uniquely identify row in a table
    • No NULLs in PK
  • Foreign Key (FK)
    • FK in one table is PK in another table

Indexes

  • Search optimization happens using indexes
  • RDBMS creates physical data structure using b-tree or similar
  • Obviates full table scan
  • Most RDBMS's automatically create indexes on PKs
  • Most RDBMSs do not automatically index foreign keys, so it's often recommended to create indexes manually on foreign key columns, especially if they are frequently used in joins or WHERE clauses.
  • While indexes significantly speed up search operations, they come with some costs:
    • Space: Indexes consume disk space.
    • Insert, Update, and Delete Overhead: When data is inserted, updated, or deleted, associated indexes also need to be updated. This can slow down write operations.
  • Therefore, it's essential to strike a balance.
  • Not every column should be indexed; only those that benefit the performance based on the queries the application predominantly uses.

Table Joins

  • join operation: take values from two different tables matched by common field

Views

  • A virtual table computed or collated dynamically
  • Provides different way to see data rendered form "the base table" or "the underlying table"
  • The result set of a stored query on the data.
  • A pre-established query command is kept in the database dictionary
  • Not part of the physical schema
  • Can represent a subset of data, limiting degree of exposure, permissions, etc
  • Can join and simplify multiple tables into a single virtual table, aggregate tables
  • Hide complexity of data



Data Modeling

  • elements, structure of elements, relationships between elements

Data modeling Steps

  1. Conceptual Data Model - technology independent. Identify entity names and relationships
  2. Logical Data Model - documents structures that can be implemented in databases. Attributes, primary keys, foreign keys
  3. Physical Data model - organizes data into tables, accounts for access, performance and storage details. Table names, Column names, column data types, The physical means to store the data. Partitions, CPUs tablespaces. IO data structures.
  • Data definition language - converts schema to database.

Data Modelling strategies

  1. Strategic - do it before
  2. Systems analysis - do it during

Database normalization

  • Minimize duplication of data, minimize the possibility of inconsistencies between the duplicates.
  • Isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated.
  • Database design advice: First create fully normalized design, then selectively denormalize for performance reasons.
  • High degrees of normalization involve more tables, and more SQL join operations joining records into temporary "joined table", which can reduce performance.

Nominal Forms

  • Update anomaly - If you keep the same oinfo in two records, you only update one
  • Insertion anomaly - When you want to insert a row, but you don't know all the field values, resulting in necessity of Null field value, which might be illegal in schema
  • deletion anomaly - when information is tied to other separate information which means the info is lost if you want to delete the other information, but all you can do is delete the whole row.
  • First Nominal Form (1NF)
    • No insertion, update and deletion anomalies
    • Reduce need for resturcturing of database as new types of data are introduced, resuling in increased lifespan of applications. Avoid bias towards any particular pattern of querying, unforeseen queries answered trivially.
  • 2NF - a table where all columns are a fact of the entire key and not a subset of the key.
  • 3NF - every non-key attribute must be non-transitively dependent on the primary key.
  • 4NF - if you can have two or more email addresses or phone numbers, then you should not have them in the same table.