Difference between revisions of "Database Design"

From Colettapedia
Jump to navigation Jump to search
(One intermediate revision by the same user not shown)
Line 11: Line 11:
 
* n-Tuple - a group of n elements, with the order carrying meaning.
 
* n-Tuple - a group of n elements, with the order carrying meaning.
 
** 0-tuple {ex., void}
 
** 0-tuple {ex., void}
** singleton: a set with exactly one element { "hi" } ... { {1,2,3} }, cardinlity of one
+
** singleton: a set with exactly one element { "hi" } ... { {1,2,3} }, cardinality of one
 
** pair: a collection of two distinguishable objects. An ordered pair.
 
** pair: a collection of two distinguishable objects. An ordered pair.
 
* Projection - projecting a relation onto an attribute list yieds a subset of the relation containing only the attribute.
 
* Projection - projecting a relation onto an attribute list yieds a subset of the relation containing only the attribute.
Line 33: Line 33:
 
** Column values are of the same kind
 
** Column values are of the same kind
 
** Sequence of rows and columns is insignificant, i.e., no default sorting
 
** Sequence of rows and columns is insignificant, i.e., no default sorting
** each column has a unique name
+
** Each column has a unique name
 
* Designate certain fields as '''keys''' to optimize searching of that field
 
* Designate certain fields as '''keys''' to optimize searching of that field
 
* '''join''' operation: take values from two different tables matched by common field
 
* '''join''' operation: take values from two different tables matched by common field
Line 47: Line 47:
 
*** Can join and simplify multiple tables into a single virtual table, aggregate tables
 
*** Can join and simplify multiple tables into a single virtual table, aggregate tables
 
*** Hide complexity of data
 
*** Hide complexity of data
 
 
 
===Object/Relational Model===
 
===Object/Relational Model===
 
* Adds new object storage capabilities to relational: traditional fielded data, plus complex objects
 
* Adds new object storage capabilities to relational: traditional fielded data, plus complex objects
Line 56: Line 54:
 
* Less code, more natural '''data modelling'''.
 
* Less code, more natural '''data modelling'''.
 
* one-to-one mapping of DB objects and OO programming instances
 
* one-to-one mapping of DB objects and OO programming instances
 +
* Entities and relationships map directly to object classes and named relationships
 
===Semi-structured Model===
 
===Semi-structured Model===
 
* Flexible, when you don't know the schema ahead of time.
 
* Flexible, when you don't know the schema ahead of time.
Line 66: Line 65:
 
* elements, structure of elements, relationships between elements
 
* elements, structure of elements, relationships between elements
 
===Data modelling Steps===
 
===Data modelling Steps===
# Conceptual Data Model - technology independent. Identify entities.
+
# Conceptual Data Model - technology independent. Identify entity names and relationships
# Logical Data Model - documents structures that can be implemented in databases. Descriptions of tables, columns, OO classes
+
# Logical Data Model - documents structures that can be implemented in databases. Attributes, primary keys, foreign keys
# Physical Data model - organizes data into tables, accounts for access, performance and storage details. Doubles, int. The physical means to store the data. Partitions, CPUs tablespaces. IO data structures.
+
# Physical Data model - organizes data into tables, accounts for access, performance and storage details. Table names, Column names, column data types, The physical means to store the data. Partitions, CPUs tablespaces. IO data structures.
 
* Data definition language - converts schema to database.
 
* Data definition language - converts schema to database.
 
===Data Modelling strategies===
 
===Data Modelling strategies===
Line 78: Line 77:
 
* '''Database design advice: First create fully normalized design, then selectively denormalize for performance reasons.'''
 
* '''Database design advice: First create fully normalized design, then selectively denormalize for performance reasons.'''
 
* High degrees of normalization involve more tables, and more SQL join operations joining records into temporary "joined table", which can reduce performance.
 
* High degrees of normalization involve more tables, and more SQL join operations joining records into temporary "joined table", which can reduce performance.
===Nominal Forms==
+
===Nominal Forms===
 
* Update anomaly - If you keep the same oinfo in two records, you only update one
 
* Update anomaly - If you keep the same oinfo in two records, you only update one
 
* Insertion anomaly - When you want to insert a row, but you don't know all the field values, resulting in necessity of Null field value, which might be illegal in schema
 
* Insertion anomaly - When you want to insert a row, but you don't know all the field values, resulting in necessity of Null field value, which might be illegal in schema

Revision as of 19:11, 17 March 2014

  • Database design - the logical design of the base data structures used to store data.

Vocab

  • Entity - Something about which you want to store data, person, place thing, idea
  • Weak Entity - an entity that cannot be uniquely identified by its attributes alone - it has no unique key attribute. (Customer #15 order #3)
  • Full Participation Relationship - e.g., orders aren't possible without a customer. closely related to weak entity. Indicated on an ER diagram by a double line.
  • Weak Relationships - represented by double line diamonds.
  • Composite Entity
  • A Key is an entity attribute.
  • Unique Key (Primary Key) - uniquely identifies each row in a table.
  • Coupled tables
  • n-Tuple - a group of n elements, with the order carrying meaning.
    • 0-tuple {ex., void}
    • singleton: a set with exactly one element { "hi" } ... { {1,2,3} }, cardinality of one
    • pair: a collection of two distinguishable objects. An ordered pair.
  • Projection - projecting a relation onto an attribute list yieds a subset of the relation containing only the attribute.
  • Functional Dependency - like a function, X -> Y ... x is like a key.
  • Transitive Dependency - an indirect functional dependency, like X -> Y, Y -> Z.

Database Models

Hierarchical

  • Organizes data into tree structure.
  • Parent and Child Data segments
  • 1:N mapping between record types
  • Child data node can have only one parent

Network

  • Based on mathematical set theory
  • More than one parent per child
  • The "set construct": An owner record type, a set name, and a member record type.

Relational

  • Defines data structures, storage and retrieval operations, integrity constraints
  • Organized into tables with following properties:
    • Values are atomic
    • Each row is unique
    • Column values are of the same kind
    • Sequence of rows and columns is insignificant, i.e., no default sorting
    • Each column has a unique name
  • Designate certain fields as keys to optimize searching of that field
  • join operation: take values from two different tables matched by common field
  • Robust transaction and performance management features
  • A complex data structure must be flattened out to fit into tables or joined together from those tables, which takes overhead
  • base structures
    • table - can have any number of rows, columns specified by schema, cells are intersection, row id'ed by unique key
    • views - a virtual table computed or collated dynamically
      • Provides different way to see data rendered form "the base table" or "the underlying table"
      • The result set of a stored query on the data. A pre-established query command is kept in the database dictionary
      • Not part of the physical schema
      • Can represent a subset of data, limiting degree of exposure, permissions, etc
      • Can join and simplify multiple tables into a single virtual table, aggregate tables
      • Hide complexity of data

Object/Relational Model

  • Adds new object storage capabilities to relational: traditional fielded data, plus complex objects
  • SQL and Oracle are here

Object-Oriented Model

  • Manip DB using C++, Java, no need to translate calls into SQL
  • Less code, more natural data modelling.
  • one-to-one mapping of DB objects and OO programming instances
  • Entities and relationships map directly to object classes and named relationships

Semi-structured Model

  • Flexible, when you don't know the schema ahead of time.
  • The schema is in the data, "self-describing"
  • Modelled in terms of graphs which contain lables which give semantics to its underlying structure

Associative Model

  • Divides data into entities (items), and associations (links)
  • Source, verb and target

Data Modelling

  • elements, structure of elements, relationships between elements

Data modelling Steps

  1. Conceptual Data Model - technology independent. Identify entity names and relationships
  2. Logical Data Model - documents structures that can be implemented in databases. Attributes, primary keys, foreign keys
  3. Physical Data model - organizes data into tables, accounts for access, performance and storage details. Table names, Column names, column data types, The physical means to store the data. Partitions, CPUs tablespaces. IO data structures.
  • Data definition language - converts schema to database.

Data Modelling strategies

  1. Strategic - do it before
  2. Systems analysis - do it during

Database normalization

  • Minimize duplication of data, minimize the possibility of inconsistencies between the duplicates.
  • Isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated.
  • Database design advice: First create fully normalized design, then selectively denormalize for performance reasons.
  • High degrees of normalization involve more tables, and more SQL join operations joining records into temporary "joined table", which can reduce performance.

Nominal Forms

  • Update anomaly - If you keep the same oinfo in two records, you only update one
  • Insertion anomaly - When you want to insert a row, but you don't know all the field values, resulting in necessity of Null field value, which might be illegal in schema
  • deletion anomaly - when information is tied to other separate information which means the info is lost if you want to delete the other information, but all you can do is delete the whole row.
  • First Nominal Form (1NF)
    • No insertion, update and deletion anomalies
    • Reduce need for resturcturing of database as new types of data are introduced, resuling in increased lifespan of applications. Avoid bias towards any particular pattern of querying, unforeseen queries answered trivially.
  • 2NF - a table where all columns are a fact of the entire key and not a subset of the key.
  • 3NF - every non-key attribute must be non-transitively dependent on the primary key.
  • 4NF - if you can have two or more email addresses or phone numbers, then you should not have them in the same table.