What are the properties of the relational data model. Characterization of the relational data model. Basic principles of the relational approach. Basic concepts and definitions. Data Server Model

Ministry of Education Russian Federation Ministry of Education of the Republic of Tajikistan

Russian-Tajik (Slavic) University

Department "I and IS"

Course work

by discipline: Database

on the topic: Relational data model

Dushanbe - 2008


Plan

Introduction

1 Data model

2 Basic concepts of the relational data model

3 General understanding of the data model

Conclusion

Bibliography


Introduction

Humanity is rapidly entering into a fundamentally new for it information age... All components of the way of life of people change significantly. IN modern society the level of informatization characterizes the level of development of the state. Experts call the beginning of the XXI century a century computer technology... Their revolutionary impact concerns state structures and institutions of civil society, economic and social spheres, science and education, culture and way of life of people. Many developed and developing countries have fully realized the tremendous benefits that come with the development and diffusion of information and communication technologies. No one doubts the fact that the movement towards information society- this is the path to the future of human civilization.

In accordance with the relational model, a database is represented as a set of tables, on which operations can be performed, formulated in terms of relational algebra and relational calculus. In the relational model, operations on database objects are set-theoretic in nature. The relational data model concepts are associated with the name of the famous specialist in the field of database systems E. Codd. This is why the relational data model is often referred to as the Codd model.

The core of any database is the data model. A data model is a set of data structures, integrity constraints, and data manipulation operations. With the help of a data model, objects of the subject area and the relationship between them can be represented.


1 Data model

Data model- a set of data structures and their processing operations.

Data models are defined:

a) how the data is organized.

b) limiting the value of the data.

c) data operations.

The DBMS is based on the use of a hierarchical, network or relational model, on a combination of these models, or on a subset of them.

Let's look at 3 main types of data models: hierarchical, network and relational.

Hierarchical data model

a) The hierarchical structure is a set of elements interconnected according to certain rules. Objects connected by hierarchical relationships form a directed graph (inverted tree), the view of which is shown in Figure 1.

A Level 1

В1 В2 В3 В4 В5 Level 2

C1 C2 C3 C4 C5 C6 C7 C8 Level 3

Rice. one

The basic concepts of the hierarchical structure include: level, element (node), link.

Knot Is a collection of data attributes describing some object. In the diagram of a hierarchical tree, nodes are represented by the vertices of a graph. Each node at a lower level is associated with only one node at a higher level. A hierarchical tree has only one vertex (the root of the tree), not subordinate to any other vertex and located at the highest (first) level. Dependent (subordinate) nodes are located on the second, third, etc. levels. The number of trees in databases is determined by the number of root records. Each of them is located on the second, third peak and located at the topmost level (9th first). evenly. Database records there is only 1 hierarchical path from the root record. For example, as seen in Figure 1, for record C4, the path goes through records A and B3.

The example presented in Figure 2 illustrates the use of a hierarchical database model. For the example under consideration, the hierarchical structure is legitimate, since each student studies in a certain (only one) group, which belongs to a certain (only one) institution.

b) Integrity constraint- Integrity of links between ancestor and descendant, taking into account the basic rule: no descendant can exist without an ancestor.

Examples of : 1) OKA 3) TOTAL

2) INES 4) IMS

c) Operations on data:

Find the specified tree.

Move from one tree to another.

Move from one record to another.

Move from one record to another in order of traversing the hierarchy.

Delete current record.


Institute (specialty, name, rector)


Rice. 3 Network structure of the database in the form of a graph

Student ( record book number, surname, group)



Work ( cipher ,

supervisor,

Rice. 4.

An example of a complex network structure is the structure of a database containing information about students participating in research work (R&D). Participation of one student in several research projects is possible, as well as participation of several students in the development of one research project. A graphical representation of the network structure described in the example, consisting of only two types of records, is shown in Figure 4. The only relationship is a complex relationship between records in both directions.

c) Operations on the data of the network data model:

Find a specific record in a set of records of the same type.

Jump from node top level to the first node of the lowest on some connection.

Go to the next node on some link.

Create a new entry.

Destroy the record.

Modify the entry.

Include 1 link.

Exclude from communication.

Rearrange to another link.

Feature of the network data model: the ability to navigate through data links, i.e. transition from viewing the details of an instance of one record type to viewing the details of an instance associated with the record type. The user is provided with the possibility of multi-criteria analysis of the database without direct formalization of his information needs through the formation of queries in the language built into the DBMS.

Other strong point network data model - the use of multiple data types to describe the information attributes of objects. This allows you to create information structures, which are a tabular form of data. Despite the development of the network data model, it was not possible to create linguistic software tools on their basis, which would make it possible to describe the data of the network organization in the same way in applied information systems.

Relational data model.

The concept of relational (English relation - relation) is associated with the development of the famous American specialist in the field of database systems E. Codd.

2 Basic concepts of the relational data model

A relational data model represents information as a collection of related tables called relationships or relations.

Data type- is equivalent to the notion of data type in algorithmic languages... Exist:

Integer types;

Real types;

String types;

Data types for monetary values;

Data types for temporary values;

Types of binary objects (has no analogues in programming languages, and are denoted by Blob)

The smallest unit of data in a relational model is a separate atomic (indecomposable) data value for a given model. Domain is a set of atomic values ​​of the same type. In other words, a domain is a valid potential set of values ​​of a given type. The concept of a domain is more specific to databases, although it has some analogies with range types and sets found in a number of programming languages. In the very general view a domain is defined by specifying some basic data type to which the elements of the domain belong, and an arbitrary boolean expression applied to the element of the data type. If this logical expression evaluates to true, then the data item is a domain item.

The semantic load of the domain concept should also be noted: data are considered comparable only if they refer to the same domain. If the values ​​of two attributes are taken from different domains, then their comparison is probably meaningless. The concept of a domain is not used in all DBMSs. Examples of relational databases using domains include Ogasle and InterBase.

Attributes, relationship schema, database schema

The columns of a relationship are called attributes, and they are named and then accessed.

A list of relationship attribute names with domain names (or types if domains are not supported) is called a relationship schema.

The degree of a relationship is the number of its attributes. The relation of degree one is called unary, degree two - binary, degree three - ternary, ..., and degree n - n-ary.

A database schema is a set of named relationship schemas.

Tuple

A tuple corresponding to a given relationship schema is a set of pairs (attribute name, value) that contains one occurrence of each attribute name belonging to the relationship schema. "Value" is a valid domain value for this attribute (or data type if the domain concept is not supported). Thus, the degree of a tuple, that is, the number of elements in it, coincides with the degree of the corresponding relation scheme. In other words, a tuple is a collection of named values ​​of a given type. The relationship schema is sometimes also called the relationship header, and the relationship, as a set of tuples, is the relationship body. The concept of a relationship schema is reminiscent of the concept of a structural data type in programming languages ​​(structure in C / C ++, notation in Pascal). However, in relational databases, the schema name of the relationship is always the same as the name of the corresponding instance relationship. In classic relational databases, after defining the database schema, only the instance relationships are changed. New tuples can appear in them and existing tuples can be deleted or modified. However, many implementations also allow changing the database schema: defining new and modifying existing relationship schemas. This is commonly referred to as the evolution of the database schema.

Relationship Keys

Since a relation from a mathematical point of view is a set, and sets, by definition, do not contain coinciding elements, then no two tuples of a relation can be duplicates of each other at any arbitrary given moment in time. Thus, a relation must always contain some attribute (or a set of attributes) that uniquely identifies each tuple of the relation and ensures the uniqueness of the table rows. Such an attribute (or set of attributes) is called the primary key of the relationship.

For each relation, at least the complete set of its attributes has the uniqueness property. However, it is also required to ensure the minimum condition. Therefore, as a rule, in a relationship there is always one attribute that has the uniqueness property and is the primary key.

Depending on the number of attributes included in the key, simple and complex (or composite) keys are distinguished.

A simple key is a key containing only one attribute. In general, join operations are faster when the shortest and simplest possible data type is used as the key. From this point of view, an integer type that has hardware support for performing logical operations on it is best suited.

Complex or composite key - a key consisting of several attributes. A set of attributes that is unique but not minimal is called a superkey. A superkey is a complex (composite) key with more columns than it needs to be a unique identifier. Such keys are often used in practice, since the redundancy can be useful to the user.

A distinction is made between artificial and natural keys, depending on whether the attribute that is the primary key contains any information.

An artificial or surrogate key is a key created by the DBMS itself or by a user using some procedure, which itself does not contain information. An artificial key is used to create unique row identifiers when an entity must be fully described in order to uniquely identify a specific item. An artificial key is often used in place of a meaningful complex key that is too cumbersome to be used in a real database. The system supports an artificial key, but it is never shown to the user.

A natural key is a key that includes meaningful attributes and thus contains information.

Each of the primary key types has advantages and disadvantages; a large number of publications are devoted to their discussion. We will not carry out a detailed comparison of them, but will note only the main pros and cons of each type of key.

The main advantages of natural keys are that they carry well-defined information and their use does not lead to the need to add attributes to tables, the values ​​of which do not make any sense and are used only for communication between relations. In other words, the use of natural keys allows you to get a more compact form of tables (in which there will be no redundant, uninformative data) and more natural relationships between them.

The main disadvantage of natural keys is that their use is very difficult in the case of variability of the subject area. Understand that the values ​​of the primary key attributes must not change. That is, once specified, the primary key value for a tuple cannot be changed later. This requirement is mainly made to maintain the integrity of the database. The relationship between relationships is usually established precisely by the primary key, and changing it will lead to a breakdown of these relationships or to the need to change records in several tables. Even in relatively simple databases, this can cause a number of intractable problems. Some relational DBMSs allow changing the primary key. This can be really useful sometimes. However, this should only be done when absolutely necessary.

A typical example of a volatile subject area, in which it is impossible to determine an unchanging natural key for an entity, is any area where a person acts as an entity. Indeed, it is impossible to define for a person a set of attributes that would be unique and unchanged throughout his life.

The second, rather significant drawback of natural keys is that, as a rule, unique natural keys are composite and contain string attributes. As noted above, the maximum speed of performing operations on data is provided when using simple integer keys. Thus, from the point of view of system performance, natural keys are often suboptimal.

Both disadvantages of natural keys can be overcome by defining surrogate keys in relations, which are some universal attribute, usually of an integer type, which does not depend either on the domain or, moreover, on the structure of the relation that it identifies. Thus, it is possible to ensure the uniqueness and immutability of the key (since it does not depend in any way on the subject area, there will never be a need to change it). However, this comes at the cost of redundancy in the data in the tables. It should be noted that in many practical implementations of relational DBMSs, a violation of the uniqueness property of tuples for intermediate relations generated implicitly when executing queries is allowed. Such relations are not sets, but multisets, which in some cases allows one to achieve certain advantages, but sometimes leads to serious problems.

Any of the tables may contain several sets of attributes that can be selected as a key. Such sets are called potential or alternative keys.

Often so-called secondary keys are defined in a relationship. A secondary key is a different combination of attributes than the combination that makes up the primary key. Moreover, secondary keys do not necessarily have the property of uniqueness. When defining them, the following restrictions can be set:

UNIQUE - uniqueness constraint, values ​​of secondary keys under this constraint cannot be duplicated;

NOTNULL — with this restriction, none of the attributes included in the secondary key can be NULL.

Overlapping keys are complex keys that have one or more columns in common.

Linked relationships

In a relational model, data is represented as a collection of interconnected tables. This relationship between tables is called a rilationship. Thus, another important concept of the relational model is the relationship between relationships.

When looking at related tables, the concept of a foreign key is important. Let's consider it in more detail.

Foreign keys of a relationship

Databases often use the same attribute names in different ways. A foreign key is an attribute (or set of attributes) of one relationship, which is the key of another (or the same) relationship.

Foreign keys are used to establish logical connections between relationships. The relationship between two tables is established by assigning the foreign key values ​​of one table to the key values ​​of the other.

Like any other keys, foreign keys can be simple or composite.

Relationships are often linked by a primary key, that is, the foreign key values ​​of one relationship are assigned the values ​​of the primary key of another. However, this is not required - in general, communication can also be established using secondary keys. In addition, when establishing relationships between tables, it is not necessary to require the uniqueness of the key by which the relationship is established. Foreign key attributes do not have to have the same names as the key attributes they correspond to. A foreign key can also refer to the same table to which it belongs. In this case, the foreign key is called recursive.

Data integrity conditions

In order for the information stored in the database to be unambiguous and consistent, the relational model imposes some constraints. Constraints are rules that define the possible values ​​of data. They provide a logical basis for maintaining correct data values ​​in the database. Integrity constraints help minimize errors that occur when updating and processing data.

· The most important data integrity constraints are: categorical integrity; referential integrity.

The limitation of categorical integrity is as follows. Relational tuples represent in the database elements of certain real-world objects or, in relational DBMS terminology, categories. The primary key of a table uniquely identifies each tuple and therefore each item in the category. Thus, in order to retrieve the data contained in a table row, or to manipulate that data, you need to know the key value for that row. Therefore, a row cannot be entered into the database until all the attributes of its primary key have been determined. This rule is called the categorical integrity rule and is summarized as follows: no primary key attribute of a row can be empty.

The second condition enforces data integrity constraints on foreign keys, called referential integrity.

If two tables are related to each other, then the foreign key of the table must contain only those values ​​that already exist among the values ​​of the key by which the relationship is carried out. If the correctness of foreign key values ​​is not controlled by the DBMS, then the referential integrity of the data may be violated. The constraints of categorical and referential integrity must be supported by the DBMS. To maintain the integrity of the entity, it is sufficient to ensure that no tuples with the same primary key value are present in any relation. When it comes to referential integrity, it's a little trickier to enforce integrity. When updating the referencing relationship (when inserting new tuples or modifying a foreign key value in existing tuples), you just need to be careful not to get invalid foreign key values. But when removing a tuple from a referenced relationship, it is possible to use one of three approaches, each of which maintains referential integrity:

· The first approach is that it is forbidden to delete a tuple to which there are references (that is, first you need to either delete the referenced tuples, or change the values ​​of their foreign key accordingly);

· In the second approach, when deleting a referenced tuple, the foreign key value in all referenced tuples automatically becomes undefined;

· The third approach (also called cascading deletion) is that when a tuple is deleted from a referenced relationship, all referenced tuples are automatically removed from the referenced relationship.

In advanced relational databases, you can usually choose how to maintain referential integrity for each individual foreign key definition situation. Of course, to make such a decision, it is necessary to analyze the requirements of a specific application area. Although most modern DBMSs provide referential integrity, remember that there are relational DBMSs that do not enforce referential integrity constraints.

Types of relationships between tables

When establishing a link between two tables, one of them will be the master (master), and the second - the subordinate (detail). The difference between them can be explained somewhat simplistically as follows. All records contained in it are always available in the main table. In the subordinate table, only those records are available for which the value of the foreign key attributes coincides with the value of the corresponding attributes of the current record of the main table. Moreover, changing the current record of the main table will change the set of available records of the subordinate table, and changing the current record in the subordinate table will not cause any changes in any of the tables. In practice, more than two tables are often linked. One and the same table can be master in relation to one table and subordinate in relation to another. Or one main table may have subordinate not one, but several tables. However, the subordinate table cannot be managed by two tables. Thus, a master table can have several subordinates, but a subordinate table can only have one master.

There are four types of relationships between relational database tables:

· One to one - each record of one table corresponds to only one record of another table;

· One to many - one record of the main table can correspond to several records of the subordinate table;

· Many to one - several records of the main table can correspond to the same record of the subordinate table;

· Many-to-many - one record of the master table is associated with several records of the subordinate table, and one record of the subordinate table is associated with several records of the master table.

The difference between one-to-many and many-to-one relationship types depends on which table is selected as master and which as subordinate.

Basic properties of relationships

Let's now look at some of the most important properties of relationships in the relational data model.

3 General understanding of the data model

You can characterize the concept of a data model in different ways. On the one hand, a data model is a way of structuring data, which is considered as some kind of abstraction in isolation from the subject area. On the other hand, a data model is a tool for representing the conceptual model of the domain and the dynamics of its change in the form of a database.

Considering both of the above sides, we will define the basic structures of data models used to represent the conceptual model of the domain (entities, attributes, relationships).

Data item(field) is the smallest named data unit. Used to represent the value of an attribute.

Recording Is a named collection of fields. Used to represent a collection of entity attributes (entity records).

Record instance- a record with specific field values.

Aggregate- a named collection of data items within a record that can be viewed as a single whole.

File- a named set of instances of records of the same type. Used to represent a homogeneous set of entities.

Set of files- a named collection of files processed in the system. Used to represent multiple sets of entities.

Let us introduce the concept of "group", which generalizes the concepts of "aggregate" and "record".

Group Is a named collection of data items or data items and other groups.

The most important concept of the conceptual model is the concept of relationships between entities (sets of entities). In data models, the corresponding concept is reflected in the concept of "group relation".

Group relation Is a named binary relation defined on two sets of instances of the groups under consideration. By the nature of binary bonds, group relations of the form 1: 1, 1: M, M: 1, M: N are distinguished. Pairs of numbers are called group ratio coefficients. In a group relationship, one member of the group is designated as the owner of the relationship and another member.

Database- a named collection of instances of groups and group relations.

Group relations are represented in two forms:

but) Graphova... Groups are represented by the vertices of the graph, links between groups are represented by arcs directed from the owner group to the member group, indicating the name of the relationship and the coefficient.

The types of graphs are distinguished:

􀂃 hierarchical model (graph without cycles - tree);

􀂃 network model (directed graph of general form).

b) Tabular... The relationship between groups is represented by a table, the columns of which represent the keys of the corresponding groups. For a formal description of the table, the mathematical (set-theoretic) concept of a relationship is used. The corresponding data model is called the relational model.

The data model is described as follows:

􀂃 types and characteristics of logical data structures are defined

(fields, records, files);

􀂃 describes the rules for composing structures of a more general type from structures of simpler types;

􀂃 describes possible actions on structures and their rules

executions including:

- basic elementary operations on data;

- generalized operations (procedures);

- controls regarding simple conditions correctness of data entry (restrictions);

- means of control of arbitrarily complex conditions for the correctness of certain actions (rules). As the main elementary operations, the following are usually considered: searching for a record with a given key value, reading the required record, adding a record, correcting, deleting. Data models also provide special operations for establishing group relationships.

Generalized operations or procedures - a sequence of operations that implements a specific data processing algorithm. Procedures can be initiated by the DBMS automatically, and can also be started by the user. Examples of procedures are procedures for copying a database, restoring a database, procedures that calculate the values ​​of certain attributes in the database from the values ​​of other attributes, etc.

Controls are used to enforce constraints on the integrity of the conceptual model. The simplest constraint controls are used to enforce both the external constraints of the conceptual model and the internal constraints of the data model. As the last restrictions, in particular, there are implemented restrictions on the input of data of an inappropriate type, inappropriate characteristics (by the number of bits, by the number of fields, by the number of records, etc.). More complex controls (rules) allow you to trigger the execution of a certain sequence of operations (no matter how complex) when you change or add data to the database and thereby implement integrity constraints described using special constructs.


Conclusion

When we talked about the basic concepts of relational databases in the previous sections, we did not rely on any particular implementation. This reasoning was equally applicable to any system that was built using a relational approach. In other words, we used the concepts of the so-called relational data model. The data model describes a certain set of generic concepts and characteristics that all specific DBMS and the databases they manage should have, if they are based on this model. Having a data model allows specific implementations to be compared using one common language. Although the concept of a data model is general, and we can talk about hierarchical, network, some semantic, etc. data models, it should be noted that this concept was introduced into use in relation to relational systems and is most effectively used in this context. Attempts to apply similar models straightforwardly to pre-relational organizations show that the relational model is too "large" for them, but for post-relational organizations it turns out to be "small".


Bibliography

1. Computers in the office and at home: Relational databases: 2004. 228 pp.

2. Michie D., Jonathon R. Relational DBMS. 2004 No. 8, p. 4

3. www.libbooks.ru (2006 to 2008. Section: Database).

4. www.bankreferatov.ru (2004 to 2008. Section: Database).

5. Jones E., Sutton D. Office 97./K .: Dialectic, 1999.

6. Petrov V.N. Information Systems: tutorial for students of higher educational institutions, 2003 2nd ed. p. 139

Relationships (tables) meet certain integrity conditions. RMD supports declarative integrity constraints at the domain (data type) level, relation level, and database level.

  • Aspect (component) of processing (manipulation) - RMD supports operators for manipulating relations (relational algebra, relational calculus).
  • In addition, the theory of normalization is included in the relational data model.

    The term "relational" means that the theory is based on the mathematical concept of a relation ( relation). The word table is often used as an informal synonym for the term "relation". It should be remembered that “table” is a loose and informal concept and often means not “relationship” as an abstract concept, but a visual representation of the relationship on paper or on a screen. Incorrect and lax use of the term "table" instead of the term "relation" often leads to misunderstandings. The most common mistake lies in thinking that the RMD deals with "flat" or "two-dimensional" tables, when such can only be visual representations of tables. Relationships are abstractions and cannot be "flat" or "non-flat".

    For a better understanding of the RDM, three important points should be noted:

    • the model is logical, that is, relationships are logical (abstract), not physical (stored) structures;
    • for relational databases, the informational principle is correct: all the data content of the database is represented in one and only one way, namely, by explicitly setting the values ​​of attributes in tuples of relations; in particular, there are no pointers (addresses) connecting one value to another;
    • the presence of relational algebra allows one to implement declarative programming and declarative description of integrity constraints, in addition to navigational (procedural) programming and procedural condition checking.

    The principles of the relational model were formulated in the 1970s by E. F. Codd. Codd's ideas were first publicly presented in the classic article "A Relational Model of Data for Large Shared Data Banks".

    A rigorous presentation of the theory of relational databases (relational data model) in the modern sense can be found in the book by C.J. Date. “C. J. Date. An Introduction to Database Systems. ”

    The best known alternatives to the relational model are the hierarchical model and the network model. Some systems using these older architectures are still in use today. In addition, we can mention the object-oriented model on which the so-called object-oriented DBMS are built, although there is no unambiguous and generally accepted definition of such a model.

    Notes (edit)

    Literature

    • Date K.J. Introduction to Database Systems. - 8th ed. - M .: "Williams", 2006. - 1328 p. - ISBN 0-321-19784-4
    • Thomas Connolly, Carolyn Begg Database. Design, implementation and maintenance. Theory and Practice = Database Systems: A Practical Approach to Design, Implementation, and Management Third Edition. - 3rd ed. - M .: "Williams", 2003. - S. 1436. - ISBN 0-201-70857-4
    • S. D. Kuznetsov Database Basics. - 2nd ed. - M .: Internet University Information technologies; BINOMIAL. Knowledge Laboratory, 2007. - 484 p. - ISBN 978-5-94774-736-2
    • Kogalovsky M.R. Encyclopedia of Database Technologies. - M .: Finance and Statistics, 2002 .-- P. 800 .-- ISBN 5-279-02276-4

    Wikimedia Foundation. 2010.

    See what the "Relational Data Model" is in other dictionaries:

      Developed by E. Codd in 1970. logical data model describing: data structures in the form of (time-varying) sets of relations; multiple-theoretic operations on data: union, intersection, difference and Cartesian product; ... Financial vocabulary

      relational data model- A data model based on representing data as a set of relations, each of which is a subset of the Cartesian product of certain sets, and manipulating them using a variety of relational algebra operations, or ... ...

      Relational data model- 61. Relational data model A data model based on the representation of data as a set of relations, each of which is a subset of the Cartesian product of certain sets, and manipulating them using a set ... ... Dictionary-reference book of terms of normative and technical documentation

      Relational database database based on the relational data model. The word "relational" comes from the English. relation (relation). To work with relational databases, relational DBMS are used. Using relational databases ... ... Wikipedia

      relational database- Database implemented in accordance with the relational data model. [GOST 20886 85] relational database A database logically organized as a set of relations between its components. A characteristic feature of a relational database is ... ... Technical translator's guide

      Data is a logical data model, a rigorous mathematical theory that describes the structural aspect, the integrity aspect, and the processing aspect of data in relational databases. Structural aspect (component) data in the database represent ... ... Wikipedia

      In classical database theory, a data model is a formal theory of data representation and processing in a database management system (DBMS), which includes at least three aspects: 1) the aspect of structure: methods of describing types and ... ... Wikipedia

      Hierarchical data model representation of a database in the form of a tree (hierarchical) structure, consisting of objects (data) different levels... There are connections between objects, each object can include several objects ... ... Wikipedia

      It is necessary to transfer the contents of the Network DBMS article to this article and put a redirect from there. You can help the project by merging articles (see merging instructions). If it is necessary to discuss the feasibility of unification, ... ... Wikipedia

      - (English Associative model of data) is proposed by Simon Williams: 2 data representation model, in which the database consists of two types of data structures of elements and links stored in a single homogeneous common ... ... Wikipedia

    It would be wrong to think that only relational databases are used in information systems. You can often find the implementation of databases based on hierarchical, network, relational and other models. Nevertheless, most information systems are based on relational databases, the foundation of which was laid by E. Codd in the late 1960s, defining the basic rules and operations that should be applied in the implementation of such databases. Many database models that can be found in information systems are, in one way or another, based on the principles of relational databases and use various additional tools to improve the work with certain types of data, for example, with geographic data, data in real mode time (streaming data), multidimensional data, etc.

    The basis in relational databases is a relational data model built on the basis of relational algebra, which forms the basic rules for working with data in the corresponding databases.

    Relational data model

    The construction of a relational data model is based on the understanding that any dataset can be represented in the form of a relationship, formatted, but in the form of a table (Fig. 1.12), where the data is presented

    attributes and values ​​at the intersection of the corresponding attribute with the record (tuple).


    The term "Relationship" means a set of data combined into a set of records (tuples) and described by a header containing a set of attributes.

    In the above example, the entire set of job values ​​and the header part with named attributes on which the values ​​are placed is called a relationship. In terms of formal logic, the relation in general terms can be represented as follows:

    R (A, T), i = (1..n) (11)

    In this representation, under A, we mean an attribute describing one characteristic of the data, and under T- the data type to which the data to be represented in the relation must match. The above example is an informal statement of the relationship. Its title does not specify the data types that describe the information presented in the body of the relationship.

    Usually, the title of a relationship, where the names of attributes and their types are indicated, is called a relationship schema, and a set of interrelated relationship schemas is called a data schema. A relation header contains standardized data types, or types derived from standardized types, and the set of values ​​associated with a particular attribute of a standard or derived data type is called a domain.

    Waters the term "Domain" in database theory means an admissible set of named values ​​of the same type that have a certain meaning

    From this definition it follows that the domain is characterized by the following properties:

    • the domain carries a certain semantic load, which is expressed in the understanding of the meaning of the described data, which usually coincides with the understanding of data in the subject area;
    • a domain is defined as a simple or derived from a simple data type, allowing the use of simple logical operations over data;
    • a domain can contain a boolean condition that identifies a specific subset of the data that is valid for that domain.

    The body of a relationship is built from a set of records, which in terms of relational algebra are called tuples and represent information that exists in the domain within the considered object or group of interrelated objects.

    Thus, according to the definition of a tuple, it contains all possible data that obeys the rules defined by individual domains. Moreover, each element of the tuple data corresponds to only one domain and obeys all the properties that are defined by this domain.

    The tuple description uses a number of important properties, some of which are presented below:

    each tuple contains only one value for each of the attributes that characterize the relationship;

    no ordering is assumed for the components of a tuple, similar to the elements of a domain;

    each subset of tuples is represented by a similar tuple.

    By combining domains and tuples, you can form a relationship that is generally defined as follows;

    R [<Заголовок>]{<Список кортежей >}. (1.2)

    The relationship header is represented by a comma-separated list of attributes. Also important is the fact that the second parameter of the relation, when correctly represented, is denoted by the term "Body", which contains many tuples. But to simplify informal communication and simplify presentation, the term "Body" is replaced by the term "Tuple", implying that all tuples form the body of the relationship. In addition to understanding the terms "Tuple", "Domain" and "Body" in relational database theory, there is a constraint that all tuples of the same relation belong to the same tuple type, and that tuple type must be exactly the same as defined in the title of the relationship. Thus, all the rules for data presentation defined in the header apply to all tuples of the relation.

    Given the definitions of relationship, tuple, and domain described above, the basic properties of a relationship can be formulated. For an example of relationship properties, consider the relationship of information about the employees of the organization, which includes the attributes of the tuples, but the name of the employee, his position and official salary. These attributes will constitute the title of the relationship, forming the domains for the relationship. Each header attribute contains not only the name of the attribute, but also its type (Fig. 1.13), which determines the possible types of stored data in terms of their presentation, processing and restrictions.

    Rice. 1.13. Example of an Employee relationship

    Any relationship in a relational database has the following properties.

    1. Each tuple contains only one value of the corresponding type for each attribute (the relation is normalized).

    Each attribute in the presented example, within each tuple, is assigned only one value, which can be seen at the intersection of the selected domain "employee's full name" and the tuple with the full name "Petrov Petr Petrovich". The ratio that corresponds to this property is normalized, i.e. is in this case in the first normal form, 1NF.

    2. Attributes are not ordered according to any rule.

    Previously, it was defined that the components of a tuple are not ordered, and since a tuple must match the header attributes unambiguously, these attributes are also not ordered. You need to understand that a person, when representing data structures, always applies certain rules for ordering attributes and tuples, but it is important to remember that such ordering is not important and is not taken into account when working with relational databases. Therefore, the concepts "First attribute" or "Second attribute" are inapplicable to objects of the relational model, and also the term "Next attribute" or "Previous attribute" cannot be discussed.

    This situation gives a certain rigidity in working with databases, improving the quality program code data processing, which is often not always obvious when programming with less rigidity.

    3. Tuples are not ordered according to any rule.

    This property follows from the fact that the body of the relation is represented

    a set that, according to mathematical rules, is not ordered. Since relational relations obey the rules for working with mathematical sets, the mathematical apparatus for working with sets is used.

    Of course, presenting a relationship on paper, a person will try to streamline it in some way so that it is easier to process. However, this reflection of the relationship is not a rule and is just a representation of the relationship. The relation itself remains unordered, and by presenting it in a different ordering of tuples, the relation itself cannot be changed and the same processing operators can be applied to it as for a relation with another ordered representation. It follows from this that for implementation in databases, an ordered representation of a relationship does not make any sense, which means that any relationship in databases is unordered.

    4. There are no duplicate tuples in the relation.

    This property of a relationship follows from the understanding that the body of a relationship is represented by a set, and any set, taking into account its mathematical representation, does not contain duplicates. It follows from this that, taking any tuple represented by all the attributes used in the relation, it will be impossible to find a single tuple with exactly the same attribute values.

    At the same time, this property illustrates the differences between a relation and a table. Understanding that the data table is the physical implementation of relations in the database, records with the same values ​​can be placed there, unless, of course, such a possibility is provided at the level of the program logic for building the database, and the relationship, by definition, never contains duplicate tuples.

    Often, when there is no need to reflect the values ​​described in the relation (relation body), the relational model is limited to only specifying the title of the relation with the prescription of the name of the relation itself or only the name of the relation. These relational data model views are filtered views that are used in specialized database design tools such as IBM InfoSphere Data Architect (Figure 1.14).


    The main information contained in the relational data model is the names of relationships (entities), attributes and types of attributes that describe the relationship. Additionally, the relational model reflects relationships between relationships (entities), which make it possible to display the interaction of elements of bodies of related relationships. Each of these components of the relational model has a number of auxiliary characteristics that refine.

    for the representation and processing of the elements of the relationship body. Although these characteristics are not explicitly visualized in the data model, they are taken into account when presenting the relationship in its entirety, taking into account the display of the relationship body.

    The data model presented in the example uses a display filter that takes into account the need to display the name of the relationship (entity), the attributive composition of each relationship and the relationship between the relationships. The absence of attribute types and other characteristics in the visualization of the model does not mean that they are not defined or not. It just means that the data model is presented for the need to consider only the specified parameters, and all other characteristics are fixed in the hidden components of the model. For example, another presentation may be the case shown in Fig. 1.15.


    In this view, in addition to the names of relations (entities) and attributes, the data types that characterize the body of the relationship are displayed, although the body itself is not represented in these models. It should be borne in mind that the data model (database model) in specialized modeling tools is focused on further representation in the form of a database structure and the indication of relationship bodies is impractical. Because of this, relationship bodies are usually not shown in database models. If the model is built specifically to reflect operations with relations, in the full sense of this term, then all relations should be represented with bodies illustrating possible values ​​that will subsequently be stored in database tables.

    Such division often introduces some confusion in the correctness of the use of a particular data model (database model), which requires a more accurate description of their use. So, the data model in the form of relations is used when it is necessary to illustrate possible operations on the data of relations and to understand the correct interpretation in the domain model represented by objects with their possible instances. The data model in the form of entities and relationships (EL-model) is used to form a logical (infological) database model without specifying specific data values ​​and is aimed at further presentation in the form of a database structure. The model in the form of tables and links is built at the physical level, reflecting the peculiarities of data presentation and processing at the DBMS level. The result is a representation of the relational data model in three main versions (Table 1.3).

    Table 13

    Representation options for relational data models

    Presentation type

    Used

    terminology

    Appointment

    Model reflecting the name of the relationship, attributive composition, connections

    The essence

    Data type

    Used to model the logical data structure for the subsequent transition to the physical layer

    Model with header and body reflection with possible data

    Attitude

    Heading

    Attribute / Domain

    Data type

    It is used for a presentation with an indication of possible data values ​​and application, if necessary, of analysis of possible operations on relations and data in relations.

    Model displaying the structures of the physical representation of data in the DBMS

    Attribute / 11ole / Column

    Data type

    It is used to display a variant of the representation of the structure, which will be implemented at the physical level in the DBMS


    Note. This section discusses the presentation forms of the database model, which is reflected in three options. It should be borne in mind that database modeling is based on the implementation of modeling levels, and, therefore, there is an interpretation of data models in accordance with these levels, which are represented by other types of relational models, about which it will be later. In this regard, one should not take the above list of model representations as exhaustive, realizing that there may be other representations and other types of models.

    • Boyko V. V. Savinkov V. M. Database design of information systems.
    • Data types will be discussed in subsequent chapters.
    • The term "Normalization" and normal forms will be discussed in Ch. 2.

    The relational model is based on the set-theoretic concept of a relationship. In mathematical disciplines there is a concept “ attitude »(Relation), the physical representation of which is table ... Hence the name of the model - relational .

    In relation to a database, the concepts "relational database" and "tabular database" are synonymous. Relational databases are the most widespread in the world. Almost all database products created since the late 70s are relational.

    In 1970, papers appeared in which the possibilities of using various tabular data models were discussed. The most significant of these was an article by an employee of the firm IBM dr E.F. Codd (A Relational Model of Data for Large Shared Data Banks. CACM 13: 6, June 1970), where he first applied term "relational data model" ... The System R project was developed in a research laboratory at IBM Corporation. This project was conceived to prove the practicality of the relational model. Relational DBMS refers to DBMS second generation.

    The goals of creating a relational data model:

    1. Providing more high degree data independence.

    2. Build a solid foundation to address data consistency and redundancy issues.

    3. Extension of data management languages ​​by including operations on sets.

    Commercial systems based on the relational data model began to appear in the late 70s and early 80s. Currently, there are several hundred types of different relational DBMSs.

    The relational model is a convenient and most common form of data representation in the form tables (relationship ). Every relationship has name and consists of named attributes (columns) of data. One of the main advantages of the relational model is its uniformity. All data is stored in tables in which each row has the same format. Each row in the table represents some real-world object or relationship between objects.

    The basic concepts by which the relational model is defined are as follows:

    - relational database- a set of normalized relations;

    - attitude- file, a flat table consisting of columns and rows; a table in which each field is atomic;

    - domain- aggregate acceptable values, from which the value of the corresponding attribute of a particular relationship is taken. From a programming point of view, domain is a data type;

    - universe- a set of values ​​of all fields or a set of domains;


    - cortege- record, table row;

    - cardinality - the number of rows in the table;

    - attributes- named fields, table columns;

    - degree of attitude- the number of fields (columns);

    - relationship diagram- an ordered list of attribute names;

    - relational database schema- a set of relationship schemes;

    - primary key- a unique identifier with non-repeating records - a column or some subset of columns that uniquely identify rows.

    A primary key that includes more than one column is called plural , or combined , or composite , or super key .

    Object Integrity Rule states that the primary key cannot be completely or partially empty.

    The relationship between these concepts is illustrated in Fig. 4.5.

    Full name Year of birth Position Department
    1. Ivanov I. I. Head department. 22
    2. S. S. Sidorov Prof. 22
    3. Andreeva G.G. Prof. 22
    4. Tsvetkova S. S. Assistant professor
    5. Kozlov K.K. Assistant professor 22
    6. Petrov P. P. Art. Rev. 22
    Attributes

    rice. 4.5. Basic concepts of the relational data model.

    Sometimes different columns can be selected as the primary key in a table. A dedicated key is a key that is explicitly listed with the relational schema. Otherwise, one speaks of an implicit key, or a possible key, or a candidate key.

    - external key is a column or subset of columns from one table that can serve as a primary key for another table. The foreign key of a table is a reference to the primary key of another table. Since the purpose of building a database is to store all data, if possible, in one instance, then if a certain attribute is present in several relations, then its presence usually reflects a certain relationship between the rows of these relations.

    Foreign keys implement relationships between database tables.

    A foreign key, like a primary key, can be a combination of columns. In practice, a foreign key will always be composite if it refers to a composite primary key of another table. The number of columns and their data types in the primary and foreign keys must match.

    If a table is related to multiple other tables, it can have multiple foreign keys.

    Each relational table possesses the following properties:

    Has a name that is different from the names of all other tables;

    Data in table cells must be structurally indivisible. It is unacceptable that a table cell contains more than one piece of information. For example, the number and series of a passport should be located in different columns of the table;

    All columns in the table are homogeneous, i.e. all elements in a column have the same type (numeric, character, etc.) and length;

    Each column has a unique name;

    There are no identical rows in the table;

    The order of rows and columns can be arbitrary, regardless of their reordering, the relationship will remain the same, and therefore have the same meaning.

    A data model is a collection of data structures and operations for their processing. Using the data model, you can visualize the structure of objects and the relationships established between them. The terminology of data models is characterized by the concepts of "data element" and "binding rules". A data item describes any set of data, and the binding rules determine the algorithms for the relationship of data items. To date, many different data models have been developed, but in practice, three main ones are used. Allocate hierarchical, network and relational data models. Accordingly, they talk about hierarchical, network and relational DBMS.

    О Hierarchical data model. Hierarchically organized data is very common in everyday life. For example, the structure of a higher education institution is a multilevel hierarchical structure. A hierarchical (tree-like) database consists of an ordered set of elements. In this model, the original elements give rise to other elements, and these elements in turn give rise to the next elements. Each child has only one parent.

    Organizational charts, lists of materials, table of contents in books, project plans, and many other collections of data can be presented in a hierarchical manner. Integrity of links between ancestors and descendants is automatically maintained. Basic rule: no descendant can exist without its parent.

    The main disadvantage of this model is the need to use the hierarchy that was laid in the basis of the database during the design. The need for constant data reorganization (and often the impossibility of this reorganization) led to the creation of a more general model - network.

    О Network data model. The networked approach to organizing data is an extension of the hierarchical approach. This model differs from hierachic in that each generated element can have more than one parent element. ■

    Since the network database can directly represent all types of relationships inherent in the data of the corresponding organization, this data can be navigated, explored and queried in all sorts of ways, that is, the network model is not connected by just one hierarchy. However, in order to compose a query to a network database, it is necessary to delve deeply into its structure (to have a diagram of this database at hand) and to develop a mechanism for navigating the database, which is a significant drawback of this database model.

    About the relational data model. The basic idea behind a relational data model is to represent any dataset as a two-dimensional table. In its simplest case, a relational model describes a single two-dimensional table, but most often this model describes the structure and relationships between several different tables.

    Relational data model

    So, the purpose of the information system is to process data about objects the real world, taking into account connections between objects. In DB theory, data is often called attributes, and objects - entities. Object, attribute and connection are fundamental concepts of I.S.

    An object(or entity) is something that exists and discernible, that is, an object can be called that "something" for which there is a name and a way to distinguish one similar object from another. For example, every school is an object. Objects are also a person, a class at a school, a firm, an alloy, a chemical compound, etc. Objects can be not only material objects, but also more abstract concepts that reflect the real world. For example, events, regions, works of art; books (not as printed products, but as works), theatrical performances, films; legal norms, philosophical theories, etc.

    Attribute(or given)- this is some indicator that characterizes a certain object and takes a certain numerical, textual or other value for a specific instance of the object. Information system operates with sets of objects designed in relation to a given subject area, using specific attribute values(data) of certain objects. For example, let's take classes in a school as a collection of objects. The number of students in a class is a given, which takes a numerical value (one class has 28, another has 32). The class name is a given that takes text value(one has 10A, the other has 9B, etc.).

    The development of relational databases began in the late 1960s, when the first papers appeared discussing; the possibility of using in the design of databases the usual and natural ways of presenting data - the so-called tabular datalogical models.

    The founder of the theory of relational databases is considered to be an employee of IBM, Dr. E. Codd, who published the article A Relational Model of Data for Large-Shared Data Banks(Relational data model for large collective data banks). This article was the first to use the term "relational data model." The theory of relational databases, developed in the 70s in the United States by Dr. E. Codd, has a powerful mathematical basis describing the rules for efficiently organizing data. The theoretical basis developed by E. Codd became the basis for the development of the theory of database design.

    E. Codd, being a mathematician by education, suggested using the apparatus of set theory (union, intersection, difference, Cartesian product) for data processing. He proved that any dataset can be represented as two-dimensional tables of a special kind, known in mathematics as "relations."

    Relational such a database is considered in which all data is presented to the user in the form of rectangular tables of data values, and all operations on the database are reduced to manipulating tables.

    The table consists of columns (fields) and lines (records); has a name that is unique within the database. table reflects Object type the real world (essence), and each of her string is a specific object. Each column in a table is a collection of values ​​for a specific attribute of an object. The values ​​are selected from the set of all possible values ​​of the object attribute, which is called domain.

    In its most general form, a domain is defined by specifying some basic data type, to which the elements of the domain belong, and an arbitrary logical expression applied to the data elements. If, when evaluating a logical condition on a data item, the result is "true", then this item belongs to the domain. In the simplest case, a domain is defined as a valid potential set of values ​​of the same type. For example, the set of dates of birth of all employees constitutes the "date of birth domain" and the names of all employees constitute the "employee name domain". The date of birth domain has a datatype to store information about points in time, and the employee name domain must be a character datatype.

    If two values ​​come from the same domain, you can compare the two values. For example, if two values ​​are from the date of birth domain, you can compare them to determine which employee is older. If the values ​​are taken from different domains, then their comparison is not allowed, since, in all likelihood, it does not make sense. For example, nothing definite will come out of comparing the name and date of birth of an employee.

    Each column (field) has a name, which is usually written at the top of the table. When designing tables within a specific DBMS, it is possible to select for each field its a type, that is, define a set of rules for displaying it, as well as define those operations that can be performed on the data stored in this field. The sets of types may differ for different DBMSs.

    The field name must be unique in the table, however different tables can have fields with the same name. Any table must have at least one field; the fields are arranged in the table in accordance with the order of their names when it was created. Unlike fields, strings do not have names; their order in the table is not defined, and the number is not logically limited.

    Since the rows in the table are not ordered, it is impossible to select a row by its position - there is no “first”, “second”, “last” among them. Any table has one or more columns, the values ​​in which uniquely identify each of its rows. Such a column (or combination of columns) is called primary key... An artificial field is often introduced to number records in a table. Such a field, for example, can be its ordinal, which can ensure the uniqueness of each record in the table. The key must have the following properties.

    Uniqueness. No two distinct tuples have a relationship at any moment in time. the same value for a combination of attributes included in the key. That is, there cannot be two lines in the table that have the same identification number or passport number.

    Minimality. None of the attributes included in the key can be excluded from the key without violating the uniqueness. This means that it is not necessary to create a key that includes both the passport number and the identification number. It is sufficient to use any of these attributes to uniquely identify the tuple. You should also not include a non-unique attribute in the key, that is, it is forbidden to use a combination of an identification number and an employee's name as a key. By excluding the employee name from the key, you can still uniquely identify each row.

    Each relation has at least one possible key, since the totality of all its attributes satisfies the uniqueness condition - this follows from the very definition of the relation.

    One of the possible keys is randomly selected in as the primary key. The remaining possible keys, if any, are taken as alternative keys. For example, if you select an identification number as the primary key, then the passport number will be an alternate key.

    The relationship of tables is an essential element of the relational data model. She is supported foreign keys.

    When describing a relational database model for the same concept, different terms are often used, depending on the level of description (theory or practice) and the system (Access, SQL Server, dBase). Table 2.3 summarizes the terms used.

    Table 2.3. Database terminology

    Database Theory ____________ Relational Databases _________ SQL Server __________

    Relation Table Table

    Tuple Record Row

    Attribute Field _______________ Column

    Relational databases

    Relational database is a set of relations containing all the information that must be stored in the database. That is, the database represents the set of tables required to store all of the data. Relational database tables are logically related to each other. Design requirements for a relational database can be summarized in a few rules.

    О Each table has a unique name in the database and consists of rows of the same type.

    О Each table consists of a fixed number of columns and values. More than one value cannot be stored in one column of a row. For example, if there is a table with information about the author, publication date, circulation, etc., then the column with the author's name cannot contain more than one surname. If the book is written by two or more authors, you will have to use additional tables.

    О At no point in time will there be two rows in the table that duplicate each other. Rows must differ by at least one value in order to be able to uniquely identify any row in the table.

    О Each column is assigned a name that is unique within the table; a specific data type is set for it so that homogeneous values ​​(dates, surnames, phone numbers, sums of money, etc.) are placed in this column.

    О The complete informational content of the database is represented in the form of explicit values ​​of the data itself, and this method of presentation is the only one. For example, the relationship between tables is based on the data stored in the corresponding columns, and not on the basis of any pointers that artificially define relationships.

    О When processing data, you can freely refer to any row or column of the table. The values ​​stored in the table do not impose any restrictions on the order in which the data is accessed. Column description,