Data Representation, Structure and Management
The use of codes to represent a character set
Computers store and process numeric and character data. A digital computer stores all data as binary digits - hence characters must also be represented as numbers.
The character set used will have a different number to represent each character.
The character set must include upper case letters, lower case letters, the number digits and all the punctuation and other characters found on the standard QWERTY keyboard.
ASCII (American Standard Code for Information Interchange)
This text suggests that we must have a standard set of character codes which is used by all computers (especially for transferring data between computers).
ASCII uses a 7 -bit code to represent each of the characters and any programming text book should include this table of codes.
A selection of the codes is shown in Table 3.1.
Table 3.1
Character Decimal Character Decimal
<Return> 13 ... ...
<Space> 32 z 122
A 65 0 48
B 66 1 49
c 67 2 50
z 90 9 57
a 97 $ 36
b 98 % 37
c 99 & 38
The math tells us that 7-bits makes possible 128 different codes (with binary codes 0000000, 00000001, ... , 1111111).
The computer stores all data as one or more bytes (8-bits) and so an extra digit is used in the most significant bit position (on the left) for a checking function called a parity check. This bit is called the parity bit.
Parity
A computer system uses either even parity or odd parity.
Even parity means that the total number of one bits (including the parity bit) must total to an even number.
Consider these characters:
1. 'A' has ASCII code 65 denary.
7-bit binary for 65 is 1000001.
This has two 1 bits (an even number), so the parity bit to be added will be a 0 bit. The 8-bit ASCII code for character is 01000001.
2. '8' has ASCII code 56 denary.
7-bit binary for 56 is 0111000.
This has three 1 bits (an odd number), so the parity bit to be added will be a 1 bit to make the total number of 1 bits even.
The 8-bit ASCII code for character '8' is 10111000.
The representation of different data types
Data types
The computer system - or more precisely the programming language or applications software such as a spreadsheet - will distinguish between different types of data.
Integer
In math this is any positive or negative whole number.
Boolean
Some data only have values 'True' or 'False'.
• Is a customer allowed credit?
• Is a student aged over 18?
Date and Time
There are considerable variations across different programming languages for how the language stores and processes a date value. Typical would be SQL which uses the format YYYY/MM/DD and encloses the characters inside the hash character.
The representation for the 13 April 2012 is #2010/04/13#.
Currency
The data type 'Currency' is available in Visual Basic. Net and applications such as MS Access and Excel. The data type is used for money values which have two digits after the decimal point.
Character
The ASCII code table was concerned with the number codes used for single characters.
Examples could include:
• gender - with possible value 'M' and 'F' only
• product type - 'E' used for electrical, 'C' for computer equipment etc.
Strings
Most program applications will need to store string data. A string is a sequence of characters from the character set. So:
"President putin"
"14 The High Street"
"9876"
Are all examples of a valid string value
Note
• The string may have no characters - called an 'empty string'
• The string may include digit or <space> characters - e.g. the address above
• The Programming language used may have an upper limit on the maximum length of a string
• The Programming language will have built-in functions for the manipulation of string data
Express positive integers in binary form
All data in the computer must be represented in binary form.
Consider a single byte used to represent a positive integer.
Think - just like we would in denary base ten - that each of the bits positions has a place value.
These are shown below:
• the most significant bit position has place value 128
• the least significant position has place value a 'unit' i.e. 0 or 1.
128 64 32 16 8 4 2 1
The following examples illustrate:
1. What positive integer is this?
01100111
1 + 2 + 4 + 32 + 64 = 103 denary.
2. Represent 93 as an 8-bit positive integer.
93 = 64 + 16 + 8 + 4 + 1
128 64 32 16 8 4 2 1
0 1 0 1 0 1 0 1
Understand the structure of arrays
When we store three surnames with the identifier names - Customer1, Customer2 and customer3 the variables will be declared in the program and this is the trigger for the interpreter or compiler to reserve three storage locations in memory ready to store the values assigned in the program. Appreciate that the data items have no relation to each other; the identifier names customer1, Customer2 etc., are as different as using names NameA, NameB etc .
Arrays
An array is a collection of data items which are referred to by the same identifier name.
Example
A garage sells cars and stores data for the number of cars sold in each month of the year. The array will be called MonthlySales and we need to store twelve values. The array can be visualised as shown in Table 3.2.
Table 3.2
MonthlySales
1 13
2 15
3 5
4 11
...
12 6
The numbers alongside each value is called the index or subscript number of the array.
Typical values are:
MonthlySales[1]=13
MonthlySales[4]=11
This array is a one-dimensional array.
MonthlySales is storing integer values. We can have arrays that store values of any of the recognised data types, i.e. Char, Boolean etc.
Visual Basic.Net
• VB uses parentheses - ( and ) -to enclose the array subscript.
• Throughout the text we shall use square brackets.
Example
The garage has a site in three towns and records separately the monthly sales made on each site for the 12-month period.
This suggests we need to visualise the data as shown in Table 3.3. The columns represent each month and the rows represent the sites.
Table 3.3
SiteMonthlySales
1 2 3 4 5 6 7 8 9 10 11 13
1 3 5 1 4 9 11 6 8 6 3 9 0
2 4 5 1 7 12 6 7 3 5 11 6 4
3 6 5 3 4 8 12 9 12 10 8 8 2
All values are represented with a single array. We shall use identifier name SiteMonthlySales which now needs two subscript numbers. This is called a two-dimensional array.
Typical values are:
SiteMonthlySales[1, 5] = 9
SiteMonthlySales[3, 11] = 8
If this was implemented with program code:
♦ For array MonthlySales subscript 0 is never used
♦ For array SiteMonthlySales column zero and row 0 are not used.
Initialising an array
The variable name use for the array must be declared in the program code.
Visual Basic.Net assumes the first subscript is zero, and the number shown is the highest subscript.
Dim MonthlySales(12) As Integer
Dim SiteMonthlySales(3, 12) As Integer
The smallest and largest subscripts are called the lower bound and the upper bound of the array.
If the array is to be given an initial value for all the cells in the array this can be done using a loop.
All the monthly sales values are to be assigned an initial value of zero.
Dim MonthlySales(12) As Integer
Dim Index As Integer
For Index = 1 To 12
MonthlySales(Index) = 0
Next Index
Reading values into an array
This is what essentially has been done in the code above. Each element of the array was assigned the value zero.
We are to input the twelve monthly sales totals from the keyboard and store them in the array.
Note the Index variable is supplying both:
• the index number for the array
• the prompt for the data entry.
Dim MonthlySales (12) As Integer
Dim Index As Integer
For Index = 1 To 12
Console.Write("Month: " & Index & " ... ")
MonthlySales(Index) = Console.Readline
Next Index
Console.WriteLine("Sales figures now stored in array ... ")
Pseudocode
What follows is our first use of pseudocode to describe an algorithm.
You should be able to study this and then write the program code from it.
Serial search of an array
A serial (or linear) search means start with the first value and then consider each value in order.
Find the first month is which the total sales was below 10.
We shall write a pseudocode description for the search algorithm.
Index ← 1
Found ← False
REPEAT
IF MonthlySales[Index] < 10
THEN
Found ← True
OUTPUT "The month number was..."
Index
ELSE
Index ← Index + 1
ENDIF
UNTIL Found = True
The LIFO and FIFO features of stacks and queues
Stack
A stack is a collection of data items - which can continually have new items join and items leave - which behave in a certain way. A stack manages its data items as 'the last item to join the stack will be the first item to leave: This can be abbreviated to 'Last in - First Out' (LIFO).
Queue
A queue behaves as follows. The first item to join the queue will be the first item to leave. That is 'First In- First Out' (FIFO). Again there are practical situations for a computer system where we would want data to be managed as a queue.
6 Storing data in files in the form of fixed length records comprising items in fields
Records
Data often consist of several data items which all relate to some entity. For example the collection of data for title, artist and release date all relate to the same 'recording' entity and we want to store the data for many recordings. The programmer would organise this data as a record. Using database terminology, each recording record would consist of three fields (title, artist and release date).
A collection of the data for (for example 150) recordings would be a file of recordings.
Fixed-length records
If the programming language was to store every record with the same number of characters then the records are said to be fixed length records.
In practice for the example given:
• Title of the recording
> would be a string of characters with a stated maximum length. If this was about 50 characters, then the title 'Abbey Road' could be stored either as:
* a string of 10 characters only, or
* a string with 'Abbey Road' followed by 40 characters so ensuring all title data was a fixed length. This second option ensures that the records will be fixed-length records.
• Artist field
> Same alternatives as for the Title data.
• Release date
> The programming language will have a standard format it uses for dates. If it was as suggested earlier YYYY/MM/DD, then all dates would be stored as 10 characters
So the issue is that some data types will mean the number of characters is always fixed - but for the 'string of characters' data type the data value could be a variable or a fixed size.
Serial, sequential, indexed sequential and random access to data and implementing serial, sequential and random organisation of files using indexes and hashing as appropriate
Serial Access
Serial access to a set of data items is accessing them in the natural order in which they are stored. Access to the items held in an array would be serial if we started at the item with subscript 1, then subscript 2 etc.
Consider items held in a text file. Serial access would mean reading the line of text on line 1, followed by line 2 etc.
What serial access means in practice will depend upon how the items were originally stored or organised. If the file was a file of words stored in alphabetical order, then serial access would retrieve the data in alphabetical order. This is generally not so; serial access generally is retrieving the data items in the original order in which they were stored.
Advantage of serial access
• Easy to program and supported by all programming languages.
Disadvantage
• We do not have direct access to an individual record.
Sequential access
Sequential access assumes that the data items were stored with sequential organisation. One of the data items will be acting as a key field. For example a file of customer records could have the product code as the key field. The data items are then read starting with the first and this retrieves the items in (alphabetical) order, i.e. in sequential order.
For all applications where we are reading data from a file, the programming file processing methods have the major disadvantage that the data items can only be read in sequence. There is no way that we can immediately read the 12th item in the file.
Advantage
• Easy to program and supported by all programming languages.
Disadvantages
• We do not have direct access to an individual record.
• The subsequent maintenance of the file (adding a new record, deleting a record) requires a lot of program code involving two files.
Random Access
This technique is designed to overcome the limitations of serial/sequential access.
A random access file uses a record key number allocated to each record. This number is used to calculate (or 'hash') the disc address where the data will be stored. The same record key can be used later to directly access this individual record from the file.
An array can be considered in the same way. The index number will be used when the data item is stored. The same array subscript can be used to directly access that individual array item.
There are issues to consider when deciding how each record key is generated. If we anticipate that there will be approximately 1000 records in the file, we need to use a hashing calculation which gives this range of key numbers. Also, we do not want two different numbers to generate the same record key. If this happens then potentially one of the records will be lost, or we need to anticipate this and have a strategy in place to deal with this situation of duplicate
record key number.
Advantage
• Able to directly access individual records (without referencing any of the other records).
Disadvantages
• Storage space may be wasted as a result of a poor choice of hashing function (i.e. record key)
• Possible that two different records could generate the same record key number.
Indexed sequential
Index sequential organisation and access is designed to offer the benefits of both sequential and random organisation.
To understand index sequential organisation, we must appreciate that the data records will be written onto the disc in blocks. A block is the smallest unit of data which can be read/written. A block is typically 512 bytes and so could contain several logical (for example customer) records.
An indexed sequential file works as follows. Each record has a record key (just like a random file).
• An index which stores the highest record key in that block of records
• In practice this could be a multi-level index where:
> the top-level is a track index for each track on the disc
> each track then has its own block index.
The following 'track index' is the first place to look when the program has to write a new record (Table 3.4).
Table 3.4
Track Index
Track Highest key (on that track)
1 946
2 1693
3 2030
4 5166
Hence writing a new record with key 1926 would search the track index and establish this will be written s9mewhere on track 3. We then consult the block index for track 3 (Table 3.5).
Table 3.5
Track 3: Block Index
Block Highest key (on that track)
1 1701
2 1754
Ê… Ê…
14 1896
15 1944
16 2030
This tells the software to store this new record in block 14 (i.e. its home storage area). The problem arises when the track index tells us use block 14 but when the software reads block 14 it finds that it is full (It already has five records stored). This is the situation when an overflow area must be used.
The home storage area stores the blocks of records. A disc which used blocks of size 512 bytes and logical (fixed sized) records of 95 bytes would be able to store five records in each block.
An overflow area stores any records which could not be stored in their home block. Records stored here will have a link from the home area block in order that these can be retrieved.
Advantages
• Able to directly access individual records (without referencing any of the other records)
• Able also to access the records in sequence.
Many computing applications could benefit from both direct access and sequential access. Printing customer statements could use sequential access assuming the records are stored in customer name order.
Searching for a single transaction (to answer an on-line customer query) would use direct access to the customers record using the indexes.
Disadvantage
• Not supported by all programming languages.
The data type to be used for any particular item should be self-evident and this issue is a fundamental one when we consider an initial program specification design.
As a general rule data items which are a 'one off' will be coded with a single variable whereas a collection of items - such as all the student surnames - would be represented with an array. A major advantage of using arrays is that separate arrays can be used for different data items. For example an array for the pupil's surname, an array for the form, and an array for the year the student joined the school. This way, we can assume that
Surname[56], Form [56] and
YearJoined [56] all refer to the same student.
A typical task such as searching for a student would search the surname array - find the array index where the surname is found - for example index value X - and then directly access Form [ X ] and YearJoined [ X ] to display this student's data.
The previous section on the use of files is a key consideration when designing the data structures to be used for a particular problem. We often have to think in reverse order. The data processing requirements will determine the file organisation and access methods to be used. 'For example if we have to carry out frequent searches for individual records then direct access may be needed. An alternative approach could be to read all the data values into arrays at the start of any program session and save the data back to the file at the end of the session. This way the array indexes provide fast access to any data value.
Backing up data and archiving
Backing up data
The data that a company generates is one of its most important assets. All aspects of the business will generate data for order processing, research, manufacturing and general operational data. The loss of this data could prove disastrous for the company.
Safeguarding the companies' data is a security issue. Backing up the data means taking a complete copy of the data and is based on the worst-case scenario that this data could be completely destroyed. This could be the result of a hardware malfunction, some human error, a natural disaster or a member of staff not following the correct operational procedures.
The issues around backing up of the data are given here.
• Do we need to backup all the data?
• How often should this backup be taken?
• Where the backup data should be stored?
• What are the recovery procedures should the backups be required?
There is probably little to be gained in backing up program files. If a program fails then we will have the original program discs to re-install the software. However there may be configuration files which have been generated in the course of using the software which should be backed up.
The frequency of backup will be determined by the nature of the application. Consider a payroll application which uses batch processing and is run on the 28th day of each month. The processing is that the latest payroll master file is generated each month from the current payroll master and a monthly file of employee transactions. This produces the updated master file. The backup files should therefore be the original version of the master file and this month's transaction file. If the updated master became corrupted, then we could recover using the old master and the current transaction file. Hence a new backup set of files is effectively generated only every month.
Compare this with an order processing application which is receiving (for example) 50 new orders through its website every hour. What happens if this order processing file fails?
There will be changes made to the data every minute and it will not be possible to take a backup of the data every minute! It may be reasonable to backup the data every (for example) one hour. If we log all the transactions in a separate file then it should be possible the recreate master file(s) in the event of a failure.
An alternative strategy is the mirroring of the master files so that we always have a second 'live' copy of the important data. In the event of a disk failure, we can simply switch to the mirrored files.
Archiving files
We shall frequently get queries from customers about an order that was recently placed, but is it likely we shall get queries about orders which are two years old? This suggests that some data on the computer system could be removed from on -line availability.
Archiving is the removal of files from online availability and moving (not copying) them to some form of off-line storage. In the unlikely event that the data will be needed, it can be accessed from the off-line storage.
Archiving is the process of freeing up files which are no longer in use but still having the files available if needed. Client email software typically asks the users if they want to 'archive' emails which are older than (for example) two months.
Top Tip : Don't confuse the terms backup and archive - they are not the same.
If you are one of the group that are not real practicing or not workout regularly, perhaps you mortal both disconfirming thoughts of your own. Isn't it instance to move business and programing your intention with solon electropositive thoughts?"I don't mortal indication." It would exclusive position 1/48th of your total day to learn for 30 minutes - and most group degenerate a lot statesman example than that on unfertile activities equivalent watching TV or invoice chatting on the phone! Beingness physically activist is many grave for your eudaemonia and well-being than 99% of the new things you anticipate must get done each day.You gift pauperization to hit this a precedency in your beingness if you are e'er feat to motility your eudaemonia and shape goals. http://guiadecorpomagro.com/