## Data Analytics help and Data Visualization Help

#### Get Instant Course Help # CSE5330 Homework Questions and Answers

CSE 5330/7330

Homework #1

1. [Points 5/5] If Amazon.com kept all of its data (i.e. 42 TB) on Hollerith 80 column punched cards, how long would it take to read all of the data if you used an IBM 2540 card reader which could read 1000 cards per minute?

Hollerith 80 column card can store = 12 rows * 80 column / 8 = 120bytes

Number of Hollerith 80 column cards required to store 42TB data = 42TB/120TB = 384829069722 cards

Number of cards read by IBM 2540 card reader in 1 minute = 1000 cards/minute

Time required to read 384829069722 cards = 384829069722/1000 = 384829069.722minutes = 6413818 hours

2. (Points 20/15] Consider a disk with a sector size of 512 bytes, 1000 tracks per surface, 400 sectors per track, 5 double-sided platters, average seek time of 6 milliseconds. Suppose that a file containing 100,000 records of 200 bytes each is to be stored on such a disk and that no record is allowed to span two sectors. Assume the disk rotates at 10,000 RPM and one track of data can be transferred per rotation.

a. How many records fit into one sector?

Number of records per sector = sector/record size = 512/200 = 2 records/ sector

b. How many sectors are required to store the entire file? If the file is arranged sequentially on disk, how many surfaces are needed?

Sectors required to store the entire file = total records per file / records per sector = 100,000/2 = 50,000 sectors

Number of surface required to store the file = Sectors required to store entire file / sectors per surface

Sectors per surface = 1000 * 400 = 400,000 sectors

Number of surface required to store the file = 50,000/400,000 = 0.125 = 1 surface

c. What is the maximum number of 200 byte records that can be stored using this disk?

Max number of records that can be stored in a disk = total sectors in a disk * records per sector

Total sectors in a disk = total surface * sectors per surface = 5 * 2 * 1000 * 400 = 4000000 sectors

Max number of records that can be stored in a disk = 4000000 sectors * 2 = 8000000 records/disk

d. What is the time required to completely read the entire disk? Assume ideal circumstances.

Time to read the entire disk = seek time + Rotational delay

Disk speed = 10,000 RPM

Data read per rotation = 1 track

Total tracks in a disk = total surface * tracks per surface = 10 * 1000 = 10,000

Number of disk Rotation required to read 10,000 tracks = 10,000 rotations

Time to read the entire disk = 6milli sec + 10,000 rotations/disk speed =

Time to read the entire disk = 6milli  sec + 10,000/10,000 min = 6ms + 60,000ms = 60,000ms

3. [Points 10/10] PC Magazine has a list of 10 web hosting companies https://www.pcmag.com/article2/0,2817,2424725,00.asp Consider two companies on the list, and search their detailed information to determine what database systems are available to a web developer. Provide the names of the companies and database(s) they support.

 Company Database HostGator MySQL Inmotion hosting MySQL

4. [Points 20/15] Suppose we have a sequential ordered file of 200,000 records, where each record is 200 bytes. Assume blocksize = 2048 bytes (10 records per block), seek time = 16 ms, average rotational delay = 8.3 ms, and block transfer time = 0.8 ms. Suppose we want to make X independent random record reads from the file. This could be done in two different approaches.

a. read the entire file once and look for the X records of interest

Total access time for each block = Seek Time + Avg Rotational latency + Block transfer time

=  16ms + 8.3m + 0.8ms = 25.1ms

Total number of  Blocks with file records = 200,000/10 = 20,000 blocks

Time to read entire file = Total access time for each block * Total number of  Blocks with file records

= 25.1ms * 20,000 = 502,000ms = 502sec = 8.36 minutes

Total time = Time to read entire file + Time to read X records from the buffer

= 502sec + worst case Time complexity to read X records from buffer = 502 secs +X* O(200,000)

b. use a binary search to find a particular record, and repeat this for all X records of interest.

D->Avg time to read/write a disk page

B -> Number of blocks

C -> Average time to process a record.

Assume Avg time to process a record = 1ms.

Find = Binary search time to find one record = - DlogB + ClogB

= 25.1ms log 20,000 + 1ms log 20,000 = 0.0251 * 14.2877 + 1ms * 14.2877

Binary search time to find one record  = 0.3586 + 0.01429 = 0.37288 sec

Binary search time to find X records = X * 0.37288 secs.

Linear vs Binary comparison graph

Binary search (option 2) seems to perform better then reading entire file first and doing a linear search(option 1) up to 1300 records after which option 1 is better.

5. [Points 20/15] A file with Part# as the hash key incudes records with the following Part# values: 2368, 3760, 4692, 4871, 5659, 1821, 1074, 7115, 1620, 2428, 3943, 4750, 6975, 4981 and 9208. The file uses 4 buckets, numbered 0 to 3. Each bucket is one disk sector and holds four records and appropriate pointer values. Load these records into the file in the given order, using the hash function h(k) = k mod 4.

a. Draw the diagram showing the final loading of the data into buckets.

b. Calculate the average number of sector accesses for a random retrieval on Part#.

 Bucket Number key1 Key2 Key 3 Key 4 Overflow 0 2368 3760 4692 1620 2428 9208 1 1821 4981 2 1074 4750 3 4871 5659 7115 3943 6975

3 records out of 15 is in overflow. The 12 records that are not in overflow requires 1 sector access where as 3 overflowed records requires 2 sector access

The average time = 1*{12/15} + 2*{3/15} =  0.8 + 0.4  = 1.2 sector access

6. [Points 15/15] Assume any record can be read <= 10 milliseconds. Consider a database of 14,000,000 records containing information about Texas drivers http://www.statemaster.com/graph/trn_lic_dri_tot_num-transportation-licensed-drivers-total-number , with one record per disk sector. How long would it take to find a record using:

a. binary-search technique

Time required to find record = D log B

D-> Avg time to read/write to a disk

B-> Number of records

= 10ms * log (14,000,000) =  237.389ms

b. sequential search

Time required to find record = ½ * B(D+RC)

B-> Number of records

D-> Avg time to read/write to disk

R->Record per page

C -> Avg time to process a record

= ½ * 14,000,000(10ms + 1* 0) = 80,000 secs

c. B+ tree index assuming 100 keys per node

B+ tree index using 100 keys per node = The depth for the B+ tree is3 levels= hence time required is 30 ms

7. [Points 10/10] Assume you have sufficient space and resources, how long would it take to copy a 1 TB file from an external drive connected to a USB port to your computer? State your assumptions.

This depends on the disk speed and the USB speed as well.  Assuming the Avg disk access speed is Xms and USB speed is 15Mbits/sec

= time required = Avg disk access time X + 1TB @ 15Mbits/sec  = X+148hous

All of the above and:

8. [Points 0/15] Consider a disk with block size = 512 bytes. A block pointer is 6 bytes and a record pointer is 7 bytes long. A file has 30,000 records of fixed length. Each record has the following fields and byte size: Name (30), Ssn (9), Dept (9), Address (40), Phone (10), Bdate (8), Sex (1), Job (4) and Salary (4). An additional byte is used as a deletion marked.

a. Calculate the record size R in bytes

b. Calculate the blocking factor and the number of file blocks needed to store the data, assuming an unspanned organization.

Assume the file is ordered by the key field Ssn. You need to create a B+ tree primary index.

c. Calculate the index blocking factor

d. Calculate the number of levels needed

e. Calculate the total number of blocks needed to store the index.

Course Help Features
Course Help Services
Calculator  