Relational database questions

relational database questions and answers pdf and relational database schema examples
HalfoedGibbs Profile Pic
HalfoedGibbs,United Kingdom,Professional
Published Date:02-08-2017
Your Website URL(Optional)
12 relational databases Can you relate? There is just one of me, but so many of them… How do you structure really, really multivariate data? A spreadsheet has only two dimensions: rows and columns. And if you have a bunch of dimensions of data, the tabular format gets old really quickly. In this chapter, you’re about to see firsthand where spreadsheets make it really hard to manage multivariate data and learn how relational database management systems make it easy to store and retrieve countless permutations of multivariate data. this is a new chapter 359magazine performance analysis The Dataville Dispatch wants to analyze sales The Dataville Dispatch is a popular news magazine, read by most of Dataville’s residents. And the Dispatch has a very specific question for you: they want to tie the number of articles per issue to sales of their magazine and find an optimum number of articles to write. They want each issue be as cost effective as possible. If putting a hundred articles in each issue doesn’t get them any more sales than putting fifty articles in each issue, they don’t want to do it. On the other hand, if fifty article issues correlate to more sales than ten article issues, they’ll want to go with the fifty articles. They’ll give you free advertising for your analytics business for a year if you can give them a thorough analysis of these variables. 360 Chapter 12 TODAY The Perfect Equation? PMP: Software HF Stats FTW the Development Innovative book helps thousands new How a whiteboard could grapple with numbers save your next project craze DaTavi LLe DiSpa TCH The Secret Life of the World’s Databases Newly published tables reveal surprising new connections www.headfirstlabs.comLooks like they keep track of a lot of stuff. Sales relational databases Here’s the data they keep to track their operations The Dispatch has sent you the data they use to manage their operations as four separate spreadsheet files. The files all relate to each other in some way, and in order to analyze them, you’ll need to figure out how. ? ? Issues ? ? ? Articles What do you need to know in order to compare articles to sales? you are here 4 361 Authors How do these data tables relate to each other? f t , t t , t t t t , t f f , t . t t t l i ll ll i i i From: Dataville Dispatch To: Head First Subject: About our data Well, each issue of the magazine has a bunch o ar icles and each ar icle has an au hor so in our da a we ie he au hors to the articles. When we have an issue ready we call our lis o wholesalers. They place orders or each issue which we record in our sales able The “lo size” in he able you’re ook ng at counts the number of copies of that issue that we se —usua y n denom nat ons of 100, but sometimes we sell less. Does that help? — DD They have a lot of stuff to record, which is why they need all these spreadsheets. how does the data relate? You need to know how the data tables relate to each other The table or tables you create to get the answers that the Dispatch wants will tie article count to sales. So you need to know how all these tables relate to each other. What specific data fields tie them together? And beyond that, what is the meaning of the relationships? 362 Chapter 12 Here is what the Dispatch has to say about how they maintain their data.Sales Draw arrows between the tables and describe how each relates to the other. relational databases Draw arrows and use words to describe the relationship between the things being recorded in each spreadsheet. Issues Articles you are here 4 363 AuthorsSales relationships identified What relationships did you discover among the spreadsheets that the Dataville Dispatch keeps? Each sale refers to a bundle of copies (usually around 100) of one issue. Issues Articles 364 Chapter 12 Authors Each issue consists of a bunch of articles. Each author writes a bunch of articles.Out-of-the-box implementation There’s a ton of different kinds of database software out there. For organizations that collect the same type of data, out-of-the-box databases specifically manage that sort of data. relational databases A database is a collection of data with well-specified relations to each other A database is a table or collection of tables that manage data in a way that makes these relationships explicit to each other. Database software manages these tables, and you have a lot of different choices of database software. Database Database Database software Other times, people need something really specific to their needs, and they’ll make What’s really important is that you know the their own database with Oracle, MySQL, relationships within the software of the data or something else under the hood. you want to record. So how do you use this knowledge to calculate article count and sales total for each issue? you are here 4 365 Here’s the big question. Custom-made implementationIn the next exercise, you’ll calculate these values. Sales follow the bread crumbs Trace a path through the relations to make the comparison you need When you have a bunch of tables that are separate but linked through their data, and you have a question you want to answer that involves multiple tables, you need to trace the paths among the tables that are relevant. Issues Articles Create a spreadsheet that goes across that path Once you know which tables you need, then you can come up with a plan to tie the data Issue Article count Sales Total together with formulas. 1 5 1250 Here, you need a table that compares article 2 7 1800 count and sales for each issue. You’ll need to write formulas to calculate those values. 3 8 1500 4 6 1000 366 Chapter 12 Authors You’ll need formulas for these. These are the tables you need to pull together. This spreadsheet isn’t going to help you compare article count and sales.Save this file under a new name, so you don’t destroy the original data. relational databases Let’s create a spreadsheet like the one in the facing page and start by calculating the “Article count” for each issue of the Dispatch. Open the hfda_ch12_issues.csv file and save a copy for your work. Remember, you don’t want to mess 1 up an original file Call your new file “dispatch analysis.xls”. Load these hfda_ch12_issues.csv hfda_ch12_articles.csv hfda_ch12_issues.csv dispatch analysis.xls 2 Open hfda_ch12_articles.csv and right-click an the tab that list the file name at the bottom of the sheet. Tell your spreadsheet to move the file to your dispatch analysis.xls document. 3 Create a column for Article count on your issue sheet. Write a COUNTIF formula to count the number of articles for that issue, and copy and paste that formula for each issue. you are here 4 367 Put your COUNTIF formula here. Copy your articles sheet to your new document.The formula looks at the “articles” tab in your spreadsheet. article countin’ What sort of article count did you find each issue to have? 1 Open the hfda_ch12_issues.csv file and save a copy for your work. Remember, you don’t want to mess up an original file Call your new file “dispatch analysis.xls”. 2 Open hfda_ch12_articles.csv and right-click an the tab that list the file name at the bottom of the sheet. Tell your spreadsheet to move the file to your dispatch analysis.xls document. 3 Create a column for Article count on your issue sheet. Write a COUNTIF formula to count the number of articles for that issue, and copy and paste that formula for each issue. =COUNTIF(hfda_ch12_articles.csvB:B,hfda_ch12_issues.csvA2) 368 Chapter 12 It counts the number of times each issue shows up in the list of articles. This is the “articles” tab in your dispatch analysis spreadsheet.relational databases Cool When you add the sales figures to your spreadsheet, keep in mind that the numbers just refer to units of the magazine, not dollars. I really just need you to measure sales in terms of the number of magazines sold, not in dollar terms. Sounds good… let’s add sales to this list Add a field for sales totals to the spreadsheet you are creating. Load this hfda_ch12_sales.csv 1 Copy the hfda_ch12_sales.csv file as a new tab in your dispatch analysis.xls. Create a new column for Sales on the same sheet you used to count the articles. 2 Use the SUMIF formula to tally the sales figures for issueID 1, putting the formula in cell C2. Copy that formula and then paste it for each of the other issues. you are here 4 369 Here’s the Dispatch’s managing editor. Add this column and put your new formulas here.This formula shows that issue 1 sold 2,227 units. incorporate sales What formula did you use to add sales to your spreadsheet? The first argument of the SUMIF formula looks at the issues. =SUMIF(hfda_ch12_sales.csvB:B, hfda_ch12_issues.csvA2, hfda_ch12_sales.csvC:C) 370 Chapter 12 The second argument looks at the specific issue whose sales you want to count. The third argument points to the actual sales figures you want to sum.Save your spreadsheet data as a CSV in R’s working directory. relational databases Your summary ties article count and sales together This is exactly the spreadsheet you need to tell you whether there is a relationship between the number of articles that the Dataville Dispatch publishes every issue and their sales. This seems nice. But it’d be a little easier to understand if it were made into a scatterplot. Have you ever heard of scatterplots? Definitely Let’s let him have it… 1 Open R and type the getwd() command to figure out where R keeps its data files. Then, in your spreadsheet, go to File Save As... and save your data as a CSV into that directory. Execute this command to load your data into R: dispatch - read.csv("dispatch analysis.csv", header=TRUE) 2 Once you have your data loaded, execute this function. Do you see an optimal value? plot(Salesjitter(Article.count),data=dispatch) you are here 4 371 This function tells you R’s working directory, where it looks for files. Name your file dispatch analysis.csv. You’ll see how jitter works in a second…The head command shows you what you have just loaded... it’s always good to check. The jitter command adds a little bit of noise to your numbers, which separates them a little and makes them easier to see on the scatterplot. engage the optimimum Did you find an optimal value in the data you loaded? The optimum appears to be around 10 articles. 372 Chapter 12 Try running the same command without adding jitter; isn’t the result hard to read? Use this command to load your CSV into R. Here’s the command that creates your scatterplot. Make sure that the field names in your plot formula match the field names that head shows you are contained in the data frame.…until there are about 10 articles, at which point having more articles doesn’t associate with increased sales. relational databases 5 10 15 20 25 30 jitter(counts) you are here 4 373 It looks like there’s a pretty steady increase in sales as more articles are added… Writing more than 10 articles doesn’t seem to increase sales, while writing fewer than 10 seems to decrease sales, so it appears that the Dispatch should stick with around 10 articles. When there were five articles in the Dispatch, only 1,000 or fewer copies of the issue sold. salesTotal 1000 2000 3000 4000i i ll i l ’ i l i i li i l i l i ’ll l l i ’ i l i li i From: Dataville Dispatch Subject: Thank you Thank you Th s s a rea y b g he p for us. I d k nd of suspected that a re at onsh p ke th s was the case, and your ana ys s demonstrated it dramatically. And congratu at ons on a free year of ads It be our p easure to he p spread the word about your amazing skills. I th nk I m go ng to have a ot more quest ons for you ke th s one. Do you think you can handle the work? — DD happy clients Looks like your scatterplot is going over really well Q: Do people actually store data in linked spreadsheets like A: Well, you’re not always so lucky to recieve data from multiple that? tables that have neat little codes linking them together. Often, the data comes to you in a messy state, and in order to make the spreadsheets work together with formulas, you need to do some A: Definitely. Sometimes you’ll receive extracts from larger clean-up work on the data. You’ll learn more about how to do that in databases, and sometimes you’ll get data that people have manually the next chapter. kept linked together like that. Q: Is there some better software mechanism for tying data Q: Basically, as long as there are those codes that the from different tables together? formulas can read, linking everything with spreadsheets is tedious but not impossible. A: You’d think so, right? 374 Chapter 12 He’s downright effusive Sounds like more work for you... awesomerelational databases Copying and pasting all that data was a pain It would suck to go through that process every time someone wanted to query (that is, to ask a question of) their data. Besides, aren’t computers supposed to be able to do all that grunt work for you? Wouldn't it be dreamy if there were a way to maintain data relations in a way that’d make it easier to ask the database questions? But I know it's just a fantasy… you are here 4 375... and build one of these. meet rdbms Relational databases manage relations for you One of the most important and powerful ways of managing data is the RDBMS or relational database management system. Relational databases are a huge topic, and the more you understand them, the more use you’ll be able to squeeze out of any data you have stored in them. RDBMS_table RDBMS_table data data data data data data data data RDBMS_table data data data data data RDBMS_table data data data data What is important for you to know is that the relations that Keys are values that the database enforces among tables are quantitative. The identify records uniquely. database doesn’t care what an “issue” or an “author” is; it just knows that one issue has multiple authors. If the Dataville Dispatch had a Each row of the RDBMS has a unique key, which you’ll often RDBMS, it would be a lot easier see called IDs, and it it uses the keys to make sure that these quantitative relationships are never violated. Once you have a to come up with analyses like the RDBMS, watch out: well-formed relational data is a treasure one you just did. trove for data analysts. 376 Chapter 12 This field is a key. This diagram shows the relationships and data tables inside a relational database. Dataville Dispatch needs to get away from these...Draw the table you’d need to have here. This field is new. relational databases Now that we’ve found the optimum Dataville Dispatch built an RDBMS article count, we should figure out who our most popular authors are so that we can with your relationship diagram make sure they’re always in each issue. You could count the web hits and comments that It was about time that the Dispatch loaded all each article gets for each author. those spreadsheets into a real RDBMS. With the diagram you brainstormed, along with the managing editor’s explanation of their data, a database architect pulled together this relational database. Wholesalers Here is the schema for the Dataville WholesalerID Dispatch’s database. Circle the tables data field that you’d need to pull together into data field a single table in order to show which data field author has the articles with the most web hits and web comments. Issues Sales Then draw the table below that IssueID SaleID would show the fields you’d need in data field order create those scatterplots. IssueID EditorID WholesalerID data field data field data field Articles ArticleID Authors IssueID AuthorID AuthorName AuthorID data field web hits data field CommentID WebComments CommentID ArticleID data field data field you are here 4 377 This tables is new… it’s a listing of the comments each article gets in the online edition.You need a table that draws these three tables from the database together. Ann is the author of both articles 1 and 2 for this hypothetical table. Comment Article Author Web Hits Count 1 Ann 2016 20 2 Ann 2016 15 3 Cathy 2164 40 4 Jeff 749 5 5 George 749 14 find your tables What tables do you need to join together so that you can v evaluate each author’s popularity, by counting the web hits and comments that author receives? Wholesalers WholesalerID data field data field data field Issues Sales IssueID SaleID data field IssueID EditorID WholesalerID data field data field data field Articles ArticleID Authors IssueID AuthorID AuthorName AuthorID data field web hits data field CommentID Comments CommentID ArticleID data field data field 378 Chapter 12 In the last table you used, each row represented an issue, but now each row represents an article.