How to Think Like a Computer Scientist

How to Think Like a Computer Scientist | download free pdf
IndyRobinson Profile Pic
Published Date:05-07-2017
Your Website URL(Optional)
How to Think Like a Computer Scientist Learning with PythonChapter 1 The way of the program The goal of this book is to teach you to think like a computer scientist. This way of thinking combines some of the best features of mathematics, engineering, and natural science. Like mathematicians, computer scientists use formal lan- guages to denote ideas (speci cally computations). Like engineers, they design things, assembling components into systems and evaluating tradeo s among al- ternatives. Like scientists, they observe the behavior of complex systems, form hypotheses, and test predictions. The single most important skill for a computer scientist is problem solving. Problem solving means the ability to formulate problems, think creatively about solutions, and express a solution clearly and accurately. As it turns out, the process of learning to program is an excellent opportunity to practice problem- solving skills. That's why this chapter is called, \The way of the program." On one level, you will be learning to program, a useful skill by itself. On another level, you will use programming as a means to an end. As we go along, that end will become clearer. 1.1 The Python programming language The programming language you will be learning is Python. Python is an exam- ple of a high-level language; other high-level languages you might have heard of are C, C++, Perl, and Java. As you might infer from the name \high-level language," there are also low- level languages, sometimes referred to as \machine languages" or \assembly2 The way of the program languages." Loosely speaking, computers can only execute programs written in low-level languages. Thus, programs written in a high-level language have to be processed before they can run. This extra processing takes some time, which is a small disadvantage of high-level languages. But the advantages are enormous. First, it is much easier to program in a high-level language. Programs written in a high-level language take less time to write, they are shorter and easier to read, and they are more likely to be correct. Second, high-level languages are portable, meaning that they can run on di erent kinds of computers with few or no modi cations. Low-level programs can run on only one kind of computer and have to be rewritten to run on another. Due to these advantages, almost all programs are written in high-level languages. Low-level languages are used only for a few specialized applications. Two kinds of programs process high-level languages into low-level languages: interpreters and compilers. An interpreter reads a high-level program and executes it, meaning that it does what the program says. It processes the pro- gram a little at a time, alternately reading lines and performing computations. SOURCE OUTPUT INTERPRETER CODE A compiler reads the program and translates it completely before the program starts running. In this case, the high-level program is called the source code, and the translated program is called the object code or the executable. Once a program is compiled, you can execute it repeatedly without further translation. SOURCE OBJECT COMPILER OUTPUT EXECUTOR CODE CODE Python is considered an interpreted language because Python programs are ex- ecuted by an interpreter. There are two ways to use the interpreter: command- line mode and script mode. In command-line mode, you type Python programs and the interpreter prints the result:1.2 What is a program? 3 python Python 2.4.1 (1, Apr 29 2005, 00:28:56) Type "help", "copyright", "credits" or "license" for more information. print 1 + 1 2 The rst line of this example is the command that starts the Python interpreter. The next two lines are messages from the interpreter. The third line starts with , which is the prompt the interpreter uses to indicate that it is ready. We typed print 1 + 1, and the interpreter replied 2. Alternatively, you can write a program in a le and use the interpreter to execute the contents of the le. Such a le is called a script. For example, we used a text editor to create a le named with the following contents: print 1 + 1 By convention, les that contain Python programs have names that end with .py. To execute the program, we have to tell the interpreter the name of the script: python 2 In other development environments, the details of executing programs may dif- fer. Also, most programs are more interesting than this one. Most of the examples in this book are executed on the command line. Working on the command line is convenient for program development and testing, be- cause you can type programs and execute them immediately. Once you have a working program, you should store it in a script so you can execute or modify it in the future. 1.2 What is a program? A program is a sequence of instructions that speci es how to perform a com- putation. The computation might be something mathematical, such as solving a system of equations or nding the roots of a polynomial, but it can also be a symbolic computation, such as searching and replacing text in a document or (strangely enough) compiling a program. The details look di erent in di erent languages, but a few basic instructions appear in just about every language:4 The way of the program input: Get data from the keyboard, a le, or some other device. output: Display data on the screen or send data to a le or other device. math: Perform basic mathematical operations like addition and multiplication. conditional execution: Check for certain conditions and execute the appro- priate sequence of statements. repetition: Perform some action repeatedly, usually with some variation. Believe it or not, that's pretty much all there is to it. Every program you've ever used, no matter how complicated, is made up of instructions that look more or less like these. Thus, we can describe programming as the process of breaking a large, complex task into smaller and smaller subtasks until the subtasks are simple enough to be performed with one of these basic instructions. That may be a little vague, but we will come back to this topic later when we talk about algorithms. 1.3 What is debugging? Programming is a complex process, and because it is done by human beings, it often leads to errors. For whimsical reasons, programming errors are called bugs and the process of tracking them down and correcting them is called debugging. Three kinds of errors can occur in a program: syntax errors, runtime errors, and semantic errors. It is useful to distinguish between them in order to track them down more quickly. 1.3.1 Syntax errors Python can only execute a program if the program is syntactically correct; otherwise, the process fails and returns an error message. Syntax refers to the structure of a program and the rules about that structure. For example, in English, a sentence must begin with a capital letter and end with a period. this sentence contains a syntax error. So does this one For most readers, a few syntax errors are not a signi cant problem, which is why we can read the poetry of e. e. cummings without spewing error messages. Python is not so forgiving. If there is a single syntax error anywhere in your program, Python will print an error message and quit, and you will not be able to run your program. During the rst few weeks of your programming career, you will probably spend a lot of time tracking down syntax errors. As you gain experience, though, you will make fewer errors and nd them faster.1.3 What is debugging? 5 1.3.2 Runtime errors The second type of error is a runtime error, so called because the error does not appear until you run the program. These errors are also called excep- tions because they usually indicate that something exceptional (and bad) has happened. Runtime errors are rare in the simple programs you will see in the rst few chapters, so it might be a while before you encounter one. 1.3.3 Semantic errors The third type of error is the semantic error. If there is a semantic error in your program, it will run successfully, in the sense that the computer will not generate any error messages, but it will not do the right thing. It will do something else. Speci cally, it will do what you told it to do. The problem is that the program you wrote is not the program you wanted to write. The meaning of the program (its semantics) is wrong. Identifying semantic errors can be tricky because it requires you to work backward by looking at the output of the program and trying to gure out what it is doing. 1.3.4 Experimental debugging One of the most important skills you will acquire is debugging. Although it can be frustrating, debugging is one of the most intellectually rich, challenging, and interesting parts of programming. In some ways, debugging is like detective work. You are confronted with clues, and you have to infer the processes and events that led to the results you see. Debugging is also like an experimental science. Once you have an idea what is going wrong, you modify your program and try again. If your hypothesis was correct, then you can predict the result of the modi cation, and you take a step closer to a working program. If your hypothesis was wrong, you have to come up with a new one. As Sherlock Holmes pointed out, \When you have eliminated the impossible, whatever remains, however improbable, must be the truth." (A. Conan Doyle, The Sign of Four) For some people, programming and debugging are the same thing. That is, programming is the process of gradually debugging a program until it does what you want. The idea is that you should start with a program that does something and make small modi cations, debugging them as you go, so that you always have a working program.6 The way of the program For example, Linux is an operating system that contains thousands of lines of code, but it started out as a simple program Linus Torvalds used to explore the Intel 80386 chip. According to Larry Green eld, \One of Linus's earlier projects was a program that would switch between printing AAAA and BBBB. This later evolved to Linux." (The Linux Users' Guide Beta Version 1) Later chapters will make more suggestions about debugging and other program- ming practices. 1.4 Formal and natural languages Natural languages are the languages that people speak, such as English, Spanish, and French. They were not designed by people (although people try to impose some order on them); they evolved naturally. Formal languages are languages that are designed by people for speci c appli- cations. For example, the notation that mathematicians use is a formal language that is particularly good at denoting relationships among numbers and symbols. Chemists use a formal language to represent the chemical structure of molecules. And most importantly: Programming languages are formal languages that have been designed to express computations. Formal languages tend to have strict rules about syntax. For example, 3+3 = 6 is a syntactically correct mathematical statement, but 3=+6 is not. H O is a 2 syntactically correct chemical name, but Zz is not. 2 Syntax rules come in two avors, pertaining to tokens and structure. Tokens are the basic elements of the language, such as words, numbers, and chemical elements. One of the problems with 3=+6 is that is not a legal token in mathematics (at least as far as we know). Similarly, Zz is not legal because 2 there is no element with the abbreviation Zz. The second type of syntax error pertains to the structure of a statementthat is, the way the tokens are arranged. The statement 3=+6 is structurally illegal because you can't place a plus sign immediately after an equal sign. Similarly, molecular formulas have to have subscripts after the element name, not before. As an exercise, create what appears to be a well-structured English sentence with unrecognizable tokens in it. Then write another sen- tence with all valid tokens but with invalid structure.1.4 Formal and natural languages 7 When you read a sentence in English or a statement in a formal language, you have to gure out what the structure of the sentence is (although in a natural language you do this subconsciously). This process is called parsing. For example, when you hear the sentence, \The other shoe fell," you understand that \the other shoe" is the subject and \fell" is the predicate. Once you have parsed a sentence, you can gure out what it means, or the semantics of the sentence. Assuming that you know what a shoe is and what it means to fall, you will understand the general implication of this sentence. Although formal and natural languages have many features in commontokens, structure, syntax, and semanticsthere are many di erences: ambiguity: Natural languages are full of ambiguity, which people deal with by using contextual clues and other information. Formal languages are designed to be nearly or completely unambiguous, which means that any statement has exactly one meaning, regardless of context. redundancy: In order to make up for ambiguity and reduce misunderstand- ings, natural languages employ lots of redundancy. As a result, they are often verbose. Formal languages are less redundant and more concise. literalness: Natural languages are full of idiom and metaphor. If I say, \The other shoe fell," there is probably no shoe and nothing falling. Formal languages mean exactly what they say. People who grow up speaking a natural languageeveryoneoften have a hard time adjusting to formal languages. In some ways, the di erence between formal and natural language is like the di erence between poetry and prose, but more so: Poetry: Words are used for their sounds as well as for their meaning, and the whole poem together creates an e ect or emotional response. Ambiguity is not only common but often deliberate. Prose: The literal meaning of words is more important, and the structure con- tributes more meaning. Prose is more amenable to analysis than poetry but still often ambiguous. Programs: The meaning of a computer program is unambiguous and literal, and can be understood entirely by analysis of the tokens and structure. Here are some suggestions for reading programs (and other formal languages). First, remember that formal languages are much more dense than natural lan- guages, so it takes longer to read them. Also, the structure is very important, so8 The way of the program it is usually not a good idea to read from top to bottom, left to right. Instead, learn to parse the program in your head, identifying the tokens and interpreting the structure. Finally, the details matter. Little things like spelling errors and bad punctuation, which you can get away with in natural languages, can make a big di erence in a formal language. 1.5 The rst program Traditionally, the rst program written in a new language is called \Hello, World" because all it does is display the words, \Hello, World" In Python, it looks like this: print "Hello, World" This is an example of a print statement, which doesn't actually print anything on paper. It displays a value on the screen. In this case, the result is the words Hello, World The quotation marks in the program mark the beginning and end of the value; they don't appear in the result. Some people judge the quality of a programming language by the simplicity of the \Hello, World" program. By this standard, Python does about as well as possible. 1.6 Glossary problem solving: The process of formulating a problem, nding a solution, and expressing the solution. high-level language: A programming language like Python that is designed to be easy for humans to read and write. low-level language: A programming language that is designed to be easy for a computer to execute; also called \machine language" or \assembly lan- guage." portability: A property of a program that can run on more than one kind of computer. interpret: To execute a program in a high-level language by translating it one line at a time.1.6 Glossary 9 compile: To translate a program written in a high-level language into a low- level language all at once, in preparation for later execution. source code: A program in a high-level language before being compiled. object code: The output of the compiler after it translates the program. executable: Another name for object code that is ready to be executed. script: A program stored in a le (usually one that will be interpreted). program: A set of instructions that speci es a computation. algorithm: A general process for solving a category of problems. bug: An error in a program. debugging: The process of nding and removing any of the three kinds of programming errors. syntax: The structure of a program. syntax error: An error in a program that makes it impossible to parse (and therefore impossible to interpret). runtime error: An error that does not occur until the program has started to execute but that prevents the program from continuing. exception: Another name for a runtime error. semantic error: An error in a program that makes it do something other than what the programmer intended. semantics: The meaning of a program. natural language: Any one of the languages that people speak that evolved naturally. formal language: Any one of the languages that people have designed for speci c purposes, such as representing mathematical ideas or computer programs; all programming languages are formal languages. token: One of the basic elements of the syntactic structure of a program, anal- ogous to a word in a natural language. parse: To examine a program and analyze the syntactic structure. print statement: An instruction that causes the Python interpreter to display a value on the screen.10 The way of the programChapter 2 Variables, expressions and statements 2.1 Values and types A value is one of the fundamental thingslike a letter or a numberthat a program manipulates. The values we have seen so far are 2 (the result when we added 1 + 1), and "Hello, World". These values belong to di erent types: 2 is an integer, and "Hello, World" is a string, so-called because it contains a \string" of letters. You (and the interpreter) can identify strings because they are enclosed in quotation marks. The print statement also works for integers. print 4 4 If you are not sure what type a value has, the interpreter can tell you. type("Hello, World") type 'str' type(17) type 'int' Not surprisingly, strings belong to the type str and integers belong to the type int. Less obviously, numbers with a decimal point belong to a type calledfloat, because these numbers are represented in a format called oating-point.12 Variables, expressions and statements type(3.2) type 'float' What about values like "17" and "3.2"? They look like numbers, but they are in quotation marks like strings. type("17") type 'str' type("3.2") type 'str' They're strings. When you type a large integer, you might be tempted to use commas between groups of three digits, as in 1,000,000. This is not a legal integer in Python, but it is a legal expression: print 1,000,000 1 0 0 Well, that's not what we expected at all Python interprets 1,000,000 as a comma-separated list of three integers, which it prints consecutively. This is the rst example we have seen of a semantic error: the code runs without producing an error message, but it doesn't do the \right" thing. 2.2 Variables One of the most powerful features of a programming language is the ability to manipulate variables. A variable is a name that refers to a value. The assignment statement creates new variables and gives them values: message = "What's up, Doc?" n = 17 pi = 3.14159 This example makes three assignments. The rst assigns the string "What's up, Doc?" to a new variable named message. The second gives the integer 17 to n, and the third gives the oating-point number 3.14159 to pi. A common way to represent variables on paper is to write the name with an arrow pointing to the variable's value. This kind of gure is called a state diagram because it shows what state each of the variables is in (think of it as the variable's state of mind). This diagram shows the result of the assignment statements:2.3 Variable names and keywords 13 message "What’s up, Doc?" n 17 pi 3.14159 The print statement also works with variables. print message What's up, Doc? print n 17 print pi 3.14159 In each case the result is the value of the variable. Variables also have types; again, we can ask the interpreter what they are. type(message) type 'str' type(n) type 'int' type(pi) type 'float' The type of a variable is the type of the value it refers to. 2.3 Variable names and keywords Programmers generally choose names for their variables that are meaningful they document what the variable is used for. Variable names can be arbitrarily long. They can contain both letters and num- bers, but they have to begin with a letter. Although it is legal to use uppercase letters, by convention we don't. If you do, remember that case matters. Bruce and bruce are di erent variables. The underscore character ( ) can appear in a name. It is often used in names with multiple words, such as my name or price of tea in china. If you give a variable an illegal name, you get a syntax error:14 Variables, expressions and statements 76trombones = "big parade" SyntaxError: invalid syntax more = 1000000 SyntaxError: invalid syntax class = "Computer Science 101" SyntaxError: invalid syntax 76trombones is illegal because it does not begin with a letter. more is illegal because it contains an illegal character, the dollar sign. But what's wrong with class? It turns out that class is one of the Python keywords. Keywords de ne the language's rules and structure, and they cannot be used as variable names. Python has twenty-nine keywords: and def exec if not return assert del finally import or try break elif for in pass while class else from is print yield continue except global lambda raise You might want to keep this list handy. If the interpreter complains about one of your variable names and you don't know why, see if it is on this list. 2.4 Statements A statement is an instruction that the Python interpreter can execute. We have seen two kinds of statements: print and assignment. When you type a statement on the command line, Python executes it and displays the result, if there is one. The result of a print statement is a value. Assignment statements don't produce a result. A script usually contains a sequence of statements. If there is more than one statement, the results appear one at a time as the statements execute. For example, the script print 1 x = 2 print x produces the output2.5 Evaluating expressions 15 1 2 Again, the assignment statement produces no output. 2.5 Evaluating expressions An expression is a combination of values, variables, and operators. If you type an expression on the command line, the interpreter evaluates it and displays the result: 1 + 1 2 Although expressions contain values, variables, and operators, not every ex- pression contains all of these elements. A value all by itself is considered an expression, and so is a variable. 17 17 x 2 Confusingly, evaluating an expression is not quite the same thing as printing a value. message = "What's up, Doc?" message "What's up, Doc?" print message What's up, Doc? When the Python interpreter displays the value of an expression, it uses the same format you would use to enter a value. In the case of strings, that means that it includes the quotation marks. But if you use a print statement, Python displays the contents of the string without the quotation marks. In a script, an expression all by itself is a legal statement, but it doesn't do anything. The script 17 3.2 "Hello, World" 1 + 1 produces no output at all. How would you change the script to display the values of these four expressions?16 Variables, expressions and statements 2.6 Operators and operands Operators are special symbols that represent computations like addition and multiplication. The values the operator uses are called operands. The following are all legal Python expressions whose meaning is more or less clear: 20+32 hour-1 hour60+minute minute/60 52 (5+9)(15-7) The symbols +, -, and /, and the use of parenthesis for grouping, mean in Python what they mean in mathematics. The asterisk () is the symbol for multiplication, and is the symbol for exponentiation. When a variable name appears in the place of an operand, it is replaced with its value before the operation is performed. Addition, subtraction, multiplication, and exponentiation all do what you ex- pect, but you might be surprised by division. The following operation has an unexpected result: minute = 59 minute/60 0 The value of minute is 59, and in conventional arithmetic 59 divided by 60 is 0.98333, not 0. The reason for the discrepancy is that Python is performing integer division. When both of the operands are integers, the result must also be an integer, and by convention, integer division always rounds down, even in cases like this where the next integer is very close. A possible solution to this problem is to calculate a percentage rather than a fraction: minute100/60 98 Again the result is rounded down, but at least now the answer is approximately correct. Another alternative is to use oating-point division, which we get to in Chapter 3. 2.7 Order of operations When more than one operator appears in an expression, the order of evalu- ation depends on the rules of precedence. Python follows the same prece- dence rules for its mathematical operators that mathematics does. The acronym PEMDAS is a useful way to remember the order of operations:2.8 Operations on strings 17  Parentheses have the highest precedence and can be used to force an expression to evaluate in the order you want. Since expressions in paren- theses are evaluated rst,2 (3-1) is 4, and(1+1)(5-2) is 8. You can also use parentheses to make an expression easier to read, as in (minute 100) / 60, even though it doesn't change the result.  Exponentiation has the next highest precedence, so 21+1 is 3 and not 4, and 313 is 3 and not 27.  Multiplication and Division have the same precedence, which is higher than Addition and Subtraction, which also have the same precedence. So 23-1 yields 5 rather than 4, and 2/3-1 is -1, not 1 (remember that in integer division, 2/3=0).  Operators with the same precedence are evaluated from left to right. So in the expression minute100/60, the multiplication happens rst, yielding 5900/60, which in turn yields 98. If the operations had been evaluated from right to left, the result would have been 591, which is 59, which is wrong. 2.8 Operations on strings In general, you cannot perform mathematical operations on strings, even if the strings look like numbers. The following are illegal (assuming that message has type string): message-1 "Hello"/123 message"Hello" "15"+2 Interestingly, the + operator does work with strings, although it does not do exactly what you might expect. For strings, the + operator represents concate- nation, which means joining the two operands by linking them end-to-end. For example: fruit = "banana" bakedGood = " nut bread" print fruit + bakedGood The output of this program is banana nut bread. The space before the word nut is part of the string, and is necessary to produce the space between the concatenated strings. The operator also works on strings; it performs repetition. For example, "Fun"3 is "FunFunFun". One of the operands has to be a string; the other has to be an integer.18 Variables, expressions and statements On one hand, this interpretation of+ and makes sense by analogy with addition and multiplication. Just as 43 is equivalent to 4+4+4, we expect "Fun"3 to be the same as "Fun"+"Fun"+"Fun", and it is. On the other hand, there is a signi cant way in which string concatenation and repetition are di erent from integer addition and multiplication. Can you think of a property that addition and multiplication have that string concatenation and repetition do not? 2.9 Composition So far, we have looked at the elements of a programvariables, expressions, and statementsin isolation, without talking about how to combine them. One of the most useful features of programming languages is their ability to take small building blocks and compose them. For example, we know how to add numbers and we know how to print; it turns out we can do both at the same time: print 17 + 3 20 In reality, the addition has to happen before the printing, so the actions aren't actually happening at the same time. The point is that any expression involving numbers, strings, and variables can be used inside a print statement. You've already seen an example of this: print "Number of minutes since midnight: ", hour60+minute You can also put arbitrary expressions on the right-hand side of an assignment statement: percentage = (minute 100) / 60 This ability may not seem impressive now, but you will see other examples where composition makes it possible to express complex computations neatly and concisely. Warning: There are limits on where you can use certain expressions. For exam- ple, the left-hand side of an assignment statement has to be a variable name, not an expression. So, the following is illegal: minute+1 = hour. 2.10 Comments As programs get bigger and more complicated, they get more dicult to read. Formal languages are dense, and it is often dicult to look at a piece of code and gure out what it is doing, or why.2.11 Glossary 19 For this reason, it is a good idea to add notes to your programs to explain in natural language what the program is doing. These notes are called comments, and they are marked with the symbol: compute the percentage of the hour that has elapsed percentage = (minute 100) / 60 In this case, the comment appears on a line by itself. You can also put comments at the end of a line: percentage = (minute 100) / 60 caution: integer division Everything from the to the end of the line is ignoredit has no e ect on the program. The message is intended for the programmer or for future program- mers who might use this code. In this case, it reminds the reader about the ever-surprising behavior of integer division. This sort of comment is less necessary if you use the integer division operation, 1 //. It has the same e ect as the division operator , but it signals that the e ect is deliberate. percentage = (minute 100) // 60 The integer division operator is like a comment that says, \I know this is integer division, and I like it that way" 2.11 Glossary value: A number or string (or other thing to be named later) that can be stored in a variable or computed in an expression. type: A set of values. The type of a value determines how it can be used in expressions. So far, the types you have seen are integers (type int), oating-point numbers (type float), and strings (type string). oating-point: A format for representing numbers with fractional parts. variable: A name that refers to a value. statement: A section of code that represents a command or action. So far, the statements you have seen are assignments and print statements. assignment: A statement that assigns a value to a variable. 1 For now. The behavior of the division operator may change in future versions of Python.