MIT Courses #5 – Intro to Electrical Engineering and Computer Science

Welcome back to part five of my project of going through MIT computer science courses to try and figure out what I missed by not taking a traditional computer science route. This course is an interesting one as it’s an introduction to MIT’s series of EECS classes which span a variety of focuses. Compared to the straightforward focuses of the previous four courses, this course is a more surface level introduction to a variety of topics. Starting off, it gives a basic introduction of Python largely focusing around building a state machine class and making use of it. Section 2 focuses on designing and analyzing physical systems using equations and block diagrams. Section 3 is an introduction of circuits both electrical and as a tool for analyzing other systems. The last section covers probability and some basic searching algorithms. The main theme of the course drilled in through most of the course is the idea of building modular simplistic components and using them to design more complex systems.

The actual course has a lab component where students would build a robot that detects walls and determines its path accordingly. There are a lot of references and analysis of this through out, but there are definitely some clear takeaways and experience that I’m not getting and I can’t say I have much experience with robots in general. Additionally, some of the circuits mentioned in the lab guide aren’t used for any of the analysis in the circuits section, including a diode which is a non-linear component.

Never-the-less, this is an interesting course and the lectures, recitations and readings provide a lot of useful information. The wide variety of topics keeps it interesting lecture to lecture and the readings are from a custom book written for the course that honestly provides a better text version than my notes will, at the cost of being hundreds of pages longer and I can definitely see referencing some of them. Lastly, as a note, a lot of the code involves MIT specific Python packages and wouldn’t work without those packages, though the concepts for state machine and search make sense and can be implemented in different ways.

6.01SC – Introduction to Electrical Engineering and Computer Science 1

Unit 1 – Software Engineering

Lecture 1: Object-Oriented Programming

Dennis Freeman introduces himself as the lecturer of an “unusual course”. 6.01 is about ways to think about engineering and gives a distinct perspective. How do you design, make and debug complicated systems that are reliable? Laptops for example are made up of microscopic components such as transistors that require special tools to visualize them. How can you figure out how something will work before actually constructing it. This can be done using models to predict. Physical systems can be augmented with computation and systems should be robust to failure. EECS is an enormous topic that can’t possibly be covered in the course but this gives a good variety using modes of reasoning in a variety of contexts and are split into four units, each focusing on a few key concepts to go into depth on. Learning is done best in an iterative process, learning a little, trying it out and then learning more and implementing it.

The first module is software engineering which narrowly focuses on abstraction and modularity. This is covered at the small scale such as instructions on a computer and at the large scale with state machines which work in steps and get a new input every step, using its memory to determine an output. One you compose state machines, you can join two together to make a more complicated one. A theme of this course is making complex behavior out of simpler ones, such as controlling a robot via state machines. Systems are thought about with the acronym, PCAP, breaking it into primitives and how they’re combined and abstractive as well as important patterns.

The second module is on signals and systems focusing solely on “discrete-time feedback,” making a system that knows what it’s done and can act with past awareness. Robotic steering is an example of this, comparing how a person drives to a sequence of steps a robot could use. Innocent looking algorithms can have terrible performance but behavior can be predicted by a model. The focus is on how to make a model to predict behavior and figure out a better algorithm. This lecture focuses on building and analyzings models.

The third module is focused on circuits, focusing on adding a sensory capability to a complicated system. The main example of this starts with a robot, and the goal is to design a “head,” such as adding a light sensor to a moving robot to track light and move its neck toward the light. This will require making a circuit that connects the sensor to the motor to control the head and then tell the robot to move accordingly.

The fourth module is on probability and planning and modelling uncertainty and making complicated plans to act accordingly. How can a robot figure out where it is or act in a maze it’s unfamiliar with? The example has a robot on one side of the maze that knows of it’s target on the other side without knowing about the walls between them. The robot can use sonar sensors to figure out walls to avoid and compute a new plan to get there. As it drives it makes a map of the maze to come up with a solution.

Software engineering is the first topic because it’s a convenient language for thinking about engineering and design issues throughout the class. The first two weeks of the course are about making people comfortable with programming in Python. Python is chosen as it can help illustrate important concepts in a simple context. Because it’s an interpreter, after initializing, python goes into an interpreter loop, where it asks the user what they want done, reads what they type, and interprets the input and executes the respective task, then it repeats. The fact that python is interpreter based is useful as you can learn by doing. Errors are python trying to tell you what it’s trying to do on your behalf and might not know what you’re talking about.

Python operations can take in complicated expressions such as a math equation and stores it as a data structure. A system that does this is called compositional. For example, you can think of the sum of two numbers as a single integer instead of an equation and describe multiple objects in a sentence with “and” and “are” instead of “is”. The goal is to use this idea as a starting point and make a more general compositional system, in this case using names for functions allowing an easy way to call an arbitrary amount of lines of code. An example of this is a square function which squares the input and returns it. A very small step illustrating the point that python provides compositional functionality. It also allows hierarchical construction of operators such as adding squares or squaring squares. Adding a sumOfSquares function that returns square(x)+square(y) let’s sumOfSquares trust that square knows how to calculate the square and just needs to add them together. This is a two step process and an example of building more complicated systems with simpler parts. Lists are another example of compositional functionality and can store any types of elements. Data structures can be assigned to variables for easier use similar to calling a function’s name.

Another example of taking smaller operations and combining them into larger operations is via classes. The first idea is composing operations and associating names with values and functions. A class is a higher order construct that combines data and procedures as attributes and methods. After defining a class, you can use it to define an instance, which is a data structure that inherits elements from the class along with new unique attributes or methods potentially. You can also use a class to define a subclass which inherits the parent class’s attributes. We’ve seen ways of naming functions, variables and classes so far.

Python associates names and entities in a straightforward fashion in binding environments. Python builds an environment associating variable names with their values, then when a variable is requested, python checks the binding environment to see if it has a store for the variable. In general there’s a global environment that python modifies. First it evaluates the right hand side and then binds the left hand side to the value of the right side, explaining why b=b+1 increments b. In other words, it evaluates the right hand side with respect to the current environment and changes the current environment to reflect the new binding. In the case of subroutines, a procedure name is bound to a procedure with an input such as x and a body which is what the function will do. This will create a binding from the name to the function. The function is a data structure outside of the environment and will make a new binding environment binding the variable x with the value passed when the function is called. While that function is running, it’s internal binding environment is used instead for lookups. The function’s binding environment is destroyed once the function ends. If a function takes a parameter “a” and returns a+b, with b not defined in the function, the function will be able to look up a in its local environment and will then use a pointer to the parent environment to look up b. In theory, it could propagate up a chain of environments and find the nearest binding. Classes work in a similar way and are executed as an environment with attributes and procedures as the bindings for it. Python implements classes as environments and implements instances as their own environment, pointing to the parent environment of the class for looking up values not bound to the instance. Python uses dots as a way of navigating environments. X.y looks up y in the X environment. An instance using a class method with a parameter self calling it’s class with no arguments (instance.method()) is the same as calling class.method(instance) and will bind self to the instance in the class’s environment. This was meant to show the key details of modularity at the small scale.

Chapter 1 – Course Overview

In addition to the lecture there are a few chapters of readings written for the course. The first chapter states that the goals of the course are to use abstraction and modularity across electrical engineering and computer science and to make mathematical models of systems to help design and analyze them. This chapter goes further into these concepts mentioning that lemmas are building blocks for theorems and circuits are built from components, or as seen here software is built from functions and classes. Building using smaller pieces helps to keep things manageable. Modularity is defined as building reusable components and abstraction is making components easy to use without caring about their implementation. Not having to understand the implementation of every module and only caring about what it does makes it easier to build complex systems. Modeling systems in different ways can show different dimensions of the system to help determine what to optimize for.

Given the example of the light sensing robot from the lecture, the physics and inner workings of the robot’s movement and light sensors are abstractions and we’ll only need to know what they do and how to connect them. The chapter advocates for choosing a “basis set” of components to build a system to be able to learn from others using the same tool and to be able to use existing techniques to solve a problem. Designing is easier by limiting the number of possible options and primitive components, combination techniques, packaging and patterns abstractions can all be standardized.

Circuits are typically built from a basis set of primitive components including sources, resistors, capacitors, inductors and transistors as electronic systems are described in terms of velocity current, resistance and their rate of change. A resistor is described for its resistance and a capacitor for storing charge, which is useful for describing even if they provide some of the other’s functionality. A common technique of connecting terminals with conductors (ex. wires) is known as wiring and describing the constraints helps to abstract the process.

For the light robot example, if the resistance varies with light, a voltage divider can help the voltage vary as well. The relationship between voltage in and out is v_out=(R_B/(R_A+R_B))V_in, where R’s are resistors. Connecting a motor will change voltages so hasn’t achieved modularity as joint behavior would need to be reanalyzed and pieces can’t be designed separately without understanding the whole. A digital abstraction of circuits considers voltages as a low or high, 0 or 1 binary and digital circuits are made of gates (made of simpler transistors and resistors). Digital abstraction limits building options but can be easy to design as gates are simpler and combinations can be analyzed with boolean logic. Putting these circuits together is used to build many electronics and digital design is important in EECS though not explored much further in this course.

“Stored program” computers are one of the most important developments in digital circuit design. Computers are general purpose and the same circuit can perform almost any transformation between inputs and outputs, controlling transformation with switches and physical memory storage. It’s easier to manufacture software and we can get more powerful modularity and abstraction from software than digital circuits’ abstraction. Computers themselves are another abstraction as we don’t need to know how they’re built to use them. Programming languages are another abstraction and are converted into machine language by compilers or interpreters. Languages are another PCAP system. A computer program has data and computation primitives. Generally binary digits or bits are stored in groups for 32 or 64 called words which are data primitives representing various other data.

Each PCAP system is accompanied with a modeling system for making mathematical models. A model is a simpler system than the one its modeling, but captures important aspects, such as a model plane keeping the same shape without needing seats. They help discover things by working with the model instead of the system. Deciding which features to keep and ignore are one of the biggest challenges. Analysis is the most common use of a model, with the goal of knowing what a value or outcome will be. Proving a theorem about a mathematical model can be easier than testing all behaviors of a system. Software is often easier to model and test on its own where physical systems sometimes need a modeled environment as well. An example of an analysis for a robot moving toward a lamp. If the robot’s velocity at time t , v(t) is proportional to difference between light level X(t) and a desired light level (expected level at desired distance from the lamp), X_desired and the control system can be modeled with V(t)=k(X_desired-X(t)) with k being the constant of proportionality or gain of the controller. Equating light level to position, assuming units match and position at time t is position plus velocity at time t-1, X(t)=X(t-1)+k(X_desired-X(t-1)). For a given k value behavior can be solved by this equation for X. Treating the system as moving in discrete time steps is an approximation to make models easier, but requires understanding the effects of the approximation on results.

Automatic synthesis of systems from desired behavior is a goal of synthetic models, such as finding a circuit with desired input output behaviors. The search space is often too large for an automated process and humans can often perform better using informal models of systems. An example of a synthetic model is software documentation describing functionality and helping to build more complex systems using the software. The last model type mentioned is internal models, such as building an abstraction layer on top of a programming layer or a navigation system in a car giving a path from a location to a goal. The size of a table storing all possible routes is too big, so instead the problem is abstracted by using a graph of nodes connected by arcs and using algorithms to solve a shortest path problem, usable by many similar problems. Internal models can also be used to gain information about the outside world and update their calculations with it.

The course looks at strategies for building software systems that interact with external environments. Computers often need to receive information about the outside world and take actions based on it, and needing to adapt to changes outside. Computers get information from sensors, perform computations and take actions accordingly. These can be done with different styles. Sequential is the most straightforward style with the program giving a sequence of commands to the system, including actions and calls to read from the sensors. An example of moving a robot in a square sequentially has it repeat moving forward and turning 90 degrees 4 times as long as there isn’t an obstacle in front at the start. There’s a problem as sensors need to be checked frequently since an obstacle could move in front or there could be an error. Next, event-driven or interrupt driven systems are often used for UI programs and have procedures run as handlers or callbacks when events occur, such as clicking or typing in text. This can also be used for sensor data such as temperature getting high and an event loop will continuously run to see if any events occur and what procedure to run. A sequential example of a robot moving toward the light would have it taking incremental steps then checking while an event driven example could have it drive until an event triggers a stop which can help if there are sudden changes. Added conditions that need reactions can slow this process down.

Another strategy is building a transducer with an internal state. This acts as a continuously running processing box, reading sensors at intervals, computing storing and outputting values as needed. This can lead to complex behavior and would cause the robot to adjust speed according to changing inputs, though difficult to use for sequential tasks such as moving in a square. An abstraction layer can be built on the transducer model to make tasks like these easier.

Strategies for expressing computational processes inside systems exist as well. imperative computation is where a sequential set of instructions are run by the computer. Ex. Java/C would have procedures set a variable and run steps on it then return an output. Pimitave computational elements are basic arithmetic and assignments combined with sequences and control structures such as if/else and loops. By defining a procedure for these tasks, programmers will only need to know what the procedure does. Functional computation is another style where you start out with smaller functions such as addition and use them to construct more complex functions. Conditional evaluations and recursion help make these functions more useful and the output of one function can be used as the input to another. Primitive elements are usually arithmetic and list operations.

Data structures are another tool for modularity and abstraction, organizing data in useful ways even though computers primitively use collections of bits. Data structures can show these as string, booleans and numbers instead or point to an address of a more complicated data structure. These are abstracted into complex data structures such as lists, dictionaries and custom representations and functions can act on these abstracted data structures. Object oriented programming combines computation with data structures where an object is a data structure and along with procedures for operating on it. An obstruction in object oriented programming is “generic” programming adding functions such as print to all objects and letting them print information based on internal implementation or having objects output their x, y coordinates based on internal calculations. Inheritance is commonly supported allowing for new objects to be made and customized based on existing ones. Most modern programming languages support these techniques.

Chapter 2 – Learning to Program in Python

Chapter 2 is a basic tutorial for the Python programming language and says to skip if you’ve used python before. I’ll give a quick summary of the topics covered and not go into as much detail. It shows writing if else statements in Java and then python showing that brackets are important for java and indentation and line breaks are good style while python uses indentation to indicate grouping and is mandatory for knowing what line is part of what loop or function. You can use a backslash in python to split a command between lines. Java programs are statically and strongly typed and all variables types must be known before the program is run and types are enforced by a type checker. Python allows variables to hold any type and have new types assigned to them to replace the old ones. It’s quicker for a compiler to generate an executable knowing the type of data in a variable and errors can be caught at compile instead of run. Python wouldn’t find a type error until the line was reached when running and tends to be slower due since it evaluates types in the moment.

Modules can be saved in files and their code can be imported with the import command referencing name.py as import name and using dot notation to refer to functions and objects in the module, such name.functionName(). Debugging and testing functions and modules is important, and you can use test cases with desired inputs and outputs to check that they work. Print statements are used in this course instead of a more complex debugger and a common technique being to put a statement halfway through the code, knowing that if your expectation is true the code probably works to that point and add more prints to the problematic part to narrow down the problem in a logarithmic fashion.

Functions are objects in memory that can be called to run their procedures. You can also call pass a function call into another function such as g(g(4)) which will pass the result of g(4) to another call to g, and it’s important to use return statements. Print and return are compared with print just outputting to the console and return returning an output to the previous scope. Strings are introduced here and are shown to be indexable. Booleans, if, else, for and while are shown as control structures and range can be used to generate a list of integers. There are some examples of iterating over lists as well as list comprehensions which generate a list after evaluating an expression such as [x for x in exampleList if x > 3], which would create a list of each value greater than 3 in a list exampleList. This is a viewable as a special notation for a class of loops that loop over list to find items meeting criteria and appending them to another list. Zip is a built in function commonly used with list comprehensions to generate a list of tuples with the values at the same index for two passed in lists. An example shows using this to combine a list of first names with a list of last names creating a list of tuples with first name, last name pairs. Nested list comprehensions work similar to nested for loop.

Some common errors are gone over and then the chapter moves on to style. Style tips include not recalculating the same value, using functions for common patterns, avoid numeric indexing, and using variables instead of excessive numbers to make it easier to change later. The rest of the chapter shows for function and loop examples with the information described.

Chapter 3 – Programs and data

Chapter 3 goes into more detail on concepts such as importing modules and passing functions as arguments and the nature of python being object first and includes an example of building an interpreter which is a program that reads and executes language code. Typically interpreters are simple, encoding the rules of a language and evaluating them for legal expressions.Complexity comes from primitive elements with simple rules and the interpreters evaluate the primitives combining them in various ways. Interpreters are made up of a reader/tokenizer which takes an input of a string of characters and divides it up into tokens such as numbers, words and special characters. These are passed to a parser which creates constructs such as loops in the language and the evaluator or interpreter determines values and effects of the program. Lastly, the printer prints the value returned by the evaluator. Simulating the interpreter can help the user reason about how their programs are running.

Applying PCAP framework to computer programs, primitive data items such as ints and strings are in most languages and are combinable into more complex structures for thinking about data in different ways. Abstract data types help abstract away representational details. Primitive procedures include numeric and list operations and can be combined or abstracted into a new function. This is similar to creating a new primitive which can be used for higher order procedures. Objects also combine and abstract data elements and relevant procedures. Higher order procedures and generic functions are examples of patterns.

A basic functionality of a programming language is its ability to evaluate expressions, which consist of tokens representing application of operators to data elements and will have a computed value that it outputs. Numerals are primitive expressions with values that are constant. Floats are similar but are different from real numbers as they need to be represented in bits, causing them to violate some laws of arithmetic. These only roughly represent real numbers as .1 will be represented as 0.10000000000000001 and 1/3 as 0.33333333333333331.

The python shell is a program that asks the user for an expression, reads what they type in, converts it to tokens, parses them into a data structure representing syntax, and evaluates it with an interpreter printing the output value. The interpreter applies operations in precedence order and evaluates sub expressions as needed to get values to pass back to calling functions as needed. Binding environments and variables are rehashed in this chapter, notably Python has a __builtin__ envenironment containing definitions for basic symbols and builtin functions, and each module contains a parent of __builtin__. Procedure calls have a parent of the module calling them. Binding variables to lists or dictionaries is done by associating a name with a pointer to the memory address of the data structure. Lists are mutable meaning the value of their index can be changed and elements can be appended or deleted. An interesting behavior is setting b=a where a is the list means that changing b also changes a as both refer to the same object in memory. This is called aliasing. Using the list() function or a[:] can set a variable to copy of a list instead which can be changed on its own. You can also make a copy adding two lists with the plus operator. Copies generally only copy the top level structures. If a list contains nested lists, you can use copy.deepcopy to copy everything. Tuples and strings are introduced here with tuples being like a list but immutable. A tuple with a single element can be made with (1,) with the comma required. Tuples are useful for keys that shouldn’t be changed. A string is a special type of tuple, consisting of characters with special encoding. A value at any index of a string is also a string and strings can be concatenated.

Importing a module both evaluates the file and assigns it the variable called with import. Using “from module import x” will just bind x to module.x in the environment and the rest of the module would need to be imported separately. Looking back at functions getting variables from the parent scope, it’s important to note that running a=a+1 inside a function that has “a” defined outside won’t work as the interpreter will do the assignment and set an empty variable inside the function environment preventing a reference to the parent and causing an error. “Global a” can be run in the function to tell it that not to make a new binding for “a” in the function, so the parent environment will be changed by the command. In the example of a nested function, there isn’t a way for an inner function to access a variable belonging to an outer function calling it without a variable pass. Assignments can only be done for names in procedure-call or module environments and not intermediary ones. Functions in python are first class objects that can be stored as values of variables and passed to other functions as arguments or returned by them. Procedures can be used without being named if called with a lambda constructor such as f=lambda x: x*x. Lambdas return the value of the expression by default and can be called with f(4) for example or passed as a function object. A lambda or a function can also be returned by another function.

The section covers classes and inheritance similarly to the lecture, following with an explanation of recursive problems where you have a base case and a recursive case, with the recursive breaking down the procedure with smaller and smaller arguments, processing the answer from the recursive call in the parent call. It’s important to know that your function will terminate.

The next system shows an implementation of an interpreter for a language Spy based on Python and a language called Scheme. Scheme and Spy use prefix syntax where the name of the function comes before it’s arguments and it’s fully-parenthesized since there are parenthesis around sub-expressions. So, 1 + num * 4 becomes (+ 1 (* num 4)). This is more easily parseable with a tokenizer which could represent it as (‘+’, 1, (‘*’, ‘num’, 4)). Spy has built in integers, arithmetic symbols and “=” as an equality test returning a boolean. The set command assigns a value to a variable and functions can be defined with the def command. If is implemented (if a b c), returning b if a is true otherwise c. Begin is a command that can take in other commands to run after it and chain commands. A spyEval function is written which will take in the Spy code and interpret it. Using if/elif/else, it checks if the value it receives is an integer and checks if its “begin” to start a compound expression. Symbols are any character that’s not an expression. An environment function is passed to spyEval which contains a dictionary of values (bindings) and parent environment. The dictionary can be added to or used for lookup, and if not found checks the parent environment, erroring if not found in either. Add and lookup are methods on the environment. In this case, the global environment is the main one evaluated. SpyEval is a recursive function that calls itself again on the remainder of the code interpreting each value at a time and passing it into the appropriate form. The example goes on to implement classes and inheritance in Spy as their own environments while allowing lookups from their parent environment and generating a constructor function.

Next the chapter gives a couple of examples of designing classes for a square object and bank accounts with different account types being a subclass inheriting from an Account class. In the second case, the accounts maintain a current balance and each type of account has its own creditLimit function meaning that processes working with accounts can ask for credit limit without caring about the implementation. This is an example of a generic function, operating on different objects with type specific implementations of the same function name. Procedures operating on instance types and classes without caring what type are polymorphic which is a helpful method for using common patterns. In the next example, a time class has hours and minutes attributes and implements adding time to create a new time. This example shows defining a builtin __add__ function to describe how to add two of these object as well as __str__ and __repr__ which create string and print representations of the class respectively. Lastly the chapter contains some practice problems defining classes and methods and asking what the output of a value will be after the methods are run.

Recitation Notes:

Object oriented programming is a program paradigm in the same category as functional or imperative programming. Everything is an object in OOP and you interact with code in the same way as objects in the physical world. Classes are the basic unit of a code block, describing what a thing can do (methods) and what attributes it can have. You can then use that code to instantiate an object, which is the functional unit you interact with. The first recitation has this description and shows some more examples of creating classes and instances of them. Self is an implicit argument passed into class methods, and is always passed to the method call. The second recitation talks about inheritance, the idea that you can arrange objects in a hierarchical fashion. The example used is that breeds are subclasses of dogs and breeds have specific properties while inheriting the properties of dogs.

Lecture 2 – Primitives, Combination, Abstraction, and Patterns

Last week focused on how to organize to manage complexity using PCAP at smaller levels. This week looks at PCAP and managing complexity at a higher level. First, programming styles can be used to manage complexity. The structure of a program impacts its modularity. Imperative programming also known as procedural is common where you’re writing out instructions as you think out the steps. Functional programming mimics mathematical functions and focuses on inputs and outputs. Object-oriented programming focuses on building collections of data and methods and using them to organize programs as hierarchies.

The first example has two functions increment, which adds 1 to a count and square which squares it and asks the minimum length of calls to get from 1 to 100 which is 5 (increment, increment, square, increment, square (10 to 100). Then trying to program out a solution, a procedural solution attempts to try every combination at each length to see if it can get to 100. This solution will find the answer in the options with a length of five. This is straightforward to create but led to three levels of loop.

The functional way of creating the same program divides the problem into different functions, with an apply function that takes a list of functions (increment and square) and returns the answer. Apply is a pure function, taking an input and output and running its process and functions are used as first class objects, adding them to lists for future use. The next procedure added is addLevel (another “pure” function), which takes a list of lists and generates a new list of lists with an additional element. This will double the length of the list of lists as each list can have increment or square added to its steps. Functional programming is useful for making you think about modules, reducing complexity and making debugging easier. It’s easy to figure out if a program works if each module does.

An object oriented approach to this problem represents each step as a node object, organizing possible sequences in a tree. The solution builds up a tree until the answer is found. Using a class, you can keep track of important information. Each node has a parent, action (increment or square) and an answer. The program will then generate new nodes and check if their answers are the goal.

An example of recursion is shown reducing complicated programs to simpler ones until a base case is reached. In this case, an exponent function takes b and n and returns b*exponent(b,n-1) in the recursive case and 1 when n is 0. This will call the recursive function n times and is on the slower side. A faster version that speeds up this process has a case where if n is even, then you return exponent(b,n/2)**2. The Towers of Hanoi problem is reintroduced as well. Is this the 3rd or 4th course to show that problem? The solution is clear as recursion but much harder as separate loops. With recursion you try recursively every combination. The function takes n (number of blocks), A, B, and C variables which are the post to move from, to and one not used respectively. If n = 1, then moving from A to B is the solution. Otherwise, call the function again with Hanoi(n-1,A,C,B), Hanoi(1,A,B,C) and Hanoi(n-1,C,B,A).

The key takeaway from the above examples is that the structure of the program affects its modularity. Thinking about modularity at a higher level, we’re looking at programs that control evolutions of processes such as a bank account. In the above problems, there are clear answers, but something like a bank account isn’t as simple as having one answer. The following examples have evolving answers. There’s no answer to a computer application such as Excel or other graphical user interfaces. Making a robotic steering or other controller is another process. For controlling processes there will be other programming structures and it’s important to keep thinking about PCAP and modularity.

This is the segue to state machines, which is a good structure for capturing behavior of a process. State machines are operated on in steps, taking a new input it uses to calculate an output and the next state during each state. State machine remembers what it needs to know to keep the program running. A moving robot is similar to this and can be thought of as a state machine. Another example is a turnstile with two states, locked and unlocked. Possible inputs are putting a coin in, turning the turnstile or doing nothing. Based on inputs and states, it can let you enter or ask you to pay. Graphically showing this, states are shown as nodes and edges show actions leading to the correct state. This is an evolving process that can be shown as a state diagram. Behaviors can be thought of as outputs as a result of time, which is a useful representation that’s removed from the looping structure, focusing on what you do at each instance in time. This is also a modular interpretation.

The moving robot example from earlier updates its states to make a map to figure out where walls are and make a plan and move accordingly. This would be awful to structure as procedural programming, but with state machines, sensor input enters a map maker module which adds info to the map, then the input and map, construct a plan for movement. Using the plan and input, the mover module can create an action the robot takes to get to the next state. Each of these modules can be built and tested separately, combining simple things to get something more powerful than the linear combination.

Thinking about the representation of a state machine in Python, we’ll have a state machine class with start, step and transduce methods. Start creates the instance’s state, in this case, the step will call a generic getNextValues function passing its state and an input to get the next state, which is a pure function taking an input state. Then it sets its state to that state value. Transduce takes a sequence of states and inputs and runs them. The turnstile would start in a locked state and getNextValue would check for inputs, coin and turn, and will unlock or tell the user to enter or pay based on its state. Another example is an accumulator which starts with 0 and adds the input to the current state.

State machines will be put together modularly and their combinations are a focus of the course. They can be put in a cascade where the output of one goes into another or runs in parallel. For a cascading example of the accumulator, if c = Cascade(a,b) and c.transduce([7,3,4]) is called, then each input will be passed to a, then b and then c which each output their current state. 7 is put in and output by a and received by b which outputs it. Then, 3 is passed a which outputs its new state 10 causing b to output 17. Adding 4 causes a to output 14 and b to output 31.

Chapter 4 – State Machines

State machichines model systems where outputs depend on their entire input history as opposed to a single input, like a functional system. State machines can be used for UI, conversations as a word’s meaning depends on context, vehicle state and DNA sequences. Models can use continuous time with continuous range for values of inputs and outputs and differential equations describing the system’s dynamics. Discrete time is more applicable to the robot system as outputs based on sensors are discrete and inputs aren’t linear and discontinuous. This course focuses more on discrete time models, whose inputs and outputs are determined at specific time increments. The job of an embedded system is transducing a stream (infinite sequence) of inputs to a stream of outputs. States capture essential properties needed to keep calculating outputs and next states instead of having to go back through the whole history. Choosing the right state values is an important step.

A few ways of using state machines are synthetically, analytically and predictively. Synthetically, a program can input sensor readings and output behaviors for a robot to perform in the world. Analytically, they can describe the behavior of a control system and the environment it controls, with the goal of analyzing properties of the system. Predictively, they can describe the environment and come up with plans based on it, such as deciding how to move.

A state machine’s transducer requires a set of states, inputs and outputs, next state and output function and an initial state to run. It’s possible to have a state machine that does the same sequence regardless of input. A pure function is the simplest kind of state machine as it doesn’t depend on state. Finite state machines have a finite set of possible states, such as an elevator which can only be on one of a fixed number of floors. LTI (linear, time-invariant) systems are a category of state machines discussed later. State transition diagrams can be used to model state machine behaviors for given inputs and require an arrow for each input from each state to be complete. Tables can also show evolution of state at different times with rows for time, input, state and output.

Multiple examples of state machine problems are shown in this chapter and the python class interpretation is shown again, this time noting that getNextValue’s feature of not updating the state can be used to avoid problems before taking action and leaves implementation to subclasses and using steps to actually change states. The problems shown before are detailed in python code and there are a lot of examples of machines that add, average, check that input is a prefix of a string, and more.

A parking garage gate controller is one example, where the gatePosition is top, middle or bottom, carAtGate is true or false and carJustExited is true if a car passed through the gate and is reset the next step to false. The gate can raise, lower or do nothing. For a car to enter, it needs to raise it’s arm and stay there until the car goes through the gate, then lower it to the bottom position. The machine’s possible states are waiting (for a car), raising, raised and lowering and are determined by the current arm state and car inputs.

Combinations are gone into further detail showing a cascading of two state machines that each take an input and delay it a step before outputting it. By cascading them, each input is delayed by two. It’s also necessary to make sure that inputs from one can be processed by the other when combining state machines. Parallel composition takes two machines running them side by side, taking the same input and returning a pair of outputs from the individual machines. The input vocabulary of the two machines should be the same. In a second type of parallel composition, two different inputs could each go into one of two parallel state machines. Examples of these parallel systems are shown in Python along with a state machine that runs the parallel machines and returns a single output that’s the sum of the two. Another way of combining is a feedback combinator where a machine’s output is fed back into the same machine as the input for the next step, but it’s important that a machine’s output doesn’t have a direct dependence on its input. In coding, the getNextValues methods should be prepared to handle undefined inputs and outputs, but if an undefined input leads to an undefined output there’s an immediate dependence of the output on the input. (I think immediate dependence means causing a loop but I’m not 100%). Another example shows feedback addition composition creating a feedback loop that outputs the sum of all its inputs and a subtraction one where the out of the second machine is subtracted from the input each loop. A couple other examples shown are state machines for generating factorial and fibonacci numbers.

A common situation for combining machines is coupling a controller and a plant or factory. The output of the plant machine (usually sensory observations) is the input to the controller which outputs actions to the plant. A robot could be a plant and it’s “Soar brain” (?) is the controller for instance. The machines can be cascaded to connect them with feedback used so the process flows back and forth between the two state machines. For example, a robot driving toward a wall with a sensor getting the distance to the wall at time t, d[t] and the goal is to stop at d_desired. The rule to set velocity at t is v[t]=K(d_desired-d[t-1]). As a state machine, inputs are distances and outputs are velocities. The next step given a state and input, n(s,i)=K(d_desired-i) and s₀ is d_init. The plant is the relationship between the robot and the world. Lil’ delta T is the length of time between velocity commands issued by the robot and world is described with d[t]=d[t-1]-lil’ delta Tv[t-1] assuming a positive velocity is toward the wall. This world system has velocity as inputs and distance as the output. Coupling the systems can give a single state machine with no inputs and it’s internal d and v values can be observed to see how it’s behaving. In python, a WallController class gets it’s next state by calling a safeMul function on k (-1.5 here) and safeAdd(dDesired, safeMul(-1, inp)). The k value will cause it adjust course when it gets too close or far to the wall. A WallWorld class models the world with a startState of five (meters from wall) and it gets its next values as return (state-deltaT*input, state). Combining them with sm.Feedback(sm.Cascade(m1, m2)) (both implemented in the chapter) causes the robot to move closer to the wall and then adjust until it’s at the desired state, since the system repeatedly generates its own inputs and outputs, feeding them back into itself.

You can use switches to determine which of multiple machines to send an input to at every step. In this case the output will have to work with the not updated machine as well. Combining with a multiplex variation updates both machines on every step, only choosing 1 machine for the output. If using a conditional to determine which machine to run on, a condition will be needed when the choosing machine is in the start state. In some cases, state machines are sequences of processes, such as a robot cleaning a room then moving to the next. This requires it having a way of terminating its current process or a done function that takes a state and returns true if a condition is met. Some combinators run one machine until done and switch to another. Another method added to the Python class repeats the state machine until it’s done n times. Another common use of terminating state machines is running them sequentially, taking in a list and running each until done then starting the next. RepeatUntil and Until methods are added to the TSM and repeating state machines respectively to allow them to run until a condition is met.

Using a state machine to control the robot is possible as well. An io module provides methods for interacting with the robot and util contains useful computations for the robot. Inputs to the robot controller are io.SensorInput and outputs are io.Action classes respectively. An example for a robot brain always emits io.Action(), the default setting all outputs to zero. Once set up, we can add behavior for converting inputs to actions. The step method for the robot reads sensors by calling io.SensorInput() to get sonar/optometry readings, feeds it to the brain state machine calling its step function and calls the execute method of the action passed back. This is a “Soar” brain and it has access to a robot object. A RotateTSM state machine is added causing the head to rotate to a desired state and ForwardTSM is another state machine that helps it move forward as needed. XYDriver is added which moves forward when the robot is going toward a target point and turns if not. After that a SpyroGyra state machine is used which will create subgoal points and move toward them to move in a spiral incrementing length and direction and cascading with XYDriver to achieve this. Each of the examples above have a lot more details in the 50 page chapter along with Python code and diagrams. It’s hard to analyze the behavior of a state machine without implementing them. Restricting the form of models allows for stronger claims about their behavior.

Recitation Notes:

Functional programming is like OOP where everything is a function to be able to test and avoid side effects and is important for the history of computer science and why lambdas and list comprehensions are talked about. You can pass function and *args to another function and use that function to call the variable function with the additional arguments and potentially add additional behavior. Lambda has roots in lambda calculus and functional programming. You can use lambdas to write and use anonymous functions without assigning name or memory space, or you want the function quick without taking up too much code. There is a lot of list manipulation in python and a common use of lambdas. Map, filter and reduce can apply a function to a list and return a copy of the list and a lambda is a useful way of passing a function. For instance, map(lambda x:x*2, demolist) creates a copy of demolist where each element is squared. List comprehensions were popular in Hascal but goes back to the 60’s and are similar to set notation in math. They can be used with map, filter and reduce as well. Mutable objects occur when you have two variables with separate assignments to the same value (ex g and h = “hello”) both point to the same space in memory (id(g)=id(h)), which creates a problem with aliasing. c=a[:] is the easiest way to create a copy and id(c) isn’t id(a). Functional and imperative programming can be used to create object oriented programming. In math, state machines are represented as a set of inputs, set of outputs, set of states, transitions and a start state. The turnstile example is shown again in a directed graph showing states and then goes on to show a python example.

Unit 2 – Signals and Systems

Lecture 3 – Signals and Systems

Starting to think about modeling and controlling physical systems and characterise systems constructed, focusing on their behavior. A method used for the robot avoiding walls was a proportional controller where its velocity was proportional to the current distance minus the desired distance. There’s a small delay to register distance and for the computer to register and move, which means there’s a potential to overshoot where you’re going and why it has to adjust a few times to get to the desired states. The signals and systems abstraction describes a system by the way it transforms an input signal to an output signal. An example given is a mass and spring system that’s modeled with an input output system as opposed to a free body diagram. The input is considered the position of the hand and the output the position of the mass and system as a box converting x to y. Signals and systems maps physics examples to input output systems, which is the transforming rule. Other examples are water poured into a pool and a cell phone system that transforms input to output sound. This is a very general technique that can be applied to a wide variety of systems in a modular way. If underlying representations for physical substrates are the same, we can combine multiple modules into 1 similar to programming.

The spring and mass moving up and down is a continuous time example where the robots work in steps in discrete time. One tool for analyzing systems is difference equations, similar to differential, but with difference equations. These are the discrete time analog for differential equations in continuous problems. For example, y[n]=x[n]-x[n-1]. Difference equations are useful for step by step analysis. Go through values of n and think about implications on the input. For example, if x[n] is a unit sample and (little) delta(n)= 1 if n=0 and 0 otherwise. Representing this with a difference equation where delta is x, y(n)=x(n)-x(n-1). y(-1)=x(-1)-x(-2)=0-0=0. y(0)=x(0)-x(-1)=1-0=1. y(1)=x(1)-x(0)= -1 and any values higher will be zero. Difference equations are the most compact representation for discrete time examples.

Block diagrams are a picture representing the same system shown as a signal flow path showing operations needed on each same to get from input to output. The system needs to be started in a state, usually “at rest” in this case, the “delay”, x[n-1] outputs 0. The system is drawn as the input going down two paths added together at the end. Blocks characterize the system here instead of math. Comparing difference equations to block diagrams, it’s a lot simpler to say an equation than to draw a picture. Block diagrams are appealing as they show a picture of how to build a system and show explicit inputs and outputs. The arrows in the diagram show clearly what the input and output are. The difference equation tells a true statement of what the system will do. The block diagram is imperative and tells you what to do.

A combination of the benefits of the two can be done by stopping thinking about sames and starting to think about signals. This lumping is the same as the abstraction done in Python, putting all needed information in signals. X is all of the n inputs and Y is all outputs. R represents the right-shift operator or delay box, taking a signal X and shifting it to the right, Y=RX. So, Y is the system for x subtracting the calculation for x-1. Operators operator on signals. You take the input X and turn it into Y by operating on it with R. If Y=RX, then y(n+1)=x(n) for all n. A right shift operator shifts all points on a graph to the right. This represents an algebraic equation that keeps information on the direction. Applying this example to a cascading system, Y₁=(1-R)X, Y₂=(1-R)Y₁ and Y₂=(1-R)(1-R)X. Operator representation is as compact as difference equations and can be manipulated like a polynomial. Using polynomials and the R operator, you can understand systems with operator expressions. When the operator representations are the same, systems represent the same transformation if they start at rest.

Operators are a higher level abstraction composing signals. For operator algebra, if X is the unit sample signal at point 0, then -2RX subtracts a right shifted X multiplied by 2 and +R²X adds a double right shifted X. You can make strong statements about operators that map onto polynomial math. It’s easy to show that R(1-R)X=(1-R)RX and that a delay before and after the additions at the end of a block diagram are equal. Operators observe commutativity similar to polynomials.

Systems that propagate inputs through to outputs are feedforward. When cycles are involved, the system has feedback. When you have a feedback loop the operator expression isn’t a simple sum of input simples. For example, if the output signal is the difference between the input and right-shifted input signal, then Y=RY+X and (1-R)Y=X and the operator acts on the output. The output signal of a system like this persists after the input goes away. With an example of the accumulator system, if two systems are equivalent they produce identical outputs when starting at rest. (1-R)Y₁=X₁ <=> Y₂=(1=R+R²+R³+…)X₂. Assuming that X₂ is X₁, then Y₂=Y₁. So the feedforward and feedback systems are the same when they start at rest. (1-R)Y=X represents the operator Y/X = 1/(1-R) and 1/(1-R) = 1+R+R²+R³. Y = (1/(1-R))X. This gives a way of thinking about operators with numerators and denominators. Any system built out of simpler parts can be represented by a difference equation or a more powerful operator equation with more information.

Chapter 5 – Signals and Systems

A signal is defined as a mathematical function with an independent variable such as time and a dependent variable based on it. The system transforms the input signal to an output signal. In the case of a system steering a car to stay in the middle of a lane, the input signal is the time sequence of steering wheel angles assuming a constant speed and the output is the time sequence of distances between the center of the lane. Typically more than one input and output system are realistic, though this chapter only uses examples with one of each. Possible outputs for the car system are 3d position, angular position, wheel rotation speeds, tire temperatures and more. Choose the most relevant outputs and abstract away the rest. Using lateral position, p_o(t) as the output, which is the distance from the center of the lane, we can get a lot of information as the line goes over and under the x-axis. If angular of the steering wheel is φ(t) and the position is the output for the car system, the difference between desired position p_i(t) which is 0 and actual position p_o(t) is e(t)=p_i(t)-p_o(t). A steering controller would take in the error signal, e(t) and output φ(t), the input for the car system.

A continuous movement is observed by a controller at sampling intervals T and is a sequence of values x[n]=x(nT), known as the sampling relation. This converts a signal from a continuous domain to a discrete one. Images are similarly represented as arrays of pixels with colors represented as integers instead of continuous fields and spatial coordinates. The input and output would be p_i[n] and p_o[n] respectively in the car system for discrete times.

So far, state machines have been shown to transduce discrete-time input signals to discrete time output signals for any system whose output is computable from its input history. Executing the machines shows behavior for an input on a behavior of a finite amount of time, but isn’t useful for finding general properties or long term behavior about a system. It’s hard to predict a computer program’s behavior without running it. The examples focused on are discrete time linear time-invariant systems which allow for better analysis. In LTI systems, inputs and output are real numbers, the state is a fixed number of previous inputs and output and the output is a fixed linear function with current input and state elements. Inputs and outputs are generally a fixed length vector of numbers, but this chapter focuses on inputs and outputs which are a single real number. LTI systems can be analyzed mathematically to get properties of output for any possible input which is better than having to try a machine out. They are compositional as well and the cascade, parallel and feedback combinations of LTI systems shown last chapter are LTI systems as well.

Feed forward systems are brought up again and are an LTI system that performs a combination of scaling delay and other operations on the input. Systems can be represented with Python’s state machines as well as difference equations and block diagrams and operator equations, representing a relationship between input and output. Feed forward systems are described using an operator equation in the form Y=phi(X) where phi is a polynomial in R.

Block diagrams are made of components with arrows connecting them representing signals. Lines connected to each other (not going through round, triangular or circular components) represent the same signal. The components represent systems and there are 3 primitive components corresponding to operations. Delay components are rectangles labeled delay with two lines connected to them one coming in and one out. If X is the incoming signal, then the outgoing signal is RX. Scale (gain) components are triangles with a positive or negative number. If X is the incoming signal, the outgoing signal is cX. Adders are drawn as circles labeled with + and 4have two incoming signals and one outgoing. If X₁ and X₂ come in, then X₁ + X₂ goes out. Y=X-RX could also represented as a state machine in Python and is shown that it could be a combination of state machines gain and R from the course’s code.

You can combine operator equations to cascade systems. If M₁ has equation Y=phi₁X and M₂ is Z=phi₂W, then they can be cascaded by setting Y=W to get Z=(phi₂*phi₁)X which will also be a polynomial in R. For an example of a parallel combination, if M₁ has equation Y=phi₁X and M₂ is Z=phi₂X, then a parallel system would give output W as (phi₁+phi₂)X. You can also create systems using both combination methods.

Feedback systems are shown again, where outputs can depend on finite numbers of inputs and outputs. The example of the accumulator is used as the output is the sum of the input on the step and the output from the previous step. In a block diagram, X goes into the adder which passes the value both to the output and to the delay which feeds it back to the adder. Compared to the feedforward system’s finite sums of scaled and delayed versions of outputs, with feedback systems, transient input can have infinite outputs. LTI systems can generally be defined as difference equations in the form y[n]=c₀y[n-1]+c₁y[n-2}+…+c_k-1y[n-k]+d₀x[n]+d₁x[n-1]+…+d_jx[n-j], using k previous output and j previous inputs. This is a linear combination of values and k and j are represented in the python class implementation for the calculation as well. In operator equation form, y/X = (d₀+d₁R+…)/(1-c₀R-c₁R²-c₂R³-…) with the form Y/X = N(R)/D(R) which are both polynomials in R. This is the system function characterizing the operation of a system independent from particular input and output signals, often written in the form Y/X=(b₀+b₁R+b₂R²…)/(a₀+a₁R+a₂R²…) where c_i=-a_i+1 and d_i=b_i/a₀ and ts possible to rewrite where a₀ is 1. Since feedforward systems don’t require previous Y values, D(R)=1 there. Feedback systems’ persistent behavior is determined by D(R).

Primitive systems for LTI systems include a gain element, k where Y=kX and Y/X=k and a delay, R, where Y/X=R. Additionally, for the basic composition operations, where is a system equation, sum adds two systems (Y=HX where H=H₂ + H₁ and cascade is the product of two system functions (Y=HX where H=H₂H₁). For feedback systems, there are multiple possible combinations. An example of this is for a system with a block diagram where X feeds to the adder which goes to system H₁ passing to output Y and back to H₂ which is passed through a gain component of -1 back to the adder. If X is the desired value and y is the actual value, this is an error signal. The system equation is Y=HX where H is H₁/(1+H₁H₂). Lots of machines can be modeled with these combinations often with multiple inputs and outputs and a matrix of system functions between them.

Recitation:

A lot of rehashing of the lecture and reading but there are some things I hadn’t noted earlier. We want to move away from state machines to try to predict the future as it’s hard to get information from state machines without running them. R² means shift right twice and the same for other numbers. Block diagrams are good for abstraction as they can be a black box often denoted H. If Y/X=H and X, then XH=Y. You can draw a system with multiple systems by labeling each system H₁, H₂, etc. each outputting a respective y_n with Y being the combination.

Lecture 5 – Characterizing System Performance

Feedback is pervasive in natural and artificial systems and is used in almost everything we do. Driving, we constantly compare where we are to where we want to be. Thermostats regulate temperature using feedback and the body’s system of regulating glucose uses feedback as well keeping it in a needed range by removing appropriate amounts of glucose from the blood. Going back to the robot moving to a desired distance from a wall example, this was thought of as a feedback system with a controller, plant and sensor each with their own difference equation. The system has d_i[n] as an input going to the adder when passes e[n] to the controller which outputs v[n] to the plant which outputs d_o[n] and also passes it back to the sensor which passes d_s[n] back to the adder passed through a gain of -1. The controller is the brain which sets velocity with v[n]=ke[n]=k(d_i[n]-d_s[n]). Plant is the robot’s movement and takes v[n] to get a new position d₀[n]=d_o[n-1]-Tv[n-1]. Lastly, the sensor reports the output and adds a delay, d_s[n]=d_o[n-1]. This means there are 3 equations for every value of n. The knowns are T,k parameters, initial conditions and inputs. Unknowns are velocities for n bigger than zero and outputs and sensor outputs. There are infinitely many unknowns and equations. Trying to solve using operators, with equations V=k(D_i-D_s), D_o=RD_o-RT_v and D_s=RD_o. Knowns are k, T, and D_i, unknowns are V, D_o and D_s and there are 3 equations in total. Thinking about operators makes the system more easy to analyze and generates new insights.

The relationship between the input and an output is represented by the operator H, where Y=HX or H=Y/X. H can be thought of as a ratio of polynomials in R. For the robot, H=D_o/D_i = (-kTR)/(1-R-kTR²). For an example solving a simpler system where x[n] is given to an adder which outputs to y[n] and gives the output to a delay box R and a gainer P0. Finding y[n] given x[n]=(little)delta[n]. y[n]=x[n]+p₀y[n-1]. Y[0] is 1, y[1]=P₀, y[2]=P²₀, etc. Thinking about the whole system, since Y=X+P₀RY and H=Y/X=1/(1-P₀R) which can be expanded to 1+P₀R+P²₀R²+P³₀R³+…. Ihe drawing, 1 is the path going straight to the output. P₀R is the path that goes to the output and loops back around and the additional terms are added for each loop. Block diagrams help visualize how the answer comes from all possible paths leading from the input to output. Feedback implies that there are cyclic flow paths and that transient inputs generate persistent outputs as output at time n is triggered by input at time n-1, which is what feedback is. Feedforward means the system is acyclic and input leads to the output. A cyclic system has at least 1 cycle and has feedback, where transient input can lead to outputs going on forever.

If traversing the cycle decreases or increases the signal’s magnitude, the output will grow or decay respectively. Eventually the signal could get small enough that a seemingly infinite loop stops. Systems can be characterized with a number called the pole which is the base of the geometric sequence of growth or decay. A negative pole means that each unit sample response will alternate its sign. If a pole is bigger than 1, the magnitude diverges and grow monotonically. Between 0 and 1, magnitude converges monotonically toward 0. Between -1 and 0, magnitude converges but alternates sign. Below -1, and it diverges alternating its sign. P₀ is the pole’s symbol in examples. With these we can describe all of the systems types of behaviors. More complicated systems can’t be represented by a single pole. With 2 delay boxes with different poles, the response is more complicated and not geometric, growing and decaying. Capitalizing on the idea of thinking of operators as algebra, if expressions are treated as polynomials, we can factor. The factored form shows a cascade of 2 simpler systems and can think of the complicated form as a sum of simpler parts. The complicated growing and decaying response is just the difference of two geometric sequences.

In detail, the example has X input to an adder which outputs to Y or goes to a delay box R which either passes through a gainer of 1.6 back to an adder or a second delay box passing through a gainer of -0.63 back to the adder. This is described as Y=1.6RY-0.63R²Y+X with a system function H=1/(1-1.6R+0.63R²) and can be factored to H=H₁H₂=(1/(1-0.7R))(1/(1-0.9R)) and an equivalent cascading diagram would have two systems with 0.9 and 0.7 poles/gainers. Expressing it as a sum of systems, find A and B where 1/(1/(1-0.7R))(1/(1-0.9R)) = A/(1-0.9R) + B/(1-0.7R) which are 4.5 and -3.5 respectively in this case. With poles less than one, the system’s magnitude decreases over time. In general, the long term unit sample response is based on the mode with the larger magnitude pole.

If a system is constructed out of adders, gains and delays, then they can be represented by linear difference equations with constant coefficients and have operator representations that are the ratio of polynomials in R, Y/X. Then poles can be found by expanding polynomials into factors and representing them as partial fractions. So the system can always be expressed as a sum of geometric sequences. Shows how to calculate a first order system, then an example of a second order system being broken down into 2 first order systems summed and how higher order systems can be expressed in the same way. Additionally, using the factor theorem and the fundamental theorem of algebra, if you have a polynomial order n, then there are n roots. Replace each R with 1/z and the poles are the roots of the denominator polynomial in z.

What if a system had complex poles since roots can be complex in polynomials. For instance, for Y/X=1/(1-R+R²)=1/(1 – 1/z + 1/z²) = (z²/z²-z+1). Poles are z = ½ + or – sqrt(3)/2 j = e^{+ or – jpi/3}. Partial fractions still work when pole and base are complex, so Y/X=(1/(1-e^jpi/3R))*(1/(1-e^-jpi/3R)) = 1/(j*sqrt(3)) * ((e^jpi/3R/(1-e^jpi/3R))-(e^-jpi/3R/(1-e^-jpi/3R))). All of the numbers are complex but the math still works. The modes are complex sequences in this case. These complex modes look somewhat like a sine wave (sinusoidal behavior) and are easier visualized in a complex plane visualizing points going around a circle.

Difference equations for physical systems have real values for coefficients, so why would we need complex models. If you have a polynomial with a complex root but real coefficients, then the complex-conjugate (p*) is also a root. If d is a polynomial in z with constant real valued coefficients and p is a root of D(z) then D(p)=0 and D(P*)=(D(p))*=0, so p* would have to be a root as well. Then the two roots work to have their imaginary parts cancel. So the sum of sinusoidal and sinusoidal figures has imaginary parts. The period of a real signal is the same as the sum of the two complex signals in the complex example from the previous paragraph.

Systems can always be thought of as poles and complex geometrics called modes. Applied to the fibonacci sequence, the difference equation model is y[n]=x[n]+y[n-1]+y[n-2], where x is the initial condition. If the system starts at rest, x=little delta(t). What are the poles of the fibonacci system? Convert to operator equation, Y/X = 1/(1-R-R²) = z²/(z²-z-1). The roots are (1+ or – sqrt(5))/2, roughly 1.618 and -0.618. Poles can work for any system of this type and is a powerful abstraction similar to PCAP for Python. Recitations for the week go into a recap of finding poles and show a similar example to the second order system from the lecture.

Chapter 5 continued

The chapter goes into further detail on the examples from this lecture, showing the 2nd and higher order examples again. Of note, it explains expressing poles in polar coordinates a bit more for complex numbers, expressing a+bj as re^{j omega} where a is r cos omega and b = r sin omega. Magnitude r or |a+bj| is r = sqrt(a²+b²) with omega = tan^-1(b,a). If a mode’s behavior is 1/(1-pR) = 1+pR+p²R²+…+pⁿRⁿ+…, then a complex pole has p=re^{j omega} and pⁿ=rⁿe^{j omega n} and 1/(1-re^{j omega}R) = 1+re^{j omega}R+r²e^j2omegaR²+…. As n goes to infinity, the radius will shrink or grow based and magnitude while omegan, will rotate. The period of the output signal is 2pi/omega and is the number of samples needed to move through 2pi radians. This creates a spiral sequence when points are plotted out. A couple other examples if |p₀| is complex and less than 1, the magnitude decreases monotonically and if greater than 1 it increases monotonically to infinity. If complex with angle omega, the signal will be periodic with period 2pi/omega. R and omega also help provider information on the bounds for where the signals can go over time, for instance, for y[n]=rⁿ(cos nomega + alpha sin nomega), (cos nomega + alpha sin nomega), can’t be less than negative sqrt(1+alpha²) and more than sqrt(1+alpha²).

Polls are not the roots of polynomials in R but are the roots of the polynomial in the reciprocal of R. Also it’s advised against cancelling matching factors from numerator and denominators of system functions as it wouldn’t likely match a real system’s behavior. The rest of the chapter goes into more examples of these types of problems, for various systems such as cooling cooking, more of the robot problem and more examples of polar coordinates and analysis.

Lecture 6 – Designing Control Systems

This is the last lecture on signals and systems. Difference equations, operators, block diagrams and system functions were the four ways of representing these systems so far. Feedback produces cyclic signal flow paths where transient inputs get persistent outputs, which can be characterized as modes by using poles. Using these, how do you optimize the design of a controller without building it to determine it’s behaviors? Blacks equation has two common forms H=F/(1-FG) and H=F/(1+FG). In this case, X goes through an adder through F to output Y or back through function G to the adder potentially multiplied by -1 if in the case of the second form. The right form is useful in control applications trying to have Y converge to X. In a case where the wall finder robot’s system has D_i input to an accumulator through k and through -T to an accumulator consisting of an adder going to R and outputting or feeding back to the accumulator’s adder, then either outputting D₀ or feeding back to the original adder through a gain of -1, the accumulator can be replaced with (R/1-R). In this case, D_o/D_i=((-kTR/(1-R))/(1+(-kTR/(1-R)))= -kTR/(1-(1+kT)R. For systems of only adders, gainers, and delays this will always simplify to a single ratio.

This representation can be simpler than looking at the block diagram, thinking about that combined equation as a system of its own. The system function has a pole at z=1+kT and the numerator is a gain and delay. The diagram can be redrawn using the new pole informing in other forms. Thinking about it as an operator can find a cleaner representation. A unit sample will be the simplest possible input different from zero, but behaviors often are more complicated. We often want to know more about a step response. Starting the output d_o[n] at 0 while the input is held constant at 1. Calculating the unit step response, s[n] is the response of H to the unit step signal u[n] which is an accumulation of the unit sample signal delta[n]. This means passing delta[n] as the input to an accumulator consisting of an adder and feedback through R, outputting u[n] at the end. U[n] is passed to H to get s[n]. If h was measured with a unit sample instead of step, you’d get a unit sample response (h[n]) which you could run through an accumulator to get s[n]. This is because the unit step response is the accumulation of the unit sample response. There are different kinds of performance metrics that could be useful. You can learn something about all of them from the unit sample symbol making it a useful measurement tool.

You can infer properties about the control system from properties of the pole such as whether the system is monotonic or converging. To find the optimum gain, where k moves the robot to the desired position in 1 step where d_i[n] (desired) is 1m and d_o[n] (actual distance) is 2m. If kT=-1, then k=-1/T=-10 and v[n]=k(d_i[n]-d_o[n])=10m/s, which is the speed needed to get there in a step. Since sensors introduce delay, you don’t see the ideal behavior. Adding delay to the sensor so, d_s[n]=d_o[n-1], then you’ll get to the wrong place due to the delay and will get poor performance. In general adding delays makes a system harder to stabilize with the robot having a number of delays. If kT is small and the poles are at z=-kT and z=1+kT, then they are near 1 or 0. A pole near one leads to a mode that doesn’t converge and an error could persist forever, so one pole is good and one pole is bad, but the pole at 1 will dominate the response. If kT is less than a quarter, the poles are complex leading to oscillations which were present in the experiment.

With a single component, there were the four behavior modes. For a second order system, it could become oscillatory. For a system where the input goes through 3 delays and an adder, there are 3 poles in the system with the same magnitude. First order systems can’t be simply mapped to higher orders without adding concepts, such as periods or dominant poles.

Unit 3 – Circuits

Lecture 7 – Circuits

Circuits are a more primitive physical layer. How do you actually make a device? Specifically, a lab for the in person course would be building a robot’s head to detect light. Circuits represent systems as connections of elements where currents go through them and voltages develop across them. A flashlight is a very simple circuit. When a switch is closed, current flows from the battery to the light. The battery is a source and the light is a resistor. Representing the flow of water as a circuit is possible as well. There are through variables and across variables. Through here would be the flow of water and the pressure is the across variable instead of current and voltage respectively. Circuits are important for physical systems directly for power grids or electronics, as well as to model complex systems in the brain. Circuits are the basis of the semiconductor industry where some have roughly a billion transistors. Neurons can be modeled using circuits as well explaining the relationship between parts and action. Thinking about it in terms of a circuit is the only successful way of explaining it.

A circuit’s primitives are sources, capacitors and resistors. Resistors obey Ohm’s law, v=iR, voltage sources have a constant voltage (v=v₀) and current sources maintain a constant current, i. How do they connect? The voltage source determines the voltage across the resistor, v=1V, so the current through the resistor is i-v/R=1/1=1A (amph). The current sources determines current through resistor, i=1A, so voltage is v=iR=1*1=1V. Many transistors behave as current sources. A part can only say 1 relationship such as their current. Complex circuits can be analyzed by systematically applying Kirchoff’s voltage law (KVL) and current law (KCL). KVL states the sum of voltages in a closed path is zero. For example, a voltage source v₁=V₀ goes up from a voltage source (represented as (+ -) and down through transistors v₂ and v₄ back to the voltage source. This means that v₁=v₂+v₄. The same is true for any other paths in the circuit going through each element at most 1 time. Signs come from reference directions, where every voltage has a positive and negative terminal that must be declared and stuck to. A planar circuit has independent inner loops and is drawable without crossing wires. One KVL equation can be written for every closed path.

For KCL, flow of current is similar to flow of incompressible fluid. If a current flows into a node and two currents flow out, then i₁=i₂+i₃. In electrical circuits, wires allow flow but not build up of electrons. Electrons build up, but in an ideal world, not in wires. The nodes in a circuit are where more than 1 element meet and the sum of currents going in is the sum of the currents out. 1 KCL equation can be written for each node but they’re not independent. The number of KCL equations in a system is 1 less than the number of nodes. With any closed path in a circuit, the sum of the currents going across it is zero. You can always isolate one node from the others, so you can write KCL for every node and they will be linear dependent, allowing you to remove 1.

Circuits can be analyzed combining linearly independent KVL and KCL equations (all nodes except 1) and one constitutive equation for each element. Using the node method, you can reduce the number of unknowns. It’s easier to write circuit equations by writing voltage associated with every node, so you’ll be able to get the voltage across every part. To start, label 1 node as ground to be the reference voltage, and it’s exact details don’t matter as it’s voltage will be treated as zero. With the rest of the nodes’ voltages, you can find all of the other elements’ voltages. Then, write the KCL equation for each voltage not known. Any simplification for voltage has an analogous simplification for current. Rather than thinking about element currents think about the loop currents. Each loop is assigned a variable as element voltages are linear combinations of loop voltages. Each element current can be considered a sum or difference of loop currents. Write 1 KVL equation for each loop.

Fundamentally, we have element equations and combinations. The three methods so far for analyzing the circuit include one which analyzes currents and voltages for each element. One that looks at nodes for voltages, and one that looks at loops currents. Each requires all constitutive equations and can identify KVL or KCL equations. In the case of node voltages, elements are differences in the node voltages and element currents are a sum of loop currents in current case. Circuits can also be abstracted in a way similar to systems functions (H) from the previous unit. A series combination of two resistors is the same as a resistor with the resistance of both added together and a parallel combination is the same as a resistor with conductance (1/resistance) of the sum of both’s original conductance. The series combination will be larger than the originals and the parallel will be smaller. If the same current flows through two resisters, then the voltage falling across each is I=V/(R₁+R₂). V₁ = R₁I = V(R₁/(R₁+R₂)) and V₂ = R₂I = V(R₂/(R₁+R₂)). There’s a proportional voltage drop across both resistors. The bigger the resistor, the bigger the drop in voltage across it. Resistors in parallel act as current dividers and the current tries to go through the smaller resistor.

Chapter 6 – Circuits

A circuit is a path or route that starts at and returns to the same place. In electronics, it’s a closed path where currents can flow in a loop. Abstraction in circuits is different than software of LTI systems since connecting circuits will change voltage. A circuit does enforce a constraint of voltage and currents entering and exiting it which will be true no matter what’s connected. Circuits are made up of elements connected by nodes. Nodes can be thought of as wires connecting components. Voltage is the difference in electrical potential between 2 points in a circuit and a chosen ground point will be chosen to have voltage 0 with other points being defined with respect to ground. Any point can be chosen for ground and the results will be the same. Current is the flow of electrical charge through a path, with a positive current generated by negative charges moving in the opposite direction.

Starting off, the course looks at components with two terminals (connection points). Each component has rules for it’s relationship between voltage and current and in this course, components will exert a linear constraint. Circuits can be modeled in terms of dynamics and how currents and voltage change over time, with difference equations, but dynamic properties converge quickly and the converged state can be modeled instead.

Conservation laws are properties a circuit must hold. KCL and KVL are examples of these. Fluids are compressible or incompressible, air can be compressed with it’s volume decreased by applying pressure. Water is pretty much incompressible and laws are easier for incompressible fluids. The net flow of incompressible fluid into a region of fixed volume must be zero, an equal flow going in must go out. Similar to a heart’s pressure helping blood circulate or pressure moving water through pipes, voltage propels electrical currents. Voltage is associated with every node in a circuit and the difference between voltages at two nodes that causes currents to flow between them.

Circuits behavior isn’t only determined by KCL and KVL. An example is a two element circuit with 2 element currents i₁ and i₂ and voltages v₁ and v₂, which are all unknown. Every element imposes a relation between voltage and current interacting with the element, known as a constitutive relation. Sources have the simplest relation as it’s voltage is a constant, v₀, regardless of current. A voltage source is drawn on a diagram as a circle with a plus and minus symbols and a number representing the voltage. A current source is an element with a constant current, i=I₀, regardless of voltage. THis is drawn as a circle with an arrow for it’s direction. Units for voltages are volts and currents have amperes or amps. A resistor is an element where the voltage is proportional to current by constant R, so V_R=Ri_R. Resistors are drawn as a zigzagging line and have dimensions of ohms which are volts divided by amps.

One strategy for solving a circuit for each element is to write down all KVL and KCL equations, add constituent equations for each element and solve for voltage and current at each component. Some of the loops will be redundant however and this is time consuming. Node voltages can eliminate redundancy, using voltages associated with nodes instead of elements as elements can still be determined. The NVCC or node-voltage and component current method has variables for all voltages for all nodes and currents through all components can be used and is general but long. To use it, label all nodes n₁ to n_n and place voltages v₁ to v_n at each node. Make 1, n_g the ground node with value 0. Add currents i₁ to i_m for each component and label a direction for each current. Write n-1 KCL equations with 1 for each node except the ground node, showing sums of currents are 0. Write m constitutive equations with one for each component describing linear relationship between current i_k and voltage difference across the component. The voltage for the the component is v_k+-v_k- which are the voltages of the positive and negative terminals where the current runs from positive to negative. Solve these to determine node voltage and component currents.

The node method is the optimal solution and has the goal of writing down a single equation per node. When a voltage source connects two nodes, it constrains their voltages so one can be determined from the other. A n node circuit requires n-1 KCL equations. If no voltage sources exist, then each KCL equation is a sum of element currents that can be determined from node voltages using the element’s constitutive equation, so n-1 node voltage variables. If there are voltage sources, then there are a couple problems. Each voltage source connects two nodes and is represented by 1 variable and the current through it can be determined from its constitutive equation, where v_s=V₀ which provides no information about the current through the source. A couple more examples are shown of calculating circuits along with the examples of parallel and combined resistors from the lecture.

Pieces of circuits can’t be abstracted as input and output elements in the same way as everything connected can change the currents and voltages. A method for abstraction with circuits is finding a simpler circuit with the same constraints as the original, treating a complex circuit as a simpler one. One way of combining elements is to use Thevenin’s theorem where any combination with two terminals can be replaced with a single voltage source, V_th and a single series resistor R_th. V_th is the open circuit voltage at terminals V_oc and R_th is V_th/(-i_sc). V_oc is the open circuit voltage across terminals or the voltage drop across wires if nothing were connected to them. If the wires were connected, the current running through them is the i_s, the short circuit current. Using these combinations, more complex circuits can be viewed as nearly equivalent simpler ones. The constraints looked at will be the same, but some properties would be different in the actual circuit. Another strategy mentioned is finding an equivalent circuit consisting of a single resistor in parallel with a current source, called a Norton equivalent circuit.

In addition to resistors and voltage and current sources, another component is an operational amplifier or op-amp, which are useful for decoupling aspects of complex circuits to help them be designed independently and connected. They also help amplify small voltage differences to larger ones and alter voltages. They typically have four terminals, two inputs and two outputs. The difference between the output is controlled by the difference between the inputs. Most of the examples from the chapter are the same as covered in course, though this chapter is probably the best place to return to if I need a refresher on circuits or need to practice the calculations.

Lecture 8 – Op-Amps

How do you make modular circuits? Op-Amps are one solution to this problem. Before we tried to find the smallest number of elements required to get equivalent KVL and KCL equations using rules for combining elements. When looking at writing a program to solve circuits, the easiest system to automate isn’t the same as the easiest for people to understand.Something similar to NVCC is easier than the node or current simplifications from the last lecture. In a circuit, the presence of every element changes every other element. If you change one thing, you change everything. Closing a switch is equivalent to adding a component for example, causing it to be able to go down a different path, where a light bulb turning on would act as a resistor.

Interactions between elements can be reduced or eliminated using an op-amp as a buffer. An op-amp circuit can produce an output equal to its input without impacting the other half of the circuit. Op-amps have multiple connections unlike the two shown for every other element. A new type of element is a dependent source which generates a voltage or current with a value depending on another voltage or current. For example, a current-controlled current source (drawn as a diamond instead of circle). This current source generates a current controlled by another current variable, which can be in a separate loop, called a two-port, where the loop with the dependent source needs the equation for the current it depends on. This will take twice as many equations. One-ports (such as resistors, voltage sources, and current sources) can be characterized with a single equation. Two-ports are different from two one-ports as the coupling can’t be modeled.

An op-amp is a voltage-controlled voltage source, drawn as a triangle is a two-port and has a plus and minus terminal and output. The total current flowing into each plus and minus leads is zero. The voltage is determined with a K value, which when large can have the op-amp act as an amplifier. As K goes to infinity, the difference between V₊ and V_– goes to zero and V_o=K(V₊+V_–) for an op amp. This is called the “ideal op-amp” rule and makes solving problems easier since the op-amp’s effect will be to make the inputs the same. Treating V₊ and V_– as the same value simplifies equations. The ideal op-amp assumption does have a problem since it applies that a circuit where an input comes in the minus and wraps around the plus is the same as if the plus and minus were flipped. What’s going on is that we’ve made a mistake thinking about the ideal op amp and a voltage controlled voltage source. It’s good for making a simple model but not at describing how things are actually working. In actuality, the op-amp is moving charge around. Charge accumulates in a capacity and how flux of water generates height is similar to how flux of charge generates volts. The op-senses voltage at input and output and pumps or removes charge before outputting it. The model looks nothing like what is actually in a physical op-amp. If the input goes into the negative lead, the output will go the wrong way and convince the op-amp things are worse and cause more current to go the wrong way. This would need to be hooked up to use negative feedback to get a stable equilibrium. Typically, the output voltage of an op-amp is constrained by its power supply, V_EEoCC where EE and CC are the power pins. The ideal op-amp approximation only works with negative feedback.

Lecture 9 – Circuit Abstractions

Op-amps are a great solution for modularity in a lot of cases, but it’s a complicated part and can be expensive with 200 transistors in most op-amps. In examples, op-amps are a more complex device than the rest of the circuit. Other ways of achieving modularity in circuit design are looked at in this lecture. A straight line is shown as a property of the left part of a circuit shown with a left and right part and pattern will apply regardless of the circuit on the right. In this case, there’s a linear relationship between voltage and circuit and this allows us to think about the circuit on the left separately. The current-voltage relation summarizes all possible behaviors of a circuit regardless of what it’s connected to and a single circuit can be thought of as a one-port and single element, more complicated but similar to a resistor or source. Similar to other parts, this abstraction will have a rule between voltage and current. How often will the relationship be linear? If the i-v curves for two one ports are both straight lines, then the i-v curve of a parallel combination is a straight line as well. The sum of two straight lines is also a straight line. Two linear curves in a series also result in a linear curve. Any combination of parts with linear i-v curves create a linear function. Since equations for the separate elements are linear, solving for a system uses linear equations. If all elements have a linear i-v curve, you’ll lead to a linear i-v curve, so it will be true as long as you use “ideal” parts. Since more than one circuit can generate the same straight line, this can help reduce systems to an easier system with the same plot. Figuring out the line, reading the x-axis and slope and using those to construct a simple circuit that’s the “Thevenin equivalent.” This helps create an abstraction even without buffers.

Regardless of how complicated an arbitrary circuit is, if a circuit is composed of only linear parts, you can fully characterize it using two points since it can be represented as a straight line. Set the voltage to 0 and set the current to 0 to get two points. How big is the open-circuit voltage (0 current since no connection between terminals) and how big is the short circuit (where voltage is zero). Thevenin equivalents depend on the terminals used as well. Two reasons for thinking about equivalent circuits are to have a useful abstraction and what would happen if you change a load on a circuit without recalculating everything. When you buy a physical part, they tell you how it works via equivalent. An op-amp’s usefulness is specified with Thevenin equivalent since they don’t know exactly how the buyer will use it. Thinking of the switch closing example in terms of Thevenin equivalents is easier since you can replace the left part without the switch and the right part with the switch with equivalent circuits and the part values don’t need to be found to see the effect that closing the switch increases the current. Circuit designers don’t typically solve circuit equations as they know how it will work.

Thevenin and Norton’s equivalents came from linear algebra. Lastly, if you have a system with multiple sources, then more constant terms are added to the equation and the answer to a_N + a_N+1 is the answer to the sums of both. Superposition is solving multiple circuits by turning one source on at a time and summing them. Replace the current source with an open circuit and compute i, for example. Setting voltage to zero is done by short circuiting, replacing the voltage with a short circuit, getting a current divider. The result of having both sources on is the sum of both equations. This lecture was all about applying linear algebra to circuits and ends with a message to take 18.06.

Unit 4 – Probability and Planning

Lecture 10 – Discrete Probability and State Estimation

How do you deal with uncertainty and things that are harder to plan? How does a robot make a map of an unknown maze and use it to figure out where it is and plan a route to a goal? If the robot starts somewhere and knows where it is and wants to go, making a plan is easy with a straight line until it fails due to walls. Probability theory is a framework for reasoning about uncertainty making precise statements about uncertain situations and drawing reliable inferences from unreliable observations. Using this theory, we want to design robust systems that won’t be thrown off by a new factor.

For a game, there are four white or red legos in a bag. If you pull a red brick randomly, you get 20 bucks. How much are you willing to pay to play the game? What if they only put white blocks in the bag? How the blocks are chosen changes things. If you know there’s 1 red one in the bag?

Probabilities are assigned to events which are possible outcomes of an experiment, such as all possible outcomes of 3 coin flips. Atomic events are mutually exclusive and only one can happen. The set of all possible atomic events is collectively exhaustive covering all cases and called the sample space. For 3 coin flips, there are 8 atomic events (combinations of heads and tails). Three axioms derived from probability theory are that probabilities are real numbers that aren’t negative, the probability of the sample space is 1, and if the intersection of A and B is empty, the probability of the union is the probabilities of the individual events added. These rules are called non-negativity, scalability, and additivity. The probability of the union of A and B is generally Pr(A)+Pr(B)-Pr(AnB), the probabilities added minus the intersection. To find probability, enumerate the sample space, assign probabilities and find the target events.

Bayes’ rule specifies the probability one event A happened assuming B happened. Pr(A|B)=Pr(AnB)/Pr(B). Find the probability of the intersection over the probability that B occurred. In a venn diagram of the two spaces, what’s left of the sample space occurs in the B circle. For instance, the probability of getting an odd number given a die roll is greater than 3 shrinks a sample space to 4,5,6 and makes the probability ⅓. If we know B happened, the sample space shrinks. Depending on overlap, A can become more or less likely.

A random variable is a probabilistic analog of a deterministic variable. It represents a distribution instead of a number. For instance, if X is the result of a die roll, then there are the events that X equals each option. After formalizing this, it’s easy to say there’s a multidimensional random variable making it easier to find situations that factor. If V is the first die roll and W is the second, then Pr(V,W) is the joint probability distribution and Pr(v,w) is the Pr(V=v and W=w). This is a convenient notation for writing more complicated things concisely. Dimensionality in joint probability distribution can be reduced by marginalizing, where you collapse dimensions by summing over all possible outcomes along those dimensions. For instance, if you have a six by six sample space for the two rules each possibility can only happen 1/36. If you don’t care about about 1 roll, then you only care about ⅙ or 6/36. Conditioning is collapsing by accounting for new information that restricts outcomes and using Bayes’ rule. Another example considers effectiveness of an AIDS test, dividing population into patients without or with AIDS and the value positive or negative of their test, dividing into four pieces. Probability that test is positive given a subject has AIDS is Pr(Test=positive and AIDS=true)/Pr(AIDS=True).

Probabilities can also be represented in Python with dictionaries to allow for variable names. In this case, the class is making a DDist class associating the same of an atomic event using any python data structure and associating it with a probability. So you can associate strings or tuples with probabilities for keeping track. The prob method will look up the probability or tell you zero, to easily handle missing values. So, toss = dist.DDist({‘head’: 0.5, ‘tail’: 0.5}). dist.prob(‘head) gives 0.5 and same with tail, another string will give 0. Tuples can be used to store probabilities of joint distributions. Professor assures this is a reasonable notation.

Applying probability to the wall robot, where you are based on velocity and noisy sensors are two factors. How do you combine them? Looking at Hidden Markov Models for a system with a state that changes over time. Discrete time steps are from 0 to 1, random variables for states at each time S₀, S₁,… and random variables for observations noted O_t. The state at time t determines the probability distribution over the observation at time t and state at t+1. The initial state distribution is Pr(S₀=s), the state transition model is Pr(S_t+1=s|S_t=r) and the observation model is Pr(O_t=o|S_t=s). Given a sequence of observations, Pr(S_t+1=s|O₀…,O_t=o_t). Based on velocity and sonars, figure out where you start. Based on the velocity and where you think you are, where will your next location be. A transition model describes where you start and finish in one step.

Going back to the lego back problem and how much to pay? How likely are the 5 possibilities of bag combinations? Assuming each is ⅕ likely, in 0 red case, you make 0, in 4 red, you make 20. In 1 red, you make 5, 2 reds is 10 and 3 reds is 15. Divide each by the 5 to get their expected values and add them up to have an expected value of 10.

Chapter 7 – Probabilistic State Estimation

Using state machines to model systems and change over times can be used to predict behavior over time. A state space described by more than 1 integers or combination of continuous and discrete variables is a factored state space and this chapter focuses on representing them and manipulating their probability distributions. Probability theory lets us assign numerical assessments to possible events and do calculations with them. Probability statements are typically long-term frequencies such as a coin getting heads half the time. This is the “frequentist interpretation”. They can also be interpreted in degree of belief in a statement or the Bayesian interpretation. The calculus is the same either way. Much of the first half of the chapter covers the same content as Lecture 10. The Python dictionary example is further expanded to include a removeElt and incrDictEntry to remove and increment values at elements. Functions for condition and marginalize are added as well for running those processes.

A PCAP system for constructing distributions is specified next showing primitive standard distributions and using a mixture distribution to combine them. Delta distributions hae the probability mass on a single element, so 1 element has probability 1. Uniform distribution assigns 1/n probability to each of n elements. A square distribution sets two points in a range, lo to hi and probability is 1/(hi-lo). A triangle distribution takes a triangle shape on a graph and has a peak and halfWidth variable with values decreasing at each half width. To combine distributions with a mixture distribution, specify distributions d1 and d2 and a mixing parameter p between 0 and 1 and determine the probability to use each distribution. Pr_mix(x)=p Pr(D₁=x)+(1-p)Pr(D₂=x).

Going back to state machines, what can we infer about the current state from the history of noisy observations. For instance, a copy machine can be modeled as having a good or bad state and we can see from a copy the result, we’ll have to ask it to make a copy. To build a model of a system, use a probabilistic generalization of state machines called stochastic state machines, where time is a discrete sequence of steps and state, input and observation are thought of at each. Create a probability distribution for the state at t+1 given the history of inputs and observations up to t. Use SSM model for the copying machine with a 90% percent of being good, the initial state distribution is Pr(s₀)={good: 0.9, bad: 0.1}, the state transition model is Pr(S_t+1|S_t) is {good: 0.7, bad: 0.3} if S_t is good and {good: 0.1,bad: 0.9} if S_t is bad. The observation model of Pr(O_t|S_t=perfect:0.8, smudged:0.1, black:.01} if S_t is good and {perfect:01, smudged: 0.7, black: 0.2} if S_t is bad. These models are coded into python returning the proper probabilities based on the old state and a new class StochasticSM is created with the SM as a parent to make use of them instantiating a copying machine with them.

State estimation is done by taking a sequence of inputs and observations and determining the sequence of hidden states of the system as best as possible.It won’t be exact but can lead to useful probability distributions. The problem of filtering and imagining watching inputs and observations go by is done by producing a state estimate Pr(S_t+1 | O₀,…,O_t,I₀,…I_t). So for the first printing in a copy machine if we get perfect, then we have a perfect page, what’s the probability Pr(S₁|O₀=perfect)? If the joint distribution has the form good: {perfect: 0.72, smudged: 0.09, black: 0.09} and bad: {perfect: 0.01, smudged: 0.07, black: 0.02} then based on the observation {0.72, 0.01}/0.73, Pr(S₀=perfect) = {good: 0.986, bad: 0.014}. We can save these and use them in future calculations as B’₀. What’s the probability one step forward Pr(S₁|O₀=perfect, based on what we observed at step zero. This is a joint probability, Pr(S₀, S₁ | O₀=perfect) and Pr(S₀=perfect) is a distribution on S₀ with Pr(S₁ | S₀₀ and S₁ are good is 0.690, both bad 0.012, S₀ good and S₁ bad is 0.296 and vice versa 0.001. Based on those, Pr(S₁ | O₀=perfect) = {good: 0.692, bad: 0.308}. Printing another page that shows O₁ is smudged, then create a joint distribution from B₁ and Pr(O₁ | S₁) and condition on O₁ = smudged to get B’₁ and construct a joint distribution from B’₁ and Pr(S₂ | S₁₁ to get B₂. This leads to only a .246 chance that it’s good and .754 chase it’s bad.

Generally, the state update procedure involves states, observations and input at each time, an initial distribution, observation distribution (since time-invariant, the same for all t), and a transition distribution. Assuming we have a belief state at time t, the observation update given o_t will be Pr(S_t | O_0..t=o_0..t,I_0..t=i_0..t)= (Pr(O_t=o_t | S_t)Pr(S_t| O_0..t-1=o_0..t-1,I_0..t-1=i_0..t-1))/(Pr(O_t=o_t | O_0..t-1\o_0..t-1,I_0..t-1=i_0..t-1) and the transition update give i_t is Pr(S_t+1|O_0..t=o_0..t, I_0..t=i_0..t)= sum over r of Pr(S_t=r,I_t=i_t)Pr(S_t=r|O_0..t=o_0..t,I_0..t-1=i_0..t-1). These definitions help build a recursive state estimator. If belief at time t, B_t=Pr(S_t|O_0..t-1 = o_0..t-1, I_0..t-1=i_0..t-1, and the belief after making the update will be B’_t=Pr(S_t|O_0..t = o_0..t, I_0..t=i_0..t. Variables can be updated to reach the next step or run in a loop.

In python the state estimation process can be run as a state machine with B the belief state as a probability distribution over the states. Each input is an (o,i) pair specifying observation from and the input to the machine for the step and the output is a probability distribution as a dist.DDist over the states of the system. Using the same probabilities, we can make a state estimator with the model shown earlier that works with the copy data.

Lecture 12 – Search Algorithms

Bayes’ rule was about updating beliefs based on new information. In order to figure out the best move in a situation, search through all possible options that could be done and choose the best. A puzzle is shown again where numbers 1 through 8 are put on a 3 by 3 grid with an empty space and the goal is to get them in order by moving an adjacent piece into the empty space. There are 9 factorial possible combinations of spaces or 362,880 possible combinations, so random guessing won’t be a good option. What are different ways of figuring out how to solve this problem? As a simpler problem, if a 3 by 3 grid has letters A-I, what’s the minimum distance between 2 points? Organize all possible paths with a tree. If going from A to I, the first level would be from A right to B or down to D. Then draw out the second layer and so on. Since the tree could be infinite as the size grows, a better algorithm will construct the tree and search for an answer at the same time. Represent possible locations as states ‘A’ through ‘I’. Transitions will be represented by a program “successor” which takes the current state and action and figures out the next state. An initial state is required as well as a point to end at, checked by a goalTest function. In Python, the tree is represented as SearchNode objects which know the action that got to that node, the current state and the parent. A method on the node will report the path that led to it.

With the objects and functions set up, initialize the agenda, which is the list of nodes being considered, to contain the starting node. Then run a loop of removing one node and adding it’s children to the agenda until the goal is found or the agenda is empty. Return the resulting path. This will both construct the search tree and also checking if you’ve constructed a child node that’s the answer.

The search procedure is

Def search(initialState, goalTest, actions, successor):
    if goalTest(initialState):
        return [(None, initialState)]
    agenda = [SearchNode(None, initialState, None)]
    while not empty(agenda):
        parent = getElement(agenda)
        for a in actions:
            newS = successor(parent.state, a)
            newN = SearchNode(a, newS, parent)
            if goalTest(newS):
                return newN.path()
            else:
                add(newN, agenda)
    return None

This is a depth first search algorithm that walks along the left edge of the tree going as deep as it can before checking other nodes on the same layer. A depth-first tree could also search from left to right. If after pulling a node out, it’s children are put at the beginning of agenda, then a search will be depth-first. If placed at the end of the agenda, it will go through layer by layer and be a breadth first search. The general tools are the same but order determines how the search is conducted.

A depth-first search can be implemented with a stack which is a LIFO data structure and the breadth-first with a queue which is FIFO. A stack is similar to a stack of dishes, the last item put on it is taken off first. Stack.push() adds an item and pop() takes it off. A queue is similar to a line that pushes to the end of the line and pops from its front. To customize the search function, agenda can be implemented as a stack to make it depth first or a queue to make it breadth first. With classes stack and queue implemented, agenda=Stack() or agenda=Queue() are the only changes needed. Stacks and queues are also implemented with empty functions typically.

In the case of going A to I, it’s extra work to go back to any steps so a good amount of the tree can be ignored. If you’d go to a previous state, stop looking on that path. This can delete half of the tree in this example and is implemented by checking if a state is in the path before appending to the agenda and check’s the current node’s state as well as it’s parent’s, recursively. This is a pruning rule which simplifies solving a tree. A second pruning rule is if multiple actions lead to the same state in the same number of steps, you only need to look at one of them. This is implemented by keeping a list of child states and asking if new states are already in the list of children and don’t append the next state if it’s in the list of child states.

Depth first search won’t work for every problem as it can get stuck running forever potentially in an infinite loop. Even when it finds a path it doesn’t always find the shortest path. In comparison, breadth-first search always returns the shortest path. It requires more space and can run forever if the answer doesn’t exist. For A-I it looked at 16 nodes even though there were only 9 states, which doesn’t sound good.

The dynamic programming principle says the shortest path from X to Y that goes through Y is the shortest path from X to Y and Y to Z. This means, we only need the shortest path from the start state to each other state. The first path that BFS finds is the best path to get there. Keeping track of all states visited can help prune this, creating a visited dictionary and putting each state in the visit list. Then, ask if the child is already in the visited list and don’t keep going on that path if so. If there’s a depth 3 way to get from A to D and a depth 2 way, you don’t need to consider the depth 3 way. BFS with dynamic programming will never take more than the number of states and just requires maintaining two lists instead of 1.

In summary, this lecture focuses on depth first and breadth first search algorithms and the three pruning rules, don’t consider any path that visits the same state twice, only consider one action leading to the same state. Dynamic programming is similar to a third pruning rule that only considers the first path to a given state. The recitation associated with weeks talks about using search for state transitions, searching over a legal action list and its associated states showing another example of DFS and BFS and going through the same pruning rules.

Lecture 13 – Optimizing a search

This lecture generalizes the search framework leading to a uniform cost search and focusing on using the order to improve search efficiency. So far the cost of each child in a search problem has been the same. Sometimes different actions can have different costs and it may be more beneficial to use more node switches that have lower weights. BFS with dynamic programming was the best example from last time assuming each hop between nodes had a weight of 1. What if the edges between nodes were a number of miles instead. Associate action costs with actions, then enumerate paths in order of path costs. Modify the queue to be a priority queue where each push has a value and an associated cost, and pop returns the element pushed with the smallest associated cost. For a priority queue class, an example of a pop method is

def pop(self)
    (index, cost) = util.argmaxIndex(self.data, lambda (c,x): -e)
    return self.data.pop(index)[i]

This isn’t a good implementation but an example of one way to do it. The cost for each action is kept track of in the node instantiation is kept track of as well. Each node keeps track of the total path cost so far. The agenda is now kept track of with a priority queue class. Another change that needed to be made is the goalTest function which checked each node to see if it was the right state used to be able to return the first child it found since all would have the same weight in BFS. Now, goal test must be deferred until all of the children of a node are pushed as a later child might have a smaller cost. To figure out the shortest path, all child nodes have to be expanded. Wait until a path has been compared to all of its siblings. This means keeping track of the states after they’ve been expanded instead of after they were visited, using an “expanded” list instead of “visited.” After each pass, look at the children with the shortest paths first. Even if the correct path ends up being longer at a step, it will first extract the item with the minimum length. To illustrate this, paths from B and F to C have a weight of 5 and all others have 2, and it shows the queue prioritizing other paths. Depending on how exactly you set up the search you can do a lot less or more work and if you can do less work, you can do harder problems.

A search problem has been represented as a series of states represented as nodes in a tree.

So far, every search has started at a starting state and widening the search. This won’t always be the case. For instance, if you’re trying to go from one difference on a map, you aren’t going to care about a random state just because it was the first node added to a map. Ex, going from Massachusetts to New York, you don’t care about Wyoming.

All examples so far considered the next step based on the path from the starting point to the destination. The idea of a heuristic is to add something that informs the search of an estimated distance from the start to finish. The problem is that finding the second part of a path is as hard as finding the first part. Making the search better informed by making the search more complicated isn’t always a great trade off. As an approximation, you could add a Manhattan distance, X+Y on a grid to figure out the number of steps. This isn’t great for an actual map, but is a decent approximation for a grid. Where you would have looked at cost before, use a heuristic function. Be careful not to make the next step look too big as it won’t be looked at. Avoid overestimating as too big a heuristic can miss the best path. A heuristic too small underestimates the penalty in the wrong direction. The heuristic should be the same as the actual path or slightly less than. The grid search is much faster with a better heuristic. Different heuristics can give different solutions of the same length. With a heuristic you can search a much smaller number of states to get the same quality answer, showing the importance of order. An A* search combines heuristics and costs and is hinted at in the recitation and lecture. The recitation shows another example of dynamic programming and using heuristics.

Chapter 8 – Long-Term Decision Making and Search

The chapter gives in detail examples with code and diagram for DFS and BFS searches along with applying dynamic programming, pruning and heuristics. Additionally, search is connected to state machines showing that a state machine has the search problem of finding inputs that will get it from its current state to a finished state. Adding a legalInputs attribute gives a list of options to search through at each step to find the right combination and the startState makes for an obvious starting point and the finished state as a goal. getNextValues can serve as the successor function for the search problem. A search function using a state machine is shown which calls the existing search function passing in values for the state machine.

A problem shown here involves a numeric state machine which starts at an integer and can apply a variety of equations to the number with the goal of getting to ten. Depth-first search is terrible as if it passes 10 it can go on forever without reaching ten. Breadth search is much better and even quicker with dynamic programming, becoming more efficient the larger the desired number. With depth-first search, adding bounds can help though it will still take longer than breadth-first.

b is the branching factor or number of successors a node can have and d is the graph’s maximum depth, l is the solution depth or shortest path from start to goal and n is the state space size or number of states. With these variables, DFS can search the entire tree, with d levels which each have b times as may as the previous (b^d on level d) and visit b^d+1 states or so and require b*d space. Breadth first only needs to visit b^l+1 but the storage space required can be b^l. This can still be quite a lot but dynamic programming reduces the algorithm to visiting at most n nodes which can be much smaller and generally useful when there are potentially multiple paths to the same state.

Uniform cost search and priority queues are explained in detail again with helpful diagrams and the chapter wraps up with heuristics. A heuristic function takes a state as an argument and returns a numeric estimate of the total cost it will take to reach the gao from there. This allows for biasing the search function toward states closer to goal. Modifying uniform-cost to use a heuristic function gives the A* algorithm, which’s main difference is that when a node is inserted to the priority queue it’s cost is newN.cost+heuristic(newS), which is the sum of the actual cost of the path from the start state to the current and the estimated cost to go from the current to the goal. This halves the number of nodes visited and expanded compared to UCS in the given example, but effectiveness depends on chosen heuristics. If we knew the answer we wouldn’t have to calculate anything so that’s unrealistic, but the goal is for the estimate to be as close to the shortest solution path as possible and efficient to compute. A heuristic function is admissible if it’s guaranteed to be an underestimate of the actual cost of the optimal path to the goal and guarantees finding the shortest path in the state space. Otherwise, for instance, if the goal can be reached in 10 steps but the heuristic is 100, then it could find a state 89 steps from the optimal path instead.

Conclusion

Overall, this course covers a variety of topics related to math, programming and engineering. While there is some hands on learning with the robot in the course itself, the open courseware still provides a great amount of information with the lectures, readings and recitations. Still, for the topics I am familiar with which are programming, probability and searching algorithms, there wasn’t much new here. In comparison, for circuits and robotics what I saw was interesting and left me feeling like I need to learn more to understand those fields better. All in all, this makes sense as the course is meant to be an introduction to various topics in a much wider field. Next up is Introduction to Algorithms which is a course that I’d started and abandoned awhile back and going forward most courses will be directly related to computer science.