# CompSci 367 notes

#### Logic

• Entail - KB |= a if and only if a is true in all worlds were KB is true
• Infer - KB |- i a = sentence a can be derived from KB by algorithm i
• i is sound if whenever we infer something from i it is also entailed by the KB
• i is complete if something is entailed by the KB, then i infers it
• Conjunctive normal form (CNF): conjunction of disjunction of literals e.g. (A or B) and (B or C)
• Negation in prolog, when something cannot be proved true it is false
• Complete - always finds a solution if one exists
• Optimal - finds the best solution
• b - branching factor
• d - depth of least cost solution
• m - maximum depth of state space
• f(n) is the desirability of node n
• g(n) is the cost from initialState to n
• h(n) is the estimate of the distance from n to goal
• Admissible if the estimate is less than the actual cost (i.e. never overestimates)
• Consistent if obeys the triangle inequality (i.e. h(n) is <= the cost from n to n’ + h(n’))
• a heuristic is likely to be better when its average h-value (r) is higher
• dominates another heuristic when h1(n) > h2(n) for ALL n
• uninformed tree = b^d, informed = b^(d-r)
• heuristic can be formed from a relaxed problem (reduce the restrictions)
Complete? Time Space Optimal f(n) Notes..
BFS Y (if b finite) Exponential Exponential Y (if step costs = 1) Uses a tonne of space (Graph search)
DFS N (loops/infinite m) Exponential Linear (only keeps best in mem) N
ID DFS Y Exponential Linear (discard after each step) Y (if step costs = 1) DFS + BFS advantages
Greedy N Exponential Exponential N h(n) Doesn’t care about g(n)
A* Y Exponential Exponential Y (if h is admissible + consistent) g(n) + h(n)
IDA* Y Exponential Linear Y g(n) + h(n) Iterates on the f-limit - start with h(init)
Weighted A* Y Exponential Exponential Y g(n) + h(n) * w will be no worse than w times as costly as optimal
Bidirectional A* Front-to-Back Y Exponential Exponentially smaller (b^(d/2)) Y g(n) + h(n) h(n) estimates diastase to the opposite terminal (init or goal), keeps searching for optimal
Bidirectional A* Front-to-Front Y Exponential Exponentially smaller (b^(d/2)) Y g(n) + h(n) h(n) estimates diastase to the opposite frontier, optimal on first collision!, cost of computing h grow exponentially
Min-Max Y Exponential Linear Y (against optimal opponent) minmax value
Min-Max a-b pruning Y Exponentially smaller (b^(m/2)) Linear .. minmax value doubles depth which can be done in the same time
• Monte Carlo Tree Search plays out the rest of a game randomly, thousands of times, picks the one with the highest win rate. (explores the whole problem space)
• Hill climbing algorithm
• always moves toward a maxima - can get stuck on a local maxima, missing the global maximum
• add random-restarts when you get stuck - trivially complete (depends on the shape of the state-space landscape)
• Simulated annealing search - allows some ‘bad’ moves to escape local maxima, but gradually decreases their size and frequency
• Local beam search
• better than random restarts as information is shared – but can get concentrated in a small region
• Stochastic beam search - choose k successors randomly, biased towards good ones (natural selection)

#### Genetic algorithms

• requires: reproduction, population, variety, difference in ability to survive
• reproduce with a probability proportional to their fitness
• overcrowding can happen if one individual is too dominant

#### Constraint satisfaction problems (CSP)

• state is defined by variables Xi with values from domain Di
• goal test is a set of constraints specifying allowable combinations of values for subsets of variables
• Unary constraints involve a single variable (X != 5)
• Binary constraints involve variable pairs (X != Y)
• Higher-order 3 or more variables
• Incremental search - assign values to variables that do not conflict with current assignment
• Backtracking search (DFS)
• choose the most constrained variable i.e. with the fewest legal values (minimum remaining values (MRV) heuristic)
• Forward checking - keep track of legal values for all variables, abort when a variable has none remaining
• Arc consistency - also keeps track of the affect that forward checking has on neighbours (detects failure earlier)

## Week 6 - 12

#### Decision making & choice

• Given: some goals/objectives; candidates relevant to these aims; descriptions for each
• Find: one or more candidates to select/carry out
• Decision theory (prescriptive)
• Given: K alternatives; N attribute values; N weights (importance)
• Compute: the utility value of each alternative == UJ = sum of each (weight x value)
• Heuristic choice (descriptive)
• humans do not make optimal choices but instead satisfice (i.e., use heuristics to select an acceptable alternative based on an aspiration level with minimal effort)
• humans retrieve chunks of relevant memory (by pattern matching) to produce associated responses very rapidly
• AI defines two forms of heuristics:
• Symbolic heuristics – rules or relational patterns (as in human cognition, chunks – used in heuristic choice theory)
• Numeric heuristics – arithmetic combinations of attributes (similar to utility functions – used to find optimal choices and in search)

#### Reactive control for routine behaviour

• sequential decision making by invoking the basic process repeatedly –> reactive control task
• Given: description of environment; related goals [+ beliefs, or _mental states_]
• Select: an appropriate action to execute
• routine control is conditional, which gives it great flexibility. this can be done by direct mappings from perception to action – known as stimulus-response (does not take advantage of mental states)
• in reality, humans draw inferences about current situation, have memories of past decisions, knowledge of order of activities, breakdown activities to subactivities, take goals into account – extensions can attempt to address these issues, but the stimulus-response paradigm has inherent limitations

#### HTNs

• hierarchical structures: +each component is simple, +modular +reusable +composable dynamically
• comprise of a set of methods with: a task predicate; conditions under which the method applies; subtasks that make up the method; [effects the method produces]
• imposes a hierarchical, sequential, conditional structure on activities
• this produces an AND/OR tree which is transversed over time – this reactive, but continues along the current path when possible
• executing HTNs given some task T, on each cycle:
1. infer a set of beliefs from stimuli (using conceptual rules)
2. find (through the hierarchy) a path of applicable methods
3. execute

#### Production systems

• can model human cognition – support mental activities; combine parallel retrieval with sequential decisions; balance stimulus-driven and goal-driven behaviour; modular representation supports structural learning
• made up of 2 main components:
• Production memory containing generalized rules that specify domain knowledge as conditional responses – condition-action form
• Working memory that contains specific elements
• internal stimuli – encoding the systems beliefs and goals
• changes as the program runs
• runs recognize-act cycles
1. match each rules conditions with working memory
2. select one (or more) of the matched rules to execute
• conflict resolution can be: match more recent elements, match more specific sets, more conditions, rules added earlier
3. execute

#### Causal models + Qualitative reasoning

• Given: entities & attributes; relations/rules that relate them; external influences
• Find: the resulting effects on attributes of interest
• this makes a mental model – analogous to a physical model
• definition: X causally influences Y if a change in X‘s value results in a change in Y‘s value (provided everything else constant)
• this does not infer causality
• e.g. `oil_production --gas_price --traffic ++pollution ++lung_disease`
• use a qualitative but dynamic causal model to generate Qualitative Envisionments
• include: initial qualitative state, set of possible qualitative successor states, set of transitions
• envisionment encodes the set of possible trajectories
• nondeterministic because they abstract details
• -multiple influences can cause weak predictions, -difficult to visualize and interpret

#### Explanations

• Given: set of general knowledge elements; set of observed facts
• Find: one (or more) explanation that relate the observations (connected to others through general knowledge)
• understand often means explained
• we can use causal relations to create a deductive causal explanation (e.g. oil production leads to +lung disease)

#### Abductive reasoning

• Given: general rules/knowledge; specific observed facts
• Find: one (or more) explanation that relate observations and plausible assumptions
• observed facts + specific default assumptions + rule instances that link facts and assumptions
• key difference: the creation of default assumptions that complete the account
• take the form of proof trees, but the assumptions are not deductively valid
• inference to the best explanation? too concerned with optimality – humans often arrive at incomplete/incorrect explanations through abduction
• e.g. abductive explanation
``````%% background knowledge
good_mood(X) :- did_well_on(X,Y).
did_well_on(X,Y) :- exam(Y), study(X,Y), take(X,Y).

%% suppose we observe
good_mood(john).    exam(e).    take(john,e).

%% suppose we want to explain why john is in a good mood
%% if we make one assumption:
study(john,e).

%% we could derive good_mood(john) from this assumption and the observations
good_mood(john) :- /** observation to explain **/
did_well_on(john,e) :-
exam(e),
study(john,e), /** assumption **/
take(john,e).``````
• prefer: +simpler explanations; +more probable accounts; +minimum weight assumptions; +coherent explanations (those which relate more observations)

#### Case-based and Analogical reasoning

• humans are able to take advantage of background knowledge from specific experiences stored in memory (when general rules are unknown/unavailable)
• operates using 4 main stages:
1. input some query or problem;
2. retrieve relevant cases from memory;
• typically based on similarity between the case and query/problem
3. select which case(s) to utilize; and
4. use the case(s) to perform the task
• In some situations the system must also adapt the case to the target context – because cases are typically larger scale than rules
• analogical reasoning
• analogical reasoning also operates over specific instances rather than general rules – however analogies:
• find mappings between relational structures and use these mappings to draw relational inferences
• whereas case-based uses attribute-value schemes to do classification
• Given: a base description of a situation (stated as a set of objects and relations); a target description of a situation
• Find: One or more mappings between the objects and relations in the two descriptions; extensions to the target using elements from its base
• vs pattern matching
• analogies find mappings between specific descriptions rather than matching variables against constants
• analogies also allow partial matches – more costly, especially over large structures
• both analogy and abduction differ from logical reasoning by deduction:
• both adopt forms of partial pattern matching (rather than all-or-none matching)
• both use plausible inference (rather than deductive proofs) – broader coverage

#### Domain modelling and planning

• problem solving can be described as states, operators (describe search space), and search control knowledge (describe search strategy)
• planning is computationally expensive (even in simplified models) – caching is usually better
• the 7 classical assumptions:
1. finite proposal domains
2. omniscience (must be able to determine the truth/falsity of all propositions for a given situation)
3. actions are always completely deterministic
4. no exogenous events
5. all actions take unit time to occur
6. only qualitative results
7. all actions occur sequentially
• a problem is described as:
• a complete description of the initial situation (need to do domain ontology engineering)
• a description of desired goals
• state language
• using a closed world assumption allows us to use a default (false) value to reduce description
• predicates whose value can change are fluents
• predicates whose value can’t change are static – can be associated with the problem rather than with each state
• storing these two separately can save space and time, however we must keep a record of which are static and which are fluent
• derived predicates values can be derived from the value of other predicates (primative predicates can not) – allows further space saving by not storing the value of derived predicates
• goal language
• we must explicitly state values we want to be false as `not(foo(X))`
• things we don’t care about are not mentioned in the goal description
• update language
• parameters indicate what objects are relevant to executing the action
• preconditions describe when the action can be executed
• expressed in the goal language – explicitly state what must be true and false
• the effects describe what happens when the action is executed
• require a complete description of exactly what changes and does not change – frame problem
• can fail if:
• our preconditions don’t capture all the relevant tasks – qualification problem
• our effects don’t capture all the relevant results – ramification problem
• object level predicates are based on state whereas meta level predicates describe domain/state independent functions (e.g. eq, neq, odd)
• the plan is a sequence of steps or states

#### Plan-Space planning

• backward search from the goal
• each node of the search is a parital plan
• a set of partially-instantiated actions
• a set of constraints
• precedence constraint: a must precede b
• binding constraint: equality/inequality
• causal link: use action a to establish precondition p needed by action b
• make more and more refinements, until we have a solution
• resole flaws (PSP procedure)
• open goal: an action has an unmet precondition
• threat: an action is capable of deleting a precondition
• -we don’t yet know how to generate good heuristsics
• +smaller search space than state-space planner, +when goal node is found solution can be directly extracted

#### Graph planning

• simplify the problem so that search is quicker

#### Machine Learning

• Machine learning is useful for
• Datamining (e.g. credit worthiness)
• Poorly understood domains (e.g. face recognition)
• Programs which must dynamically adapt to changing conditions (e.g. Internet)
• Learning problem needs a well-specified task, performance metric, and source of training experience
• Involves a number of design choices:
• type of training experience,
• target function,
• representation of target function,
• an algorithm for learning the target function from the training data
• learning involves searching the space of possible hypotheses
• different learning methods search different hypothesis spaces (numerical functions, neural networks, decision trees, symbolic rules)
• there are some theoretical results which characterize conditions under which these search methods converge towards an optimal hypothesis

#### Concept learning

• the inductive hypothesis is that any hypothesis found to approximate the target function well over sufficiently large set of training data will also approximate well over unobserved examples
• the inductive bias of an algorithm is a set of assumptions that an algorithm uses to predict outputs for data it has not seen.
• the assumptions that must be added to the observed data to transform the algorithm’s outputs into logical deductions.
• concept learning can be seen as search
• General-to-Specific partial ordering of hypotheses can be used to organize search
• Find-S and Candidate-Elimination algorithms
• Inductive learning algorithms are able to classify unseen examples only because of their implicit inductive bias for selecting one consistent hypothesis over another
• an unbiased learner cannot make inductive leaps to classify unseen examples

#### Decision tree learning

• Similar to compression
• use the information gain to select the best attribute at each step
• prefer a tree with less depth && with high information gain attributes at the root
• can use gain ratio to offset effect of bias towards attributes with many values
• overfitting of h is when the hypothesis h has a smaller error than h’ over the training data, but a larger error over the entire dataset
• caused by: noise, small sample size

#### Swarm Intelligence

• collective behaviors that result from the local interactions of individuals with each other and/or their environment
• many agents follow very simple local rules
• no centralized control structure
• local interactions lead to the emergence of complex global behavior
• the power is in interactive collaboration

#### Computational scientific discovery

• a long history of work on this, including methods for constructing:
• descriptive laws stated as numeric equations
• explanatory models of structures and processes
• work in this paradigm discovers knowledge stated in formalisms and concepts that are familiar to scientists
• the challenge is not with ‘big data’, but with complex models and large search spaces

#### Language + Dialogue

• language processing
• given: a sentence, knowledge of grammar and words
• find: a parse tree that breaks the sentence into constituents
• this is only concerned with grammar (not meaning)
• is the sentence valid
• dialogue processing
• given: a conversation topic, generic rules of dialogue, communicative actions and effects
• create: interpretations and appropriate responses
• challenges:
1. parsing utterances (language processing)
2. integrating dialogue and performance
3. recognizing or understanding the speakers intent
4. mixed-initiative control of the conversation

#### Cognition, Emotions, and Personality

• Affect: the positive or negative aspect of some experience
• Mood: a global variant of affect for the entire cognitive system
• Emotion: a mental structure related to goals + beliefs about an event, agent, or object
• Feeling: an affective or hormonal response that is associated with an emotion
• we can state conditions for eliciting emotions as abstract rules (e.g. an agent is disappointed if it wanted x, but did not get it)
• emotional metacognition – emotions play a meta-cognitive role in mental processing (i.e. they inspect basic cognition and alter its course in response)
• personality metacognition – a stronger effect