Entail - KB |= a if and only if a is true in all worlds were KB is true
Infer - KB |- i a = sentence a can be derived from KB by algorithm i
i is sound if whenever we infer something from i it is also entailed by the KB
i is complete if something is entailed by the KB, then i infers it
Conjunctive normal form (CNF): conjunction of disjunction of literals e.g. (A or B) and (B or C)
Negation in prolog, when something cannot be proved true it is false
Search
Complete - always finds a solution if one exists
Optimal - finds the best solution
b - branching factor
d - depth of least cost solution
m - maximum depth of state space
f(n) is the desirability of node n
g(n) is the cost from initialState to n
h(n) is the estimate of the distance from n to goal
Admissible if the estimate is less than the actual cost (i.e. never overestimates)
Consistent if obeys the triangle inequality (i.e. h(n) is <= the cost from n to n’ + h(n’))
a heuristic is likely to be better when its average h-value (r) is higher
dominates another heuristic when h1(n) > h2(n) for ALL n
uninformed tree = b^d, informed = b^(d-r)
heuristic can be formed from a relaxed problem (reduce the restrictions)
Complete?
Time
Space
Optimal
f(n)
Notes..
BFS
Y (if b finite)
Exponential
Exponential
Y (if step costs = 1)
Uses a tonne of space (Graph search)
DFS
N (loops/infinite m)
Exponential
Linear (only keeps best in mem)
N
ID DFS
Y
Exponential
Linear (discard after each step)
Y (if step costs = 1)
DFS + BFS advantages
Greedy
N
Exponential
Exponential
N
h(n)
Doesn’t care about g(n)
A*
Y
Exponential
Exponential
Y (if h is admissible + consistent)
g(n) + h(n)
IDA*
Y
Exponential
Linear
Y
g(n) + h(n)
Iterates on the f-limit - start with h(init)
Weighted A*
Y
Exponential
Exponential
Y
g(n) + h(n) * w
will be no worse than w times as costly as optimal
Bidirectional A* Front-to-Back
Y
Exponential
Exponentially smaller (b^(d/2))
Y
g(n) + h(n)
h(n) estimates diastase to the opposite terminal (init or goal), keeps searching for optimal
Bidirectional A* Front-to-Front
Y
Exponential
Exponentially smaller (b^(d/2))
Y
g(n) + h(n)
h(n) estimates diastase to the opposite frontier, optimal on first collision!, cost of computing h grow exponentially
Min-Max
Y
Exponential
Linear
Y (against optimal opponent)
minmax value
Min-Max a-b pruning
Y
Exponentially smaller (b^(m/2))
Linear
..
minmax value
doubles depth which can be done in the same time
Monte Carlo Tree Search plays out the rest of a game randomly, thousands of times, picks the one with the highest win rate. (explores the whole problem space)
Local Search
Hill climbing algorithm
always moves toward a maxima - can get stuck on a local maxima, missing the global maximum
add random-restarts when you get stuck - trivially complete (depends on the shape of the state-space landscape)
Simulated annealing search - allows some ‘bad’ moves to escape local maxima, but gradually decreases their size and frequency
Local beam search
start with k randomly generated states
better than random restarts as information is shared – but can get concentrated in a small region
Stochastic beam search - choose k successors randomly, biased towards good ones (natural selection)
Genetic algorithms
requires: reproduction, population, variety, difference in ability to survive
reproduce with a probability proportional to their fitness
overcrowding can happen if one individual is too dominant
Constraint satisfaction problems (CSP)
state is defined by variablesXi with values from domainDi
goal test is a set of constraints specifying allowable combinations of values for subsets of variables
Unary constraints involve a single variable (X != 5)
Incremental search - assign values to variables that do not conflict with current assignment
Backtracking search (DFS)
choose the most constrained variable i.e. with the fewest legal values (minimum remaining values (MRV) heuristic)
Forward checking - keep track of legal values for all variables, abort when a variable has none remaining
Arc consistency - also keeps track of the affect that forward checking has on neighbours (detects failure earlier)
Week 6 - 12
Decision making & choice
The choice task:
Given: some goals/objectives; candidates relevant to these aims; descriptions for each
Find: one or more candidates to select/carry out
Decision theory (prescriptive)
Given: K alternatives; N attribute values; N weights (importance)
Compute: the utility value of each alternative == UJ = sum of each (weight x value)
Heuristic choice (descriptive)
humans do not make optimal choices but instead satisfice (i.e., use heuristics to select an acceptable alternative based on an aspiration level with minimal effort)
humans retrieve chunks of relevant memory (by pattern matching) to produce associated responses very rapidly
AI defines two forms of heuristics:
Symbolic heuristics – rules or relational patterns (as in human cognition, chunks – used in heuristic choice theory)
Numeric heuristics – arithmetic combinations of attributes (similar to utility functions – used to find optimal choices and in search)
Reactive control for routine behaviour
sequential decision making by invoking the basic process repeatedly –> reactive control task
Given: description of environment; related goals [+ beliefs, or _mental states_]
Select: an appropriate action to execute
routine control is conditional, which gives it great flexibility. this can be done by direct mappings from perception to action – known as stimulus-response (does not take advantage of mental states)
in reality, humans draw inferences about current situation, have memories of past decisions, knowledge of order of activities, breakdown activities to subactivities, take goals into account – extensions can attempt to address these issues, but the stimulus-response paradigm has inherent limitations
HTNs
hierarchical structures: +each component is simple, +modular +reusable +composable dynamically
comprise of a set of methods with: a task predicate; conditions under which the method applies; subtasks that make up the method; [effects the method produces]
imposes a hierarchical, sequential, conditional structure on activities
this produces an AND/OR tree which is transversed over time – this reactive, but continues along the current path when possible
executing HTNs given some task T, on each cycle:
infer a set of beliefs from stimuli (using conceptual rules)
find (through the hierarchy) a path of applicable methods
execute
Production systems
can model human cognition – support mental activities; combine parallel retrieval with sequential decisions; balance stimulus-driven and goal-driven behaviour; modular representation supports structural learning
made up of 2 main components:
Production memory containing generalized rules that specify domain knowledge as conditional responses – condition-action form
Working memory that contains specific elements
internal stimuli – encoding the systems beliefs and goals
changes as the program runs
runs recognize-act cycles
match each rules conditions with working memory
select one (or more) of the matched rules to execute
conflict resolution can be: match more recent elements, match more specific sets, more conditions, rules added earlier
execute
Causal models + Qualitative reasoning
Causal prediction/simulation task:
Given: entities & attributes; relations/rules that relate them; external influences
Find: the resulting effects on attributes of interest
this makes a mental model – analogous to a physical model
definition: Xcausally influencesY if a change in X‘s value results in a change in Y‘s value (provided everything else constant)
this does not infer causality
e.g. oil_production --gas_price --traffic ++pollution ++lung_disease
use a qualitative but dynamic causal model to generate Qualitative Envisionments
include: initial qualitative state, set of possible qualitative successor states, set of transitions
envisionment encodes the set of possible trajectories
nondeterministic because they abstract details
-multiple influences can cause weak predictions, -difficult to visualize and interpret
Explanations
explanation generation task:
Given: set of general knowledge elements; set of observed facts
Find: one (or more) explanation that relate the observations (connected to others through general knowledge)
understand often means explained
we can use causal relations to create a deductive causal explanation (e.g. oil production leads to +lung disease)
Abductive reasoning
abductive inference task:
Given: general rules/knowledge; specific observed facts
Find: one (or more) explanation that relate observations and plausible assumptions
observed facts + specific default assumptions + rule instances that link facts and assumptions
key difference: the creation of default assumptions that complete the account
take the form of proof trees, but the assumptions are not deductively valid
inference to the best explanation? too concerned with optimality – humans often arrive at incomplete/incorrect explanations through abduction
e.g. abductive explanation
%% background knowledge
good_mood(X) :- did_well_on(X,Y).
did_well_on(X,Y) :- exam(Y), study(X,Y), take(X,Y).
%% suppose we observe
good_mood(john). exam(e). take(john,e).
%% suppose we want to explain why john is in a good mood%% if we make one assumption:
study(john,e).
%% we could derive good_mood(john) from this assumption and the observations
good_mood(john) :- /** observation to explain **/
did_well_on(john,e) :-
exam(e),
study(john,e), /** assumption **/
take(john,e).
prefer: +simpler explanations; +more probable accounts; +minimum weight assumptions; +coherent explanations (those which relate more observations)
Case-based and Analogical reasoning
humans are able to take advantage of background knowledge from specific experiences stored in memory (when general rules are unknown/unavailable)
operates using 4 main stages:
input some query or problem;
retrieve relevant cases from memory;
typically based on similarity between the case and query/problem
select which case(s) to utilize; and
use the case(s) to perform the task
In some situations the system must also adapt the case to the target context – because cases are typically larger scale than rules
analogical reasoning
analogical reasoning also operates over specific instances rather than general rules – however analogies:
find mappings between relational structures and use these mappings to draw relational inferences
whereas case-based uses attribute-value schemes to do classification
analogical mapping task:
Given: a base description of a situation (stated as a set of objects and relations); a target description of a situation
Find: One or more mappings between the objects and relations in the two descriptions; extensions to the target using elements from its base
vs pattern matching
analogies find mappings between specific descriptions rather than matching variables against constants
analogies also allow partial matches – more costly, especially over large structures
both analogy and abduction differ from logical reasoning by deduction:
both adopt forms of partial pattern matching (rather than all-or-none matching)
both use plausible inference (rather than deductive proofs) – broader coverage
Domain modelling and planning
problem solving can be described as states, operators (describe search space), and search control knowledge (describe search strategy)
planning is computationally expensive (even in simplified models) – caching is usually better
the 7 classical assumptions:
finite proposal domains
omniscience (must be able to determine the truth/falsity of all propositions for a given situation)
actions are always completely deterministic
no exogenous events
all actions take unit time to occur
only qualitative results
all actions occur sequentially
a problem is described as:
a complete description of the initial situation (need to do domain ontology engineering)
a description of desired goals
state language
using a closed world assumption allows us to use a default (false) value to reduce description
predicates whose value can change are fluents
predicates whose value can’t change are static – can be associated with the problem rather than with each state
storing these two separately can save space and time, however we must keep a record of which are static and which are fluent
derived predicates values can be derived from the value of other predicates (primative predicates can not) – allows further space saving by not storing the value of derived predicates
goal language
we must explicitly state values we want to be false as not(foo(X))
things we don’t care about are not mentioned in the goal description
update language
parameters indicate what objects are relevant to executing the action
preconditions describe when the action can be executed
expressed in the goal language – explicitly state what must be true and false
the effects describe what happens when the action is executed
require a complete description of exactly what changes and does not change – frame problem
can fail if:
our preconditions don’t capture all the relevant tasks – qualification problem
our effects don’t capture all the relevant results – ramification problem
object level predicates are based on state whereas meta level predicates describe domain/state independent functions (e.g. eq, neq, odd)
the plan is a sequence of steps or states
Plan-Space planning
backward search from the goal
each node of the search is a parital plan
a set of partially-instantiated actions
a set of constraints
precedence constraint: a must precede b
binding constraint: equality/inequality
causal link: use action a to establish precondition p needed by action b
make more and more refinements, until we have a solution
resole flaws (PSP procedure)
open goal: an action has an unmet precondition
threat: an action is capable of deleting a precondition
-we don’t yet know how to generate good heuristsics
+smaller search space than state-space planner, +when goal node is found solution can be directly extracted
Graph planning
simplify the problem so that search is quicker
Machine Learning
Machine learning is useful for
Datamining (e.g. credit worthiness)
Poorly understood domains (e.g. face recognition)
Programs which must dynamically adapt to changing conditions (e.g. Internet)
Learning problem needs a well-specifiedtask, performance metric, and source of training experience
Involves a number of design choices:
type of training experience,
target function,
representation of target function,
an algorithm for learning the target function from the training data
learning involves searching the space of possible hypotheses
different learning methods search different hypothesis spaces (numerical functions, neural networks, decision trees, symbolic rules)
there are some theoretical results which characterize conditions under which these search methods converge towards an optimal hypothesis
Concept learning
the inductive hypothesis is that any hypothesis found to approximate the target function well over sufficiently large set of training data will also approximate well over unobserved examples
the inductive bias of an algorithm is a set of assumptions that an algorithm uses to predict outputs for data it has not seen.
the assumptions that must be added to the observed data to transform the algorithm’s outputs into logical deductions.
concept learning can be seen as search
General-to-Specific partial ordering of hypotheses can be used to organize search
Find-S and Candidate-Elimination algorithms
Inductive learning algorithms are able to classify unseen examples only because of their implicit inductive bias for selecting one consistent hypothesis over another
an unbiased learner cannot make inductive leaps to classify unseen examples
Decision tree learning
Similar to compression
use the information gain to select the best attribute at each step
prefer a tree with less depth && with high information gain attributes at the root
can use gain ratio to offset effect of bias towards attributes with many values
overfitting of h is when the hypothesis h has a smaller error than h’ over the training data, but a larger error over the entire dataset
caused by: noise, small sample size
Swarm Intelligence
collective behaviors that result from the local interactions of individuals with each other and/or their environment
many agents follow very simple local rules
no centralized control structure
local interactions lead to the emergence of complex global behavior
the power is in interactive collaboration
Computational scientific discovery
a long history of work on this, including methods for constructing:
descriptive laws stated as numeric equations
explanatory models of structures and processes
work in this paradigm discovers knowledge stated in formalisms and concepts that are familiar to scientists
the challenge is not with ‘big data’, but with complex models and large search spaces
Language + Dialogue
language processing
given: a sentence, knowledge of grammar and words
find: a parse tree that breaks the sentence into constituents
this is only concerned with grammar (not meaning)
is the sentence valid
dialogue processing
given: a conversation topic, generic rules of dialogue, communicative actions and effects
create: interpretations and appropriate responses
challenges:
parsing utterances (language processing)
integrating dialogue and performance
recognizing or understanding the speakers intent
mixed-initiative control of the conversation
Cognition, Emotions, and Personality
Affect: the positive or negative aspect of some experience
Mood: a global variant of affect for the entire cognitive system
Emotion: a mental structure related to goals + beliefs about an event, agent, or object
Feeling: an affective or hormonal response that is associated with an emotion
we can state conditions for eliciting emotions as abstract rules (e.g. an agent is disappointed if it wanted x, but did not get it)
emotional metacognition – emotions play a meta-cognitive role in mental processing (i.e. they inspect basic cognition and alter its course in response)