\documentclass[thmsa,10pt]{article}
\usepackage{chicago}
\usepackage{amssymb}
\usepackage{amsmath}
\usepackage{tabularx}
\usepackage{hhline}
\usepackage{graphicx}
\usepackage{float}
\newtheorem{thm}{Theorem}
\newtheorem{impl}{Implication}
\newtheorem{assump}{Assumption}
\newtheorem{lemma}{Lemma}
\oddsidemargin = 3pt \textwidth = 450pt \topmargin = 0pt \headsep =
1pt \textheight = 604pt
\renewcommand{\baselinestretch}{1.5}
\begin{document}
\renewcommand{\thefootnote}{\fnsymbol{footnote}}
\begin{center}
\begin{tabular}{c}
ECONOMIC ANALYSIS GROUP \\
DISCUSSION PAPER \\
\end{tabular}
\end{center}
\addvspace{0.8in}
\begin{flushright}
\begin{tabular}{c}
Consumer Learning, Switching Costs, and \\
Heterogeneity: A Structural Examination \\
by \\
Matthew Osborne\footnotemark \\
\begin{tabular}{c c}
EAG 07-10 & September 2007 \\
\end{tabular}
\end{tabular}
\end{flushright}
\addvspace{0.8in}
EAG Discussion Papers are the primary vehicle used to disseminate
research from economists in the Economic Analysis Group (EAG) of the
Antitrust Division. These papers are intended to inform interested
individuals and institutions of EAG's research program and to
stimulate comment and criticism on economic issues related to
antitrust policy and regulation. The analysis and conclusions
expressed herein are solely those of the authors and do not
represent the views of the United States Department of Justice.
Information on the EAG research program and discussion paper series
may be obtained from Russell Pittman, Director of Economic Research,
Economic Analysis Group, Antitrust Division, U.S. Department of
Justice, BICN 10-000, Washington, DC 20530, or by e-mail at
russell.pittman@usdoj.gov. Comments on specific papers may be
addressed directly to the authors at the same mailing address or at
their e-mail address.
Recent EAG Discussion Paper titles are listed at the end of this
paper. To obtain a complete list of titles or to request single
copies of individual papers, please write to Janet Ficco at the
above mailing address or at janet.ficco@usdoj.gov or call (202)
307-3779. Beginning with papers issued in 1999, copies of
individual papers are also available from the Social Science
Research Network at www.ssrn.com.
\footnotetext{Research Economist, Antitrust Division, U.S.
Department of Justice, Washington, DC 20530. Email:
matthew.osborne@usdoj.gov. The views expressed are not purported to
reflect those of the United States Department of Justice. I am
indebted to my advisors, Susan Athey, Timothy Bresnahan and Wesley
Hartmann for their support and comments. I would also like to thank
Liran Einav, Chuck Romeo, and Dan Quint for helpful comments. I
would like to thank the Stanford Institute for Economic Policy
Research for financial support when I was in graduate school, and
the James M. Kilts Center, GSB, University of Chicago, for provision
of the data set used in this report.}
\renewcommand{\thefootnote}{\arabic{footnote}}
\thispagestyle{empty}
\newpage
\abstract{I formulate an econometric model of consumer learning and
experimentation about new products in markets for packaged goods
that nests alternative sources of dynamics. The model is estimated
on household level scanner data of laundry detergent purchases, and
the results suggest that consumers have very similar expectations of
their match value with new products before consumption experience
with the good, but once consumers have learned their true match
values they are very heterogeneous. I demonstrate that resolving
consumer uncertainty about the new products increases market shares
by 24 to 58\%. The estimation results also suggest significant
switching costs: removing switching costs increases new product
market shares by 12 to 23\%. Using counterfactual computations
derived from the estimates of the structural demand model, I
demonstrate that the presence of switching costs with learning
changes the implications of the standard empirical learning model:
the intermediate run impact of an introductory price cut on a new
product's market share is significantly greater when the only source
of dynamics is switching costs as opposed to when both learning and
switching costs are present, which suggests that firms should
combine price cuts with introductory advertising or free samples to
increase their impact.
Because my model includes two different types of dynamics, I am able
to assess the impact of ignoring learning or switching costs on the
model's imputed long run price elasticities by re-estimating the
model assuming that one of these dynamics is not present. I find
that ignoring learning will i) lead to underestimates of the own
price elasticities of new products by 30\%, ii) will underestimate
the cross-price elasticities between new and established products by
up to 90\%, iii) will overestimate the cross-price elasticities of
established products by up to 15\%. Ignoring switching costs will
lead to underestimates of own price elasticities of up to 60\%, and
underestimates of cross-price elasticities of up to 90\%.}
%I estimate the model using a recently developed technique that
%allows Bayesian estimation in a dynamic discrete choice model to
%include a richer heterogeneity structure than has been included in
%most papers.
\thispagestyle{empty}
\newpage
\thispagestyle{empty}
\setcounter{page}{1}
\section{Introduction}
An experience good is a product that must be consumed before an
individual learns how much she likes it. This makes purchasing the
product a dynamic decision, since the consumer's decision to
experiment with a new product is an investment that will pay off if
the consumer likes the product and purchases it again in the future.
Consumer learning in experience goods markets has been an important
subject of theoretical research in industrial organization and
marketing since the 1970's. Learning can be an especially important
factor in the demand for new products, and there is a small
empirical literature that quantifies learning in household panel
data using structural demand models with forward-looking consumers
(for example, \citeN{erdkeane96}, \citeN{crshum05}). In these papers
it is assumed that the only type of dynamics in demand come from
learning, and alternative types of dynamics, such as switching costs
or consumer taste for variety, are not modeled. Similarly, papers
that estimate other forms of dynamics (see \citeN{chkyrperk01} for
an example) usually only allow for one type of dynamics in demand.
In this paper, I estimate a structural model of learning and
experimentation that nests alternative sources of dynamics in
demand, such as the cost of switching between different products. In
my model, consumers are forward-looking and take into account the
effect of learning and other sources of dynamics on their future
utility. I also allow a rich distribution of heterogeneity in
consumer tastes, price sensitivities, expectations of new product
match values, and alternative dynamics. To my knowledge, this paper
is the first to estimate a demand model with such a rich dynamic
structure.
The model is estimated on household-level longitudinal data on
laundry detergent purchases. During the time the data was collected,
three new product introductions are observed. Learning can be
empirically separated from switching costs through differences in
the effect of having made a first purchase of a new product on a
consumer's current purchase relative to the effect of having used a
product in the previous purchase event. Allowing for switching costs
in addition to learning changes the implications of the standard
empirical learning model: for example, the presence of switching
costs makes consumers less likely to experiment with new products.
Additionally, I find that the intermediate run impact of an
introductory price cut increases when both learning and switching
costs are present, compared to the case where only learning is
present, since consumers who purchase the product and find they have
a low match value for the product (alternatively, a low permanent
taste for the product) may find it too costly to switch away from
it.
Another contribution of this paper to the industrial organization
literature is to examine the impact of misspecifying dynamics on
estimates of long term price elasticities. This issue is
economically relevant because cross-price elasticities are used to
assess the impact of mergers, and it has been examined in some
previous empirical research: for example, \citeN{hendelnevo05} show
that there are large biases from ignoring dynamics. In addition to
estimating a model of learning and switching costs, I estimate two
restricted structural demand models: one with no switching costs,
and one with no learning. I find that in both cases, own and cross
price elasticities elasticities are underestimated by leaving out
dynamics, often by as much as 90\%. An exception to this is that
cross-price elasticities between established products are slightly
overestimated in the model with no learning. A final contribution of
this paper relative to the existing literature is that I use a
recently developed technique allowing Bayesian estimation of a
dynamic discrete choice model to include a richer heterogeneity
structure than has been included in most papers.
Although my empirical results are found in a data set on laundry
detergents, they should be applicable to other product categories
where similar sorts of behavior have been observed. In consumer
packaged goods markets, consumer switching costs or taste for
variety has been found to play an important role in many product
categories: some examples are nondiet soft drinks and liquid laundry
detergents \cite{chintagunta99}, ketchup \cite{roychhaldar96},
margarine, yogurt and peanut butter \cite{erdem96}, breakfast
cereals \cite{shum04}, and orange juice \cite{dubehitschrossi06}.
Evidence of learning has been found in liquid laundry detergents
\cite{erdkeane96} and yogurts \cite{Ack-03}. Since new product
introductions are very frequent in packaged goods markets, the
issues that were raised in the first few paragraphs are going to be
important. These types of dynamics are present in markets other than
packaged goods. For example, learning has been found to play an
important role in pharmaceuticals \cite{crshum05}, automobile
insurance \cite{israel05}, and personal computers
\cite{erdemkeaneoncustrebel05}. Markets where switching costs play
an important role are discussed in \citeN{farrellklemperer06}; some
examples are cellular phones, credit cards, cigarettes, air travel,
online brokerage services, automobile insurance, and electricity
suppliers. It is certainly conceivable that consumer learning could
also play a role in these markets. For example, in cellular phone
markets consumers will likely learn about aspects of their
provider's service after signing a contract with them. Cellular
phone contracts often penalize consumers for switching providers,
which creates switching costs and makes the decision to invest in a
plan a forward-looking one.
%Three examples of papers that estimate structural models of Bayesian
%learning are \citeN{erdkeane96}, \citeN{Ack-03}, and
%\citeN{crshum05}, which quantify learning in laundry detergents,
%yogurts, and ulcer medications, respectively.\footnote{In
%\citeN{erdkeane96}, consumers are learning about one unobserved
%attribute of each brand of laundry detergent, which is interpreted
%as the detergent's cleaning power. Each time an individual purchases
%a product she receives a signal of the product's quality, which is
%her perceived product quality. \citeN{erdkeane96} allows advertising
%to affect consumer learning, while \citeN{Ack-03} differentiates
%between informative effects of advertising (search, product
%existence, or experience characteristics) versus prestige effects.
%\citeN{crshum05} and \citeN{Ack-03} differ from \citeN{erdkeane96}
%in that they allow consumer match values with each product to differ
%across consumers; in \citeN{erdkeane96}, under full information,
%consumer tastes for each product are this attribute level plus an
%idiosyncratic error term that is i.i.d. across time and consumers.}
The underlying model of consumer learning that is used in this paper
is similar to many of the papers that I have mentioned in the
previous paragraphs. In particular, I assume that learning occurs
during consumption of the product: laundry detergent is assumed to
be an experience good. I assume that consumers have an
individual-level match value for each product that does not change
over time. For the three new products, I assume that consumers may
not know their true match values beforehand, and will have
expectations about their match values which are right on average. If
the consumers' true match values are equal to their expected match
values, then there is no learning. This might be the case if
consumers learn their match value beforehand through other means,
such as experience with established products or by examining the new
product's package. I also assume that consumers are forward-looking,
which means that they will recognize that there is value to learning
about new products. If there is indeed learning, and there are no
other sources of dynamics in demand, one should expect to see
consumers purchasing the new products very soon after they are
introduced. Also, after experimenting with the new product,
consumers who have a high match value for the new product will
continue to purchase it after experimenting, and consumers who have
a low match value will switch back to the established product. These
behaviors will identify the magnitude of the learning.
One problem with this identification approach is the presence of
dynamics in demand that are not learning. For example, some
consumers may be variety-seeking: holding fixed their intrinsic
match values, a previous purchase of the new product will decrease
their current marginal utility for the product. These consumers will
tend to purchase the new product very soon after its introduction
and will switch away from it afterwards. To the researcher, it may
look like these consumers experimented with the product and found
their match value was low. Alternatively, there could be switching
costs for some consumers: holding fixed their intrinsic match
values, their marginal utility for products other than the new
product will decrease after purchasing the new product. When a
consumer with switching costs makes a first purchase of the new
product, she will be likely to keep on purchasing it. Switching
costs could arise in packaged goods markets for psychological
reasons: for example, it may be costly for consumers to reoptimize
their utility across purchase events, which could bias them to
purchase the same product as in their previous purchase, or
consumers may become more brand loyal in response to a previous
purchase (\citeN{klemperer95} notes that brand loyalty can create
switching costs).\footnote{In the empirical marketing literature,
these types of switching costs are equivalently referred to as habit
persistence or inertia. As discussed above, they have been found to
play an important role in many packaged goods markets.} To a
researcher it may look like these consumers have a high match value
for the new product. Thus, will be important to take into account
that these other types of dynamics exist in order to properly
isolate learning. I will argue that these types of dynamics can be
identified in the long run, during periods where there have been no
new product introductions for a long time. Learning can then be
identified using the arguments specified above in the first few
periods after the new product introductions.
Previous papers that have estimated structural models of consumer
learning on household level panel data have not included these other
types of dynamics; three well-known papers in this category which I
previously discussed are \citeN{erdkeane96}, \citeN{Ack-03}, and
\citeN{crshum05}, which estimate structural models of Bayesian
learning in laundry detergents, yogurts, and ulcer medications,
respectively. A paper that raised the problem of leaving out
alternative dynamics is \citeN{israel05}, which looks for learning
in the time-series behavior of departure probabilities from an
automobile insurance firm. The paper's model allows consumers to
learn about the firm's quality, and also controls for consumer
lock-in by allowing the number of time periods spent with the firm
to enter utility directly. Although this paper was an important
first step in examining the importance of more than one type of
dynamics, there are two important aspects of demand which need to be
addressed in a unified structural model of dynamics. First, the
paper does not distinguish between consumer lock-in and unobserved
heterogeneity in preferences. A researcher may observe a consumer
staying with the firm for a long time because she has a strong
preference for the firm, or it may be because she becomes locked in
to it. This issue may not be critical in automobile insurance
markets, but in packaged goods markets it is important not to
confuse these two behaviors, because the long run effect of a
temporary price cut on a product's future share will be different
under switching costs as opposed to taste heterogeneity. Under
switching costs, a temporary price cut will increase a product's
future market share; under heterogeneity, this will not be the case.
Second, the paper does not directly model the forward-looking
behavior of consumers by solving for their value function, but
instead includes a term in the utility function which is interpreted
as the value function. The estimated parameters of this paper are
potentially subject to the Lucas critique: they will be functions of
policy variables, such as the distribution of future prices.
To assess the impact of ignoring switching costs in a model of
consumer learning, I compare the estimates from my model of learning
and switching costs to estimates produced by a restricted model,
where I assume that there are no switching costs. In the model with
no switching costs, the variances in consumer taste distributions
are estimated to be much larger. Consistent with the intuition
above, the impact of learning is estimated to be smaller: consumers
are much less similar in their initial predictions of how much they
will like the new products. This difference is economically
significant: in the full model, learning increases new product
market shares by 24\% to 58\%; in the model with no switching costs,
the impact of learning on market shares is smaller, ranging from
9.9\% to 34\%.\footnote{To quantify the importance of learning, I
simulate the new product market shares using my model estimates, and
then re-simulate them assuming that consumers have full
information.} Furthermore, when switching costs are ignored, the
impact of introductory price cuts on future market shares is
underestimated, and long term price elasticities are significantly
underestimated.
An alternative way of misspecifying the model would be to ignore
learning, and assume that the only source of dynamics is switching
costs or consumer taste for variety. I also estimate a restricted
version of my model, where I assume that switching costs and taste
for variety are the only source of dynamics. I find if learning is
ignored, the distributions of price sensitivities, taste
distributions, and switching costs look similar to those implied by
the full model. However, the model does a significantly worse job at
predicting the initial market shares of both new and established
products. The model overpredicts the impact of introductory price
cuts on the subsequent market shares of new products, and
significantly underestimates long term price elasticities.
As with the literature on learning, there are many papers that
estimate models of switching costs. In economics, perhaps the most
well-known work about forward-looking switching costs is the work on
rational addiction in \citeN{becker88} and \citeN{becker94}. In
marketing, there are many papers which estimate structural models of
switching costs (alternatively, inertia or habit-formation) or
variety-seeking in the presence of unobserved taste heterogeneity
(for an example see \citeN{seetharaman04}). Although these papers
account for rich sources of dynamics in demand, they typically do
not model consumers as forward-looking. Two examples of papers that
do allow for forward-looking behavior in this type of model are
\citeN{chkyrperk01} and \citeN{hartmann06}. \citeN{chkyrperk01}
estimates a model of consumer switching costs in panel data on
yogurt purchases. \citeN{hartmann06} examines intertemporal
consumption effects in consumer decisions to play golf. In this
paper, consumers are forward-looking, and dynamics arise through the
fact that a consumer's decision to play golf will affect her future
marginal utility for golf.
A third type of misspecification arises from failing to control for
unobserved heterogeneity. As I mentioned above, this can bias the
predictions of the impact of price cuts on future market shares.
Additionally, it can make the identification of learning more
difficult. To see why, suppose that when a new product is
introduced, its price is initially low and then it is raised, a
practice that is common and is observed in the data set used in this
paper. Suppose further that there is a group of consumers who are
very responsive to price cuts. These consumers will purchase the new
product right after its introduction, when it is inexpensive, and
will switch away from it as it gets more expensive. To a researcher
who does not take into account that they are price sensitive, it may
look like they experimented with the product and disliked it.
For computational reasons, most papers that estimate structural
models of consumer learning or switching costs have had to be
parsimonious in how they specify consumer heterogeneity, if it is
modeled at all. For example, \citeN{erdkeane96} do not allow for
consumer heterogeneity,\footnote{In \citeN{erdkeane96}, consumers
are learning about one unobserved attribute of each brand of laundry
detergent, which is interpreted as the detergent's cleaning power.
Each time an individual purchases a product she receives a signal of
the product's quality, which is her perceived product quality. Under
full information, consumer tastes for each product are this
attribute level plus an idiosyncratic error term that is i.i.d.
across time and consumers.} while \citeN{crshum05} allows for
individual level heterogeneity in two dimensions: how serious the
patient's sickness is, and how good a match a particular ulcer
medication is for the patient. The paper assumes that the
distribution of unobserved heterogeneity is discrete: in each of the
2 dimensions, consumers fall into a small number of types.
\citeN{Ack-03} allows 2 dimensions of individual-level
heterogeneity: the intercept of each consumer's utility for a new
yogurt, which is assumed to be known and observed by the consumer,
and the consumer's intrinsic match value which is being learned.
Unlike \citeN{crshum05}, who assume that the population distribution
of unobserved heterogeneity is discrete, this paper assumes the
heterogeneity is normally distributed across the population.
Although allowing for continuously distributed heterogeneity
increases computational burden, the model is kept computationally
tractable since consumer choice is binary: there is only one new
product introduction, and consumers either purchase the new product
or they do not. This approach would be less tractable in markets
where there are multiple new product introductions.
These papers estimate their econometric models using classical
methods such as the maximum-likelihood estimator. In models where
consumers are forward-looking, it is necessary to solve their
Bellman equation whenever the parameters of the model are changed,
such as when a derivative is evaluated. This makes the model
estimation computationally difficult. Allowing for unobserved
heterogeneity substantially increases the computational difficulty
of the estimation due to the fact that the unobserved heterogeneity
must be integrated out by simulation. Because of these issues,
researchers who have estimated these types of models have had to be
parsimonious in their specification of unobserved heterogeneity. As
I have already discussed using my example with consumer price
sensitivities, failing to account for unobserved heterogeneity can
result in biases. One way to reduce this computational burden is to
use an importance sampling method developed by \citeN{Ack-01} to
reduce the computational burden induced by the heterogeneity.
\citeN{hartmann06} uses this to allow for a richer distribution of
heterogeneity than in the learning papers I have previously
discussed.
I overcome the problems associated with unobserved heterogeneity by
estimating my model using the Bayesian method of Markov Chain Monte
Carlo, which is well suited to dealing with high-dimensional
unobserved heterogeneity than classical techniques. To reduce the
computational burden that is created by solving the consumers'
Bellman equations, I apply a new technique by \citeN{ijc05}. In
contrast to most classical techniques, which require the Bellman
equation to be calculated many times, this new technique only
requires one full solution of the Bellman equation. The basic idea
behind this method is to update the value function once in each step
of the Markov Chain Monte Carlo algorithm using information from
previous steps, so that by the time the estimation is completed an
accurate approximation of the value function is obtained. This paper
is the first to apply this new technique to field data. It would
also likely be possible to estimate this model using the method of
\citeN{Ack-01}. An interesting topic for future research would be to
compare the computational speed and accuracy of these two estimation
techniques.
\section{Data Set}
\subsection{Discussion of the Scanner Data}
The data set I am using is A.C. Nielsen supermarket scanner data on
detergent purchases in the city of Sioux Falls, South Dakota between
December 29, 1985 and August 20, 1988. This data is particularly
useful for identifying consumer learning for two reasons: first,
since this data is a panel of household purchases, it allows one to
track individual household behavior over time. Second, during the
period that this data was collected, three new brands of liquid
laundry detergents were introduced to the market: Cheer in May 1986,
Surf in September 1986 and Dash in May 1987. Households that
participated in this study were given magnetic swipe cards, and each
time the household shopped at a major grocery or drugstore in the
city, the swipe card was presented at the checkout counter.
Additionally, households that participated in the study filled out a
survey containing basic demographic information. The distributions
of household demographics are shown in Table \ref{table:hhdemog}.
Although a visit to the grocery store will reveal many different
brands of laundry detergent, the market is dominated by 3 large
companies: Procter and Gamble (Dash, Cheer, Era, Tide), Unilever
(Wisk, Surf) and Colgate-Palmolive (Fab, Ajax). During this period,
laundry detergents were available in two forms: liquids and powders.
Table \ref{table:hhpurchstuff} shows the distribution of sizes for
liquid and powder products. For liquids, the most popular size was
the 64 ounce size. Table \ref{table:mktshares1} shows the market
share for the 7 most popular brands of laundry detergents (the other
category covers purchases of smaller brands), in their liquid and
powdered forms. As can be seen from the last column of the table,
the market share of liquids is about 52\%. Well known brands, such
as Wisk and Tide, have high market shares.
Table \ref{table:mktshares2} shows the market shares of different
brands of liquids over different periods of time. It is notable that
for all three new products, their market share tends to be
significantly higher in the first 12 weeks after introduction than
it is for the remainder of the sample period. This fact is
consistent with learning, since the option value of learning induces
consumers to purchase new products early. However, it is also
consistent with consumer response to introductory pricing. The
average prices of different brands at different periods of time are
shown in the same table, underneath the shares. There are two
noteworthy facts in this table. First, prices of the new brands
Cheer and Surf tend to be lower in the first 12 weeks after
introduction than they are later on in the data. This fact suggests
that we should be aware of possible biases due to consumer
heterogeneity: for example, price sensitive consumers could purchase
the new products initially when they are cheap, and switch away from
them as they get more expensive, which could be mistaken for
learning. Second, when Cheer is introduced to the market by Procter
and Gamble, the price of Wisk, a popular product of Unilever, goes
down. Similarly, when Unilever's Surf is new, Procter and Gamble's
Tide drops in price. Cheer and Surf have been successful products
since their introductions, but Dash was discontinued in the United
States in 1992. One possible reason for this is that Dash was more
of a niche product: it was intended for front-loading washers, which
constituted about 5\% of the market at the time.
\subsection{An Overview of the Laundry Detergent Market Prior to 1988}
The fact that the three new products were liquid detergents was not
a coincidence, and to see why it is useful to briefly discuss the
evolution of this industry. The first powdered laundry detergent for
general usage to be introduced to the United States was Tide, which
was introduced in 1946. Liquid laundry detergents were introduced
later: the popular brand Wisk was introduced by Unilever in 1956.
The market share of liquid laundry detergents was much lower than
powders until the early 1980's. The very successful introduction of
liquid Tide in 1984 changed this trend, and detergent companies
began to introduce more liquid detergents. Product entry in this
industry is costly: an industry executive quoted the cost of a new
product introduction at 200 million dollars \shortcite{chemweek87}.
Industry literature suggests a number of reasons for the
popularization of liquids during this time: first, low oil and
natural gas prices, which made higher concentrations of
surfactants\footnote{The most important chemical ingredient to
laundry detergents are two-part molecules called synthetic
surfactants which loosen and remove soil. Surfactants are
manufactured from petrochemicals and/or oleochemicals (which are
derived from fats and oils).} more economical; second, a trend
towards lower washing temperatures; third, increases in synthetic
fabrics; fourth, on the demand side, an increased desire for
convenience. In the third and fourth points, liquids had an
advantage over powders since they dissolved better in cold water,
and did not tend to cake or leave powder on clothes after a wash was
done.
The fact that new liquids were being introduced at this time
suggests that learning could be an important component of consumer
behavior. Many consumers may not have been familiar with the way
liquids differed from powders, and they might learn more about
liquids from experimenting with the new products. Further, there may
be learning across the different brands of liquids. For example,
using liquid Tide might not give consumers enough information to
know exactly how liquid Cheer or Surf will clean their clothes.
Learning about these products could be important for consumers to
know how well these products will work for a number of reasons.
First, laundry detergents are fairly expensive and the household
will use the product for a long period of time, so the cost of
making a mistake is not trivial. Second, consumers may have
idiosyncratic needs which require different types of detergents. As
an example, a consumer whose wardrobe consists of bright colors will
likely prefer to wash in cold water, where liquids are more
effective.
\section{Econometric Model}
\subsection{Specification of Consumer Flow Utility}\label{sec:modelspec}
In my structural econometric model an observation is an individual
consumer's purchase event of a liquid laundry detergent. In the
following discussion, I index each consumer with the subscript $i$,
and number the purchase events for consumer $i$ with the subscript
$t$. The dependent variable in this model is the consumer's choice
of a given size of one of the 13 different laundry detergents listed
in Table \ref{table:mktshares1}. I index each product with the
variable $j$, and each size with of the product with $s$. In a
particular purchase event $t$ for consumer $i$, not all of the 49
choices may be available. I denote the set of products available to
consumer $i$ in purchase $t$ as $J_{it}$. I assume that a consumer's
period utility is linear, as in traditional discrete choice models.
The period, or flow utility for consumer $i$ for product-size $(j,s)
\in J_{it}$ on purchase event $t$ is assumed to be
\begin{equation}\label{eq:pdutbig}
\begin{split}
& u_{ijst}(S_{it-1},\alpha_i,p_{ijt},c_{ijt},\beta_i,x_{ijt},\eta_i,
y_{ijt-1},\varepsilon_{ijt}) \\
& = \Gamma_{ij}(S_{ijt-1},y_{ijt-1}) + \xi_{is} + \alpha_i
(p_{ijst}-\alpha_{ic} c_{ijt}) + \beta_i x_{ijt} + \eta_i y_{ijt-1}
+ \varepsilon_{ijst},
\\
\end{split}
\end{equation} where $\Gamma_{ij}(s_{ijt-1},y_{ijt-1})$ is consumer $i$'s match value,
or taste, for product $j$. A consumer's match value with a product
is a function of the two ``state variables" $s_{ijt-1}$ and
$y_{ijt-1}$. The variable $y_{ijt}$ is a dummy variable that is 1 if
consumer $i$ chooses product $j$ in purchase event $t$, so
$y_{ijt-1}$ keeps track of whether consumer $i$ chose product $j$ in
her previous purchase event. The state variable $s_{ijt}$ keeps
track of whether consumer $i$ has ever purchased product $j$ prior
to purchase event $t$, and it evolves as follows:
\begin{equation}\label{eq:learnstate}
s_{ijt}= s_{ijt-1} + 1\{ s_{ijt-1} = 0 \mbox { and } y_{ijt-1} = 1
\}.
\end{equation}
For the 10 established products, I assume that consumer match values
do not change over time, so $\Gamma_{ij}(s_{it-1},y_{it-1}) =
\gamma_{ij}$. For identification purposes, I normalize every
consumer's match for other liquid (product 1) to 0. For the three
new products, I assume that the evolution of the consumer's
permanent taste is as follows:
\begin{equation}\label{eq:tastessdbig}
\begin{split}
\Gamma_{ij}(s_{ijt-1},y_{ijt-1}) = \gamma_{ij}^0 & \mbox{ if } s_{ijt-1} = 0, \mbox{ and } y_{ijt-1} = 0 \\
\Gamma_{ij}(s_{ijt-1},y_{ijt-1}) = \gamma_{ij} & \mbox{ if }
s_{ijt-1} = 1, \mbox{ or } y_{ijt-1} = 1. \\
\end{split}
\end{equation}
The consumer's match value for the new product is $\gamma_{ij}^0$ if
the consumer has never purchased the product before, and it is
$\gamma_{ij}$ once she has. For the three new products,
$\gamma_{ij}^0$ is consumer $i$'s prediction of how much she will
like product $j$ before she has made her first purchase of it.
$\gamma_{ij}$ is her ``true'' match with the product.
I assume that
\begin{equation}\label{eq:ltaste}
\gamma_{ij} \sim N(\gamma_{ij}^0, \sigma_{ij}^2),
\end{equation} where $\sigma_{ij}^2$ is consumer $i$'s uncertainty about her true
taste for product $j$. I allow $\sigma_{ij}^2$ to vary with the
household $i$'s income and size as follows:
\begin{equation}\label{eq:scoeff}
\sigma_{ij}^2 = \sigma_{max} \frac{\exp( \sigma_{0ij} + \sigma_{1j}
INC_i + \sigma_{2j} SIZE_i)}{1+\exp( \sigma_{0ij} + \sigma_{1j}
INC_i + \sigma_{2j} SIZE_i)}.
\end{equation}
Note that there is unobserved heterogeneity in $\sigma_{ij}^2$ as
well as observed heterogeneity: $\sigma_{0ij}$ varies across
individuals and accounts for unobserved heterogeneity. $INC_i$ is a
variable that varies from 1 to 4, where the four possible categories
correspond to the four income groups in Table \ref{table:hhdemog}.
Household size, the variable $SIZE_i$, also varies from 1 to 4 and
is defined similarly. Note that $\sigma_{ij}^2$ is always positive
and bounded above by $\sigma_{max}$, which I assume is equal to
5.\footnote{The choice of the number 5 is somewhat ad hoc, but the
important thing is that when choosing the upper bound for this
parameter the number should be high enough to not be binding - there
should not be consumers with values of $\sigma_{ij}^2$ greater than
five. In the model estimates section I will examine the distribution
of $\sigma_{ij}^2$ across the population - they do not appear to
approach the upper bound. Furthermore, in my thesis research I
estimate a version of this demand model where there is no learning
and consumers are not forward-looking. The estimates of the taste
variances for all the three new products are significantly smaller
than 5.}
The parameter $\alpha_i$ is consumer $i$'s price sensitivity. I also
allow this parameter to vary with household income and size as
follows,
\begin{equation}\label{eq:pcoeff}
\alpha_i = \alpha_{max} \frac{\exp( \alpha_{0i} + \alpha_{1} INC_i +
\alpha_{2} SIZE_i)}{1+\exp( \alpha_{0i} + \alpha_{1} INC_i +
\alpha_{2} SIZE_i)},
\end{equation} where $\alpha_{max}$ is set to -100. $\alpha_i$ is assumed to always
be negative and, like $\sigma_{ij}^2$, it is bounded. $p_{ijst}$ is
the price in dollars per ounce of size $s$ of product $j$ in the
store during purchase event $t$, and the variable $c_{ijt}$ is the
value of a manufacturer coupon for product $j$ that consumer $i$ has
on hand in purchase event $t$, also measured in dollars per ounce.
The parameter $\alpha_{ic}$ is consumer $i$'s sensitivity to
coupons. I assume that $\alpha_{ic}$ lies between 0 and 1, and that
\begin{equation}\label{eq:ccoeff}
\alpha_{ic} = \frac{\exp(\alpha_{0ic})}{1+\exp(\alpha_{0ic})},
\end{equation} where $\alpha_{0ic}$ lies on the real line.
In Equation \eqref{eq:pdutbig}, $\beta_i$ is a vector that measures
consumer $i$'s sensitivity to other variables, $x_{ijt}$. The first
and second elements of the $x_{ijt}$ vector are dummy variables
which are equal to 1 if product $j$ is on feature or display,
respectively. The third element is a dummy variable that is 1 if
purchase event $t$ occurs in the first week after the introduction
of Cheer, and $j$ is Cheer. The fourth is the same thing for the
second week of Cheer, the fifth for the third and so on up to the
fourteenth week after the Cheer introduction. The next element is a
dummy variable that is 1 if purchase event $t$ occurs in the third
week after the introduction of Surf, and $j$ is Surf (the first and
second week dummy variables were dropped because they were not
identified in simple logit estimations of the demand model). The
next 11 elements are the same thing for weeks 4 to 14 after the Surf
introduction. The next 14 elements of the vector are the same
time-product dummy variables for the Dash introduction. These time
dummy variables are included to capture the effect of unobserved
introductory advertising for the new products.
The consumer's utility in purchase event $t$ is increased by
$\eta_i$ if she purchases the same product that she did in purchase
$t-1$.\footnote{Although some purchases were dropped due to data
problems as described in Section \ref{sec:sampleselection},
$y_{ijt-1}$ is defined to be the product chosen in the consumer's
previous purchase event, even if that purchase event was dropped.
When two product choices are made in the same week, I set
$y_{ijt-1}$ to be the second (or third, if there were three)
recorded purchase. This avoids measurement error in $y_{ijt-1}$.
Also, the fact that I include the consumer's previous purchase event
in the model creates an initial conditions problem: I do not observe
the consumer's purchase event before the data collection starts. To
get around this problem, I estimate the model using data from the
period after the introduction of Cheer Liquid, so that the
consumer's initial $y_{ijt-1}$ is her latest purchase before the
introduction of Cheer. For a very small number of households the
first observed purchase event occurs after the introduction of
Cheer. For these households I impute the first purchase as the most
frequently purchased established product.} Note that the parameter
$\eta_i$ and the function $\Gamma(s_{ijt-1},y_{ijt-1})$ allow two
different sources of dynamics in consumer behavior: consumer's
previous product choices can affect her current utility. One way in
which a consumer's past product choices affect her current product
choice is through the $\Gamma(s_{ijt-1},y_{ijt-1})$ function: this
is {\it learning}. If she has never purchased the new product $j$
prior to purchase event $t$, her taste for this product is her
expected taste, $\gamma_{ij}^0$, whereas if she has purchased it at
some point in the past I assume that she knows her true taste for
the product, $\gamma_{ij}$. The term $\eta_i$ accounts for the
dynamic behaviors of {\it switching costs} or {\it variety-seeking}.
If $\eta_i>0$, consumer $i$'s utility is greater if she consumes the
same product twice in a row. Thus, a positive $\eta_i$ induces a
switching cost (\citeN{pollack70}, \citeN{spinnewyn81}). An
alternative way to model switching costs would be to subtract a
positive $\eta_i$ from all products except the one that was
previously chosen; since utility functions are ordinal and there is
no outside good in this model, these two formulations are
equivalent. As discussed in the introduction, switching costs have
been found to be an important part of demand for consumer packaged
goods, and could arise if there are costs of recalculating utility
if a consumer decides to switch products. If $\eta_i<0$, the
consumer will prefer to consume something different than her
previous product choice: I label this as variety-seeking
\cite{mcpess82}. Variety-seeking is not likely an important behavior
in laundry detergent markets, but I allow it in the model for the
sake of generality. As with the price coefficient and consumer
uncertainty, I allow both observed and unobserved heterogeneity in
$\eta_i$:
\begin{equation}\label{eq:etacoeff}
\eta_i = \eta_{i0} + \eta_{1} INC_i + \eta_{2} SIZE_i
\end{equation}
Last, the $\varepsilon_{ijst}$ is an idiosyncratic taste component
that is i.i.d. across $i$, $j$, $s$ and $t$, and has a logistic
distribution. I assume this error is observed to the consumer but
not the econometrician and is independent of the model's explanatory
variables and the individual's utility parameters such as $\alpha_i$
and $\beta_i$.
I allow unobserved heterogeneity in most of the individual-level
parameters for every consumer: the $\gamma_{ij}$'s for all products
except for the Powder Other and Powder Tide products, the
$\gamma_{ij}^0$'s, the $\alpha_{0i}$'s, the the $\xi_{is}$'s's, the
$\alpha_{0ic}$'s and $\sigma_{0ij}$'s for the three new products,
the intercept of the switching costs parameter $\eta_{i0}$, and the
$\beta_i$ vector. Denote the vector of population-varying individual
level parameters for consumer $i$ listed previously as $\theta_i$,
and the vector of individual level parameters with the
$\gamma_{ij}$'s for the three new products removed as
$\tilde{\theta}_i$. I assume that $\tilde{\theta}_i \sim N(b,W)$
across the population, where $W$ is diagonal.\footnote{A possible
worry may be that the assumption of normally distributed
heterogeneity is too restrictive. In my thesis research I present
the results of the estimation of an extended model where some of the
heterogeneity is assumed to be a two point mixture of normals.
Identification of some parameters becomes more difficult in this
case, but generally the results do not change, which suggests that
the assumption of normality is a good fit.} This assumption means
that the household's uncertainties about tastes for the new
products, $\sigma_{ij}^2$'s, and the price sensitivities
$\alpha_i$'s will be transformations of normals as shown in
Equations \eqref{eq:scoeff} and \eqref{eq:pcoeff}. Their
distribution is Johnson's $S_B$ distribution, which is discussed in
\citeN{johnsonkotz70}, page 23. The parameters which do not vary
across the population are the $\gamma_{ij}$'s for Other Powder and
Tide Powder, the coefficients on household demographics for the
learning parameters, the price sensitivities and the switching
costs, which are $\sigma_{1j}$ and $\sigma_{2j}$, $\alpha_{1j}$ and
$\alpha_{2j}$ and $\eta_1$ and $\eta_2$ respectively, and a group of
parameters which capture consumer expectations of future coupons
$c_{ijt}$. These latter parameters will be discussed further in the
next section. I denote the vector of population-fixed parameters as
$\theta$.
\subsection{Consumer Dynamic Optimization Problem}\label{sec:dynopt}
I assume consumers are forward-looking\footnote{In my thesis
research \cite{osborne06}, evidence is provided that consumers are
forward-looking in this data set.} and in each purchase event they
maximize the expected discounted sum of utility from the current
purchase into the future. The consumer's expected discounted utility
in purchase event $t$ is
\begin{equation}\label{eq:maxfututil}
V(\Sigma_{it};\theta_i,\theta) = \max_{\Pi_i} E \left[
\sum_{\tau=t}^\infty \delta^{\tau-t}
u_{ijs\tau}(S_{i\tau-1},p_{ij\tau},c_{ij\tau},x_{ij\tau},
y_{ij\tau-1},\varepsilon_{ij\tau},\theta_i) | \Sigma_{it}, \Pi_i;
\theta_i, \theta \right],
\end{equation} where $\Pi_i$ is a set of decision rules that map the state in
purchase $t$, $\Sigma_{it}$, into actions, which are the $y_{ijt}$'s
in purchase event $t$. The parameter $\delta$ is a discount factor,
which is assumed to equal 0.95.\footnote{The discount factor is
usually difficult to identify in forward-looking structural models,
so it is common practice to assign it a value. Since the timing
between purchase events varies across consumers, it is possible that
the discount factors may also vary across consumers. As I will
discuss in a few paragraphs, I assume that all consumers have the
same expectations about when their next purchase will occur, which
removes this problem. Also, the estimation method I will use
requires that the discount factor is not an estimated parameter.}
The function $V(\Sigma_{it};\theta_i,\theta)$ is a value function,
and is a solution to the Bellman equation
\begin{equation}\label{eq:bellman}
V(\Sigma_{it};\theta_i,\theta) = E_{\varepsilon_{ijt}} [ \max_{(j,s)
\in J_{it}} \{ u_{ijst}(S_{it-1},p_{ijt},c_{ijt},x_{ijt},
y_{ijt-1},\varepsilon_{ijt},\theta_i) + \delta E
V(\Sigma_{it+1};\theta_i,\theta) \}].
\end{equation}
The state vector in purchase event $t$, $\Sigma_{it}$, has the
following elements: the $S_{ijt-1}$'s for the new products, the
$y_{ijt-1}$'s for all 13 products, the prices of all products,
$p_{ijt}$, the set of available products, $J_{it}$, and a new state
variable $n_t$, which will be discussed later.
The expectation in front of the term
$V(\Sigma_{it+1};\theta_i,\theta)$ in Equation \eqref{eq:bellman}
will be taken over the distributions of future variables, which are
\begin{enumerate}
\item[i)] the true tastes for new products the consumer has never
purchased, as in Equation \eqref{eq:ltaste},
\item[ii)] future prices,
\item[iii)] future coupons, and
\item[iv)] future product availabilities.
\end{enumerate}
For reasons of computational tractability that will be discussed in
the next section, I assume that consumers have naive expectations
about future $x_{ijt}$'s, which are the feature, display, and time
dummies. By this I mean that consumers expect all these variables to
have future levels of zero. A result of this assumption is that
these variables do not have to be included in the state
space.\footnote{Assuming that consumers do not expect future
advertising is probably not that unrealistic in the laundry
detergent market. For this product category, it is likely that
consumers will care more about future prices and how well the
product they purchase will function. Future advertising is likely to
be more important with ``prestige'' products, such as shoes or
clothing.}
I account for consumer expectations about future prices $p_{ijst}$
and product availability $J_{it}$ in the following way. I estimate a
Markov transition process for prices and availability from the data
on a store-by-store basis, using a method similar to
\citeN{erdemimaikeane03} which I will briefly summarize. A detailed
description of the estimation of this process can be found in
Appendix \ref{append:priceprocess}. I assume that consumers' actual
expectations about these variables are equal to this estimated
process. In my data, prices tend to be clustered at specific values,
so the transition process for prices is modeled as
discrete/continuous. The probability of a price change for a product
conditional on its price in the previous week, last week's prices
for other products, and whether a new product was recently
introduced is modeled as a binary logit. Conditional on a price
change, the probability of a particular value of the new price is
assumed to be lognormal given the previous week's prices in the same
store and whether a new product introduction recently occured. Note
that there are 49 possible brand-size combinations, which makes the
state space of prices very large. To reduce the size of the state
space, the Markov process for prices is only estimated on the most
popular sizes of liquids and powders. The prices of other sizes are
assumed to be a function of the prices of the popular sizes.
An important part of the price process is that we observe
introductory pricing for the new products. I assume consumers
understand that the prices of new products will rise after their
introduction, so I include a dummy variable in both the price
transition logit and regression which is 1 for the first 12 weeks
after the introduction of Cheer, a separate dummy variable which is
1 for the first 12 weeks after the introduction of Surf, and one for
the first 12 weeks after Dash's introduction. Allowing for
introductory pricing in this way will complicate the state space. To
see why, consider a consumer who purchases a laundry detergent on
the week of Cheer's introduction. Suppose further that this person
purchases detergent every 10 weeks, and she knows exactly when she
will make her future purchases. This person's next purchase will
occur in 10 weeks, when the price of Cheer is still low. Her next
purchase after that will occur in 20 weeks, when the price process
is in its long run state. The number of purchase events before the
consumer enters the long run price state will be a state variable,
which I denote as $n_t$.
A complication this variable $n_t$ creates is that consumers
probably do not know exactly when their next purchases of laundry
detergents will be. Because the econometrician does not observe
consumer expectations, the best we can do is to make an assumption
about this. I assume that all households expect to make their next
purchase of laundry detergent in exactly 8 weeks. In the sample of
households I use to estimate the model on, household interpurchase
times are clustered between 6 and 8 weeks, with a median
interpurchase time of 8 weeks. This means that $n_t$ will take on 2
values: 1 if the consumer's purchase occurs within the first 4 weeks
after the new product introduction, and zero anytime afterwards.
For the state variable $J_{it}$, I estimate the probability of each
detergent being available in a given calendar week for a given store
separately using a binary logit. As was the case with prices, the
process for availability is only estimated for the most popular
sizes of each product, and so the only part of $J_{it}$ that is a
true state variable are the availabilities of these products. This
means I estimate 13 logits, one for each product, where one of the
regressors is whether the product was available in the previous
week. The availabilities of less popular sizes are assumed to be a
function of the availability of the popular sizes. I assume that the
introductions of new products are a surprise to consumers, so this
aspect of the state space is not taken into account by my
availability estimation. A result of this assumption is that
consumers will recalculate their value functions after each new
product introduction: there will be a value function for after the
Cheer introduction, a new one after the Surf introduction, and
another one after the Dash introduction. Hence, there will be three
times where it will be possible for $n_t$ to be equal to 1, right
after the introduction of each new product.
I treat consumer expectations about future coupons, which are the
$c_{ijt}$'s, differently than future prices. As I will discuss
further in the Section \ref{sec:coupons}, I specify a process for
the distribution of coupons and estimate the parameters of this
process along with the other model parameters. I assume that the
future $c_{ijt}$'s are composed of two random variables: a binary
random variable $\overline{c}_{ijt}$ which is 1 if consumer $i$
receives a coupon for product $j$ in purchase $t$, and a random
variable $v_{ijt}$ which is the value of the coupon received. Denote
probability of a consumer having a coupon on hand and available for
use for product $j$ when $n_t=0$ as $p_{cj}^0$. Because consumers
may expect more coupons to be available for new products when they
are new, I allow the probability of receiving a coupon for a given
product $j$ to be different when $n_t=1$. In particular, for the new
products $j=$ Cheer, Surf and Dash I assume the probability of
having a coupon is $p_{cj}^0 + p_{cj}^1$. For established products,
I assume the probability of receiving a coupon when $n_t=1$ after
the Cheer introduction to be $p_{cj}^0 + p_{c}^{Cheer,1}$, after the
Surf introduction to be $p_{cj}^0 + p_{c}^{Surf,1}$, and after the
Dash introduction to be $p_{cj}^0 + p_{c}^{Dash,1}$. Note that the
parameters $p_{c}^{Cheer,1}$, $p_{c}^{Surf,1}$ and $p_{c}^{Dash,1}$
do not vary by product. If a consumer receives a coupon for product
$j$, the value of that coupon, which I denote as $v_{ijt}$, is
multinomial and drawn from the empirical density of coupon values.
Coupon values are clustered at certain numbers (such as 50 cents, 60
cents, or 1 dollar), so I calculate the probability of getting a
particular coupon value for a particular brand in a
period\footnote{There are six periods in all - when $n_t=1$ after
Cheer's introduction, when $n_t=0$ after Cheer's introduction, when
$n_t=1$ and $n_t=0$ after Surf's introduction, and when $n_t=1$ and
$n_t=0$ after Dash's introduction.} by tabulating the number of
redeemed coupons of that value for that brand in that period, and
dividing by the total number of redeemed coupons for that product in
that period.
The last part of the state space is the process on the state
variables summarizing purchase history, $S_{ijt-1}$ and $y_{ijt-1}$.
Because these state variables are influenced by consumer choices, it
is instructive to examine how we compute the value functions as
these parts of the state space change. Suppose first that
$S_{ijt-1}=0$ for some product $j$. If the consumer decides to
purchase product $j$ for the first time, then $S_{ijt}$ will be zero
and $y_{ijt}$ will be 1. When we construct the next period value
function we will integrate out the consumer's true taste for product
$j$, conditional on $\gamma_{ij}^0$ and $\sigma_{ij}^2$. Let
$\gamma$ be a random variable with the distribution of true tastes
for product $j$, where $f(\gamma|\gamma_{ij}^0,\sigma_{ij}^2)$ is
$N(\gamma_{ij}^o,\sigma_{ij}^2)$, and denote $\theta_i(\gamma)$ as
the vector of individual level parameters for consumer $i$ with her
true taste draw for product $j$ replaced by $\gamma$. Denote
$v_{ikst+1}(\gamma)$ as consumer $i$'s utility for size $s$ of
product $k$ in purchase event $t+1$ as a function of $\gamma$, minus
the logit error $\varepsilon_{ijst+1}$:
\begin{equation}\label{eq:utfirstpurch}
\begin{array}{l l l}
\mbox{Product } k = j & : v_{ikst+1}(\gamma) = & \gamma +\xi_{is}- \alpha_i (p_{ikst+1}-c_{ikt+1}) + \eta_i y_{ikt} + \delta EV(\Sigma_{it+2};\theta_i(\gamma),\theta) \\
\mbox{Product } k \neq j & : v_{ikst+1}(\gamma) = &
\Gamma_{ik}(S_{ikt},y_{ikt}) + \xi_{is} - \alpha_i (p_{ikst+1}-c_{ikt+1}) + \eta_i y_{ikt} \\
& \quad & + \delta EV(\Sigma_{it+2};\theta_i(\gamma),\theta). \\
\end{array}
\end{equation}
Consumer $i$'s expected value function in purchase event $t+1$, at
her first purchase of product $j$ ($S_{ijt}=0$ and $y_{ijt}=1$) will
be
\begin{equation}\label{eq:vffirstpurch}
\begin{split}
& E V(\Sigma_{it+1};\theta_i,\theta) \\
& = E_{c_{it+1}} E_{p_{it+1}|p_{it}} E_{J_{it+1}|J_{it}} \left[
\int_{\gamma_{ij}} \ln \left( \sum_{(k,s) \in J_{it+1}}
\exp(v_{ikst+1}(\gamma_{ij})) \right)
f(\gamma_{ij}|\gamma_{ij}^0,\sigma_{ij}^2)\,d \gamma_{ij} \right].
\\
\end{split}
\end{equation}
When the consumer has purchased product $j$ in the past, such as at
state space points $S_{ijt}=1$ and $y_{ijt}=1$ or $S_{ijt}=1$ and
$y_{ijt}=0$, the value function will be defined similarly, but will
be simpler: the consumer's utility for all products given in
Equation \eqref{eq:utfirstpurch} will be a function of the true
taste $\gamma_{ij}$ rather than $\gamma$ and the value function in
\eqref{eq:vffirstpurch} will not include the integral over $\gamma$.
Note that even if consumer $i$ knows her true taste for all 3 new
products ($S_{ijt}=1$ for all these products), there will still be
dynamics in demand arising from the $\eta_i$. The consumer will take
into account the fact that her purchase today will change $y_{ijt}$,
and affect her utility in period $t+1$.
\subsection{Model Identification}\label{sec:ident}
I will explain the identification of the model in two steps. For
simplicity, assume that we are examining a market with one new
product introduction, similar to the market analyzed with the simple
model in section 3. Assume further that we see each consumer for a
long period of time. Although the estimation procedure I am using is
likelihood-based, for brevity I will discuss it in the context of
method of moments estimation. Thus, I will consider which moments in
the data will be necessary to solve for the model's parameters. This
is sufficient to show that the likelihood-based estimates are
identified since the likelihood-based estimator, if it is correctly
specified, is consistent and will converge to the same value as the
method of moments estimator.
First, consider the period after most or all of the learning has
occurred. In the long run, there will be no learning: since the
distribution of the idiosyncratic error, $\varepsilon_{ijst}$, has
infinite support, at some point in time everyone in the market will
purchase the new product at least once. After every consumer has
experimented with the new product, the only dynamics left in demand
will be the switching costs or variety-seeking captured by the
$\eta_i$'s. At this point we are left with separately identifying
the distribution of $\eta_i$'s and the distribution of the
``non-dynamic" coefficients in the consumer's flow utility: consumer
tastes for established products, consumer price sensitivities, and
the distribution of the coefficients for the $x_{ijt}$'s, the
$\beta_i$'s.
Consider first the task of identifying $\eta_i$ for an individual
consumer. The $\eta_i$ causes state dependence in her demand: a
consumer's choice in purchase event $t-1$ will affect her choice
today. \citeN{chamberlain85} has argued that state dependence can be
identified through the effect of previous exogenous variables on
today's purchase probabilities. As an example, consider the effect
of a price cut for Tide in purchase event $t-1$ on the probability
of consumer $i$ purchasing Tide in purchase event $t$. If the price
cut has no effect of this probability, then $\eta_i=0$. If the price
cut increases the probability that the consumer purchases Tide in
purchase event $t$, then $\eta_i>0$ and the consumer has a switching
cost. If the price cut decreases the probability of the consumer
purchasing Tide in purchase event $t$, then $\eta_i<0$ and the
consumer is a variety-seeker. If we observe consumer $i$ for a long
period of time, and there is variation in the time series path of
prices the consumer observes, then it should be possible to infer
the size of the consumer's $\eta_i$. In the data, for many consumers
we will not observe them long enough to be able to accurately
estimate a consumer's individual $\eta_i$; identification is made
easier by the fact that $\eta_i$ is assumed to only be a function of
household demographics.
% given that I assume that
%$\eta_i$ is normally distributed across the population, we simply
%need to identify its mean and variance. Since in the data I observe
%consumers for at least 5 purchase events, and since there is
%substantial time-series price variation, there should be enough
%variation in previous prices to identify the mean and variance of
%the $\eta_i$'s.
Once the $\eta_i$ distribution has been identified, we are left with
identifying the heterogeneity of the non-dynamic coefficients in the
consumer's flow utility. Identification of this part of consumer
heterogeneity is straightforward and will come through the effect of
variation in purchase event $t$ exogenous variables on purchase
event $t$ purchase probabilities.
Now consider the periods right after the new product introduction,
when we will need to identify $\sigma_{ij}^2$ and $\gamma_{ij}^0$
for the new product $j$. In my model I allow these parameters to
vary across the population, but to get a feel for identification it
is easier to start with the case where there is no heterogeneity.
Hence, for the next few paragraphs I will drop the $i$ subscript.
First, consider the identification of $\sigma_j^2$. In my thesis
research \cite{osborne06}, I solve a simplified version of the
econometric model and simulate it numerically under different values
of $\sigma_j^2$. In the simplified model there is one new product
introduction, and one established product, and I assume that
$\eta_i$ is fixed across the population. Prices for each product are
constant over time and there is no advertising, so product switching
will be induced by variation in the error term. I simulate the model
when $\sigma_j^2$=0, when it is positive, and for different values
of $\eta_i$. Doing this allows me to find the model's testable
implications, where the null hypothesis is $\sigma_j^2$=0 and the
alternative is $\sigma_j^2>0$.
There are two testable implications to this model which are examined
in \citeN{osborne06}, who finds support for them in the same laundry
detergent scanner data which is used in this paper. The test
statistics associated with them are shares of consumers who take
actions at certain times, controlling for any time-series variation
in prices. The first implication is that, under the maintained
hypothesis that $\delta$ is high and $\eta_i = 0$ $\forall i$, in
the first two periods after the new product's introduction, the
share of consumers who purchase the new product and then do not is
greater than the share who do not and then do. This is due to the
fact that when there is learning and $\eta_i=0$, there will be a
positive option value of learning which induces consumers to
purchase the new product sooner rather than later.\footnote{Since
evidence in favor of this implication is found in the data set I use
\cite{osborne06}, it is reasonable to conclude that for some new
products the option value of learning is positive, and that
consumers are forward-looking.} When there is no learning, the test
statistic will be zero since the order of purchase does not matter.
This share difference is an increasing function of $\sigma_j^2$,
because the option value of learning induces consumers to purchase
the new product sooner rather than later, and the option value of
learning is increasing in $\sigma_j^2$. If this share difference is
greater in the data than the model would predict at $\sigma_j^2=0$,
then $\sigma_j^2$ will pick up that difference.
The second testable implication is that for any value of the
discount factor and for any value of $\eta_i$, among consumers whose
previous purchase was the new product, the share of consumers who
repurchase the product increases over time if $\sigma^2>0$. This is
because initially the consumers whose previous purchase was the new
product consist mostly of consumers who are experimenting; later it
consists mostly of consumers who like the new product. The share of
consumers who repurchase the new product is an increasing function
of the population variance in tastes for the new product.
Immediately following the new product introduction, this share will
reflect the population variance in expected tastes, the
$\gamma_{ij}^0$'s (which for the moment we have assumed to have zero
variance). As consumers learn, the population variance in tastes
will be increased by $\sigma_j^2$. Since consumers' taste draws will
be taken from more extreme ends of the taste distribution, those who
purchase the new product will tend to have higher taste draws after
the learning has occurred and will be more likely to repurchase it.
An increase in $\sigma_j^2$ will increase the share of consumers who
repurchase the new product in periods after all learning has
occurred. Hence, $\sigma_j^2$ can also be identified from the
difference between the share of consumers who repurchase the new
product immediately following the new product introduction and the
share of consumers who repurchase the new product after all learning
has occurred: the greater this difference, the greater is
$\sigma_j^2$.
%The third testable implication of the model is that consumers will
%purchase smaller sizes of the new product on their first purchase
%relative to later purchases. To derive this implication, I expanded
%the model of learning and switching costs to allow for inventory
%behavior. The structural model that is used in this paper does not
%include inventory behavior, so this implication does not help with
%the identification. However, I find support for the implication in
%the data, which supports the importance of learning.
The identification of the mean of $\gamma_j^0$ and its variance is
straightforward when $\sigma_j^2$ is constant across the population.
First, note that in the period after the learning occurs, we can
identify the distribution of true tastes for the new product. The
mean of the population distribution of tastes for the new product
will be the same as the mean of the $\gamma_i^0$ distribution. The
variance in the distribution of true tastes for the new product is
the variance of $\gamma_j^0$ plus $\sigma_j^2$. We can identify
$\sigma_j^2$ using the share difference moment or from the change in
the share of consumers who repurchase the new product. The variance
of $\gamma_j^0$ will simply be the variance in the population
distribution of true tastes minus the $\sigma_j^2$.
Last, I will relax the assumption that $\sigma_j^2$ is constant
across the population. Separating out heterogeneity in
$\sigma_{ij}^2$ and $\gamma_{i0}$ is a harder task, because we only
observe consumers making one first purchase of each new product. To
see why this could be a problem, let us reconsider the simplified
model which was just discussed, and assume that there are only 3
periods in it, the distribution of the error term for product 2 is
standard normal, $\eta_i$ is 0 for all consumers, and consumers
discount the future at a rate of 0.95. Figure \ref{fig:prexper1}
shows the probability of experimenting with the new product in
period 1 for different values of $\gamma_{i0}$ and $\sigma^2$. I
assume that the price of product 2 is zero in the first two periods,
and 1 in the 3rd period, that consumer price sensitivities are equal
to 1, and that consumers know the path of future prices. The lines
on the bottom of Figure \ref{fig:prexper1} are the level curves of
the probability of experimenting. The set of consumers who are on a
level curve are the set of consumers who have the same probability
of experimenting for certain values of $\gamma_i^0$ and $\sigma^2$.
We can see that if we take a consumer with some value of
$\gamma_{i0}$ and a low $\sigma^2$, a consumer with a higher
$\sigma^2$ and a somewhat lower $\gamma_i^0$ will have the same
probability of experimenting. There needs to be some way to tell
apart these consumers in order to separately identify the population
distributions of $\gamma_i^0$ and $\sigma_{ij}^2$.
The intuition behind how this is done is that $\sigma_{ij}^2$ will
affect how consumers respond to changes in future prices. In the
case where the $\eta_i=0$, consumers with values of $\sigma_{ij}^2$
close to zero will be very unresponsive to future price changes
compared with consumers who have higher values of $\sigma_{ij}^2$.
In Figure \ref{fig:prexperdiff}, I show the probability of
experimenting when the price path is 0,0, and 1 minus the
probability of experimenting when the price path is 0, 1 and 1.
These probabilities are very close when $\sigma_{ij}^2$ is close to
zero, but are large when it is greater. The direction of the
response depends on the value of $\gamma_{i0}$: it is positive for
consumers with low $\gamma_{i0}$'s and negative for those with high
$\gamma_{i0}$'s.\footnote{In my thesis work, I demonstrate that the
option value of learning can be increasing or decreasing in future
prices.}
Figure \ref{fig:prexperlevel} shows an overlay of the level curves
from Figures \ref{fig:prexper1} and \ref{fig:prexperdiff}. The level
curves from Figure \ref{fig:prexper1} are shown in solid lines, and
the probability associated with the given level is labeled. The
level curves from Figure \ref{fig:prexperdiff} are shown in the
dotted lines. They are not labeled, but the important thing to
notice about them is them is that the level curve for 0 is the curve
that starts in between $\gamma_i^0=0.5$ and $\gamma_i^0=1$ on the
$\gamma_i^0$ axis. In the area below that line, the level curves
increase as $\sigma^2$ increases, and above that they decrease. In
general, it appears that for a given value of the probability of
experimentation, the response to future price changes increases as
we increase $\sigma^2$. As an example, consider the set of consumers
for which the probability of experimenting with the new product is
0.4, which is shown by the line 0.4 in Figure
\ref{fig:prexperlevel}. We can see that as $\sigma^2$ increases, the
response of these consumers to future price changes also increases.
Thus, even though all these consumers will have the same probability
of experimenting, we can tell apart consumers with high $\sigma^2$'s
and low $\sigma^2$'s by their response to future price changes.
Unfortunately, there are some consumers we cannot tell apart by
their response to future prices. For example, take the set of
consumers whose probability of experimenting with the new product is
0.8. The level curve for which the response to future prices is 0
crosses this line twice, and we cannot tell apart the consumers at
the points where it crosses. In my thesis research \cite{osborne06}
I argue that the set of consumers that we cannot tell apart by this
method is of zero measure.
In the data set we observe variation in future prices that is
qualitatively similar to this example; the reason for this is that
there is introductory pricing for the three new products. Some
consumers will arrive to the market right after the new product
introduction, when future prices will be low, and some will arrive
near the end of the introductory pricing period, when future prices
will be high. Increasing the variance in $\sigma_{ij}^2$ will make
some consumers more responsive to future price changes, and some
consumers less responsive to future price changes. The population
response to a change in price will be a function of the moments of
the distributions of $\gamma_{i0}$ and $\sigma_{ij}^2$.
Unfortunately, I have not been able to find a population moment that
is a monotonic function of the variance of $\sigma_{ij}^2$. However,
I believe that the arguments presented so far, which demonstrate
that households with different $\gamma_{i0}$'s and $\sigma_{ij}^2$'s
behave differently, suggest that identification of these parameters
is possible. Moreover, as I will discuss in Section
\ref{sec:exlearning}, my model estimation procedure produces an
empirical density for the variance of $\sigma_{ij}^2$ for the new
products, conditional on the data. I show in that section that the
density has a small variance and thin tails, which suggests that
there is enough information in the data to identify the parameters
in question.
\section{Estimation Procedure}\label{sec:est}
\subsection{Selection of Household
Sample}\label{sec:sampleselection}
Although there are 1693 households in the total sample, I remove
roughly two thirds of them from the sample before estimation,
leaving a final sample of 519 households. There are two main reasons
for reducing the sample so much: first, and most importantly, for
computational reasons and second, to address data issues. The full
structural model takes more than 1 week to estimate, so many of the
households I remove were randomly dropped from the sample. Some
households were dropped due to data issues, such as lack of
observations. More than 80 percent of the dropped households were
removed for these reasons. The other households who were removed
from the sample were taken out for a third reason: they were
households who were thought to be prone to inventory behavior. There
are two reasons for removing these types of households: first,
modeling inventory behavior structurally is computationally
difficult (see \citeN{erdemimaikeane03} for an example), and adding
this element to my model of learning and switching costs would make
the model computationally intractable. Second, stockpiling behavior
can potentially lead to a source of bias that is similar to that
caused by ignoring heterogeneity in price sensitivities. Since new
products are introduced at low initial prices, some consumers may be
induced to purchase them simply in order to stockpile. These
consumers will likely purchase something else when the new products
are more expensive and they need to buy detergent again.
Household sample selection is performed as follows. The full sample
includes 1693 households. First, I remove households who are likely
to be extremely price sensitive, and who will stockpile the
products. \citeN{hendelnevo05} examine stockpiling behavior in
laundry detergents, and find that suburban families, non-white
families and households with 5 or more members are more price
sensitive.\footnote{The data set used in that paper is different
from the one I use.} In my data set, it was not possible to
determine whether a household was suburban or not. Data was
collected on the type of residence where the household lived, and
about 85\% of households live in a single family home. 94\% of these
households owned the home rather than rented it. If all these
families were suburban, it would not be feasible to cut them out.
Household racial characteristics were collected in the data set. I
constructed a household race variable which was defined to be the
race of the male head of household if there was a male head of
household, and the female head of household otherwise. I found that
over 99\% of households were white. For household size, there were
211 households with 5 or more people in them. I remove all these
households from the sample.
In my thesis research \cite{osborne06}, I examine whether 5 person
households were indeed more likely to stockpile than other
households. First, I examine time series patterns in the quantities
purchased by households. Households that do not stockpile will
likely be less responsive to sales in stores, and will be more
likely to repeatedly purchase the quantity that suits the
household's needs. To examine this, for each household in the sample
I calculate the standard deviation of the quantity purchased by the
household over the household's purchase history. I find that the
standard deviation of quantity purchased by 5 person households is
larger than for smaller households, and this difference is
statistically significant.
Another way to tell if these households are stockpiling is to look
at how much they buy in response to sales in the detergent product
category. We would expect households who stockpile to purchase
larger quantities in response to store price drops. To examine this,
I regress the quantity purchased by a household during a given week
on the minimum store price in the laundry detergent category that
week, the price interacted with a dummy variable for whether the
household has 5 or more persons in it, the next week's price,
feature, display and household inventory. Household fixed effects
were also included in the regression. The coefficient on the
interaction between price and household size of 5 is negative and
significant.
A third way to tell if 5 person households are stockpiling they
should be more likely to make a purchase in the laundry detergent
category in response to a store price drop. In the data set,
household shopping trips were recorded, so it was possible to tell
if a household visited a store in a given week but did not buy
anything. To estimate a household's sensitivity to store price
drops, for each household in my data set I estimate a binary logit
model where the dependent variable is 1 if the household makes a
purchase in a given week, and the independent variables are the
minimum price observed in the store during the current week, the
minimum price in the next week, a measure of the household's
inventory, and two dummy variables for whether any products were on
feature and display during the current week. I find that households
with 5 or more people in them have lower price coefficients, but the
difference is not large or statistically significant.
I also examine a number of different statistics to verify that the
households I removed actually had the potential to impart the
positive bias described above. I examine this using the two testable
implications which were described in Section \ref{sec:ident}. Recall
that the first testable implication was that the share of consumers
who purchase the new product and then do not minus the share who do
not and then do is increasing in in the amount of learning. If 5
person households are imparting positive bias to the sample, then
including them should make this test statistic larger in magnitude.
I computed the test statistic for both the full sample and the
sample with less than 5 households in it and find that this is the
case. The second testable implication can also be used to examine
this. The share of consumers who repurchase the new product should
increase more if there is more learning. The results of this test
are not sensitive to the choice of sample.
A potential problem with this procedure is that there may still be
significant stockpiling in the sample that is left. All households
likely stockpile to some degree, so it may be difficult to remove
this source of bias entirely. To examine whether this bias still
exists in the sample I have left, I examine whether there is less
learning among households who are less likely to stockpile. In
particular, I compute the share difference test statistic for
households that have only 1 person, or are in the highest income
bracket. I find that the share difference test statistic for smaller
and higher income households does not vary systematically across
products. Another problem with selecting out 5 person households is
that this demographic variable may be correlated with consumer
tastes, leading to sample selection bias. In my estimates of reduced
form demand models, I find that taste coefficients, coefficients on
the previous product purchase, and coefficients on prices do not
change very much when I remove 5 person households from the sample.
Thus, the impact of removing these households on these quantities is
likely to be small.
Removing 5 person households might not be the only method of
selecting a sample of households that are not prone to stockpiling.
Another way is to remove households who are sensitive to price drops
in the laundry detergent category. Recall that I estimated binary
logit models for each household in the sample to examine whether 5
person households were more sensitive to category price drops than
smaller households. An alternative sample selection criterion I
examine is to remove households who have estimated price
coefficients less than 1 standard deviation below the population
mean price coefficient. There are 148 such households. I find that
the share difference test statistic decreases when these households
are removed, which suggests they may also be subject to the problem
of bias. The tenure test is not impacted by changing the sample. A
disadvantage of this selection criterion is that it is less
transparent than removing 5 person households. The estimates of
reduced form demand models produced using this selection criterion
look similar to those produced by removing 5 person households, so
it will probably not matter which one is used.
Another possible selection criterion is to examine household
interpurchase timing behavior. Households who do not stockpile
likely wait until they run out of detergent to purchase more of it.
To examine this, for each household I compute a consumption rate,
which is the total number of ounces the household purchased in the
three year period, divided by the number of weeks. Then, at each
purchase event I predict the week when the household will use up the
amount they purchased on that occasion, assuming a constant
consumption rate. I compute a measure of the error of this
prediction on a household by household basis by taking the average
of the absolute difference between the predicted week of the next
purchase and the actual week of the next purchase. I remove
households whose error is larger than the population mean of the
error, which was 7.43. There are 425 such households. I find that
when I use this sample selection criterion to estimate the share
difference test, the share difference increases. This is the
opposite of what should happen if these households are indeed
stockpilers. The tenure test is not affected by this sample. I also
find that households with more predictable purchase behavior are
more sensitive to category price drops, and that they tend to be
larger households. This suggests that error in predicted
interpurchase timing is not a good proxy for how likely households
are to stockpile.
After removing 5 person households, I remove households where there
appeared to be issues with data. Households that made more than 50
purchases in the three year period were removed. Also, any
households who made less than 5 purchases were removed from the
sample. It is probable that these households were purchasing at
stores which were not included in the survey, and I felt it was best
to remove them since we might not be observing many of their
purchases. 188 households were removed for these reasons.
In addition to removing households from the sample, some individual
purchase events needed to be removed from the sample. Recall that in
my model, an observation is a household purchase of one of the 49
possible brand-size combinations. Any purchase events of products
outside these categories were removed from the sample. Purchase
events where the household purchased different products in the same
week were also removed. Purchase events where the only product
available in the store was the product purchased were also removed
from the sample. Any households whose first purchase of one of the
new products was removed for these reasons were dropped from the
sample. Also, for the share differences test to work (and for the
identification arguments laid out later in the paper to apply), we
need to observe a household's first and second purchase events after
the new product introduction. Any households for whom we do not see
at least 2 purchase events in either the window of time after
Cheer's introduction and before Surf, or the time after Surf and
before Dash, or the time after Dash, are removed from the sample.
238 households were dropped for these reasons. Of the households who
were kept, 12.6\% of purchases were removed, about half of those
being multiple brand purchases. Of the 1066 households remaining,
one half of these households were dropped randomly to ease
computational burden. The sample of households remaining was 519.
\subsection{Coupon Parameters}\label{sec:coupons}
Before I discuss in detail the estimation procedure, I wish to
discuss an issue that arises in estimation due to the inclusion of
coupons. In my model, I assume that the price of a product $j$ to a
consumer is the shelf price, $p_{ijst}$, minus the value of a coupon
$c_{ijt}$. Coupons present an estimation difficulty: in my data set,
I only observe whether a consumer has a coupon for the particular
product that she purchases in a given purchase event. We do not
observe whether the consumer has a coupon for any other products at
that time. I overcome this problem by treating any coupons for
products that the consumer did not choose as unobservables.
I assume that for each purchase event every coupon $c_{ijt}$ for a
non-purchased product (one for which $y_{ijt}=0$) received by the
consumer is drawn from the same distribution as consumer
expectations about future coupons that is described in Section
\ref{sec:dynopt}; hence, consumer expectations about future coupons
are rational. To summarize the notation developed in that section,
recall that the $c_{ijt}$ for a non-purchased product is composed of
two random variables, the binary random variable
$\overline{c}_{ijt}$ which is 1 if the consumer receives a coupon
for product $j$, and $v_{ijt}$, which is the value of the coupon
received. Then the variable $c_{ijt}$ is equal to
$\overline{c}_{ijt}v_{ijt}$, and the vector of population-fixed
parameters, $\theta$, contains the parameters $p_{cj}^0$, $p_{cj}^1$
for Cheer, Surf and Dash, and $p_{c}^{Cheer,1}$, $p_{c}^{Surf,1}$,
and $p_{c}^{Dash,1}$.
%I assume that for each purchase event every coupon $c_{ijt}$ for a
%non-purchased product (one for which $y_{ijt}=0$) received by the
%consumer is drawn from a discrete distribution. Each of these
%$c_{ijt}$'s is composed from two Bernoulli random variables and a
%multinomial random variable. The first Bernoulli random variable,
%which I denote $z_{it}$, is 1 if the consumer gets any coupons at
%all in purchase event $t$ (this is automatically 1 if the consumer
%uses a coupon in her purchase), which occurs with probability $p_z$.
%The probability of getting a coupon for a nonpurchased product in
%purchase $t$ is also assumed to be Bernoulli, with probability
%$p_{cj}$\footnote{Note that these probabilities do not vary across
%the population. I have experimented with relaxing this assumption in
%static logit models by allowing the probabilities to be functions of
%normal random variables, but have had difficulty estimating the
%variances of the underlying hyperparameters.}. I denote this
%variable as $\overline{c}_{ijt}$. If a consumer receives a coupon
%for product $j$, the value of that coupon, which I denote as
%$v_{ijt}$, is multinomial and drawn from the empirical density of
%coupon values. Coupon values are clustered at certain numbers (such
%as 50 cents, 60 cents, or 1 dollar), so I calculate the probability
%of getting a particular coupon value for a particular brand in a
%given calendar month by tabulating the number of redeemed coupons of
%that value for that brand in that month, and dividing by the total
%number of redeemed coupons for that product in that month. In
%summary, for consumer $i$, purchase event $t$, and product $j$ where
%$y_{ijt}=0$, the value of the coupon $c_{ijt}$ received by the
%consumer for product $j$ is
%$c_{ijt}=z_{it}\overline{c}_{ijt}v_{ijt}$.
This specification is a first approximation to solving the problem
of unobserved coupons and represents a step forward from most papers
that estimate discrete choice dynamic programming problems. The
procedure I use is similar to \citeN{erdemkeanesun99}, who also
propose a discrete distribution for the probability a consumer has a
coupon on hand for a non-purchased product, and estimate the
parameters of the distribution. Note that there is more than one
explanation for why a consumer might have or not have a coupon on
hand for a non-purchased product. It could be that no coupon was
available for the product, or it could be that a coupon was
available but the consumer found it too costly to search for it and
cut it out. The scanner data does not contain information on coupon
availability and how likely a consumer was to search for coupons, so
there is no way to separate these explanations. There is also a
subtle endogeneity issue that could arise with coupon use: consumers
could be more likely to search for coupons for products for which
they have high tastes. I do not take this source of endogeneity into
account, and to my knowledge this problem has not been addressed in
scanner data research.
A more difficult issue with estimating the coupon parameters is that
it may be difficult to separately identify $p_{cj}^1$, which is the
amount that the probability of getting a coupon for the new products
differs in their introductory periods, from the learning parameters.
To see why, recall that introductory pricing can cause patterns in
purchase behavior which look like learning. Introductory couponing
may also have the same effect: if a lot of coupons for one of the
new products are available right after its introduction, consumers
will be induced to purchase the new product sooner rather than
later, which will look like learning. Obviously, if we observe the
entire distribution of coupon availability then there will be no
identification problem - we can treat coupons just like prices.
Since we are estimating the probability a consumer gets a coupon for
a new product, it may be difficult to tell whether or not consumers
are likely to make an initial purchase of the new product because
the option value of learning is high, or because the likelihood they
have a coupon for it is high.\footnote{Further, if $n_t=1$, raising
the probability a consumer gets future coupons will raise the value
of purchasing the new product when there is no learning and only
switching costs.}
There are three things that help the identification. First, for some
consumers the first purchase events after the new product
introduction will occur when $n_t=0$. Given that the coupon
probabilities when $n_t=0$ can be estimated from the period when
most consumers have learned, if the probability of making a first
purchase of the new product when $n_t=0$ is higher than it should
be, then that difference will pin down $\sigma_{ij}^2$. Second, some
consumers will experiment with the new product when $n_t=1$, and
will make a second purchase when $n_t=1$. For these consumers, their
purchases will be pinned down by parameters we have already
estimated - the state dependence and taste parameters. Hence, if the
likelihood of them purchasing the new product is higher than it
should be, this will raise the probability that they got coupons for
the new product. Third, since we observe coupon use for a product
when consumers purchase it the probability of receiving a coupon for
the product will be bounded. As an example, suppose that during
Cheer's introductory period 10 percent of all purchases involve a
Cheer coupon, and 50 percent of Cheer purchases involve a coupon.
The probability of receiving a coupon for Cheer will not likely be
lower than ten percent, and not likely be higher than 50 percent,
since 50 percent of the consumers who purchased Cheer did not have
(or use) a coupon for it.
\subsection{The Markov Chain Monte Carlo Estimator}
I estimate the structural model described in the previous section
using Markov Chain Monte Carlo, which is abbreviated as MCMC. MCMC
methods are Bayesian methods, which differ from classical methods in
that they do not involve maximizing or minimizing a function. In
models with high dimensional unobserved heterogeneity, like the one
I have specified, maximization of a likelihood function can be
numerically difficult. Bayesian procedures proceed differently: the
researcher must specify a prior on the model parameters and then
repeatedly draw new parameters from their posterior distribution
conditional on the observed data.
Drawing from the posterior is made easier using an MCMC procedure
called Gibbs sampling, which involves breaking the model's parameter
vector into different blocks, where each block's posterior
distribution, conditional on the other blocks and the observed data,
has a form that is convenient to draw from. Gibbs sampling proceeds
by successively drawing from each parameter block's conditional
posterior. This procedure results in a sequence of draws which
converge to draws from the joint distribution of all the model
parameters. The initial draws in the sequence are discarded, and
remaining draws from the converged distribution are used to
calculate statistics of model parameters, such as mean or
variance.\footnote{Determining when the sequence of draws produced
by the Gibbs sampler has converged to draws from the joint posterior
distribution is difficult, which is a tradeoff of Bayesian methods
relative to classical methods. The simplest approach is for the
researcher to observe the sequence and to see the draws trending
towards the posterior. After convergence the draws will traverse the
posterior. A more formal method of testing for convergence is
suggested in \citeN{gelmanrubin92}, who propose running the Gibbs
sampler from several different starting points and testing whether
the posterior means calculated from the converged sequences are
equal across runs.} My underlying demand model is the random
coefficients logit model, with two differences: the coupon
parameters and the value function solution. Thus, the setup for my
Gibbs sampler is very similar to that used to estimate the random
coefficients logit model. This estimator is well understood and is
described in \citeN{train03}, pgs xxx - xxx.
%\footnote{The model was estimated
%using code written in Fortran 90. Computational time for 20,000
%iterations of the model was roughly 4 days.}
%\subsection{Markov Chain Monte Carlo Blocks: A Short Description}
To form the conditional posterior distributions for the blocks of
parameters it is necessary to impose a prior distribution on some of
the model parameters. I assume flat priors on $\theta$, a normal
prior on $b$ which I denote $k(b)$, and inverse gamma priors on the
elements of the diagonal matrix $W$, which I denote as $IG(W)$. The
posterior distribution of the model parameters will depend on the
parameters' prior distribution and the probability of the data given
the parameters. The priors on the $p_{cj}^0$'s are uniform on [0,1],
the priors on $p_{cj}^1$ are uniform on [-$p_{cj}^0$,1-$p_{cj}^0$]
for Cheer, Surf and Dash, and the priors on $p_c^{Cheer,1}$,
$p_c^{Surf,1}$ and $p_c^{Dash,1}$ are uniform on
[-$\min_j\{p_{cj}^0\}$,1-$\max_j\{p_{cj}^0\}$]. The priors on $b$
and $W$ are assumed to be non-informative, so that $k(b)$ has zero
mean and infinite variance. The prior on $W$ is also chosen to be
non-informative, so that the scale is set to 1 and the degrees of
freedom approaches 1. The posterior distribution of the model
parameters will depend on the parameters' prior distribution and the
probability of the data given the parameters.
The probability a consumer chooses a particular product in purchase
event $t$, given her preferences and the values of observables, can
be expressed using a simple logit formula. Denote $d_{ijst}$ as the
variable that is 1 if consumer $i$ chooses size $s$ of product $j$
in purchase event $t$. Denote $d_{it}$ as the vector of observed
$d_{ijst}$'s, $c_{it}$ as the vector of $c_{ijt}$'s, $x_{it}$ as the
vector of $x_{ijt}$'s and $v_{ijst}$ as the consumer's flow utility
minus the logit error. The probability of the consumer's choice in
purchase event $t$ will be
\begin{equation}\label{eq:choiceprob}
Pr(d_{it}|\theta_i,\theta,\Sigma_{it},c_{it},x_{it}) = \sum_{(j,s)
\in J_{it}} d_{ijst} \frac{\exp(v_{ijst} + \delta
EV(\Sigma_{it+1};\theta_i,\theta))}{\sum_{(k,l) \in
J_{it}}\exp(v_{iklt} + \delta EV(\Sigma_{it+1};\theta_i,\theta))}.
\end{equation}
Denote $g(\theta_i|b,W)$ as the density of an individual level
$\theta_i$ and $Pr(c_{it}|\theta)$ as the probability of a
particular $c_{it}$. Then the posterior density of the parameters is
proportional to
\begin{equation}\label{eq:bigpost}
\begin{split}
\Lambda(\theta_i \forall i, b, W, c_{it} \forall i \mbox{ and } t,
\theta) \propto & \prod_{i=1}^I \left[ \prod_{t=1}^{T_i} \left\{
Pr(d_{it}|\theta_i,\theta,\Sigma_{it},c_{it},x_{it})
Pr(c_{it}|\theta)
\right\} g(\theta_i|b,W) \right] \\
& \cdot k(b)IG(W)
\end{split}
\end{equation}
I draw from this posterior in 5 different blocks, where each block
has a functional form that is convenient to draw from. I will
describe these formulas briefly in the next few paragraphs. More
details on the specifics of the Gibbs steps are given in detail in
the Appendix.
The first block draws $\theta_i$ for each household conditional on
the $d_{it}$'s, the $c_{it}$'s, $b$ and $W$. Because of the
assumption that the error term is logit, the conditional posterior
likelihood of a particular vector of $\theta_i$ is proportional to
$\prod_{t=1}^{T_i} \left\{
Pr(d_{it}|\theta_i,\theta,\Sigma_{it},c_{it},x_{it}) \right\}
g(\theta_i|b,W)$. This distribution is not conjugate, which means
that the Metropolis-Hastings algorithm (see Appendix
\ref{append:structmcmc} for the steps I use to implement this) must
be used in this step.\footnote{Note that when we perform this step,
we will need to evaluate the consumer's expected value function in
Equation \eqref{eq:choiceprob}, $EV(\Sigma_{it+1};\theta_i,\theta)$.
The procedure I use to do this is described in Section
\ref{sec:vfsolve}.}
In the second step, I draw a new vector of fixed parameters,
$\theta$. The posterior distribution of $\theta$ conditional on
$\theta_i$, the $\overline{c}_{ijt}$'s, $v_{it}$ and the $d_{it}$'s
is
\begin{equation}\label{eq:fixedpost}
\prod_{i=1}^I \prod_{t=1}^{T_i} \left\{
Pr(d_{it}|\theta_i,\theta,\Sigma_{it},c_{it},x_{it})
Pr(c_{it}|\theta) \right\}.
\end{equation} This distribution is also not conjugate and the
Metropolis-Hastings algorithm must be used to draw from it.
The third step draws a new $b$ vector conditional on
$\tilde{\theta}_i$ for $i=1,...I$ and $W$. The conditional posterior
distribution for $b$ is normal, so this step is straightforward.
Similarly, the conditional posterior of the elements of $W$ given
$\tilde{\theta}_i$ for $i=1,...I$ and $b$ are inverse Gamma, which
is straightforward to draw from. For unobserved coupons, each
$\overline{c}_{ijt}$ is drawn separately across households, products
and purchase events, and has a Bernoulli posterior distribution
conditional on $v_{it}$, $\theta_i$, $\theta$ and $d_{it}$.
%Last, the posterior distributions of the $p_z$ and the $p_{cj}$'s
%conditional on the $z_{it}$'s and $\overline{c}_{ijt}$'s (both
%unobserved and {\it observed} coupons) are Beta, and are simple to
%draw from. Note that this last step is simplified by the assumption
%that consumers do not take future coupons into account. If consumers
%did take future coupons into account, then the value function in
%\eqref{eq:choiceprob} would depend on $p_z$ and $p_c$. The posterior
%distribution of the $p_z$ and the $p_{cj}$'s would depend not only
%on the $z_{it}$'s and the $\overline{c}_{ijt}$'s, but also the
%consumer's choices $y_{it}$. Because the posterior would include
%choice probabilities, it would not be conjugate and it would be
%necessary to use the Metropolis-Hastings algorithm in this step,
%which would significantly increase the model's computational time.
%Although my assumption that consumers do not expect future coupons
%is perhaps unrealistic, I will note that the fact that I account for
%coupon use at all is an improvement over most previous empirical
%work in dynamic discrete choice modeling, which usually ignores
%coupon use entirely.
\subsection{Value Function Solution}\label{sec:vfsolve}
The method of \citeN{ijc05} works in conjunction with the Gibbs
sampler to obtain a solution of the value function. In this section
I will broadly describe how I solve for the value function in
Equation \eqref{eq:choiceprob} and how the method works. The
innovation of this new method is that discrete choice dynamic
programming problem is solved only once, along with the estimation
of the model parameters.
Recall that in the Gibbs sampling algorithm described in the
previous section, we draw a sequence of model parameters that
converges to draws from the parameters' joint distribution. The
basic idea of the value function solution method can then be broken
up into two steps. First, at a particular point $g$ in sequence,
draw small number of values of the unobservable and calculate
expected utility at all state space points. The expected utility and
the current parameter value are then retained for use in later
iterations of the MCMC sequence. In order to calculate expected
utility at some point $g$ in the sequence, it is necessary to have
an approximation of the value function at the current parameter
value. In the second step, the value function is calculated as a
weighted average of previously retained expected utilities, where
the weights are kernel densities of the difference between the
current parameter and the previous saved parameters. In actual
implementation these steps are performed in reverse order: first the
value function is interpolated at the current parameter draw, and
then the expected utilities are calculated. However, I believe it is
easier to understand the algorithm by looking at the steps in the
order I have laid them out, rather than the order in which they are
executed. In the following paragraphs I will describe these two
steps in greater detail.
Consider the first step, which is to draw some values of the model's
unobservables and calculate expected utility. This calculation is
done at points in the state space, $\Sigma = (s,p,J,y,n)$, and the
expected utilities and current parameter value are retained. There
are two different sets of unobservables which are unobserved to the
consumer at the time she makes her purchase decision, and must be
integrated out when the value function is formed: the
$\varepsilon_{ijst}$'s, and the consumer's future tastes for
products she has not yet purchased, the $\gamma_{ij}$'s. Integrating
out the $\varepsilon_{ijst}$'s does not require numerical
approximation: because of the assumption that they are logit errors,
the consumer's expected utility has a closed form solution,
conditional on $\theta_i$, $\theta$, and future coupons. This is not
true when we integrate out the future $\gamma_{ij}$'s and
$c_{ijt}$'s, so these must be approximated numerically. As an
example, let us consider constructing an analogue to the consumer's
expected value function in Equation \eqref{eq:vffirstpurch}, which
is the value at state space point $s_j=0$, $y_j=1$ for some new
product $j$. First I draw $L=10$ draws from the true taste
distribution for product $j$, which is
$N(\gamma_{ij}^0,\sigma_{ij}^2)$, and from the coupon distribution
implied by $\theta$. To calculate the expected utility, we need to
calculate first each consumer's exact utility (ignoring the logit
error) at each product at simulation $l$. Denote the $l$th taste
draw as $\gamma_{ij}^l$ and the $l$th coupon draw as $c_{ij}^l$, and
denote $\theta_i^l$ as the vector of $\theta_i$ with the consumers
true taste for product $j$ ($\gamma_{ij}$) taken out and replaced
with the simulated tastes ($\gamma_{ij}^l$). Assume that we have an
approximation of the expected value function at point $n$ of the
sequence for next period's state space point,
$\Sigma'=(s',p',J',y',n')$, which I will denote as
$E_{(p',J')|(p,J)}V_n(s',p',J',y',n';\theta_i^l,\theta)$.\footnote{Since
the state space is quite large, and computer memory is limited, I
only evaluate the value function at a subset of the state space
points, and interpolate it everywhere else. The details of this
procedure, as well as other computational details associated with
the value function solution, are described in the Appendix.} Then
the consumer's utility for product $j$ at simulation $l$,
$v_{ij}^l$, will be
\begin{equation}\label{eq:utpm}
\begin{array}{l l l}
\mbox{Product } k = j & : v_{iks}^l = & \gamma_{ik}^l + \xi_{is} - \alpha_i (p_{ks}-c_{ik}^l) + \eta_i y_k \\
& & + \delta E_{(p',J')|(p,J)}V_n(S',p',J',y',n';\theta_i^l,\theta) \\
\mbox{Product } k \neq j & : v_{iks}^l = & \gamma_{ik}(S_k)
+\xi_{is} - \alpha_i
(p_{ks} - c_{ik}^l) + \eta_i y_k \\
& & + \delta E_{(p',J')|(p,J)}V_n(S',p',J',y',n';\theta_i^l,\theta), \\
\end{array}
\end{equation} which corresponds to Equation \eqref{eq:utfirstpurch}.
Her expected utility for purchasing product $j$ for the first time
(state space point $y_{j}=1, s_{j}=0$) at the individual $i$'s
$\theta_i$ is then calculated as
\begin{equation}\label{eq:vfpm2}
\hat{EV}_g(S,p,J,y,n;\theta_i,\theta) = \frac{1}{L} \sum_{l=1}^L \ln
\left( \sum_{(k,s) \in J } \exp(v_{iks}^l) \right).
\end{equation}
The second step of the algorithm is to calculate the approximation
of the value function at the parameter draw for the current point in
the sequence, $g$. Denote consumer $i$'s individual level parameters
at this iteration as $\theta_{i,g}$, the population-fixed parameters
as $\theta_g$, and the vector of $\theta_{i,g}$ stacked on
$\theta_g$ as $\overline{\theta}_{i,g}$. Recall that at each point
in the sequence, the expected utilities calculated in the first step
are retained along with the parameter draws. Assume that at
iteration $g$ we have retained $N(g)$ previous parameter draws and
expected utilities, and we want to calculate the expected value
function at $\theta_{i,g}$. This is then calculated as
\begin{equation}\label{eq:vfupd}
E_{(p',J')|(p,J)} V_g(s,p,J,y,n,\theta_{i,g},\theta_g) = \frac{
\sum_{r=1}^{N(g)} \left[ \hat{EV}_r(s,p,J,y,n;\theta_{i,r},\theta_g)
\right] k( (\overline{\theta}_{i,g} - \overline{\theta}_i^r)/h_k )
}{ \sum_{i=1}^{N(g)} k( (\overline{\theta}_{i,g} -
\overline{\theta}_{i,r})/h_k ) },
\end{equation} where $k(\cdot)$ is a kernel density function and $h_k$ is a
bandwidth parameter, and $\hat{EV}_r(s,p,J,y,n;\theta_{i,r},\theta)$
is the $r$th retained expected utility. The approximated value
function is used to calculate the utilities in Equation
\eqref{eq:utpm}.
\section{Estimation Results}
The main estimation results are shown in Table
\ref{table:estresstr1}. Recall that in my model, the coefficients of
consumer $i$'s flow utility are broken up into two groups: those
that vary across the population, denoted $\theta_i$, and those that
are fixed across the population, denoted $\theta$. The
population-varying coefficients are normally distributed across the
population with mean $b$ and diagonal variance matrix $W$. The
Markov Chain Monte Carlo estimator produces a simulated posterior
distribution of $b$, $W$, and the fixed parameters, $\theta$. The
two columns of estimates under the headings ``Mean'' show the means
and standard deviations (shown in parentheses) of the simulated
posterior density for each element of $b$; similarly, the two
columns of estimates under the headings ``Variance'' show the mean
and standard deviation of the simulated posterior for $W$. Estimates
of parameters that are fixed across the population, $\theta$, are
shown under the ``Mean'' heading; the corresponding entries under
the ``Variance'' heading are dashed for these parameters. Although
the numbers in the table are means and standard deviations of
parameter posterior densities, they can be interpreted in the same
way as the estimated coefficients and standard errors that are
produced by classical methods.
Consider the first block of estimates, labeled ``Taste parameters''.
The first 9 rows show the estimated tastes for each established
product. The liquid Other product is normalized to 0, and the Other
Powder, Tide Powder and parameters associated with switching costs
are fixed across the population. The first element of the first row
shows the population average of consumer tastes for liquid Era,
which is -0.908. It may look like people like Era less than the
Other product, but this is not the whole story. The fourth column
shows the variance in tastes for Era across the population, which is
3.258. This variance is large, which indicates that consumers are
very heterogeneous in their taste for Era: some consumers like it a
lot, and some do not like it very much at all. The results are very
similar for almost all the established products: the mean tastes are
negative, and most of the variances are high, so there is a lot of
heterogeneity in tastes. Consumer heterogeneity in tastes is very
important in this market, which is consistent with these products
being experience goods. It is also consistent with important
heterogeneity in factors such as the types of fabrics in a
household's wardrobe, the types of soils and stains that need to be
cleaned, the water temperature used, the household's washing machine
quality, and the types of scents the household prefers. The next six
rows show the taste parameters for the different size categories.
Skipping the last three rows of the taste parameters section, which
will be discussed later, consider the second block of estimates in
the table, under the heading ``Learning Parameters". The first row
of this section shows the estimated population mean and variance of
consumer's expected tastes for Cheer, $\gamma_{ij}^0$. The
population average predicted taste for Cheer is -0.518, and this
estimate is statistically different from zero. The population
variance of predicted tastes is statistically significant, but small
relative to the mean at 0.356. This means that there is not a lot of
heterogeneity in how much consumers expect to like Cheer: most of
them don't expect to like it very much, and most consumers do not
have a very good idea of how much they will like the product in
advance.
Consider the next three parameters, which correspond to the
consumer's uncertainty about her true taste for Cheer. The mean of
the parameter $\sigma_{0ji}^2$, the intercept, is precisely
estimated at -0.58, while the parameters on household size and
income are negative and statistically significant. The positive
coefficients suggest that the amount of variance in true tastes is
higher among larger and higher income households. Recall that the
actual consumer uncertainty in tastes is a transformation of these
parameters (as specified in Equation \eqref{eq:scoeff}). As an
example, for a household of income 3 and size 3 that has population
average value of $\sigma_{0ji}^2$, the variance in her true taste
for Cheer is $5 \frac{\exp(-0.58 + 0.148 \cdot 3 - 0.068 \cdot 3
)}{1+\exp(-0.58 + 0.148 \cdot 3 - 0.068 \cdot 3)}$, which is about
2.08. If the consumer's expected taste for Cheer is $-0.58$, the
population average, then her true taste will be drawn from a
$N(-0.58,2.44)$. Her true taste distribution looks very similar to
the taste distributions for the established products. The results
for Surf and Dash follow a similar pattern to those of Cheer.
In summary, there are two important facts about the learning
parameters: first, the variance across consumers in $\gamma_i^0$ is
low. Before consumers make their first purchases of the new product,
their expectations are similar. Second, the variance in their true
tastes is large, which indicates that after consumers make their
first purchases of the new products, they are very different in how
much they like it. These facts are consistent with these products
being experience goods: consumers need to purchase and consume the
product in order to find out how much they like it.
Let us return to the last three rows of the first block of parameter
estimates. This shows the estimates for the coefficient on
$y_{ijt-1}$, which is $\eta_i$. The intercept for $\eta_i$,
$\eta_{i0}$, is allowed to vary across the population. Its mean is
1.311, and its variance is large at 2.002. The distribution of
$\eta_i$ across the population will depend on two things: the
distribution of unobserved heterogeneity, which is normal, and the
distribution of demographics. Taking both of these into account, the
expected value of $\eta_i$ in the population is 1.32, and its
variance is 1.98. This means that most households have switching
costs, but a small portion of them are variety-seeking (about 84\%
of them have values of $\eta_i$ that are positive). The coefficient
on household size, $\eta_1$, is negative and the one on household
income, $\eta_2$, is positive. The fact that switching costs is
increasing in income is consistent with the idea that the switching
cost is generated by a cost of recalculating utility: for high
income consumers, time is likely more valuable and the cost of
recalculating utility may be higher. There are a few explanations
for the negative coefficient on household size. One is that larger
households may switch brands more easily since detergent is a
smaller part of household consumption. A second explanation for the
negative coefficient on household size could be due to within
household heterogeneity in tastes. Different members of a household
may have different tastes, and since the data does not record
consumption by different members, brand switching among households
with more members may be due to different members purchasing
different products rather than variety-seeking. Similar findings are
documented in \citeN{chesudhirseetaraman06}.
The fact that most consumers have switching costs has interesting
implications for pricing policy. As an example, suppose that it has
been a long time since the introduction of Dash, so that most
consumers have experimented with all the three new products. Suppose
that Unilever decides to temporarily drop the price of Wisk. Procter
and Gamble might worry that this price drop could decrease the
market share of Tide in the intermediate run. The price drop will
draw consumers away from Tide who will have a cost of switching back
to Wisk. It would be optimal for Procter and Gamble to respond with
a subsequent price drop in order to get them back.
The last block of parameters shows consumer responses to the
exogenous variables prices, features and displays. The parameter for
consumer price sensitivities is constructed in the same way as for
the learning parameters (Equation \eqref{eq:pcoeff}). The population
average value of the price coefficient is about -24.9. The parameter
on household income is positive while that of household size is
negative, which means that higher income and larger households are
more price sensitive. The fact that the coefficient on income is
positive is surprising, but the magnitude of the coefficient is very
small. The variance in the intercept of the price coefficient is
quite large, which indicates that there is substantial heterogeneity
in price sensitivity. The population distribution of the price
coefficient is shown in Figure \ref{pricecoeffstr}. The distribution
of price coefficients is right skewed, and somewhat less than half
of all households have price coefficients that are less than 5
(about one quarter have price coefficients that are less than 1).
The estimates of the coupon sensitivity parameter, $\alpha_{0ic}$,
show that its mean is -0.673 and its variance is 0.229. Recall that
the coupon sensitivity coefficient that enters the consumer flow
utility, $\alpha_{ic}$, is a transformation of $\alpha_{0ic}$,
$\frac{\exp(\alpha_{0ic})}{1+\exp(\alpha_{0ic})}$ (Equation
\eqref{eq:ccoeff}). The population mean of $\alpha_{ic}$ is 0.35,
and its variance is 0.01, so there is very little heterogeneity in
consumers' sensitivities to coupons. The feature and display
variables are both positive on average in the population, which is
to be expected.
Table \ref{table:estresstr2} shows the parameters of the coupon
distribution. The first column of the table shows the mean of the
posterior draws of the $p_{cj}^0$'s, the $p_{cj}^1$'s, and the
$p_{c}^{Cheer,1}$, the $p_{c}^{Surf,1}$, $p_{c}^{Dash,1}$; the
second column shows their standard deviation. Almost all the mean
parameters are precisely estimated. To see how to interpret the
parameters, recall that the $p_{cj}^0$'s are the probability that a
consumer receives a coupon for product $j$ after the ``introductory
pricing'' period. So the probability a consumer gets a coupon for
Tide Liquid is 0.263. The parameters under $n_t=1$ are added to the
$n_t=0$ parameters during introductory pricing periods. So the
probability of a consumer getting a coupon during the introductory
period for Surf Liquid is $p_{cj}^0 + p_{cj}^1 = 0.237 - 0.078 =
0.159$. The probability a consumer gets a coupon for Tide Liquid
during the introductory period for Surf Liquid is $p_{cj}^0 +
p_{cj}^{Surf,1} = 0.263 - 0.038 = 0.225$.
\subsection{An Examination of Consumer Uncertainty About the New
Products}\label{sec:exlearning}
In this section I will examine two aspects of consumers' uncertainty
about their true tastes for the three new products. First, I will
examine how consumer uncertainty varies across the population.
Recall from the previous discussion that consumer $i$'s uncertainty
about her true taste for a new product $j$, $\sigma_{ij}^2$, is a
transformation of the three parameters in the second block of Table
\ref{table:estresstr1}, $\sigma_{0ij}^2$, $\sigma_{1j}^2$ and
$\sigma_{2j}^2$, and the consumer's household income and size.
Heterogeneity in consumer uncertainty about product $j$ will come
from two sources: unobserved heterogeneity in the random coefficient
$\sigma_{0ij}^2$, and observed heterogeneity in household
demographics. I will demonstrate that across the population as a
whole, there is not a lot of variance in the $\sigma_{ij}^2$'s. I
will also show there is not a consistent pattern in the
$\sigma_{ij}^2$'s across demographics. Second, I will examine the
effect of removing consumer uncertainty on the market shares for new
products. I will demonstrate that removing consumer uncertainty
substantially increases the overall market share for a new product,
and the impact is largest for niche products.
The first column of Table \ref{table:unc1} shows the average value
of $\sigma_{ij}^2$ in the population for each of the three new
products, and the second shows the standard deviation across the
population.\footnote{When I compute the population distribution of
$\sigma_{ij}^2$, I use the estimated individual level parameters,
the $\theta_i$'s, rather than the estimated $b$ and $W$, which are
respectively the population mean and variance of the $\theta_i$'s.
Recall that in a given step $g$ of the Gibbs sampler, I draw the
population-varying coefficients $\theta_i$ for each consumer $i$,
and the population-fixed coefficients $\theta$. In step $g$
(assuming step $g$ is retained), I calculate each consumer's
uncertainty, $\sigma_{ij,g}^2$, using $\theta_{i,g}$, $\theta_g$,
and demographics for $i$ (Equation \eqref{eq:scoeff}). I then
calculate the population mean and variance of $\sigma_{ij,g}^2$. The
numbers in the table are the average over draws of the mean and
variance calculated in each step $g$.} There are two important
patterns to notice. First, we can see from the table that the
average amount of uncertainty is greater for Cheer than for Surf,
the first two liquid introductions observed in my data set. This may
be due to the fact that these products are liquid detergents, and
consumers' experience with Cheer helped them resolve some
uncertainty about liquids as a product category. The amount of
learning about Dash, the last liquid introduction in this data set,
is a little bit lower than Surf, which may also be a result of
consumers learning more from Surf's introduction. Second, we can see
that the variance of the learning parameters is small, which
indicates that the amount of learning does not vary a lot across the
population. Recall that in the previous section, I showed that
consumer's expected tastes for the new products also did not vary
significantly across the population. These two facts together
indicate that consumer expectations about their true tastes for the
new product did not vary across the population by very much.
Table \ref{table:unc2} shows the average consumer uncertainty broken
down by household income and size. For Surf and Dash, the amount of
learning is decreasing in income and household size. For Cheer, the
amount of learning increases in household size until it reaches 4,
where it drops. The amount of learning for Cheer is increasing in
household income.
To examine the effect of learning on the market shares of the new
products, I conduct the following simulation experiment. First,
using the retained draws on $\theta_i$ and $\theta$ in each step $g$
of the Gibbs sampler I simulate each consumer's product choice in
each purchase event. The error terms and unobserved coupons observed
by the consumer in each purchase event are drawn from their
underlying distribution. I then calculate the overall market share
for each product from the simulated choices, averaged over the $g$
draws. The first column of Table \ref{table:nolearn} shows the
average of this simulated market share over all the weeks that the
product was available, and over the first 12 weeks that each new
product was available.
Then I run the same simulation setting $s_{ijt-1}=1$ for all three
new products: in this case consumer tastes for the new products are
assumed to always be $\gamma_{ij}$. These simulated market shares
are shown in the second column of Table \ref{table:nolearn}, and are
substantially larger than the shares in the first column: the market
share of Cheer rises by 34\%, Surf by 24\%, and Dash by 58\%. In the
period right after the new products are introduced, the impact of
removing learning is smaller: for Cheer, the market share rises by
only 9\%, 10\% for Surf, and 33\% for Dash. Why does this happen?
The answer to this question is twofold. First, consider the short
run (the first 3 months after the introduction), and assume that
$\delta=0$.
I refer the reader to Figure \ref{fig:grpic}, which shows the
estimated population distribution of tastes for Cheer before and
after all learning has occurred. The thinner distribution is the
population distribution of predicted means for Cheer (the
$\gamma_i^0$'s), or the tastes for consumers who have not yet
learned about Cheer. This distribution is normal with mean of -0.518
and variance of 0.356 (Table \ref{table:estresstr1}). The flatter
one is the population distribution of true tastes for Cheer, tastes
after learning has occurred. This distribution is normal, and has
mean of -0.518, and a variance of 2.59. The number 2.59 is the
variance in $\gamma_{ij}^0$, 0.36, plus the average of
$\sigma_{ij}^2$ across the population, which is 2.03.
A myopic consumer will experiment with Cheer when her prior draw is
greater than her maximum utility for other products. In the figure,
the line labeled $\delta=0$ shows the cutoff for a consumer with
average values of tastes for all products, assuming that there is no
state dependence, prices for all products are the same, and the
error terms are set to zero. The share of consumers who will
experiment will be those whose prior is to the right of this line.
We can see that the share will increase when consumers know their
true tastes, since the area under the posterior curve is larger than
under the prior.
Since I assume consumers are forward-looking, there will be an
option value of learning, which will shift the cutoff to the left
and result in more experimentation. I compute this option value of
learning at the given parameter values (assuming the consumer has
average tastes, and that there is no switching costs), assuming
consumers expect prices to stay the same over time. This new cutoff
is shown by the line $\delta=0.95$; when the option value of
learning is taken into account, significantly more consumers
experiment. Although it is difficult to see from the picture, the
shaded area to the right of $\delta=0.95$ line on the expected
tastes distribution is a little bit smaller than that to the right
of the $\delta=0$ line on the true taste distribution. This means
that informing consumers of their true match values will cause an
increase in the product's short run market share, even in when
consumers are forward-looking. During the first three months after
Cheer's introduction, the increase in Cheer's market share from
removing learning increases from 9.1\% to 10.0\%, which is a 9\%
increase. In the intermediate run, the effect of giving consumers
their true taste draws will be even greater. The consumers who will
be affected by this will be those who have not yet experimented. The
consumers who have experimented will tend to be those who have a
high option value of learning, so the consumers who will be left
will have a low option value of learning. Their behavior will be
closer to consumers who are myopic. This explains why the increase
in market shares over the intermediate run is 34\%, which is much
larger than the increase in the short run, which was only 9\%.
Note that it is possible to do a similar exercise where I set the
switching costs parameter, $\eta_i$ to be 0 for all consumers to see
the impact of switching costs on the new product market shares. The
results of this exercise show that removing switching costs
increases Cheer's market share by 23\%, Surf's market share by 13\%,
and Dash's market share by 23\%. This result is intuitive: when
there are switching costs, consumers will find it costly to switch
away from one of the established products at the time the new
products are introduced, which will make them less likely to
experiment with them.
One final issue to discuss is the identification of the population
variances of the learning parameters. This was discussed to some
degree in Section \ref{sec:ident}, where I argue that individuals
with different values of $\sigma_{ij}^2$ can theoretically be
distinguished. This does not imply that they can easily be
distinguished in the population, which might mean that the variance
of $\sigma_{0ij}^2$ is not identified. One method of assessing this
is to examine the posterior densities of the variances of
$\sigma_{0ij}^2$. If they look very similar to the prior densities,
then we have not obtained identification. Figures
\ref{fig:vcheerdens}, \ref{fig:vsurfdens}, and \ref{fig:vdashdens}
show plots of the kernel density of the saved draws on the variances
of $\sigma_{0ij}^2$ for each of the three new products. Recall that
the priors on these parameters were assumed to be noninformative; in
contrast, the posterior densities have easily visible global maxima
and thin tails. The posterior density for Cheer, and to a lesser
extent Dash, have second modes; these modes occur with less
probability than the model's highest peak. This suggests that
overall, there is enough information in the data to identify the
variance of $\sigma_{0ij}^2$; an inverse gamma prior might be too
restrictive, though, since it is unimodal.
\subsection{Estimates from the No Switching Costs and No Learning Models}
One of the important contributions of this research is that I
estimate a model with two types of dynamics in it: learning and
switching costs. In this section I will discuss how the model
estimates differ when one of the two types of dynamics is left out.
I will also discuss how the model fit changes when different
dynamics are ignored.
First, I re estimated the model with the parameter on alternative
dynamics, $\eta_i$, restricted to be zero for all consumers, so that
the only dynamics in demand were learning.\footnote{The tables of
estimation results produced by this restricted model and the next
one are not shown, but are available upon request from the author.}
The taste variances produced by the model were larger, which is to
be expected. The price coefficient estimate is similar to that of
the full model; the average of the price coefficient across the
population is -26.9, which is slightly higher than the estimate
produced by the full model. Consumers are estimated to be less
sensitive to coupons than in the full model, and the feature and
display variables look the same. The most interesting change is in
the estimates of the learning parameters. The means of the
$\gamma_i^0$'s drop, but the pattern is the same - one average
consumers expect to like Surf more than Cheer, and Cheer more than
Dash. The variances in the $\gamma_i^0$'s are significantly higher,
which indicates that consumers are heterogeneous in how much they
expect to like the new products. The estimates of the $\sigma^2$'s,
the amount of uncertainty consumers have about their true tastes for
the new products, also change. On average, the $\sigma^2$ for Cheer
is 0.909, for Surf it is 2.239, and for Dash it is 1.473. Thus, the
model predicts that consumers are significantly more certain about
their true tastes for Cheer and Dash, and less certain for Surf
(although they are more heterogeneous {\it ex-ante}). Thus, the
estimates do not give as strong support to the hypothesis that these
products are experience goods. This result is intuitive when we
consider the results associated with the share difference
implication, which is what will identify the amount of learning
(Section \ref{sec:ident}). Recall that the estimates of the full
model suggest that switching costs plays an important role, and that
the share difference test statistic is negatively biased in the
presence of switching costs. Thus, it is not surprising that the
estimates of this model suggest a less important role for learning.
The importance of learning in this restricted model can also be
assessed by simulating the market shares for the new products when
there is learning and when there is no learning, as was done for the
full model. The results of this simulation are shown in Table
\ref{table:nolearnnohabits}. As expected, the impact of learning on
the new product's market share is smaller in the no switching costs
model than the no learning model. Hence, researchers who ignore
switching costs will underestimate the importance of learning.
The second restricted model I estimate has the learning parameters,
the $\sigma^2$'s, restricted to zero for all consumers. Compared to
the estimates from the full model, the no learning model generally
tends to underestimate the amount of variance in consumer tastes.
The population average of the price coefficient is -24.8, which is a
little bit smaller than the full model; there is less estimated
variance in the heterogeneity parameter for price sensitivities as
well. Consumers are slightly more sensitive to coupons, and slightly
less sensitive to features and displays. The estimates of the
switching cost parameters also look very similar to those produced
by the full model. The population mean of $\eta_i$ is 1.37, and its
variance is 2.04; both of these numbers are slightly higher than
those produced by the full model. Overall, the results for this
model look similar to the full model results, which is not entirely
surprising. To see why, recall my model identification argument
(Section \ref{sec:ident}). I argued that consumer taste
distributions and switching cost coefficients will be identified
from longer run behavior. In the no learning model, the first few
periods after the new product introduction will have some impact on
the estimates, but most of the model's identification will come from
longer run behavior since we observe consumers for a long period of
time after the product introductions.
In order to assess how each type of misspecification affects the
model's predictive power, I compute the simulated market shares for
each of the three models. Table \ref{table:predsharecomp} shows the
actual market shares compared to the simulated market shares
produced by each model. The final row of the table shows the average
of the absolute difference between the predicted market share and
the actual market share, where the average is taken across the 13
products. Overall, the full model does the best job at predicting
market shares - on average, it will mispredict market shares by 0.26
percent. The model with no switching costs is almost as good as the
full model, with an error of 0.33 percent, while the model with no
learning is significantly worse, with an error of 1.8 percent. Much
of this difference is due to the no learning model poorly predicting
consumer response to new product introductions. To see this,
consider the final four lines of the table, which shows the absolute
average prediction error for market shares in the periods directly
following the new product introductions, and for the final 63 weeks
of data. During the periods after the new product introduction, the
full model and the no switching costs model have similar prediction
errors. There is almost twice as much error in the no learning model
during these periods. During the final 63 weeks of the sample, the
prediction error is very similar across all 3 models.
In summary, it appears that ignoring learning is a more serious
misspecification error than ignoring switching costs, if the aim of
the researcher is to simply fit market shares. Learning has a large
impact on the market shares of products right after the introduction
of a new product, so leaving it out will significantly reduce the
model's predictive power.
\subsection{Counterfactuals}
In this section I will examine two important counterfactuals that I
have computed: the effect of an introductory price cut for a new
product on its intermediate run market share, and the effect of
informative advertising on the new product's market share. There are
two different ways I will compute each counterfactual. First, using
the estimated results from the full model, I will compute the impact
of the introductory price cut and introductory advertising when
there is learning and switching costs, no switching costs, and no
learning. This exercise allows me to explore the impact of dynamics
on the impact of introductory pricing and introductory informative
advertising. I will also compute these counterfactuals for the
restricted models I discussed earlier. This will show how model
misspecification affects the counterfactuals. These results will be
of interest to brand managers who wish to better understand the
effects of their pricing and advertising policies. They will also be
of interest to government agencies that are interested in the impact
of information provision about new products, and to industrial
organization economists who wish to better understand the role for
introductory pricing.
First let us consider the effect of an introductory price cut for
each of the new products. I compute this counterfactual as follows.
First, I set $x_{ijt}=0$ and $c_{ijt}=0$ for all $i$, $j$ and $t$.
For each product $j$, I set $p_{ijt}$ to its average across all
purchase events where the product is available. Thus, aside from the
introductory price cuts, prices are the same across time and
consumers. If there are any new product introductions after the new
product for which I am calculating the price cut, I do not introduce
them. I also assume that all other products are always available, so
$J_{it}$ does not vary across $i$ and $t$, except for the
introduction of the new product I am interested in. When I compute
the counterfactuals, all 519 consumers make exactly one purchase in
each period. Since the modal interpurchase time is 8 weeks, each
period can be thought of occurring every two months. I compute the
counterfactuals for ten periods, so the total length of time covered
by these counterfactuals can be thought of as occurring over a
twenty month period.
I then solve for every consumer's value function, assuming that they
know the path of future prices, and simulate each persons's choice
in each period.\footnote{The fact that there is learning about one
of the new products means that I need to integrate over the
distribution of future tastes for that product. I compute this
integral using Gauss-Hermite quadrature. Also, it is necessary to
pick a value for $y_{it-1}$ in period 1. I assume that the period 0
choice is whichever brand gives the highest static utility.} This
requires that I draw 49 new $\varepsilon_{ijt}$'s for each period.
To reduce simulation error, I simulate each consumer's sequence of
choices ten times and take the average of these choices. I simulate
choices for each retained draw on $\theta_i$ and $\theta$ from the
Gibbs sampler (a total of 750 times) under three different
assumptions on the type of dynamics in demand: when there is both
switching costs and learning, which is at the estimated parameters,
when there is no learning, which means every consumer knows
$\gamma_{ij}$ from the beginning, and when there are no switching
costs, which means $\eta_i=0$ for all $i$. I also assume that there
is no learning for any product other than the one for which I am
examining the effect of the price cut; for example, if the price cut
is for Surf, then I assume consumers know their true taste for
Cheer.
I repeat this exercise for an introductory price cut for each new
product, which means I cut the price of the new product by 10
percent in period 1, period 3, and period 5. The ten percent price
cut is applied to all 4 sizes of the new product. This is a partial
equilibrium counterfactual: all other prices are held fixed. I
assume that consumers correctly view the price cut as temporary and
expect the new product's price to rise to its average in period 2.
The first column of Table \ref{table:fullpd} shows the percentage
impact on period $t$ marketshares of the period 1 price cut. For all
three products, a ten percent price cut results in a greater than 10
percent increase in period 1 market shares, so own price
elasticities are elastic. The intertemporal price elasticities are
small. For instance, a ten percent price cut for Cheer only leads to
a 0.87\% increase in Cheer's period 2 market share. The impact on
Cheer's market share drops as time increases. The second column show
the response to a price cut that occurs in period 3, and the third
in period 5. As time increases, the own price elasticity of each new
product drops: in other words, when consumers learn about the new
products their demand becomes more inelastic. This provides an
alternative explanation for why we observe introductory pricing of
the new products to the idea that firms recognize dynamics and are
pricing with them in mind. It may be that firms are myopic and
simply observe that short run demand elasticity is falling, and
raise their prices in response.
The fourth, fifth and sixth columns show the impact of the price cut
under the model parameters estimated for the full model when
consumers are assigned their true tastes for the product. The own
price elasticities are smaller when there is no learning. Generally,
the intertemporal price response is larger when the learning is
removed. The only place where this does not hold is for Dash in
periods after the second, where the intertemporal price elasticity
is slightly smaller. The intuition behind this result is that when
there is no learning, consumers who respond to the price cut will
continue to purchase the new product due to the cost of switching
products; when learning is added to the price cut, some consumers
who experiment with the product will receive a low match value for
it and will switch away to something else. This can also be seen in
the first three columns, where the intertemporal elasticity rises as
time increases.
This result suggests that firms who wish to make their price cuts
more effective at building future market share should combine them
with informative advertising. This does not take into account the
impact of advertising itself, which could increase or decrease
revenues. The increase in revenues for the 10 simulated periods from
the price cut alone was 13.14 dollars for Cheer, 18.41 dollars for
Surf, and 9.64 for Dash. The increase in revenues from removing
learning was 338.45 for Cheer, 122.68 for Surf, and 305.07 for Dash.
Finally, the increase in revenues from the price cut combined with
removing learning was 349.32 for Cheer, 139.16 for Surf, and 314.69
for Dash. Since removing learning increases overall revenues, price
cuts combined with advertising increase revenues more than price
cuts alone (this does not take costs of advertising into account).
When switching costs are removed from the model, price cuts become
much less effective at building future market shares. This is
because when there is learning and switching costs, some of the
consumers who experiment with the new product and find they dislike
it will continue to purchase it due to the cost of switching
products. For introductory price cuts in period 1, the own price
elasticity of demand drops when switching costs is removed; it rises
for periods 3 and 5.
The tenth to twelfth columns show the impact of these price cuts
under the results produced by the no learning model. The impact of a
period $t$ price cut on the new product's period $t$ market share
can be either higher or lower than in the full model. For Cheer,
these elasticities drop, but for Surf and Dash, all but one of them
rise. The intertemporal price elasticities are overestimated,
especially in the earlier periods. The last three columns show the
impact of these price cuts produced by the model with no switching
costs. The own price, current period elasticities appear to be
underestimated, except for Dash in period 5. This result seems
somewhat counterintuitive: one would expect that, all else equal,
increasing switching costs should reduce a product's own price
elasticity. However, recall that in the no switching costs model,
estimated taste variances were much larger, which will tend to raise
price elasticities. As expected, the intertemporal elasticities are
much lower in the no switching costs model. Thus, the impact of
misspecifying dynamics on the model's implied intertemporal price
elasticities is significant.
The second counterfactual, shown in Table \ref{table:cf2},
demonstrates the effect of informative advertising on the short run
and intermediate run market shares for the new products. The market
shares are simulated in the same way as the price cut
counterfactuals. The informative advertising is modeled as follows:
when the new product is introduced, I assume that every consumer
receives a signal $a_{ij}$ about their true match value for the new
product which is normally distributed with mean $\gamma_{ij}$ and
variance $\sigma_{aj}^2$. I assume that consumers update their
expected true taste, $\gamma_{uij}^0$, and the variance of their
true taste distribution, $\sigma_{uij}^2$, using a Bayesian updating
rule (see \citeN{degroot70}, pg. 166-167):
\begin{equation}\label{eq:bupdating}
\begin{split}
\gamma_{uij}^0 & = \frac{\frac{\gamma_{ij}^0}{\sigma_{ij}^2} +
\frac{a_{ij}}{\sigma_{aj}^2}}{\frac{1}{\sigma_{ij}^2} +
\frac{1}{\sigma_{aj}^2}} \\
\sigma_{uij}^2 & =
\frac{1}{\frac{1}{\sigma_{ij}^2}+\frac{1}{\sigma_{aj}^2}} \\
\end{split}
\end{equation} For each product, I assume that the signal variance $\sigma_{aj}^2$
is one half of the population variance in Table \ref{table:unc1}, so
that for the Cheer counterfactual $\sigma_{aj}^2$ is 1.015, for Surf
it is 0.925, and for Dash it is 0.905). This counterfactual is
simulated using the results of the full model, the full model with
switching costs removed, and the no switching costs model.
The simulated market shares are shown in Table \ref{table:cf2}. For
Cheer, informative advertising increases the product's market share
in all periods. For Surf, informative advertising reduces the market
share in the short run, but increases it in the longer run. One
explanation for this is that when consumers have a better signal of
how much they will like the new product, their option value of
learning is reduced. Also, the informative advertising informs
consumers who have low match values with the product that they
should not switch into it, which would be very bad when there is a
switching cost; when there is no advertising, consumers expect to
like Surf (recall that the distribution of expected tastes for Surf
has the highest mean out of all three new products) and those who
have lower match values for it will continue to purchase it due to
the switching cost.\footnote{The fact that advertising reduced
Surf's market share in the presence of switching costs may seem
counterintuitive in light of the discussion in Section
\ref{sec:exlearning}. Recall that in the counterfactual experiment
discussed there, removing learning increased Surf's market share in
the presence of switching costs. A reason for this is that the
simulation experiment performed in Section \ref{sec:exlearning} was
done at the actual data, where there is significant price variation,
whereas these counterfactuals are computed at constant prices. Price
variation will reduce the impact of the switching costs, making the
results look more like the no switching costs case.} For Dash,
advertising has the largest impact of all the three products. The
reason for this is that consumers' expected taste for Dash is lower
than Cheer or Surf, which means that the option value of learning
about Dash will be lower than for Cheer or Surf and the fact that
advertising reduces it will not matter that much. What the
advertising does is it gives consumers a better idea of their true
match value for Dash. Since the population variance of true match
values for Dash is high, those who have high match values will
become more likely to experiment. This makes the advertising have a
stronger effect on the market share for Dash than for Cheer or Surf.
This result is interesting, since it suggests that informative
advertising is more effective for niche products. The second column
shows the predicted revenue increase resulting from the advertising.
The third and fourth columns show the predicted market shares and
revenues under the full model with the switching costs parameter,
$\eta_i$, set to 0 for all agents. For all three new products,
informative advertising increases the new product's market share
more when there is no switching costs than when it is present, for
the first one or two periods. For Cheer and Dash, the impact of the
advertising dies out more quickly when there are no switching costs,
which is not surprising. For Surf, the advertising increases the
product's market share rather than decreasing it. This suggests that
the decrease in Surf's market share in the full model was driven by
the switching costs. The results of the no switching costs model are
shown in the third and fourth columns. The impact of the advertising
is smaller for Cheer, but larger for Surf and Dash. Part of this
result may be due to the fact that the learning parameter for Dash
is estimated to be smaller, and for Surf it is estimated to be
larger. For Dash, the learning parameter is slightly smaller, but
the lack of switching costs will tend to increase the impact of the
advertising.
\section{Empirical Price Elasticities}
In Table \ref{table:elastslt} I compute the empirical price
elasticities implied by the full model, the no learning model, and
the no switching costs model. The price elasticities are computed in
a method similar to how I computed the impacts of price cuts in the
counterfactual exercises from the previous section. First, I compute
the market shares for ten periods, holding the prices of all
products fixed over time at their averages. Then, I cut the price of
all sizes of one of the products by ten percent for all ten periods,
and compute the percentage change in the product's market share,
divided by ten. Note that this type of price elasticity is not
simply the impact of a 1 period price cut on the product's market
share. When I compute the market shares at the lower price,
consumers expect the lower price to last forever, and adjust their
value functions accordingly.
The first column of the table shows the impact of a long term price
cut for Cheer Liquid on Cheer Liquid, Wisk Liquid, Tide Liquid, and
Tide Powder. I show the simulated elasticity for 3 of the ten
periods I simulate, periods 1, 3 and 5. Thus, the own price
elasticity of Cheer for a permanent price cut in Cheer is -1.85 in
period 1, -1.73 in period 3, and -1.64 in period 5. The cross-price
elasticity of a price cut on Cheer for Wisk's quantity in period 1
is 0.11. Note that when I computed the elasticities in the first
column, I assumed that Surf and Dash were not available, and that
there was learning in Cheer. The second, third, and fourth columns
show the impacts of long term price cuts in Surf Liquid, Dash
Liquid, and Tide Liquid respectively. For the Tide Liquid price cut,
I assume that all products are available, and that there is learning
in Dash. The own price elasticity of all products are estimated to
be significantly larger than in the case of temporary price cuts:
they range from -1.45 to -1.87. This is likely due to the fact that
the price cuts are affecting consumers' future utility through the
switching costs and the learning, in addition to their current
utility. The cross-elasticities of demand are roughly between 5\%
and 30\%, with the highest cross elasticity between Dash and Tide
Liquid.
One reason that computing these elasticities is important is that
own and cross-price elasticities are used by the Federal Trade
Commission and the US Department of Justice to evaluate the
competitive impact of mergers. Cross-price elasticities are an input
into ``unilateral effects'' analysis, where agency personnel examine
whether a producer of several products could profitably raise the
price of one (or some) of them post-merger. As an example, suppose
that all the laundry detergents analyzed in this paper were produced
by different firms, and a merger between Cheer and Wisk was
announced. A unilateral effects analysis would ask whether the
producer of both products would find it profitable to raise the
price of Cheer, for example. If Cheer and Wisk were very similar
products, then many consumers of Cheer would switch to Wisk, which
would mitigate the impact of the price increase on the merged firm's
profits. Clearly, the higher the cross-price elasticity between the
two products is, the larger will be the unilateral effect.
Own price elasticities are used in merger analysis to define
``antitrust markets.'' An antitrust market is a group of products
such that a hypothetical monopolist who produced those products
would find it profit maximizing to raise price above current levels
by at least some percentage (in practice this percentage is often
taken to be 5\%), assuming that the prices of other products did not
change. Antitrust markets are used to address the question of market
definition in merger analysis: when computing the market shares of
merging firms, it is necessary to determine which products should be
included in the denominator of the share calculation. A convention
that is used by federal agencies is to define the relevant market to
be the smallest antitrust market. For example, again suppose that
all the laundry detergents analyzed in this paper were produced by
different firms. If one was evaluating a merger between Cheer and
Wisk, one might want to know whether these two products were in a
separate antitrust market from other detergents. One would then ask
whether a hypothetical monopolist who produced Cheer and Wisk would
find it profit-maximizing to raise the prices of these products by
at least 5\%. An important input into this exercise will be the own
price elasticities of demand for these products (the own price
elasticities will themselves be functions of the cross-price
elasticities with other laundry detergents). To complete the
hypothetical monopolist exercise, an understanding of the production
costs of the products is also necessary.
In order to carry out these exercises, it is necessary to have
correct estimates of own and cross-price elasticities. If certain
types of dynamics are ignored, such as learning or switching costs,
then our estimated elasticities will be wrong. Because I have
estimated the model without learning and without habit formation, it
is possible to assess the magnitude of ignoring these types of
dynamics. The results for the no learning model are shown in the
fifth through eighth columns. The own price elasticities for the new
products drop significantly. One reason for this is that the
learning makes consumers more likely to switch into the new
products; the option value of learning makes them more attractive,
and a permanent price drop will generally raise this option value.
It is notable that the own price elasticity for the established
product, Tide Liquid, is not affected much by ignoring learning. The
cross price elasticities between new products and established
products drops significantly when learning is removed, for the same
reason. In contrast, the cross price elasticity between established
products rises. Thus, by ignoring learning we would tend to
underestimate the unilateral impact of mergers between new and
established products, and overestimate the unilateral impact of
mergers between established products. Cross price elasticities can
differ across models by a factor of as much as 90\% (for example,
Surf Liquid and Wisk Liquid).
\section{Conclusions and Extensions}
In this paper I propose a structural model of learning and
experimentation that nests alternative sources of dynamics in
demand, such as switching costs or consumer taste for variety. In
this model, consumers are forward-looking, and I allow a rich
distribution of heterogeneity in consumer tastes, price
sensitivities, consumer expectations of true match values, and the
type of alternative dynamics.
I estimate the model on laundry detergent scanner data and find
evidence for switching costs and significant learning. The model is
estimated using a Markov Chain Monte Carlo and I employ a new method
for solving for consumers' value functions that substantially
reduces the estimation procedure's computational burden. The results
show strong support for learning and suggest that new products are
experience goods. Before consumers make their first purchases of the
new product, they have very similar expectations of what their true
tastes will be. Those who make first purchases end up being very
heterogeneous in their true tastes. The results also suggest most
consumers have switching costs in addition to learning. Both
learning and switching costs have a significant impact on the market
shares of new products.
I re estimated the model twice to examine the impact of ignoring
dynamics on the model predictions: once without learning, and once
without switching costs. I find that if learning is ignored, model
parameter estimates for other parameters look similar, but the model
does a much worse job at predicting market shares during the periods
after new product introductions. If switching costs are left out,
the model underpredicts the importance of learning. I also examine
the impact of misspecification on If learning is left out, the model
underpredicts own price elasticities for new products, and
underpredicts cross price elasticities between new and established
products. If switching costs are left out, the model underpredicts
own and cross price elasticities for all products.
I also examine the effect of two ``what-if'' experiments. In the
first experiment I drop the price of the new products and simulate
the products' intermediate run market share in a partial equilibrium
setting, under different assumptions about dynamic demand. The
results of this counterfactual exercise suggest that the impact of
the price cut is greater when consumers both learn and have costs of
switching products, as opposed to when there are no switching costs
and they only learn. The impact of the price cut is also greater
when there are only switching costs than when consumers learn and
have switching costs, which suggests that price cuts may be more
effective when they are combined with informative advertising or
free samples. In my second ``what-if'' experiment, I give consumers
informative advertisements which reduce their uncertainty about
their true match value for the new products in the same partial
equilibrium setting. The results suggest that for the two mainstream
new products, informative advertising reduces the product's market
share in the presence of switching costs. For a niche product,
informative advertising is beneficial.
There are a number of extensions for this research that would be
useful. First, this paper abstracts from the supply side: for
example, the counterfactuals I compute are partial equilibrium
counterfactuals and do not account for competitor responses. It
would be interesting to examine the model's supply side
implications: for example, we might be interested in knowing the
impact of learning on the ease of new product entry, or on
equilibrium pricing behavior. One way to perform this kind of
exercise would be to take the demand system as given, solve for the
market equilibrium, and compute comparative statics. This sort of
exercise has been performed in markets with switching costs in
\citeN{dubehitschrossi06} and \citeN{chesudhirseetaraman06}. In
these papers, the computation of the market equilibrium is tractable
because switching costs are the only source of dynamics, and
consumers are not forward-looking (\citeN{dubehitschrossi06} argue
that the problem with forward-looking consumers is similar to the
problem with myopic ones, when firms prices follow a Markov
process.). Solving the model with forward-looking consumers who
learn their match values for new products is a more difficult task.
Second, the modeling technique I have used in this paper could be
used to examine other problems where consumers are forward-looking
and are heterogeneous in dimensions such as price sensitivities. One
empirical question would be to examine the patterns of price
promotions in supermarkets. For many products, supermarkets have
periodic price promotions and quantity purchased spikes sharply
during those promotions. \citeN{hendelnevo05_2} find that consumer
behavior is consistent with stockpiling behavior;
\citeN{pesendorfer02} also finds that sales are consistent with
demand accumulation. \citeN{villasboasvillasboas06} provide an
alternative explanation for these promotions: they could be due to
learning and forgetting. It may be possible to disentangle these
stories with a structural demand model that is similar to the one
that was estimated in this paper. It would be more complicated to
estimate than the one provided in this paper, since there is an
additional source of dynamics, which is consumer stockpiling
behavior.
Another potential area of application would be consumer demand for
durable goods.Two examples of papers that examine the relationship
between price declines and consumer dynamics for durables are
\citeN{gowrisankaranrysman06} and \citeN{nair04}. These papers have
assumed that the only dynamic decision on the consumer side is the
decision of when to make a purchase in the presence of declining
prices. There may be additional sources of dynamics which will
impact price declines, such as the existence of a used market.
Adding another source of dynamics to these models will make them
more computationally burdensome, but the new technique I have used
could ameliorate this problem.
\appendix
\section{Appendices}
\subsection{Markov Chain Monte Carlo Algorithm}\label{append:structmcmc}
Essentially, there are 2 levels to the MCMC algorithm: a level in
which population-varying individual parameters on unobserved
heterogeneity are drawn, and a level in which the population-fixed
parameters are drawn (which includes the parameters that generate
unobserved coupons and govern consumer expectations about future
unobserved coupons).
\begin{enumerate}
\item Update value function at chosen state space points
(see Section \ref{sec:vfsolve} and Appendix \ref{append:dynopt} for
more details on this process).
\item For each household, draw a new $\theta_i$. The posterior of
$\theta_i$ is proportional to
\begin{displaymath}
\left(\prod_{t=1}^{T^i}
Pr(d_{it}|\theta_i,\theta,c_{it},p_{it},x_{it})\right)\phi(\theta_i|b,W)k(b,W),
\end{displaymath} where $\phi(\theta_i|b,W)$ is the joint normal density and $k(b,W)$
is the prior on $b$ and $W$. It is difficult to draw from this
posterior directly since $Pr(d_{it}|\theta_i,c_{it},p_{it},x_{it})$
is multinomial logit. Hence, I use the Metropolis-Hastings
algorithm. This means that for each household $i$ I draw a trial
$\theta_i^1$, where $\theta_i^1 \sim N(\theta_i^0,\rho \tilde{W})$,
and $\theta_i^0$ is the previous iteration's $\theta_i$.
$\tilde{W})$ is the variance matrix $W$ with three extra variances
added in to correspond to the posterior draws. In my program, I draw
the difference between $\gamma_{ij}$ and $\gamma_{ij}^0$. For a
particular person, this difference has variance $\sigma_{ij}^2$. We
might be tempted to use this value in $W$, but it would violate the
reversibility condition for the proposal distribution. Hence, I put
in the average population mean of the $\sigma_{ij}^2$'s.
I accept the new draw $\theta_i^1$ with likelihood
\begin{displaymath}
\frac{\left(\prod_{t=1}^{T^i}
Pr(d_{it}|\theta_i^1,\theta,c_{it},p_{it},x_{it})\right)\phi(\theta_i^1|\tilde{b},\tilde{W})}{\left(\prod_{t=1}^{T^i}
Pr(d_{it}|\theta_i^0,\theta,c_{it},p_{it},x_{it})\right)\phi(\theta_i^0|\tilde{b},\tilde{W})}.
\end{displaymath}
The scalar $\rho$ is automatically selected so the acceptance rate
is about 0.3.
\item Then I draw $b$ conditional on $\tilde{\theta}_i,W$ and $W$ conditional
on $\tilde{\theta}_i,b$. The formulas for the posteriors of these
parameters are the usual ones. Note that in the posterior
distributions for $b$ and $W$, the individual level posterior draws
will drop out since they only directly depend on $\sigma_{ij}^2$.
\item Population-fixed parameter layer: at the beginning of this layer, I draw a
new set of unobserved coupons, which means drawing the
$\overline{c}_{ijt}$'s and the $v_{ijt}$'s. As described in the body
of the paper, the $v_{ijt}$'s are drawn from the empirical
distribution of coupon values in the data. Denote $p_{cjt}$ as the
probability a consumer gets a coupon for product $j$ in period $t$.
This probability will be a function of parameters in $\theta$, as
described in Section \ref{sec:dynopt}. The $\overline{c}_{ijt}$'s
are binary, and their distribution is:
\begin{displaymath}
\begin{split}
& Pr(\overline{c}_{ijt}=1) \\
& =
\frac{Pr(d_{it}|c_{it},\overline{c}_{ijt}=1,v_{it},\theta_{i},\theta)p_{cjt}}{Pr(d_{it}|c_{it},\overline{c}_{ijt}=1,v_{it},\theta_{i},\theta)p_{cjt}
+Pr(d_{it}|c_{it},\overline{c}_{ijt}=0,v_{it},\theta_{i},\theta)(1-p_{cjt})}.
\\
\end{split}
\end{displaymath} The more difficult task is drawing the $\theta$, which is performed
next. The posterior distribution of $\theta$ is proportional to
\begin{displaymath}
\prod_{i=1}^I \prod_{t=1}^{T_i} \left\{
Pr(d_{it}|\theta_i,\theta,\Sigma_{it},c_{it},x_{it})
Pr(c_{it}|\theta) \right\}.
\end{displaymath} As with the $\theta_i$, the Metropolis-Hastings
algorithm is also used here. I draw a trial $\theta^1$ from a
$N(\theta^0,\rho_2)$ distribution. Any trial draw where the coupon
probabilities, like $p_{cj}^0$ or $p_{cj}^0 + p_{cj}^1$, are outside
of the $[0,1]$ interval are automatically rejected. For cases where
the draws are inside this interval, the new draw is accepted with
likelihood
\begin{displaymath}
\frac{\prod_{i=1}^I \prod_{t=1}^{T_i} \left\{
Pr(d_{it}|\theta_i,\theta^1,\Sigma_{it},c_{it},x_{it})
Pr(c_{it}|\theta) \right\}}{\prod_{i=1}^I \prod_{t=1}^{T_i} \left\{
Pr(d_{it}|\theta_i,\theta^0,\Sigma_{it},c_{it},x_{it})
Pr(c_{it}|\theta) \right\}}.
\end{displaymath} This procedure for drawing fixed coefficients is similar to what is
suggested by \citeN{train03}, pgs 311-313, for drawing fixed
coefficients in static mixed logit models. I adjust the parameter
$\rho_2$ so that the acceptance rate is about 0.3.
\end{enumerate}
These steps are iterated 15,000 times, with the first 7,500
parameter draws discarded for burn-in.
\subsection{Estimation of the Price
Process}\label{append:priceprocess}
When I construct consumer price expectations, I estimate a price and
product availability process for a number of brand-size combinations
in the market. In my data set, prices are only recorded when a
consumer makes a purchase of a product. Before we can construct a
process for prices, we will need a set of prices and availabilities
for all products in all the stores in the data. The data set
includes a set of ``price files" which contain prices imputed from
the household purchase data by A.C. Nielsen; one possibility would
be to use this file. A drawback to using these files is that some
brand-size combinations were not included. In order to calculate the
average price per ounce of every brand in my estimation, I would
like to keep track of the prices of the most popular brand-sizes. I
therefore use a simple algorithm that is somewhat similar to
Nielsen's to impute prices and availability of products in a store
during a given calendar week.\footnote{A possible problem is that
the imputed prices may not be exactly the same as the actual prices.
\citeN{pesendorfer02} examines retail pricing of ketchups, another
product included in this data set, and notes no problems with the
constructed price series. It would also be possible to estimate a
price distribution along with the model parameters, treating prices
for nonpurchased brands as latent unobservables like I did for
coupons.}
In the data set, when a consumer makes a purchase I observe an
identification number for the store where the purchase was made. The
actual identity of the store was not recorded. When constructing the
price series, first I run through all household purchases and store
the price of the product purchased in that purchase event. If two
different prices are observed at the same store during the same
week, I assume that the weekly price is the price that is observed
earlier. A ``product" denotes a brand-size combination, as is used
in estimation - the brand is one of the 13 brands and types
described in Table \ref{table:mktshares1}, and the size is in one of
the categories shown in Table \ref{table:hhpurchstuff}. If no
consumer purchases a particular product from a given store for an
interval greater than 4 weeks, I assume that product is unavailable
in that store for that period. No imputed prices are filled in for
these periods. Some stores had very few observed purchases, and
these stores were not included in the estimation. In all, 15 stores
were used for the estimation.
% The percentage of observed prices varies
%widely by product. For the Other Liquid 64 oz bottle, we observe
%64\% of prices, whereas for 128 oz Dash Liquid we only observe about
%5\% of prices. If we sum across all 48 brand-size combinations,
%roughly 20\% of prices are observed.
Most of the stores in the sample are identified by Nielsen as being
in 1 of five different chains. When Nielsen constructed their price
file, they assumed that for two of the chains, pricing patterns were
the same across stores in the same chain. I hold the same assumption
as they do. Thus, if in a given store in one of these chains, no
price is observed for a certain week and the product is assumed to
be available during the week, then the price of the product is
imputed to be the modal observed price at all the other stores in
the same chain (or the lowest price if there are multiple modes). If
there is no observed price in the other stores, then the price of
the product is imputed to be the same as the previous week's price.
For other stores, the price is calculated on a store by store basis.
That is, if the price is not observed in a given week, it is imputed
to be the same as the previous week's price. Also, during the first
few weeks of the data we may not see prices for some products. These
prices are imputed backwards from first observed price. Periodically
products are marked below their shelf price, which is recorded by a
variable in the model. I assume that these discounts only last
during the week they are recorded.
Once I have constructed an array of prices and availability for each
product, I estimate a discrete/continuous Markov process on prices
and availability, for all products, similar to
\citeN{erdemimaikeane03}. As noted earlier, there are 48 possible
brand-size combinations. Even though the space of prices is
discrete, it would be difficult to accurately interpolate the value
function over a 48 dimensional state space. In order to reduce the
size of the state space, I estimate the Markov price process for the
most popular sizes of liquids and powders, which are the 64 oz
bottle for liquids, and the second category for powders. I will
denote these sizes as the reference sizes for these products. For
the other sizes, I assume that the price and availability is a
function of the price and availability of the reference size's price
and availability.
An observation in the price process estimation is the
price/availability of a product in a given store during a given
week. The price process is estimated only for weeks after the
introduction of Cheer. First, I will describe the estimation of the
Markov process for the reference sizes. If a particular product was
available in the store I assume the probability of a product $j$'s
price staying the same in weeks $t$ and $t-1$ is
\begin{displaymath}
\frac{\exp(\kappa_{j}'X_{jt})}{1+\exp(\kappa_{j}'X_{jt})},
\end{displaymath} where
The $X_{jt}$'s include a constant term, three dummy variables that
are 1 for the first three months after each new product
introduction, a dummy variable that is 1 during the period after the
introduction of Cheer and before the introduction of Surf, which I
will denote as $D_1$, a dummy variable that is 1 during the period
after the introduction of Surf and before the introduction of Dash.
To allow previous prices to affect the probability of a price
change, I also include the difference between the price of product
$j$ in week $t-1$ and the average of all other brands available in
week $t-1$ (that is, $p_{jt-1} - \overline{p}_{jt-1}$, where
$\overline{p}_{jt-1} = (1/J)\sum_{k=1}^J p_{kt-1}$), and this
difference interacted with $D_1$ and $D_2$. The probability of a
price change varies with the difference between the brand's previous
price and the average price of other products in order to allow for
competitive responses. The price difference regressors are
interacted with $D_1$ and $D_2$ to allow the price process to be
different when different sets of products are available on the
market. If the price changes in period $t$ then I assume the density
of the price change is
\begin{displaymath}
\ln(p_{jt-1}) = \lambda_j'X_{2jt} + \varepsilon_{itj},
\end{displaymath} where $X_{2jt}$ includes a constant, three dummy
variables for the first three months after each new product
introduction, the dummy variables $D_1$ and $D_2$, the log of the
price of product $j$ in week $t-1$, the average of the logarithm of
the prices of all other products in week $t-1$ (that is, $(1/J)
\sum_{k=1}^J \ln(p_{kt-1})$), and these previous two variables
interacted with $D_1$ and $D_2$. I assume $\varepsilon_{itj} \sim
N(0,\sigma_j^2)$. If a product is not available in week $t-1$ but is
available in week $t$ then I estimate a similar regression to the
one above but I leave out the previous price of product $j$. Last, I
estimate a logit to model product stockouts from week to week.
Letting $a_{jt-1}$ be a dummy variable that is 1 if product $j$ is
not available in period $t-1$, I assume the probability of a store
stockout in week $t$ is
\begin{displaymath}
\frac{\exp(\zeta_{j}'X_{3jt}) }{1+\exp(\zeta_{j}'X_{3jt})},
\end{displaymath} where $X_{3jt}$ includes a constant, the dummy
variables $D_1$ and $D_2$, $a_{jt-1}$, $D_1$ and $D_2$ interacted
with $a_{jt-1}$, $(1-a_{jt-1})(p_{jt-1}-\overline{p}_{t-1})$, the
interaction of the previous term with $D_1$ and $D_2$,
$a_{jt-1}\overline{p}_{t-1}$, and this term interacted with $D_1$
and $D_2$. I run these estimations in Stata and keep the results in
data files my fortran programs can access. Parameter estimates are
not shown in this paper, but are available upon request from the
author.
For the non reference sizes, I assume that the price of the product
is
\begin{displaymath}
\begin{split}
\ln(p_{jst}) = & \psi_{0js} + \psi_{1js}D_1 + \psi_{2js}D_2 +
\psi_{3js} D_1(1-a_{jt})\ln(p_{jt}) + \psi_{4js}
D_2(1-a_{jt})\ln(p_{jt}) \\
& + \psi_{5js} (1-a_{jt})\ln(p_{jt}) + \psi_{6js} D_1a_{jt} +
\psi_{7js} D_2a_{jt} + \psi_{8js} a_{jt} + \varepsilon_{jst},
\end{split}
\end{displaymath} where the $\varepsilon_{jst}$ is normally
distributed, and the probability of the product stocking out is
\begin{displaymath}
\frac{\exp(\varphi_{0js} + \varphi_{1js}D_1 + \varphi_{2js}D_2 +
\varphi_{3js}a_{jt})}{1+\exp(\varphi_{0js} + \varphi_{1js}D_1 +
\varphi_{2js}D_2 + \varphi_{3js}a_{jt})}.
\end{displaymath} As before, $p_{jt}$ and $a_{jt}$ are the price and
availability of the reference size of product $j$ in week $t$.
Tables of parameter estimates are available upon request from the
author.
For some products, it was difficult to identify parameters of the
price process due to lack of observations. For example, there were
only 2 observations for the log price regression for Other Liquid
when it was not available in the previous period, and only 7 for the
Other Powder product. This is not surprising since these products
were popular and almost always available. For these products, I
tabulate the empirical distribution of prices when the product is
not available and draw from it when I form consumer's price
expectations rather than using the regression results. For Dash
Powder, there were also only 7 observations for the price change
regression when the product was available. I also tabulate these 7
observations and draw from their empirical distribution to form
consumer's price expectations.
As described in the paper, I solve the value function on a grid of
$M=100$ prices. Each time a household makes a purchase, it is
necessary to calculate the probability of each price point $p^m$
conditional on the observed price vector at the time of purchase. A
complication is that the price process is weekly, but households do
not make purchases every week. As I describe in the paper, I assume
that every household expects their next purchase to take place in 8
weeks, the median interpurchase time.\footnote{A less restrictive
assumption would be to allow the household's expected next purchase
time to be the average interpurchase time for that particular
household. Doing this will mean calculating a separate value
function for each household, increasing memory requirements
substantially.} When I calculate the probability of a particular
grid point $p^m$ given today's price, I simulate the transition
probability 100 times in the 7 intervening weeks.
\subsection{Details of the Value Function
Solution}\label{append:dynopt}
%Before describing the second step, which is the calculation of the
%approximation to the expected value function
%$E_{(p',J')|(p,J)}V_n(s',p',J',y',n';\theta_i^l)$, I would like to
%make a note about the size of the state space.
In this section I will describe some of the details about the
computation of the value function that were left out of Section
\ref{sec:vfsolve}. The first detail is about dealing with the large
size of the state space, which is the vector of $(S,p,J,y,n)$. One
important part of the state space is the vector of prices $p_{ijst}$
and the set of available products, $J_{it}$, in a given purchase
event. Recall that I only include the prices of the most popular
size of each product in the price state space. Even with this
simplification, because there are 13 products, this portion of the
state space is still high-dimensional. Recall that the expected
utility which is calculated in \eqref{eq:vfpm2} must be retained for
future use. During the estimation, these expected utilities must be
stored in computer memory, which is limited in size. Because of
this, I do not evaluate the value function at all possible
price/availability states, but I instead do it only on a grid of $M$
points, following \citeN{rust87}. Although the estimated price
process treats prices as a continuous variable, prices in the data
are clustered at certain points. I choose the grid points as
follows: for each product, I find the five most frequently occurring
prices, and randomly choose each product's price from these points.
This ensures that the approximated value function will be more
accurate at frequently visited state space points. At any other
point, I interpolate the value function as follows. Suppose that the
estimated transition density of a price/availability grid point
$(p^m,J^m)$, where $m=1,...,M$, given a price/availability vector
$(p,J)$, is $f(p^m,J^m|p,J)$ (details of the estimation of this
density are described in Appendix \ref{append:priceprocess}). Assume
that at the current point in the MCMC sequence we have an
approximation to the value function for individual $i$, who is
represented by the parameter vector $\theta_i$, at all the
price/availability grid points, $(p^m,J^m)$, the learning state $S$,
the previous product purchase $y$ and the time state $n$, which I
denote $\hat{EV}_i(S,p^m,J^m,y,n;\theta_i)$. Then the expected value
function for some other price/availability vector $(p,J)$ at
$\theta_i$ is approximated as
\begin{equation}\label{eq:vfpm}
E_{(p',J')|(p,J)}V_i(S,p,a,y,n;\theta_i,\theta) \approx
\frac{\sum_{m=1}^M
\hat{EV}_i(S,p^m,J^m,y,n;\theta_i,\theta)f(p^m,J^m|p,J)}{\sum_{m=1}^M
f(p^m,J^m|p,J)}.
\end{equation}
This equation is plugged into Equation \eqref{eq:vfupd} in the
second step of the value function calculation, so the version of
Equation \eqref{eq:vfupd} that is used in practice is
\begin{equation}\label{eq:vfupd2}
\begin{split}
& E_{(p',J')|(p,J)} V_g(S,p,J,y,n,\theta_{i,g},\theta_g) \\
& = \frac{ \sum_{r=1}^{N(g)} \left[ \frac{\sum_{m=1}^M
\hat{EV}_r(S,p^m,J^m,y,n;\theta_{i,r},\theta_r)f(p^m,J^m|p,J)}{\sum_{m=1}^M
f(p^m,J^m|p,J)} \right] k( (\overline{\theta}_{i,g} -
\overline{\theta_{i,r}})/h_k ) }{ \sum_{i=1}^{N(g)} k(
(\overline{\theta_{i,g}} - \overline{\theta_{i,r}})/h_k ) }. \\
\end{split}
\end{equation}
I choose to save $N(g)$ = 800 previous value functions. When I
estimate the model, I make a simplification to steps 1 and 2. Saving
800 previous value functions at all the state space points for all
519 households will still require a large amount of computer memory,
and will be computationally intensive. I overcome this problem by
recognizing that the value function only depends on the $\theta_i$'s
and $\theta$, and not any individual specific characteristics.
Demographics enter utility in linear combinations with the
$\theta_i$'s, so in practice I store $\alpha_{0i} + \alpha_{1} INC_i
+ \alpha_{2} SIZE_i$ rather than storing $\alpha_{0i}$, $\alpha_{1}$
and $\alpha_{2}$ separately and treating demographics as state space
variables. The same is done for the learning parameters. At the end
of step 1 I randomly select a household whose parameter draw is
accepted in the first Metropolis-Hastings step (the one for the
population-varying coefficients) and I store only that $\theta_i$.
The $\theta_{i,r}$ that is used in \eqref{eq:vfupd} will in practice
not depend on $i$.
For the kernel function $h_k(\cdot)$, I use the Epanechnikov kernel
for computational efficiency. Optimal bandwidth selection requires
that the bandwidth parameter of the kernel, $h_k$, is a function of
$N$ and that as $N \rightarrow \infty$, $h_k(N) \rightarrow 0$ and
$Nh_k(N)^{2k} \rightarrow \infty$, where $k$ is the dimension of the
vector in the kernel function. In my model, there are 46 parameters
that enter the kernel function. $20$ of these parameters are
population-fixed parameters (the 18 coupon parameters and the 2
fixed taste coefficients), and the rest are population-varying
coefficients which include tastes, the underlying learning
parameters, price coefficient, switching costs and coupon
sensitivity parameter. For the full model, $h_k(N)$ is set to be
$2/N^{(1/124)}$. There are less parameters in the kernel function
for the restricted models, so the bandwidth is slightly different in
them. For the no switching costs model, it is $2/N^{(1/124)}$, and
for the no learning model, it is $2/N^{(1/112)}$.
\bibliographystyle{chicago}
\bibliography{thesis_bib}
%An intuitive explanation would be that under state dependence only
%the price cut draws in consumers who will purchase the new product
%and will keep on purchasing it in the future. Under learning, the
%introductory price cut may not have as much effect, since some of
%the new consumers will decide that they don't like the product and
%will switch away from it.
\newpage
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% %
% %
% TABLES AND FIGURES %
% %
% %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\renewcommand{\baselinestretch}{1.2}
\begin{table}[h]\caption{Distributions of Household Demographics}\label{table:hhdemog}
\begin{center}
\begin{tabular}{ c c c c c }
\hline
Income Bracket: & Less than 20,000 & 20,000 - 40,000 & 40,000 - 60,000 & 60,000+ \\
Percent: & 11.5 & 21.9 & 29.1 & 37.6 \\
\hline
\end{tabular}
\end{center}
\begin{center}
\begin{tabular}{ c c c c c }
\hline
Household Size: & 1 & 2 & 3 & 4+ \\
Percent: & 16.9 & 33.7 & 17.1 & 32.4 \\
\hline
\end{tabular}
\begin{center}
\parbox{0.9\linewidth}{\footnotesize{Income and size distributions are calculated as the fraction of
households observed of a particular income/size in the Sioux Falls,
SD sample. Household demographics were collected in a survey that
was given to all households who participated in the study.}}
\end{center}
\end{center}
\end{table}
\begin{table}[h]\caption{Shares of Sizes}\label{table:hhpurchstuff}
\begin{center}
\begin{center}
Size Distribution of Liquids:
\end{center}
\begin{tabular}{c c c c c}
\hline
32 oz & 64 oz & 96 oz & 128 oz & Other\\
15.5 & 52.9 & 11.4 & 17.5 & 2.7 \\
\hline
& & & & \\
\end{tabular}
\begin{center}
Size Distribution of Powders:
\end{center}
\begin{tabular}{c c c c c}
\hline
17 to 20 oz & 34 to 49 oz & 65 to 84 oz & 144 to 157 oz & Other\\
8.8 & 33.5 & 32.3 & 20.5 & 4.9 \\
\hline
& & & & \\
\end{tabular}
\parbox{\linewidth}{\footnotesize{Size distributions were calculated by taking the
number of observed purchases of liquids (powders) in a certain size
category and dividing them by the number of liquid (powder)
purchases observed in the entire sample.}}
\end{center}
\end{table}
\begin{table}[h]\caption{Market Shares for all Products}\label{table:mktshares1}
\begin{center}
\begin{tabular}{ c c c c c c c c c c }
\hline
Type & Other & Era & Wisk & Tide & Solo & Cheer & Surf & Dash & Total \\
\hline
Liquid & 0.14 & 0.06 & 0.10 & 0.09 & 0.03 & 0.03 & 0.06 & 0.02 & 0.53 \\
Powder & 0.21 & - & - & 0.16 & - & 0.07 & 0.03 & 0.01 & 0.47 \\
\hline
\end{tabular}
\end{center}\parbox{\linewidth}{\footnotesize{ Market share is calculated as the total number of observed
purchases of a specific brand divided by the total number of
observed purchases. The sample is all observed purchases in Sioux
Falls over the sample time period, which starts on December 29, 1985
and ends on August 20, 1988.}}
\end{table}
\begin{table}[h]\caption{Market Shares, Average Prices: Liquids Only at Different Periods}\label{table:mktshares2}
\begin{center}
\begin{small}
\begin{tabular}{ c c c c c c c c c c }
\hline
Period & Actual Time & Other & Era & Wisk & Tide & Solo & Cheer & Surf & Dash \\
& YYYY/MM & & & & & & & & \\
\hline
Entire & 1985/12 - & 0.26 & 0.12 & 0.19 & 0.17 & 0.06 & 0.06 & 0.11 & 0.03 \\
Sample & 1988/08 & 2.80 & 4.21 & 2.90 & 3.97 & 4.12 & 3.57 & 2.67 & 3.12 \\
& & & & & & & & & \\
Before Any & 1985/12 - & 0.41 & 0.14 & 0.19 & 0.16 & 0.10 & 0.00 & 0.00 & 0.00 \\
Product Intro & 1986/05 & 2.56 & 4.12 & 3.03 & 4.41 & 3.26 & . & . & . \\
& & & & & & & & & \\
First Quarter & 1986/05 - & 0.24 & 0.11 & 0.27 & 0.11 & 0.07 & 0.20 & 0.00 & 0.00 \\
After Cheer & 1986/08 & 2.69 & 3.55 & 2.79 & 3.98 & 4.10 & 3.13 & . & . \\
& & & & & & & & & \\
First Quarter & 1986/09 - & 0.24 & 0.13 & 0.15 & 0.17 & 0.06 & 0.05 & 0.19 & 0.00 \\
After Surf & 1986/11 & 2.91 & 3.87 & 3.05 & 3.10 & 3.85 & 3.76 & 2.01 & . \\
& & & & & & & & & \\
First Quarter & 1987/03 - & 0.24 & 0.10 & 0.18 & 0.10 & 0.05 & 0.07 & 0.15 & 0.12 \\
After Dash & 1987/06 & 2.80 & 4.15 & 2.88 & 3.96 & 4.42 & 2.90 & 2.70 & 3.15 \\
& & & & & & & & & \\
Remaining & 1987/06 - & 0.24 & 0.11 & 0.18 & 0.21 & 0.04 & 0.05 & 0.12 & 0.04 \\
Time & 1988/08 & 2.91 & 4.42 & 2.88 & 4.01 & 4.83 & 4.07 & 2.95 & 3.11 \\
\hline
& & & & & & & & & \\
\end{tabular}
\end{small}
\parbox{\linewidth}{\footnotesize{ Market share is calculated as the total number of observed
purchases of a specific brand divided by the total number of
observed purchases in a given time period. Brand introduction is
defined as the first time a purchase is observed of a new brand. The
actual introduction dates were verified by telephone conversation
with representatives of the companies; these dates coincide closely
with my definition of the introduction date. According to my
definition, Cheer was introduced in the last week of May, 1986, Surf
in the first week of September, 1986, and Dash in the third week of
March, 1987. Average prices in dollars are shown under the market
share. Prices are calculated using observed purchase data. If there
are $I$ purchases in a given period, the average price for a
specific brand in the particular period is calculated as
$(1/I)\sum_{i=1}^I (p_i - c_i)$, where $p_i$ is the shelf price at
the time of purchase, and $c_i$ is the total value of coupons used
at the time of purchase.}}
\end{center}
\end{table}
\clearpage
\begin{table}[h]\caption{Parameter Estimates of $b$, $\theta$, and $W$ (Utility Function)}\label{table:estresstr1}
\begin{center}
\begin{small}
\begin{tabular}{ l r l r l l r l r l}
\hline
Coefficient & \multicolumn{2}{c}{Mean, $b$, $\theta$} & \multicolumn{2}{c}{Variance, $W$} & Coefficient & \multicolumn{2}{c}{Mean, $b$, $\theta$} & \multicolumn{2}{c}{Variance, $W$} \\
\hline
Taste & & & & & Learning \\
Parameters & & & & & Parameters \\
\hline
Era L & -0.908 & (0.161) & 3.258 & (0.462) & Cheer, $\gamma_i^0$ & -0.518 & (0.089) & 0.356 & (0.167) \\
Wisk L & -0.456 & (0.104) & 2.249 & (0.326) & Cheer, $\sigma_{i0}^2$ & -0.58 & (0.072) & 0.368 & (0.157) \\
Tide L & -0.049 & (0.106) & 1.511 & (0.28) & Cheer, Size ($\sigma_{j1}^2$) & 0.148 & (0.005) & - & \\
Solo L & -2.208 & (0.677) & 5.632 & (2.061) & Cheer, Inc ($\sigma_{j2}^2$) & -0.068 & (0.004) & - & \\
Other P & 0.218 & (0.004) & - & & Surf, $\gamma_i^0$ & -0.161 & (0.07) & 0.432 & (0.123) \\
Tide P & 0.031 & (0.005) & - & & Surf, $\sigma_{i0}^2$ & -0.324 & (0.08) & 0.425 & (0.072) \\
Cheer P & -1.417 & (0.254) & 3.6 & (0.849) & Surf Size ($\sigma_{j1}^2$) & -0.127 & (0.004) & - & \\
Surf P & -0.461 & (0.184) & 1.139 & (0.448) & Surf Inc ($\sigma_{j2}^2$) & 0.019 & (0.003) & - & \\
Dash P & -1.664 & (0.214) & 2.603 & (0.388) & Dash, $\gamma_i^0$ & -0.943 & (0.109) & 0.408 & (0.09) \\
64 oz & 0.839 & (0.072) & 1.308 & (0.194) & Dash, $\sigma_{i0}^2$ & -0.528 & (0.079) & 0.287 & (0.079) \\
96 oz & 0.045 & (0.105) & 2.056 & (0.324) & Dash, Size ($\sigma_{j1}^2$) & -0.025 & (0.008) & - & \\
128 oz & -0.491 & (0.165) & 2.974 & (0.595) & Dash, Inc ($\sigma_{j2}^2$) & -0.003 & (0.003) & - & \\
\hhline{~~~~~-----}
34-49 oz & 0.402 & (0.093) & 2.251 & (0.271) & Exogenous \\
65-84 oz & 0.316 & (0.119) & 2.897 & (0.419) & Variables \\
\hhline{~~~~~-----}
144-157 oz & -0.551 & (0.172) & 5.851 & (0.768) & Price ($\alpha_{i0}$) & -2.382 & (0.471) & 11.915 & (5.502) \\
S.C. ($\eta_{i0}$) & 1.311 & (0.081) & 2.002 & (0.253) & Price, Size ($\alpha_1$) & -0.094 & (0.006) - & \\
S.C. Size ($\eta_1$) & -0.079 & (0.008) & - & & Price, Inc ($\alpha_2$) & 0.014 & (0.006) & - & \\
S.C. Inc ($\eta_2$) & 0.068 & (0.01) & - & & Coupon ($\alpha_{i0c}$) & -0.673 & (0.17) & 0.229 & (0.102) \\
& & & & & Feature & 0.888 & (0.055) & 0.483 & (0.104) \\
& & & & & Display & 0.944 & (0.041) & 0.242 & (0.053) \\
\hline
& & & & \\
\end{tabular}
\end{small}
\parbox{\linewidth}{\footnotesize{This table shows the estimated parameters of the consumer
flow utility (Section \ref{sec:modelspec}). In most parameters I
allow normally-distributed heterogeneity across the population, and
so I have estimated the population mean of the coefficient ($b$) and
the variance ($W$). Because my model estimation procedure is
Bayesian, the numbers in this table show statistics from the
simulated posterior distribution of each parameter. The first two
columns of numbers in the table, under the heading ``Mean'', show
the posterior means of the mean parameters of the taste
distributions, and the standard deviations of these mean parameters
(in parentheses). The third and fourth columns show the mean and
standard deviation of the variance parameters of the taste
distributions. The last four columns of numbers show these
quantities for parameters other than the taste distribution
parameters. The posterior means and standard deviations in this
table may be interpreted in the same way as estimated parameters and
estimated standard errors that are produced by classical procedures.
Some utility coefficients, such as the price coefficient and the
consumer uncertainty (see Equations \eqref{eq:pcoeff} and
\eqref{eq:scoeff}), are transformations of the parameters in the
table. For some utility coefficients, such as the Other Powder
taste, the population variance was restricted to be 0. These
parameters are shown with dashes.}}
\end{center}
\end{table}
\begin{table}[h]\caption{Parameter Estimates: Coupon Probabilities}\label{table:estresstr2}
\begin{center}
\begin{tabular}{ c c c}
\hline
Coefficient & Mean & Standard Err. \\
\hline
Non-Introductory Periods ($p_{cj}^0$) & & \\
\hline
Other L & 0.268 & 0.018 \\
Era L & 0.259 & 0.023 \\
Wisk L & 0.314 & 0.014 \\
Tide L & 0.263 & 0.017 \\
Solo L & 0.349 & 0.019 \\
Cheer L & 0.16 & 0.016 \\
Surf L & 0.237 & 0.018 \\
Dash L & 0.001 & 0.001 \\
Other P & 0.191 & 0.024 \\
Tide P & 0.221 & 0.023 \\
Cheer P & 0.214 & 0.013 \\
Surf P & 0.039 & 0.012 \\
Dash P & 0.038 & 0.012 \\
\hline
Introductory Adjustment & & \\
\hline
Cheer ($p_{cj}^1$) & -0.001 & 0.007 \\
Surf ($p_{cj}^1$) & -0.078 & 0.007 \\
Dash ($p_{cj}^1$) & 0.13 & 0.004 \\
Est., After Cheer ($p_{c}^{Cheer,1}$) & -0.038 & 0.012 \\
Est., After Surf ($p_{c}^{Cheer,1}$) & -0.038 & 0.012 \\
Est., After Dash ($p_{c}^{Cheer,1}$) & 0.015 & 0.009 \\
\hline
& & \\
\end{tabular}
\parbox{\linewidth}{\footnotesize{This table shows the estimates
of the coupon distribution described in Section \ref{sec:dynopt}.
The numbers in the first column under the heading ``Non-Introductory
Periods'' are the probability a consumer receives a coupon for a
given product after any new product's ``introductory'' period: the
period after the first 3 months after a new product introduction.
The numbers under the heading ``Introductory Adjustment'' are added
to the probabilities under the previous heading during a given
product's introductory period (the first 3 months after its
introduction). For example, the probability of getting Surf during
its introductory period is 0.237 - 0.078 = 0.159, and the
probability of getting a Liquid Tide coupon during Surf's
introductory period is 0.263 - 0.038 = 0.225.}}
\end{center}
\end{table}
\begin{table}[h]\caption{Average Values of Consumer Uncertainty for New Products}\label{table:unc1}
\begin{center}
\begin{tabular}{ c c c }
\hline
Product & Mean of $\sigma^2$ & Population Variance \\
\hline
Cheer & 2.03 & 0.47 \\
Surf & 1.85 & 0.50\\
Dash & 1.81 & 0.34 \\
\hline
& & \\
\end{tabular}
\parbox{\linewidth}{\footnotesize{I computed the uncertainties in
the table using the individual-level draws denoted as $\theta_i$ in
the body of the paper: for each consumer I save her individual-level
parameter draws in each step of the MCMC algorithm, and her
individual level $\sigma^2$ for each product, which is computed
according to equation \eqref{eq:scoeff}. In a given step I compute
the population mean of $\sigma^2$ and its variance, and average
calculate her uncertainty. These values are averaged across steps.}}
\end{center}
\end{table}
\begin{table}[h]\caption{Average Consumer Uncertainty, Across Demographics}\label{table:unc2}
\begin{center}
Cheer\linebreak
\begin{small}
\begin{tabular}{ c | c c c c | c }
\hline
Size/Income & $<$ 20,000 & 20,000 - 40,000 & 40,000 - 60,000 & 60,000+ & Averages \\
\hline
1 & 1.969 & 2.059 & 2.194 & 2.393 & 2.021 \\
2 & 1.837 & 2.021 & 2.253 & 2.399 & 2.042 \\
3 & 1.789 & 1.927 & 2.128 & 2.304 & 2.073 \\
4+ & 1.686 & 1.865 & 2.025 & 2.189 & 1.988 \\
\hline
Averages & 1.864 & 1.936 & 2.096 & 2.265 & 2.029 \\
\hline
\end{tabular}\linebreak\parbox{\linewidth}{ }\linebreak
\end{small}
Surf\linebreak
\begin{small}
\begin{tabular}{ c | c c c c | c }
\hline
Size/Income & $<$ 20,000 & 20,000 - 40,000 & 40,000 - 60,000 & 60,000+ & Averages \\
\hline
1 & 2.006 & 1.907 & 1.727 & 1.497 & 1.944 \\
2 & 2.046 & 1.89 & 1.799 & 1.603 & 1.889 \\
3 & 2.032 & 1.902 & 1.782 & 1.682 & 1.821 \\
4+ & 1.997 & 1.933 & 1.801 & 1.679 & 1.831 \\
\hline
Averages & 2.024 & 1.911 & 1.792 & 1.669 & 1.852 \\
\hline
\end{tabular}\linebreak\parbox{\linewidth}{ }\linebreak
\end{small}
Dash\linebreak
\begin{small}
\begin{tabular}{ c | c c c c | c }
\hline
Size/Income & $<$ 20,000 & 20,000 - 40,000 & 40,000 - 60,000 & 60,000+ & Averages \\
\hline
1 & 1.868 & 1.827 & 1.786 & 1.914 & 1.851 \\
2 & 1.863 & 1.83 & 1.813 & 1.764 & 1.829 \\
3 & 1.927 & 1.813 & 1.798 & 1.759 & 1.802 \\
4+ & 1.893 & 1.826 & 1.792 & 1.77 & 1.805 \\
\hline
Averages & 1.878 & 1.823 & 1.796 & 1.766 & 1.814 \\
\hline
\end{tabular}
\end{small}
\linebreak\parbox{\linewidth}{ }\linebreak
\parbox{\linewidth}{\footnotesize{This table shows the average
uncertainty in the population for each new product, which
corresponds to the variable $\sigma^2$ from Section
\ref{sec:modelspec}. They are computed in the same way as the
numbers from the previous table.}}
\end{center}
\end{table}
\begin{table}[h]\caption{Effect of Removing Learning On New Product Market Share (Full Model)}\label{table:nolearn}
\begin{center}
\begin{small}
\begin{tabular}{ c c c c}
\hline
& Predicted Market Share, & Predicted Market Share, & \\
Product & Learning & No Learning & \% Change \\
\hline
Entire Period & & & \\
\hline
Cheer & 3.0 & 4.0 & 34 \\
Surf & 6.9 & 8.5 & 24 \\
Dash & 1.4 & 2.2 & 58 \\
\hline
1st 12 Weeks After Intro & & & \\
\hline
Cheer & 8.8 & 9.6 & 9.0 \\
Surf & 12.4 & 13.7 & 10 \\
Dash & 6.9 & 9.0 & 30 \\
\hline
& & & \\
\end{tabular}
\end{small}
\parbox{\linewidth}{\footnotesize{The first column of the table shows the
simulated market share at the parameter estimates (average of market
shares predicted at each step of the MCMC algorithm). The second
column of the table shows the market share when every consumer knows
her true taste draws for all three products. The market shares are
predicted at the data, so prices, features, etc. are not changed.
The first three rows show the market shares aggregated over the
entire data length, and the last 3 show the market shares for each
new product during the first 12 weeks after its introduction.}}
\end{center}
\end{table}
\begin{table}[h]\caption{Effect of Removing Learning On New Product Market Share
(No switching costs Model)}\label{table:nolearnnohabits}
\begin{center}
\begin{tabular}{ c c c c }
\hline
& Predicted Market Share, & Predicted Market Share, & \\
Product & Learning & No Learning & \% Change \\
\hline
Entire Period & & & \\
\hline
Cheer & 3.5 & 3.8 & 9.9 \\
Surf & 6.8 & 7.8 & 15 \\
Dash & 1.8 & 2.4 & 34 \\
\hline
1st 12 Weeks After Intro & & & \\
\hline
Cheer & 9.3 & 10.2 & 9.5 \\
Surf & 11.9 & 13.6 & 14 \\
Dash & 8.1 & 9.9 & 22 \\
\hline
& & & \\
\end{tabular}
\end{center}
\end{table}
\begin{table}[h]\caption{Actual and Predicted Market Shares}\label{table:predsharecomp}
\begin{center}
\begin{tabular}{ c c c c c }
\hline
& Actual & Full Model & No switching costs & No Learning \\
\hline
Other L & 12.0 & 11.5 & 11.3 & 9.1 \\
Era & 6.8 & 6.8 & 6.8 & 6.4 \\
Wisk & 10.2 & 10.3 & 10.1 & 8.6 \\
Tide & 11.0 & 10.7 & 11.1 & 10.5 \\
Solo & 3.3 & 3.2 & 3.3 & 3.0 \\
Cheer & 3.2 & 3.0 & 3.5 & 1.4 \\
Surf & 6.6 & 6.9 & 6.8 & 5.5 \\
Dash & 1.6 & 1.4 & 1.8 & 1.8 \\
Other P & 18.6 & 19.1 & 17.1 & 15.1 \\
Tide P & 16.1 & 15.4 & 17.0 & 18.9 \\
Cheer P & 7.4 & 7.4 & 7.3 & 9.8 \\
Surf P & 3.1 & 3.4 & 3.2 & 8.7 \\
Dash P & 0.75 & 0.90 & 0.86 & 1.13 \\
\hline
Avg Abs Prediction Error \\
\hline
Full Period (12/85 - 08/88) & & 0.26 & 0.33 & 1.8 \\
Cheer Intro (05/86 - 08/86) & & 1.3 & 1.7 & 3.2 \\
Surf Intro (09/86 - 11/86) & & 0.9 & 0.9 & 2.0 \\
Dash Intro (03/87 - 06/87) & & 0.7 & 0.7 & 1.8 \\
Last 63 Weeks (06/87 - 08/88) & & 0.6 & 0.5 & 0.6 \\
\hline
& & & \\
\end{tabular}
\parbox{\linewidth}{\footnotesize{The first thirteen rows of the first column of the table shows
the actual market share of each of the products during the times
they were available. The first thirteen rows of the second, third
and fourth columns show the simulated market shares at the parameter
estimates from each model. The last 5 rows show the absolute
difference between the predicted and actual shares, averaged over
products in different periods. The periods Cheer Intro, Surf Intro,
and Dash Intro refer to the first 12 weeks after the introduction of
each new product.}}
\end{center}
\end{table}
%\begin{table}[h]\caption{Average Absolute Error in Predicted Market Shares, In Different Periods}\label{table:predshareerror}
%\begin{center}
%\begin{tabular}{ c c c c c }
%\hline
% & Cheer Intro & Surf Intro & Dash Intro & Last 63 Weeks \\
%\hline
% Full Model & 1.3 & 0.9 & 0.7 & 0.6 \\
% No switching costs & 1.7 & 0.9 & 0.7 & 0.5 \\
% No Learning & 3.2 & 2.0 & 1.8 & 0.6 \\
% \hline
% & & & \\
%\end{tabular}
%\parbox{\linewidth}{\footnotesize{This table shows the absolute difference between the predicted and actual
%shares, averaged over products, for different periods. The periods
%Cheer Intro, Surf Intro, and Dash Intro refer to the first 12 weeks
%after the introduction of each new product.}}
%\end{center}
%\end{table}
\begin{table}[h]\caption{Counterfactual: Impact of Price Cuts in Periods 1,3 and 5}\label{table:fullpd}
\begin{center}
\begin{small}
\begin{tabular}{ c c c c c c c c c c c c c c c c}
\hline
Model & \multicolumn{9}{c}{Full Model} & \multicolumn{3}{c}{No Learning Model} & \multicolumn{3}{c}{No S.C. Model} \\
Dynamics & \multicolumn{3}{c}{All Dynamics} & \multicolumn{3}{c}{No Learning} & \multicolumn{3}{c}{No S.C.} \\
Period & 1 & 3 & 5 & 1 & 3 & 5 & 1 & 3 & 5 & 1 & 3 & 5 & 1 & 3 & 5
\\
\hline
Cheer & & & & & \\
\hline
1 & 12.3 & & & 9.65 & & & 11.5 & & & 11.9 & & & 9.95 & & \\
2 & 0.87 & & & 1.17 & & & 0.16 & & & 2.18 & & & 0.08 & & \\
3 & 0.41 & 10.6 & & 0.60 & 7.85 & & 0.15 & 10.8 & & 1.54 & 9.09 & & 0.06 & 9.43 & \\
4 & 0.22 & 0.99 & & 0.31 & 1.14 & & 0.14 & 0.13 & & 1.29 & 1.22 & & 0.05 & 0.05 & \\
5 & 0.17 & 0.46 & 9.72 & 0.20 & 0.60 & 7.37 & 0.14 & 0.12 & 10.3 & 1.20 & 0.55 & 8.61 & 0.05 & 0.04 & 9.18 \\
6 & 0.14 & 0.26 & 1.00 & 0.14 & 0.32 & 1.11 & 0.12 & 0.11 & 0.11 & 1.15 & 0.29 & 1.17 & 0.05 & 0.04 & 0.05 \\
7 & 0.12 & 0.18 & 0.48 & 0.10 & 0.21 & 0.58 & 0.12 & 0.11 & 0.11 & 1.12 & 0.19 & 0.54 & 0.04 & 0.04 & 0.04 \\
8 & 0.11 & 0.14 & 0.26 & 0.08 & 0.14 & 0.30 & 0.11 & 0.10 & 0.10 & 1.10 & 0.13 & 0.29 & 0.04 & 0.04 & 0.04 \\
9 & 0.10 & 0.12 & 0.20 & 0.06 & 0.11 & 0.20 & 0.10 & 0.09 & 0.09 & 1.08 & 0.10 & 0.19 & 0.04 & 0.03 & 0.03 \\
10 & 0.09 & 0.11 & 0.15 & 0.05 & 0.09 & 0.14 & 0.10 & 0.09 & 0.08 & 1.07 & 0.07 & 0.13 & 0.03 & 0.03 & 0.03 \\
\hline
Surf & & & & & \\
\hline
1 & 11.9 & & & 10.2 & & & 11.3 & & & 13.7 & & & 10.2 & & \\
2 & 0.80 & & & 1.34 & & & 0.00 & & & 3.22 & & & 0.08 & & \\
3 & 0.33 & 10.3 & & 0.69 & 8.33 & & 0.03 & 10.8 & & 2.60 & 10.1 & & 0.13 & 9.84 & \\
4 & 0.13 & 1.10 & & 0.35 & 1.27 & & 0.05 & 0.04 & & 2.34 & 1.39 & & 0.15 & 0.15 & \\
5 & 0.09 & 0.50 & 9.53 & 0.24 & 0.69 & 7.80 & 0.06 & 0.05 & 10.5 & 2.25 & 0.62 & 9.66 & 0.16 & 0.14 & 9.48 \\
6 & 0.07 & 0.22 & 1.18 & 0.17 & 0.36 & 1.25 & 0.07 & 0.06 & 0.05 & 2.20 & 0.31 & 1.38 & 0.15 & 0.14 & 0.15 \\
7 & 0.06 & 0.14 & 0.58 & 0.13 & 0.24 & 0.67 & 0.07 & 0.06 & 0.06 & 2.18 & 0.19 & 0.65 & 0.14 & 0.13 & 0.14 \\
8 & 0.06 & 0.09 & 0.28 & 0.10 & 0.17 & 0.34 & 0.07 & 0.06 & 0.06 & 2.16 & 0.12 & 0.33 & 0.13 & 0.12 & 0.13 \\
9 & 0.06 & 0.07 & 0.19 & 0.08 & 0.12 & 0.23 & 0.06 & 0.06 & 0.05 & 2.14 & 0.09 & 0.21 & 0.12 & 0.12 & 0.12 \\
10 & 0.06 & 0.06 & 0.14 & 0.07 & 0.10 & 0.16 & 0.06 & 0.06 & 0.05 & 2.14 & 0.07 & 0.14 & 0.12 & 0.11 & 0.11 \\
\hline
Dash & & & & & \\
\hline
1 & 14.2 & & & 11.8 & & & 13.9 & & & 15.0 & & & 13.3 & & \\
2 & 1.34 & & & 1.45 & & & 0.39 & & & 2.25 & & & 0.61 & & \\
3 & 0.78 & 12.5 & & 0.71 & 9.64 & & 0.35 & 13.2 & & 1.52 & 12.8 & & 0.49 & 12.4 & \\
4 & 0.52 & 1.43 & & 0.38 & 1.43 & & 0.31 & 0.31 & & 1.25 & 1.53 & & 0.40 & 0.40 & \\
5 & 0.42 & 0.77 & 11.8 & 0.24 & 0.69 & 9.10 & 0.28 & 0.27 & 12.7 & 1.15 & 0.63 & 12.4 & 0.34 & 0.34 & 11.9 \\
6 & 0.35 & 0.50 & 1.41 & 0.17 & 0.36 & 1.39 & 0.25 & 0.24 & 0.24 & 1.09 & 0.31 & 1.62 & 0.29 & 0.29 & 0.29 \\
7 & 0.31 & 0.41 & 0.77 & 0.12 & 0.24 & 0.71 & 0.23 & 0.22 & 0.22 & 1.07 & 0.19 & 0.67 & 0.25 & 0.25 & 0.25 \\
8 & 0.28 & 0.33 & 0.48 & 0.10 & 0.16 & 0.39 & 0.21 & 0.20 & 0.20 & 1.06 & 0.13 & 0.35 & 0.22 & 0.22 & 0.22 \\
9 & 0.25 & 0.28 & 0.37 & 0.08 & 0.12 & 0.25 & 0.19 & 0.18 & 0.18 & 1.04 & 0.10 & 0.23 & 0.20 & 0.20 & 0.20 \\
10 & 0.22 & 0.24 & 0.30 & 0.06 & 0.10 & 0.16 & 0.17 & 0.17 & 0.16 & 1.02 & 0.07 & 0.16 & 0.18 & 0.18 & 0.18 \\
\hline
& \\
\end{tabular}
\end{small}
\parbox{\linewidth}{\footnotesize{This table shows simulated percentage change in market share due to a ten percent, 1 period
price cut in Cheer, Surf or Dash. The columns labeled ``Period''
denotes the period when the price cut takes place. The first three
columns show the impact of the price cut under the estimates of the
full model. The next three show the impact of the price cut under
the full model estimates when consumers know their tastes, and the
next three when the switching costs parameters are set to zero. The
final six columns show the impact of the price cut using the
estimates of the restricted models.}}
\end{center}
\end{table}
\begin{table}[h]\caption{Counterfactual: Effect of Informative Advertising}\label{table:cf2}
\begin{center}
\begin{tabular}{ c c c c c c c }
\hline Product & \multicolumn{4}{c}{Full Model} &
\multicolumn{2}{c}{No S.C. Model} \\
Dynamics & \multicolumn{2}{c}{All Dynamics} & \multicolumn{2}{c}{No
S.C.}
\\
\hline
& Share & Revenue & Share & Revenue & Share & Revenue \\
\hline
Cheer \\
\hline
1 & 15.06 & 17.51 & 17.74 & 28.92 & 10.96 & 14.87 \\
2 & 17.58 & 22.99 & 16.65 & 27.64 & 8.82 & 12.00 \\
3 & 18.97 & 26.47 & 15.70 & 26.39 & 7.52 & 10.23 \\
4 & 19.22 & 27.72 & 14.78 & 25.13 & 6.60 & 8.97 \\
5 & 19.14 & 28.29 & 13.93 & 23.90 & 5.92 & 8.04 \\
\vdots \\
10 & 16.90 & 26.84 & 10.38 & 18.45 & 3.96 & 5.38 \\
\hline
Total & & 261.36 & & 234.53 & & 85.34 \\
\hline
Surf \\
\hline
1 & -6.04 & -9.77 & 5.01 & 10.39 & 10.88 & 18.30 \\
2 & -2.61 & -4.25 & 5.38 & 11.01 & 13.17 & 21.40 \\
3 & 0.18 & 0.96 & 5.69 & 11.60 & 13.95 & 22.56 \\
4 & 1.80 & 4.18 & 5.86 & 11.95 & 14.02 & 22.76 \\
5 & 2.89 & 6.45 & 5.88 & 12.02 & 13.79 & 22.54 \\
\vdots \\
10 & 5.04 & 11.14 & 5.06 & 10.55 & 11.39 & 19.40 \\
\hline
Total & & 46.86 & & 113.33 & & 211.34 \\
\hline
Dash \\
\hline
1 & 24.49 & 12.75 & 32.09 & 26.18 & 33.09 & 21.42 \\
2 & 28.26 & 16.40 & 29.77 & 24.93 & 30.37 & 20.43 \\
3 & 29.81 & 18.62 & 27.96 & 23.95 & 28.01 & 19.46 \\
4 & 29.97 & 19.49 & 26.33 & 23.04 & 25.97 & 18.54 \\
5 & 29.66 & 19.98 & 24.94 & 22.26 & 24.15 & 17.70 \\
\vdots \\
10 & 26.58 & 19.88 & 19.34 & 18.63 & 18.08 & 14.46 \\
\hline
Total & & 188.12 & & 220.56 & & 175.66 \\
\hline
\\
\end{tabular}
\parbox{\linewidth}{\footnotesize{This table shows the impact of
informative advertising in the first period on product market
shares, in percentage terms, and on product revenues, which are in
dollars. Consumers are given a signal on the product's quality which
has half the variance of their uncertainty about their match value
with the product. The first two columns show the simulated shares
and revenues using the results of the full model, while the third
and fourth use the results of the full model with the switching cost
parameters restricted to 0. The final two columns show the results
for the model with no switching costs.}}
%\parbox{\linewidth}{\footnotesize{For Dash, the effect of informative advertising is calculated for
%two ``intermediate run'' periods. The first intermediate run period
%is the 6 months after the introductory period. The second is the
%time after the introductory period until the end of the sample
%period, a length of 62 weeks. Results from the longer intermediate
%run period for Cheer and Surf are very similar to those shown for
%the 6 month period and are omitted from the table.}}
\end{center}
\end{table}
\begin{table}[h]\caption{Counterfactual: Long Term Empirical Price Elasticities}\label{table:elastslt}
\begin{center}
\begin{small}
\begin{tabular}{ c c c c c c c c c c c c c }
\hline
&
\multicolumn{4}{c}{Full Model} & \multicolumn{4}{c}{No
Learning Model} & \multicolumn{4}{c}{No S.C. Model} \\
& Cheer & Surf & Dash & Tide L & Cheer & Surf & Dash & Tide L & Cheer & Surf & Dash & Tide L \\
\hline
Cheer & & & & & & & & & & & & \\
\hline
1 & -1.85 & 0.14 & 0.04 & 0.12 & -1.19 & 0.06 & 0.03 & 0.14 & -1.10 & 0.10 & 0.05 & 0.09 \\
3 & -1.73 & 0.15 & 0.05 & 0.12 & -1.21 & 0.08 & 0.04 & 0.14 & -1.02 & 0.09 & 0.05 & 0.09 \\
5 & -1.64 & 0.15 & 0.05 & 0.11 & -1.22 & 0.09 & 0.04 & 0.14 & -0.99 & 0.09 & 0.05 & 0.09 \\
\hline
Surf & & & & & & & & & & & & \\
\hline
1 & - & -1.80 & 0.05 & 0.18 & - & -1.37 & 0.04 & 0.20 & - & -1.14 & 0.05 & 0.11 \\
3 & - & -1.67 & 0.07 & 0.16 & - & -1.44 & 0.05 & 0.19 & - & -1.08 & 0.05 & 0.11 \\
5 & - & -1.60 & 0.08 & 0.16 & - & -1.47 & 0.06 & 0.19 & - & -1.06 & 0.05 & 0.11 \\
\hline
Dash & & & & & & & & & & & & \\
\hline
1 & - & - & -1.84 & 0.27 & - & - & -1.50 & 0.20 & - & - & -1.42 & 0.16 \\
3 & - & - & -1.87 & 0.27 & - & - & -1.62 & 0.22 & - & - & -1.42 & 0.17 \\
5 & - & - & -1.87 & 0.27 & - & - & -1.66 & 0.23 & - & - & -1.39 & 0.17 \\
\hline
Wisk L & & & & & & & & & & & & \\
\hline
1 & 0.11 & 0.19 & 0.06 & 0.20 & 0.05 & 0.10 & 0.04 & 0.23 & 0.08 & 0.12 & 0.06 & 0.13 \\
3 & 0.12 & 0.21 & 0.07 & 0.19 & 0.07 & 0.14 & 0.06 & 0.21 & 0.08 & 0.11 & 0.06 & 0.13 \\
5 & 0.12 & 0.20 & 0.08 & 0.18 & 0.08 & 0.15 & 0.06 & 0.21 & 0.08 & 0.12 & 0.06 & 0.13 \\
\hline
Tide L & & & & & & & & & & & & \\
\hline
1 & 0.10 & 0.17 & 0.05 & -1.55 & 0.04 & 0.07 & 0.03 & -1.56 & 0.06 & 0.09 & 0.04 & -0.82 \\
3 & 0.12 & 0.19 & 0.07 & -1.48 & 0.06 & 0.10 & 0.04 & -1.48 & 0.06 & 0.09 & 0.04 & -0.82 \\
5 & 0.12 & 0.19 & 0.07 & -1.45 & 0.06 & 0.11 & 0.05 & -1.47 & 0.06 & 0.09 & 0.05 & -0.82 \\
\hline
Tide P & & & & & & & & & & & & \\
\hline
1 & 0.10 & 0.19 & 0.05 & 0.12 & 0.05 & 0.07 & 0.03 & 0.13 & 0.05 & 0.08 & 0.03 & 0.07 \\
3 & 0.09 & 0.17 & 0.05 & 0.11 & 0.05 & 0.08 & 0.03 & 0.12 & 0.05 & 0.07 & 0.03 & 0.07 \\
5 & 0.09 & 0.16 & 0.05 & 0.11 & 0.05 & 0.08 & 0.03 & 0.12 & 0.05 & 0.07 & 0.04 & 0.07 \\
\hline
\\
\end{tabular}
\end{small}
\parbox{\linewidth}{\footnotesize{This table shows the empirical
price elasticities implied by my model estimates, $\frac{\partial
Q_{it}}{\partial P_{j}}\frac{P_{j}}{Q_{it}}$. The row labels show
the product $i$ that is being affected, and the period of interest.
The column labels show the product $j$ whose price is being changed.
The price change is assumed to occur from period 1 onwards, and
consumers are assumed to understand this, which is why this is a
long term elasticity. Furthermore, when this elasticity is computed,
the prices of all products are set to be constant over time. Thus
the number in the first row and second column, 0.14, shows the
impact of a permanent price cut in Cheer from period 1 onwards on
Surf's period 1 market share. The first four columns show the
elasticities implied by the full model, the next four show the
elasticities for the no learning model, and the next four for the no
switching costs model.}}
\end{center}
\end{table}
%\FloatBarrier
\clearpage
%dvi figures
%\figure 1
%\Probability of Experimenting
%\Graph shows an upward S shaped surface
%\figure 2
%\Response of Probability of Experiment to a Future Price Increase
%\Shows a sine-wave type surface distributed across gamma sub i
%\figure 3
%\Overlay of level curves from Previous figures
%\shows circular curves from figures one and two with horizontal levels from .1 to .9
%\figure 4
%\Histogram of extimated alpha i. Shows 45% of hits at price level five, 10%
%\of hits at level ten, and declining numbers at higher price levels.
%\figure 5
%\Posterior Density of sigma ^2 sub i0 for Cheer
%\Shows curve with two peaks at .2 and .5
%\figure 6
%\Posterior density of Sigma^2 sub i0 for Surf
%\Shows single peak curve around .4
%\figure 7
%\Posterior Density for sigma^2 sub i0 for Dash
%\shows single peak around .2 with bulk of density.
%\figure 8
%\Estimated Taste Distributions for Cheer
%\Shows two density functions, with the actual density of tastes for Cheer being much broader and flatter
%\than the expected taste for Cheer.
\end{document}
\begin{table}[h]\caption{Identification of Variance of $\sigma_{ij}^2$}\label{table:identvarsig}
\begin{center}
\begin{tabular}{ c c c c }
\hline
$\sigma_\sigma^2$ & $\eta_i = 0$ & $\eta_i = 0.5$ & $\eta_i =
-0.5$
\\
\hline
0.1 & 0.0153 & 0.0609 & -0.0736 \\
1.0 & 0.0148 & 0.0606 & -0.0763 \\
2.0 & 0.0143 & 0.0603 & -0.0783 \\
3.0 & 0.0139 & 0.0601 & -0.0797 \\
4.0 & 0.0136 & 0.0600 & -0.0808 \\
\hline
\\
\end{tabular}
\parbox{0.9\linewidth}{\footnotesize{This table shows simulations from the three period
version of the theoretical model presented in Section
\ref{sec:theory}. The quantity simulated is the probability of a
first purchase of product 2 when the price of product 2 is 0,0, and
1 minus the probability when prices are 0,1 and 1. I found that for
$\eta_i=-0.5$, the purchase probabilities were not affected by the
future price change, so the two price paths were 0,0 and 2 and 0, 2
and 2, respectively.}}
\end{center}
\end{table}
I assume that these distributions are normal, so there are four
moments to identify. I have already discussed how the mean of
$\sigma_{ij}^2$ and the first two moments of $\gamma_{i0}$ can be
identified when the variance of $\sigma_{ij}^2$ is zero. To identify
the variance of $\sigma_{ij}^2$, it will be necessary to include one
more moment from the data, which will be the sample analog of the
probability difference shown in Figure \ref{fig:prexperdiff}.
To see how changing the variance in $\sigma_{ij}^2$ affects this
moment, I have computed the population probability difference in
Figure \ref{fig:prexperdiff} under different distributional
assumptions on $\sigma_{ij}^2$ in Table \ref{table:identvarsig}. I
assume that the $\gamma_{i0}$ is distributed standard normal, and
that the $\sigma_{ij}^2$ is $S_B$, so its distribution is
\begin{displaymath}
\sigma_{ij}^2 = 5.0\frac{\exp(x)}{1+\exp(x)};\quad x \sim
N(\sigma_\mu,\sigma_\sigma^2).
\end{displaymath} I assume that the mean of $x$, $\sigma_\mu$, is 0, and its
variance, $\sigma_\sigma^2$, is moved from 0.1 to 4. From the table,
it can be seen that overall consumer response to future price
changes decreases as the variance parameter of $\sigma_{ij}^2$ is
increased. Thus, if this population moment is smaller than the model
would predict when there is no heterogeneity in $\sigma_{ij}^2$,
that difference will be assigned to the variance of $\sigma_{ij}^2$.
This argument was made for $\eta_i=0$, but similar arguments can be
made for consumers who have positive and negative values of
$\eta_i$. In my thesis research I demonstrate that it is possible to
separate out consumers with different $\gamma_{i0}$'s and
$\sigma_{ij}^2$'s in each case. The population response to changes
in future price changes follows a similar pattern for both positive
and negative values of $\eta_i$, as shown in the second and third
columns of Table \ref{table:identvarsig}.
\section{Theoretical Example}\label{sec:theory}
In this section I will present a simple theoretical model of
consumer learning and experimentation that nests alternative sources
of dynamics in demand by allowing individual consumers to have
switching costs or a taste for variety, and briefly discuss its
testable implications. I will also briefly discuss some of my
previous research that finds support for these implications in the
same data set I am using. The structural model I estimate nests the
model that I will present here: since this model is simpler, it is
easier to examine the model's working parts and explain the
intuition behind some of its implications. In my model, learning
happens when a consumer purchases a new product and finds out her
taste for it. If consumers are forward-looking, they will recognize
that if they purchase the new product and like it they will be
better off in the future. This means that there will be an option
value of learning, which will lead to experimentation: consumers
will purchase the new product sooner than if they were myopic.
There are two reasons I wish to discuss this simple model and
examine its implications. First, as I discussed in the introduction,
one of the tasks I wish to perform is to examine the impact of an
introductory price cut for a new product on its intermediate run
market share (the product's market share in periods after the price
is raised) under three different sets of assumptions about the
dynamics in demand:
\begin{enumerate}
\item[i)] consumers only learn and have no switching costs;
\item[ii)] consumers only have switching costs, and know their true match
values;
\item[iii)] consumers learn and have switching costs at the same time.
\end{enumerate} The impact of the price cut could be larger in case i) or ii)
compared to iii), or it could be smaller. By solving for the option
value of learning in these cases, we can get a better idea of when
the impact will be larger or smaller. Second, by solving for the
model's testable implications we will better understand what type of
variation in the data isolates learning from other forces. These
implications will still hold in the more complicated structural
model, and I will refer to them again in Section \ref{sec:ident},
where I discuss its identification. Further, the fact that support
has been found for these implications in previous research in the
data set I use suggests that the variation in the data is of the
right kind to isolate learning.
Let us consider a market with 2 products. The first, which I denote
product 1, is an established product which everyone knows their
taste for. The second, which I denote product 2, is a new product
which consumers may have to purchase and consume in order to find
out how much they like it. The new product in this market is an
experience good; other methods of learning, such as learning by
search or social learning, are not considered. I assume that the set
of consumers in the market stays constant over time, and that
consumer purchase one unit of each product every period.\footnote{In
my Ph.D. thesis \cite{osborne06}, this last assumption is relaxed;
the two implications I described in the introduction still hold, and
a third implication is derived: consumers will purchase smaller
sizes of the new product on their first purchase. Since I do not
model size choice in my econometric model, I will not discuss it in
the theoretical model either.}
Consumer tastes for each product consist of three parts, as shown in
Equation \ref{eq:tastessd}: a permanent part which takes learning
into account, a part that accounts for switching costs or
variety-seeking, and an idiosyncratic component of tastes that is
i.i.d. across consumers, products and time.\footnote{The function
$\textbf{1}\{\cdot\}$ returns 1 when its argument is true, and 0
when it is false.}
\begin{equation}\label{eq:tastessd}
\begin{split}
\mbox{Product 1 } & : 0 + \eta_i \textbf{1}\{y_{t-1}=1\} \\
\mbox{Product 2, expected } & : \gamma_{i}^0 + \eta_i \textbf{1}\{y_{t-1}=2\} + \varepsilon_{it} \\
\mbox{Product 2, taste known } & : \gamma_{i} + \eta_i \textbf{1}\{y_{t-1}=2\} + \varepsilon_{it} \\
\end{split}
\end{equation}
The permanent part of tastes for product 1 is normalized to 0. For
product 2, before consumer $i$ has purchased it for the first time,
she does not know how much she likes it, but she has a prediction of
how much she expects to like it, $\gamma_i^0$. I assume that
consumers are rational, so that this prediction is correct on
average. The consumer's true taste or intrinsic match value for
product 2, $\gamma_{i}$, becomes known to her when she makes her
first purchase of the new product. I assume that at time 0 each
consumer is assigned a value of $\gamma_i^0$ from $N(\mu^0,
(\sigma^0)^2)$, and that when the consumer first purchases and
consumes product 2 she receives learns $\gamma_i$, which is draw
from a normal distribution with mean $\gamma_i^0$ and variance
$\sigma^2$. The parameter $\sigma^2$ accounts for the consumer's
uncertainty about her true taste draw for product 2. If
$\sigma^2$=0, then the expected and true taste draws will be the
same and there is no learning. I interpret the $\gamma_i$ as a
consumer's match value with product 2. If the products are
detergents, then the match value could be how well the product
cleans the consumer's clothes. This could be individual-specific
since wardrobes may vary across individuals, and different
detergents may do better jobs on different types of fabrics.
%The learning process I have specified is a departure from previous
%empirical literature on structural modeling of learning, where it
%takes more than one consumption event to learn the match value. In
%these papers, consumer learning is modeled as a Bayesian process. A
%relevant example is Crawford and Shum (2000); in this paper,
%consumer match values vary across the population, as in mine.
%Consumers receive an imperfect signal of their true match values in
%each consumption event. Consumers are also assumed to be rational,
%so their prior tastes are drawn from the actual distribution of
%tastes. In my work, I will also assume that consumers are rational -
%they know their distribution of true tastes conditional on their
%predicted taste $\gamma_i^0$.
The term $\eta_i$ allows dynamics in demand even if $\sigma^2=0$. A
consumer's utility is increased by $\eta_i$ if she purchases the
same product in period $t$ as she did in period $t-1$. I interpret a
positive value of $\eta_i$ as a switching cost (\citeN{pollack70},
\citeN{spinnewyn81}). An alternative way to model switching costs
would be to subtract a positive $\eta_i$ from all products except
the one that was previously chosen; since utility functions are
ordinal and there is no outside good in this model, these two
formulations are equivalent. As discussed in the introduction,
switching costs have been found to be an important part of demand
for consumer packaged goods, and could arise if there are costs of
recalculating utility if a consumer decides to switch products. I
interpret a negative value of $\eta_i$ as variety-seeking
\cite{mcpess82}. Variety-seeking is not likely an important behavior
in laundry detergent markets, but I allow it in the model for the
sake of generality.
I assume that consumers are forward-looking and discount the future
at a rate $\delta \geq 0$. This means that there when a consumer
decides to make a first purchase of the new product, she will look
at the future benefits of consuming it: she might like it better
than product 1 and continue to purchase it. This means there will be
an option value of experimentation, which will be positive when
there are no alternative dynamics in demand. If there are switching
costs it will be possible for it to be negative, since if the
consumer ends up not liking the new product she will lose utility
from having to switch brands. The option value of experimentation is
also always increasing in $\sigma^2$, which will lead consumers to
purchase the new product sooner than they would have if $\delta=0$.
I denote this behavior as experimentation.
As I mentioned in the introduction, the option value of
experimentation will affect consumer responses to an introductory
price cut, which could in turn affect intermediate run market
shares. As an example, if consumers are only learners ($\eta_i=0$
$\forall i$ and $\sigma^2 > 0$), a price cut will draw in new
consumers, some of whom will find they have a high intrinsic match
value (a high $\gamma_i$) for the product and repurchase it. If
consumers are learners and have switching costs ($\eta_i>0$ $\forall
i$ and $\sigma^2 > 0$), it is possible for the price cut to be less
effective, since if consumers dislike switching brands and will
realize if their true match value for the product is low, they will
be worse off in the future from having to switch again. It is also
possible for the price cut to be more effective under switching
costs and learning than learning only if the switching cost is
particularly large. There are two reasons this could happen. First,
if the switching cost is large, then consumers who respond to the
price cut and learn that they have a low intrinsic match value may
continue to purchase the product anyway due to the cost of switching
brands. Second, if consumers expect to like the new product, the
switching costs could actually increase the option value of learning
- consumers will want to become locked in to a product that will end
up being better than their current favorite.
In summary, when there are switching costs one of two things can
happen to the option value of experimentation:
\begin{enumerate}
\item If consumers expect to have a low match value for the product (i.e. $\gamma_i^0$ is
low), then increasing $\eta_i$ can decrease the option value of
experimentation.
\item If consumers expect to have a high match value for the product (i.e. $\gamma_i^0$ is
high), then increasing $\eta_i$ can increase the option value of
experimentation.
\end{enumerate}
To see these two cases, I have solved the model above numerically
and graphed the option value of learning in Figure \ref{fig:opvalue}
for $\eta_i>0$ and $\eta_i=0$ for a number of values of
$\gamma_i^0$. When consumers expect to have low match values for the
new product the new product, the option value for $\eta_i>0$ lies
below that of $\eta=0$.
These numerical findings could be interesting to researchers who are
interested in targeted coupons for newly introduced experience
goods. For example, suppose that through previous market research,
such as observing individual household purchases through the use of
magnetic swipe cards, the researcher is able to infer each
consumer's $\eta_i$. If the researcher knows that an experience good
will be introduced to the market, then she will want to target the
coupons at consumers who will be more likely to keep purchasing the
product in the long run. If consumers on average expect to have low
match values for the product, then she should target low $\eta_i$
consumers; otherwise she should target high $\eta_i$ consumers.
It is also useful to examine the relative impact of an introductory
price cut on a new product's intermediate run market share when
there are switching costs only versus switching costs and learning.
When I discuss switching costs only, I am referring to the case
where consumers know their true taste draws for the new product, and
the distribution of true tastes is $N(\mu^0,
(\sigma^0)^2+\sigma^2)$. A firm could potentially neutralize the
impact of learning in a market with informative advertising, or by
distributing free samples of the new product.
A price cut could be more effective under switching costs only
($\eta_i > 0$ $\forall i$ and $\sigma^2=0$) as opposed to habit
formation and learning ($\eta_i > 0$ $\forall i$ and $\sigma^2>0$)
for the following reason: when there are switching costs only, the
price cut draws in consumers who will become habituated to the
product and continue to purchase it. When there are switching costs
and learning, some of these consumers will find they have a low
intrinsic match value for the product and will switch away from it.
In this case the firm may want to combine its price cut with
advertising in order to remove the learning\footnote{This argument
does not take into account that advertising alone could increase the
market share of a new product - if most consumers have low expected
tastes, then many of them may not experiment with the product even
though their actual match value for the product was high.
Advertising could inform these consumers of their high match values
and increase the product's intermediate run market share.}. As with
the case of learning only versus learning and switching costs, it is
also possible for the price cut to be more effective when there are
switching costs and learning as opposed to switching costs only.
Again, this could occur if the switching costs are particularly
large. When the only source of dynamics are switching costs,
consumers who know they have a low intrinsic match value for the new
product will be less likely to respond to the price cut. If there
are switching costs and learning, these consumers will not know
their true match value until they have purchased the new product.
They will be more responsive to the price cut and once they find
their true match value, the habituation will induce them to keep
purchasing the new product.
A task that may be of interest to researchers is to test for the
importance of learning; the null hypothesis for this test is that
$\sigma^2=0$, while the alternative is that $\sigma^2>0$. There are
two ways to do this; one is to use simple models to estimate demand
and to construct the test statistics associated with the two
testable implications I mentioned in the introduction, and will
describe again in a moment; the other is to estimate the structural
model and to directly test if $\sigma^2=0$, which is the approach
taken in this paper. Although the second approach is more difficult
to implement and requires more restrictive modeling assumptions, it
has the advantage that we can take the model away from the data and
perform ``what-if'' experiments. It also allows us to quantify the
importance of learning.
The two testable implications to this model are examined in
\citeN{osborne06}, who finds support for them in the same laundry
detergent scanner data which is used in this paper. The test
statistics associated with them are shares of consumers who take
actions at certain times, controlling for any time-series variation
in prices. The first implication is that, under the maintained
hypothesis that $\delta$ is high and $\eta_i = 0$ $\forall i$, in
the first two periods after the new product's introduction, the
share of consumers who purchase the new product and then do not is
greater than the share who do not and then do. This is because the
option value of experimentation induces consumers to purchase the
new product sooner rather than later\footnote{Since evidence in
favor of this implication is found in the data set I use
\cite{osborne06}, it is reasonable to conclude that for some new
products the option value of learning is positive, and that
consumers are forward-looking.}. When there is no learning, the test
statistic will be zero since the order of purchase does not matter.
The test may also be used when consumers have switching costs
($\eta_i > 0$ for all $i$), but it may be less powerful. The reason
for this is that the test statistic tends to be negative when there
is no learning and positive $\eta_i$; since the test statistic is a
continuous function of $\sigma^2$, it will still be negative for
some values of $\sigma^2$ close enough to zero. This turns out to be
an issue in cite my thesis, who finds that the test statistic is in
fact negative for one of the new products. Estimating the structural
model allows us to shed light on this issue: estimating the
structural model allows the researcher to recover the population
distribution of switching costs and variety-seeking, the $\eta_i$'s,
and the learning parameter, $\sigma^2$, directly.
The second testable implication is that for any value of the
discount factor and for any value of $\eta_i$, among consumers whose
previous purchase was the new product, the share of consumers who
repurchase the product increases over time if $\sigma^2>0$. This is
because initially the consumers whose previous purchase was the new
product consist mostly of consumers who are experimenting; later it
consists mostly of consumers who like the new product. This testable
implication is more robust than the first one, because it is true
for all values of the discount factor and any type of state
dependence in demand. However, the fact that it is true for all
values of the discount factor means that it does not tell the
researcher about the option value of experimentation. Support for
this implication is found for all new products in cite my thesis.
In \citeN{osborne06}, this model is expanded to allow consumers to
choose between different sizes of the product and to hold it in
inventory. Another implication of the model is that consumers will
choose smaller sizes of the new product on their initial purchase of
it relative to later purchases. This implication is also supported
by the data.
%The fact that there is support for these implications in suggests
%that learning is important. These also suggest a strategy for
%identification of the structural model. First, consider the
%identification of the $\eta$'s.
\begin{table}[h]\caption{Pooled (Fixed Effects) Purchase Decision Logits}\label{table:logits1}
\begin{center}
\begin{tabular}{ c c c }
\hline
Coefficient & Mean & Standard Err. \\
\hline
Current Week Price & -0.17 & 0.01 \\
Next Week Price & 0.06 & 0.01 \\
Inventory & -0.28 & 0.01 \\
Display & 0.15 & 0.02 \\
Feature & 0.18 & 0.02 \\
\hline
\end{tabular}
\end{center}
\end{table}
\begin{table}[h]\caption{Household By Household Purchase Decision Logits}\label{table:logits2}
\begin{center}
\begin{tabular}{ c c c }
\hline
Coefficient & Mean & Standard Dev. \\
\hline
Current Week Price & 1.11 & 62.5 \\
Next Week Price & -0.061 & 65.2 \\
Inventory & -28.6 & 370 \\
Display & 0.97 & 208 \\
Feature & -11.3 & 207 \\
\hline
\end{tabular}
\end{center}
\end{table}
In the third section of this paper I will present a simple
theoretical model of consumer learning and experimentation and
discuss some of its implications. The theoretical model is a simple
version of the structural model and is a simple way to analyze the
most important working parts of the structural model. In this
theoretical model, a new product is introduced to a market where
consumers make repeated purchases from the same set of goods. Every
consumer has an individual-specific, non-time varying taste for the
new product which is not fully known to her until she has purchased
and consumed the product. Prior to first purchasing the new product,
each consumer knows the mean of her true taste distribution (how
much she expects to like or dislike the product) and the variance
(how uncertain she is about her expected taste for the product).
When there is no learning, this uncertainty is zero and the
consumer's expected taste equals her actual taste. The model also
explicitly nests alternative types of dynamic behavior, such as
switching costs or consumer taste for variety. Consumers are assumed
to be forward-looking, which means that they will take into account
the future benefit of experimentation when they first purchase the
new product: that they might like the new product and keep on
purchasing it in the future. I briefly describe two of the model's
testable implications. First, given that consumers are
forward-looking and that there are no alternative dynamics in
demand, the share of consumers who purchase the new product and then
do not will be greater than the share who do not and then do. This
is due to the fact that the future benefits of learning induce
consumers to purchase the new product sooner rather than later. The
test may also be used when consumers form habits, but it may be less
powerful: when consumers form habits, they do not like switching
products and may be less likely to purchase the new product in case
they do not like it and have to switch brands in the future. Second,
for any discount factor and for any alternative dynamics, among
consumers whose previous purchase was the new product, the share of
consumers who repurchase the new product will rise over time. This
is due to the fact that initially, consumers whose previous purchase
was the new product will be consumers who are experimenting; later
on it will be consumers who like the product and will be likely to
repurchase it. Support for these implications has been found in this
data set by Osborne (2005), which provides evidence that the
structural model will be identified.
In the fourth section of the paper I describe the data set on which
I estimate the structural model. The data is household-level scanner
data from Sioux Falls, SD on laundry detergent purchases. In time
period during which the data was collected there are 3 new product
introductions. This helps with the identification of learning: we
can observe a particular consumers first purchase of a new product,
and observe the consumer's response to it: repurchasing it if she
likes it, or never purchasing it again if she does not.
The fifth section of the paper contains a description of the
structural model. This model contains the most important features of
the theoretical model, which are learning and alternative dynamics.
This is an improvement over most previous empirical work in consumer
learning such as Erdem and Keane (1996) or Ackerberg (2003), where
all dynamic behavior in demand is attributed entirely to learning.
Israel (2005) is closer to my work in this regard; the paper
addresses this issue by directly controlling for positive tenure
dependence in demand. However, the identification argument in that
paper breaks down in the presence of negative tenure dependence,
which may be an important factor in packaged goods markets. More
importantly, Israel (2005) also does not directly model consumers as
being forward-looking; hence, that paper cannot examine the decision
of consumers to experiment. The paper also does not separate
consumer heterogeneity from tenure dependence. Consumer
heterogeneity could be very important in markets for packaged goods;
furthermore, leaving it out could cause the model to pick up
learning when there is no learning at all. As an example, consider a
laundry detergent market with two types of consumers: non-price
sensitive consumers who learn about new products, and price
sensitive consumers who do not care about detergents, but just buy
the cheapest product. Suppose that a new product is introduced to
the market. The learners will purchase it very soon after it comes
out, since there is value to experimenting with it. Some of them
will like the product and keep repurchasing it, while some will
dislike it and not buy it again after their first purchase. Suppose
further that the new product is priced low on introduction. The
price sensitive consumers would then purchase the new product when
its price was low, and switch away when its price rose. To an
econometrician however, it might look like these consumers
experimented with the new product and did not like it.
The previous papers that I cited used the maximum likelihood
estimator, which is computationally difficult and has forced
researchers to be parsimonious in their modeling assumptions for
tractability. One of the reasons for this computational difficulty
is that every time a parameter is changed, such as when a derivative
is calculated, a complicated discrete choice dynamic programming
problem must be solved. I estimate my structural model using a new
method which makes it easier to account for unobserved heterogeneity
and to have a richer state space. This method uses Markov Chain
Monte Carlo, rather than classical methods, and requires only one
solution of the discrete choice dynamic programming problem.
The next section discusses the estimates of my structural model,
which suggest that there is significant learning and switching costs
in demand. I allow the parameters for learning to vary with
household income and size, and the results suggest that there is
more learning among higher income, and smaller households.
The seventh section contains counterfactual calculations. First, to
gauge the importance of learning I simulate the market shares for
one of the new products when there is learning, and when there is no
learning and consumers have perfect information about their tastes.
The results suggest that learning decreases the market shares of new
products. The reason behind this is that under learning, there will
be some consumers who do not expect to like the new product and
don't try it; if they were to try it, though, they would like it.
Another counterfactual that I calculate is the effect of dropping
the price for a new product on its long run market share. The
results suggest that the return to introductory pricing is greater
when there is learning versus switching costs, and when both
behaviors are present they cancel each other out to some degree.
In each case, I cut the price of the product by half for 3 months,
and simulate the product's market share. We can see that when there
is both learning and state dependence in demand, a price cut for
Cheer increases its long run market share by 20\%. When there is
learning and no state dependence, the increase is 24\%. This is due
to the fact that when there is switching costs, consumers are less
likely to switch brands since they realize that they could lose
future utility from having to switch brands. When there is only
state dependence in demand and no learning, the increase is 22\%. As
I discussed in the introduction, a reason that the price cut is more
effective under switching costs only than under both learning and
switching costs is that some consumers purchase the product and
switch away from it in the latter case. This suggests that firms may
want to combine their price cuts with advertising or free samples in
order to remove the learning.
The result is reversed for Surf - the price cut has more effect when
there is both learning and switching costs than otherwise. An
explanation for this can be seen by recalling that in section 3, I
argued that a price cut will have more effect when there is both
learning and switching costs when consumers expect to like a
product. The population mean of the consumer's predicted taste for
Surf is the third highest of all the liquids which means consumers
are more likely to expect to like it. Also, Surf is also a
relatively inexpensive product (see Table \ref{table:prices}), which
will raise the value of being habituated to it. The result for Dash
is somewhat puzzling - it is an expensive product, and the average
population predicted mean for it is low compared to the other
products. We would expect the price cut to have the same effect as
it did for Cheer, but the pattern looks similar to Surf.
For all of these parameters, except the posterior draws for the new
products, there will be underlying hyperparameters, the population
means, $b$, and the variances, $W$. Since the posteriors,
$\gamma_{ij}$ are distributed normally
$N(\gamma_{ij}^0,\sigma_{ij}^2)$, they do not need associated
hyperparameters. I denote the vector of $\theta_i$ minus these three
parameters as $\tilde{\theta}_i$. I assume that $\tilde{\theta}_i
\sim N(b,W)$, where $W$ is diagonal.
, and as in Section 3, $\eta_i$ captures alternative sources of
state dependence in demand. I allow $\eta_i$ to vary across the
population, which means that some consumers can be habit-formers,
while some may be variety-seekers. The $\varepsilon_{ijt}$ is an iid
logit error that is observed to the consumer but not the
econometrician, and I interpret it as a shock to the cost of
obtaining or using product $j$ at time $t$. It is assumed to be
independent of the model's explanatory variables.
The state variable $s_{it}$ keeps track of which products consumer
$i$ has learned about prior to purchase $t$, and it evolves as
follows:
\begin{equation}\label{eq:learnstate}
s_{ijt}= s_{ijt-1} + 1\{ s_{ijt-1} = 0 \mbox { and } y_{ijt-1} = 1
\}
\end{equation}
The learning process is specified to be a simple, one-shot learning
process - the consumer learns her true value for product $j$ after
her first purchase of it.
In my structural econometric model the dependent variable is the
consumer's choice of one of the 13 products listed in Table
\ref{table:mktshares}. I assume that a consumer's period utility is
linear, as in traditional discrete choice models. The period, or
flow utility for consumer $i$ for product $j=1,...,13$ on purchase
$t$ is assumed to be:
\begin{equation}\label{eq:pdutbig}
u_{ijt}(s_{it-1},\alpha_i,p_{ijt},c_{ijt},\beta_i,x_{ijt},\eta_i,
y_{ijt-1},\varepsilon_{ijt}) = \gamma_{ij}(s_{it-1},y_{it-1}) +
\alpha_i (p_{ijt}-c_{ijt}) + \beta_i x_{ijt} + \eta_i y_{ijt-1} +
\varepsilon_{ijt}
\end{equation}
where $\gamma_{ij}(s_{it-1},y_{it-1})$ is consumer $i$'s "permanent"
taste for product $j$ at learning state $(s_{it-1},y_{it-1})$,
$p_{ijt}$ is the price per ounce of product $j$ in the store during
purchase event $t$\footnote{Since I do not model quantity choice in
this paper, I calculate the price for each product as the average
price per ounce of the product in the store at which the consumer
shopped. The average is taken over all sizes of the product
available in the store at that time. An issue in this data is that
of missing prices: A.C. Nielsen did not collect data on prices for
products not purchased by the consumer at time $t$. I construct a
price process from the data; the details on this process are
explained in the Appendix.} and $c_{ijt}$ is the value of a
manufacturer coupon for product $j$, if one is available (there is
very little use of store coupons in this data set). I will note that
the $c_{ijt}$'s are only observed in purchase event $t$ for a
product which the consumer purchases. This creates a problem for
structural estimation. We might be tempted to adjust the price for
{\it only} the purchased product by the value of the coupon;
however, by doing this only for the purchased product we are in
essence including an endogenous variable on the right hand side. On
the other hand, we could simply ignore coupons altogether (this
approach is usually taken in other structural papers). The problem
with doing this is that it may result in misestimation of consumer
price sensitivities (see Erdem, Keane and Sun (1999)), which could
be serious in my data set since a coupon is used in over half of all
purchase events. I overcome this problem by specifying a
distribution for coupon availability, and treating coupons for
non-purchased brands as latent variables that need to be integrated
out during estimation. The parameters of the coupon-generating
distribution are estimated along with the other model parameters.
The details of this estimation procedure are described in Section
\ref{sec:est}.
The $x_{ijt}$ vector includes other variables such as feature,
display, and time dummy variables to account for unobserved
advertising. The first and second elements of the $x_{ijt}$ vector
are dummy variables which are 1 if product $j$ is on feature or
display, respectively. The rest of the elements are a series of
weekly dummy variables, interacted with product dummy variables for
the new products: first, there are dummy variables for the first 12
weeks after Cheer's introduction, interacted with the Cheer purchase
dummy; then, the next 12 are the same thing for Surf, and the next
are the same thing for Dash. The time dummy variables are included
to account for unobserved advertising for the three new
products\footnote{In this data set some television advertising
information was collected for some households - they were given
telemeters which monitored whether the television was on during an
advertisement for one of the products. Unfortunately this data was
only collected for the final year of the sample period, so any
introductory advertising for the new products was not collected.
Additionally, there were some problems with the data collection,
such as large viewing gaps for some households which suggests that
the telemeters did not always work properly.}. The reason for
including this is that introductory advertising for the products
could lead to the same identification problem as introductory
pricing: consumers may be induced to purchase the new products
initially if they are heavily advertised, but may switch away as the
advertising intensity drops. I am assuming that advertising only has
a direct effect on consumer utility, and does not have an
informational or prestige effect. If introductory advertising has
informational content, it will make consumers more sure about their
tastes for the new products, reducing the amount of learning and
reducing the $\sigma^2$'s. In this case the results should be
interpreted as lower bounds on the amount of learning.
The variable $y_{ijt-1}$ is a dummy variable that is 1 if consumer
$i$'s purchase in period $t$ was of product $j$, and as in Section
3, $\eta_i$ captures alternative sources of state dependence in
demand. I allow $\eta_i$ to vary across the population, which means
that some consumers can be habit-formers, while some may be
variety-seekers. The $\varepsilon_{ijt}$ is an iid logit error that
is observed to the consumer but not the econometrician, and I
interpret it as a shock to the cost of obtaining or using product
$j$ at time $t$. It is assumed to be independent of the model's
explanatory variables.
The state variable $s_{it}$ keeps track of which products consumer
$i$ has learned about prior to purchase $t$, and it evolves as
follows:
\begin{equation}\label{eq:learnstate}
s_{ijt}= s_{ijt-1} + 1\{ s_{ijt-1} = 0 \mbox { and } y_{ijt-1} = 1
\}
\end{equation}
The learning process is specified to be a simple, one-shot learning
process - the consumer learns her true value for product $j$ after
her first purchase of it. For the established products, I assume
that there is no learning - so $\gamma_{ij}(s_{it-1},y_{it-1}) =
\gamma_{ij}$. For the three new products, I assume that the consumer
is uncertain about her taste for the product as in the earlier
theoretical model. For the 3 new products we have:
\begin{equation}\label{eq:tastessdbig}
\begin{split}
\mbox{New Product, } s_{ijt-1} = 0, \mbox{ and } y_{it-1} = 0 & : \gamma_{ij}^0 \\
\mbox{New Product, } s_{ijt-1} = 1, \mbox{ or } y_{it-1} = 1 & : \gamma_{ij} \\
\end{split}
\end{equation}
where, as before, the consumer knows her true taste distribution,
which is $N(\gamma_{ij}^0, \sigma_{ij}^2)$. It may seem strange to
the reader that I have specified the state space in terms of
$s_{ijt-1}$ rather than $s_{ijt}$. I will defer discussion of this
for the next few paragraphs, until I examine the value function.
As I discussed in the Introduction and Literature Review, unobserved
heterogeneity in population parameters such as tastes or price
sensitivities could be very important, especially in the presence of
introductory pricing for new products. I allow unobserved
heterogeneity in individual-level parameters, such as the
$\gamma_{ij}$'s, the $\gamma_{ij}^0$'s, and the $\alpha_i$'s by
specifying a population distribution for these parameters. All the
individual level parameters except for the price coefficient
$\alpha_i$ and the learning parameters, the $\sigma_{ij}^2$'s, are
assumed to be normally distributed across the population. The price
coefficient is distributed according to Johnson's $S_B$
distribution, where
\begin{equation}\label{eq:pcoeff}
\alpha_i = p_{max} \frac{\exp( \alpha_{0i} + \alpha_{1i} INC_i +
\alpha_{2i} SIZE_i)}{1+\exp( \alpha_{0i} + \alpha_{1i} INC_i +
\alpha_{2i} SIZE_i)},
\end{equation}
where $p_{max}$ is set to be 10, and $\alpha_{0i}$, $\alpha_{1i}$
and $\alpha_{2i}$ are normally distributed. Similarly, the
$\sigma_{ij}^2$'s are also $S_B$ where
\begin{equation}\label{eq:scoeff}
\sigma_{ij}^2 = \sigma_{max} \frac{\exp( \sigma_{0ji} + \sigma_{1ji}
INC_i + \sigma_{2ji} SIZE_i)}{1+\exp( \sigma_{0ji} + \sigma_{1ji}
INC_i + \sigma_{2ji} SIZE_i)},
\end{equation}
where $\sigma_{max}$ is set to be 5, and $\sigma_{0ji}$,
$\sigma_{1ji}$ and $\sigma_{2ji}$ are normally distributed. This
specification allows the price coefficient to have the same sign
across the population, and ensures that the learning parameters are
never negative.
I assume consumers are forward-looking, which means that they will
solve a discrete choice dynamic programming problem every time they
make a purchase. This means that they will take into account the
effects of today's product purchase on their future utility, which
will arise from learning and any alternative state dependence (that
captured by the $\eta_i$). Consumers will also have expectations
about future exogenous variables, such as prices. It is easy to see
that the state space in this model is considerably more complicated
than the state space of the theoretical model in section 2, so I
will make some simplifying assumptions.
First, I will assume that consumers are naive in their expectations
about future $x_{ijt}$'s, so they do not have to be considered a
part of the consumer's state space. By this I mean that consumers
expect future display, feature, and television advertising to have
levels of zero, and any advertising today is a surprise. Including
the weekly dummy variables in the state space will increase it
greatly and increase the program's memory requirements
significantly. Second, I make the same assumption about future
coupons. As I discussed earlier, in my modeling I will treat coupons
for nonpurchased brands as latent unobservables and I will specify a
distribution for coupon availability and estimate its parameters
along with the model parameters. A detailed explanation of why this
is a problem is reserved for Section \ref{sec:est} and the Appendix,
but I will briefly summarize it here. The issue is that in addition
to increasing memory requirements, it increases the model's
computational time: if consumers take future coupons into account,
their value function will depend on the parameters of the coupon
distribution, in addition to individual-specific parameters such
their tastes. As I will describe in more detail in Section
\ref{sec:est}, this model is estimated using in two
Markov-Chain-Monte-Carlo loops, one for individual level parameters
and one for the parameters of the coupon distribution. The value
function needs to be updated whenever parameters that enter it are
changed; if both sets of parameters enter the value function, we
will need to update it twice. In spite of this assumption, the fact
that I have accounted for coupons at all is an improvement over
previous dynamic structural papers: most papers in this literature
simply ignore coupons entirely (an exception is Ackerberg (2003),
which includes store coupons, but not manufacturer coupons. This
worked in that paper since store coupons are observed, and there was
little use of manufacturer coupons.).
After these simplifying assumptions, the main state space variables
in the model will be those associated with prices and availabilities
of each product, $p_{ijt}$, the learning state $s_{ijt}$, and the
consumer's previous product choice $y_{ijt-1}$. I estimate a Markov
transition process for prices from the data, using a method similar
to Erdem, Imai and Keane (2002) which I will briefly summarize - a
detailed description of this process can be found in the Appendix.
In my data, prices tend to be clustered around specific values. The
transition process for prices is modeled as discrete/continuous; the
probability of a price change for a product is modeled as a binary
logit, and conditional on a price change, the probability of a
particular value of the new price is assumed to be lognormal. I make
two additions to their model. First, not all products may be
available in a particular store at a given time. I infer product
availability by examining the pattern of purchases in a particular
store. If a product is not purchased by anybody for more than a
month, I assume it is not available during that period. Thus, I
jointly estimate product availability with the price process.
Another factor that may be important is the fact that we observe
introductory pricing. Consumers may understand that the prices
process of new products will rise in the future, so I include a
dummy variable in the price transition estimation to take this into
account. I assume that the "introductory" period for prices lasts
for 12 weeks after the product's introduction. This will increase
the state space, since for a consumer who makes a purchase during
the introductory period it will matter how many purchases she
expects to make before the period ends. Knowing how long it takes
before a consumer makes her next purchase will require some
assumption on purchase timing. Modeling purchase timing directly is
beyond the scope of this paper. For this reason, and for others that
will be discussed shortly, I select a sample of households that
appears to purchase at fairly regular intervals. For these
households the interpurchase times are clustered between 6 and 8
weeks. I will assume that all households expect to make their next
purchases in the median interpurchase time, which is 8 weeks. To
summarize, the time-varying state variables for the model are:
\begin{enumerate}
\item The prices of all products at time $t$, $p_{ijt}$.
\item The availability of all products at time $t$, $a_{ijt}$.
\item The learning state for the new products, $s_{ijt}$.
\item The consumer's product choice today, $y_{ijt}$.
\item The number of purchases before leaving the introductory state,
$n_t$.
\end{enumerate}
Even with the simplifying assumptions about consumer expectations,
this state space is quite large. Taking into account that consumers
are forward-looking the consumer's utility for a particular product
will then be
\begin{equation}\label{eq:pdutbig}
u_{ijt}(s_{it-1},\alpha_i,p_{ijt},c_{ijt},\beta_i,x_{ijt},\eta_i,
y_{ijt-1},\varepsilon_{ijt}) + \delta E
V(s_{it},p_{ijt+1},a_{ijt+1},y_{ijt},n_{t+1},\gamma_i(s_{it}),\alpha_i)
\end{equation}
where the expectation is taken over the error terms, the price and
availability distributions, and the true taste distributions for
untried products. The parameter $\delta$ is the consumer's discount
factor, which measures how much the consumer values the future. In
structural estimation it is difficult to identify this parameter, so
in my estimation I will assume that it is equal to 0.95. I will now
explain why consumer utility is a function of $s_{it-1}$ rather than
$s_{it}$. If I was to assume that the utility was a function of
$s_{it}$, then model's state space would be incomplete. To see why,
let us briefly return to the 2 product model of section 3. Suppose
that we were to solve for the value functions of this model. We
would need to solve for 4 value functions: the value of
experimenting with product 2, the value of not experimenting with
product 2, the value of purchasing product 2 the after learning
about it, and the value of purchasing product 1 after having learned
about product 2. Thus, the $s$'s and the $y$'s need to keep track of
4 possible states. To see what happens if we mistakenly made the
consumer utility a function of $s_{it}$, consider the following
sequence of purchases:
\begin{center}
\begin{tabular}{c c c c c}
\hline
Purchase Event & 1 & 2 & 3 & 4 \\
\hline
$y_{it}$ & 1 & 2 & 2 & 2 \\
$s_{it}$ & 0 & 0 & 1 & 1 \\
$y_{it-1}$ & . & 1 & 2 & 2 \\
$s_{it-1}$ & . & 0 & 0 & 1 \\
\hline
\end{tabular}
\end{center}
Consider the consumer's value function from choosing product 2 in
period 2. If we assume that utility is a function of utility a
function of $s_{it}$, it will be a function of $s_3$ and $y_2$. In
this period, $s_3=1$ and $y_2=2$. However, in period 3, $s_4=1$ and
$y_3=2$, which would imply the value of choosing 2 in this period
would be the same. Obviously this is wrong, since the consumer has
learned about the new product in this period. The value function
clearly needs to depend on whether today's purchase is a first
purchase or not. By substituting in $s_{it-1}$ instead of $s_{it}$
this problem is alleviated (in period 2, the next period state for
the value function is $s_2=0$ and $y_2=2$, while in period 3 it is
$s_3=1$ and $y_3=2$).
Most papers that estimate discrete choice models use classical
methods, such as the maximum likelihood estimator, while in my
research I will employ Bayesian methods. One reason for this is that
in the presence of significant unobserved heterogeneity, the maximum
likelihood estimator can be computationally cumbersome. To see why,
consider an individual household's entire sequence of product
choices. The likelihood of these choices is a function of the
consumer's draws on the individual level parameters. Because these
individual level parameters are unobserved, when constructing the
likelihood of a purchase sequence for each consumer, it is necessary
to integrate them out using simulation. As the dimension of the
integration increases, more simulation draws are required and the
computational time of the algorithm will rise. This problem has been
overcome recently by the application of Bayesian methods such as
Markov-Chain-Monte-Carlo, which is abbreviated MCMC or MC$^2$. These
methods have been used to estimate static multinomial probit models
on household scanner data, such as in Rossi, McCulloch, and Allenby
(1996), and in random coefficients logit models, such as in Train
and Sonnier (2003). In addition to the computational advantages of
Bayesian procedures over classical procedures, Bayesian procedures
return the posterior distribution of the model parameters given the
data, making it easy to calculate parameter means and variances.
With classical methods, parameter variances are calculated from the
inverse Hessian of the likelihood function, which will be
asymptotically equal to the actual parameter variance matrix. In a
model with a large number of parameters, these asymptotic results
are less likely to hold, so the classical result may be less
accurate. The results from the Bayesian estimation can be
interpreted in a way similar to the results of the classical
estimation if the econometrician so desires: according to the
Bernstein-von Mises theorem, the posterior mean and variance matrix
of the model parameters is asymptotically equivalent to that of the
classical estimator (see Train (2003) for an overview).
Discrete choice models become an order of magnitude more complicated
when consumers are assumed to be forward-looking. Structural
discrete choice dynamic programming problems have been estimated on
household panel data using classical methods, but the model solution
is computationally burdensome and is typically only tractable for
researchers who have access to powerful computer clusters. The
reason for this is that every time a model parameter changes, such
as when a derivative is taken, it is necessary to resolve the
discrete choice dynamic programming problem at all state space
points\footnote{Some of this computational burden can be overcome
with the use of techniques such as importance sampling (Ackerberg
(2003)), but this still requires a finite number of solutions of the
discrete choice dynamic programming problem.}. Because of this
problem, researcher who have estimated these models have had to make
restrictive assumptions, such as limiting the amount of consumer
heterogeneity. This problem has the potential to be exacerbated by
the use of Bayesian methods: in each iteration of the Markov Chain,
we would have to re-solve the discrete choice dynamic programming
problem. Since MCMC algorithms are usually run tens of thousands of
times, the model's computational time would be excessively long.
This problem has been overcome by a novel method developed by Imai,
Jain and Ching (2005). In this method, the discrete choice dynamic
programming problem is solved in full only once, along with the
estimation of the model parameters. Each time a new vector of
parameters is drawn in the Markov Chain, one update is made to the
value function. The computational time for this algorithm is
therefore on the same order as that of a static model.
. Denoting $p_c$ as the vector of $p_{cj}$'s, the
In my model, the parameter space can be divided into two parts:
individual-level parameters, and parameters of the coupon-generating
distribution. The vector of individual-level parameters will include
the $\gamma_{ij}$'s, the $\gamma_{ij}^0$'s, the $\alpha_{0i}$,
$\alpha_{1i}$ and $\alpha_{2i}$'s, the $\sigma_{0ji}$,
$\sigma_{1ji}$ and $\sigma_{2ji}$'s, and the $\beta_i$'s. For
simplicity, denote the vector of individual level parameters as
$\theta_i$. For all of these parameters, except the posterior draws
for the new products, there will be underlying hyperparameters, the
population means, $b$, and the variances, $W$. Since the posteriors,
$\gamma_{ij}$ are distributed normally
$N(\gamma_{ij}^0,\sigma_{ij}^2)$, they do not need associated
hyperparameters. I denote the vector of $\theta_i$ minus these three
parameters as $\tilde{\theta}_i$. I assume that $\tilde{\theta}_i
\sim N(b,W)$, where $W$ is diagonal. I assume a normal prior on $b$,
and an inverse Gamma prior on the elements of $W$.
On to the coupon distribution.
The model is estimated in two Markov Chain Monte Carlo layers. One
layer draws the $\theta_i$'s, $b$'s, and $W$'s, and one draws the
probabilities associated with the coupon parameters. The posterior
likelihood of a particular vector of $\theta_i$'s is not conjugate,
which means that the Metropolis-Hastings algorithm must be used in
this loop. The details of this computation are in the Appendix.
Also, during the first layer it is necessary to update the value
function, since the value function depends on the $\theta_i$'s. In
the coupon layer, I first draw the unobserved coupons, then the
$p_z$'s, and then the $p_{cj}$'. All of these parameters have simple
posterior distributions, so for this loop a Gibbs sampler can be
used. As I mentioned in the previous section, since I assume
consumers do not take future coupons into account, the parameters of
the coupon-generating distribution do not enter into the value
function. If they did, it would be necessary to update the value
function in the coupon layer as well, doubling the algorithm's
computational time. Also, since consumer choices would depend
directly on underlying coupon parameters, the posterior likelihood
of the $p_z$'s and the $p_{cj}$'s would be complicated, and drawing
from this posterior would have to be done using the
Metropolis-Hastings algorithm\footnote{This problem is the same
issue that arises when estimating static discrete choice models
using MCMC under the assumption that some parameters vary across the
population, and some are fixed. This issue is discussed in Train
(2003), pg. 311-313.}.
Figures \ref{fig:predshch} to \ref{fig:predshda} show graphs of the
predicted market shares for Cheer, Surf and Dash against the actual
market shares for these products. The predicted results follow the
actual market share fairly closely, which suggests that the model
fits the data well.
On to the parameters for the new products. Consumers expected tastes
for these products are low: they don't expect to like them as much
as the most popular established brands. Combined with the fact that
most consumers will have a favorite product and will be
habit-formers, it seems that consumers will not have a strong
incentive to experiment. The learning parameters, which are the
$\sigma_{0ji}^2$'s, the $\sigma_{1ji}^2$'s and the
$\sigma_{2ji}^2$'s, appear to be large and statistically
significant.
In my model, since the error term is logit, in most points of the
state space I can calculate a closed-form solution to the value
function (conditional on unobservable $\theta_i$'s and the previous
iteration's value function). For state space points where the
consumer has not yet made a first purchase of a particular product
(for instance, the expected value next period of choosing Cheer for
the first time, which would be $y_{Cheer}=1$, $s_{Cheer}=0$), it is
necessary to integrate out over the distribution of future tastes
for that product as in Equation \eqref{eq:vffirstpurch}. I treat
these future tastes as unobservables, draw ten of them in each
iteration, and store the expected utility averaged over the draws.
When calculating the expected utility, it is necessary to have an
estimate of the next period's utility at the given future taste
draw; this is also interpolated using a kernel. Formally, suppose
that the consumer has not yet purchased product $j$ and consider
some state space point defined by $(s,p^m,J^m,y,n)$. I draw $L=10$
draws from the true taste distribution for product $j$, which is
$N(\gamma_{ij}^0,\sigma_{ij}^2)$. Denote each draw $l$ as
$\gamma_{ij}^l$. To calculate the expected utility, we need to
calculate first each consumer's exact utility (ignoring the logit
error) at each product at simulation $l$. Denote $\theta_n^l$ as the
vector of $\theta_n$ with the consumers true taste for product $j$
($\gamma_{ij}$) taken out and replaced with the simulated tastes
($\gamma_{ij}^l$). Then the consumer's utility for product $j$ at
simulation $l$, $v_{ij}^l$, will be
\begin{displaymath}
\begin{split}
\mbox{Product } k = j & : v_{ik}^l = \gamma_{ik}^l - \alpha_i p_k^m + \eta_i y_k + \delta EV_n(s',p^m,J^m,y',n',\theta_n^l) \\
\mbox{Product } k \neq j & : v_{ik}^l = \gamma_{ik}(s_k) - \alpha_i
p_k^m + \eta_i y_k + \delta EV_n(s',p^m,J^m,y',n',\theta_n^l), \\
\end{split}
\end{displaymath}
where $s'$, $y'$, and $n'$ denote next period's states, and $EV$ is
calculated from formula \eqref{eq:vfupd}.
Her expected utility for purchasing product $j$ for the first time
(state space point $y_{j}=1, s_{j}=0$) at the current iteration's
parameter $\theta_n$ is then calculated as
\begin{equation}\label{eq:vfpm2}
\hat{EV}_n(s,p^m,a^m,y,n,\theta_i) = 1/L \sum_{l=1}^L \ln \left(
\sum_{k=1}^J \exp(v_{ik}^l) \right).
\end{equation}
When calculating expected utility it is necessary to calculate the
value function at values of the state space $\Sigma_{it}$, which are
described above in Section \ref{sec:dynopt}. One important part of
the state space is the vector of prices $p_{ijt}$ and the set of
available products, $J_{it}$, in a given purchase event. Because
there are 13 products, this portion of the state space is
high-dimensional. Due to memory limitations I do not evaluate the
value function at all possible price/availability states, but I
instead do it only on a random grid of points. At any other point, I
interpolate the value function as follows. Suppose that the
estimated transition density of a price/availability grid point
$(p^m,J^m)$, where $m=1,...,M$, given a price/availability vector
$(p,J)$, is $f(p^m,J^m|p,J)$ (details of the estimation of this
density are described in the Appendix). Assume that at the current
point in the MCMC sequence we have an approximation to the value
function for individual $i$, who is represented by the parameter
vector $\theta_i$, at all the price/availability grid points,
$(p^m,J^m)$, the learning state $s$, the previous product purchase
$y$ and the time state $n$\footnote{There is and abuse of notation
in using $n$ to denote the time state and the current point in the
MCMC sequence.}, which I denote $\hat{EV}_i(s,p^m,J^m,y,n;\theta_i)$
(here I am writing out the state vector $\Sigma=(s,p^m,J^m,y,n)$ in
full). Then the expected value function for some other
price/availability vector $(p,J)$ at $\theta_i$ is approximated as
\begin{equation}\label{eq:vfpm}
EV_i(s,p,a,y,n,\theta_i) = \frac{\sum_{m=1}^M
\hat{EV}_i(s,p^m,J^m,y,n,\theta_i)f(p^m,J^m|p,J)}{\sum_{m=1}^M
f(p^m,J^m|p,J)}.
\end{equation}
Recall that
The impact of this price cut will depend on whether consumers learn,
form habits, or are both learning and forming habits at the same
time. If there is only learning, then the price cut will draw in new
consumers, some of whom will like the product and continue to
purchase it in the future. If consumers learn and form habits at the
same time, the impact of the price cut could be reduced or
increased. It could be reduced since consumers who form habits will
realize that if they dislike the new product, they will lose future
utility from having to switch brands. It could be increased, though,
because consumers who experiment with the product and dislike it
will become habituated to it, and will be less likely to switch
away. Because the model I have estimated is structural, I can take
it away from the data, compute the intermediate run market share for
the new products after a price cut when there is learning only
versus learning and switching costs, and see which effect dominates.
Similarly, we can compare the effect of a price cut on a new
product's intermediate run market share when there is habit
formation only, versus switching costs and learning. The effect of
the price cut could also either increase or decrease when when
consumers both learn and form habits as opposed to only forming
habits. It could decrease because when there is learning, some of
the consumers who purchase the new product will dislike it and
switch away from it. On the other hand, it could increase because
with only switching costs, consumers who know they dislike the new
product won't respond to the price cut. If there is learning and
switching costs then the price cut will draw in some consumers who
will find they dislike the new product. If the switching costs is
strong enough some of these consumers will keep purchasing the
product even though they don't like it. From a managerial
perspective, it is useful to know which effect dominates. If having
both learning and switching costs decreases intermediate run market
share vis-a-vis switching costs only, then this suggests firms may
wish to combine their price cuts with introductory advertising or
free samples to make them more effective.
(lots of work can be done here)
The results of my structural estimation can be used to compute
interesting counterfactuals, such as the effect of changing the
price process for a product on its long run market share. Table
\ref{table:cf1} shows the change in market share from cutting the
introductory price process for each of the three new products on the
product's intermediate run share. These simulations are performed at
the data - so that features, displays, and other product prices are
kept at their original values (I'm working on some extensions to
this). To construct an alternate introductory price process for each
new product, I first find the price/availability vector for the new
product where its price is lowest. I then replace the observed
price/availability vectors for the first three months after the new
product's introduction to this vector. I note that I keep consumer
expectations constant - they have the same expectations I estimated
from the data. While this may seem unrealistic - we might expect
consumers to change their expectations if the price of Cheer stays
the same for 3 months - since on average a consumer will only make 1
or 2 purchases during that period she may not observe the price
enough for this to be a problem.
At the estimated parameters, the price drop has no effect on the
intermediate market share for Cheer, but increases it for Surf and
Dash. For both Cheer and Surf, when there is no state dependence the
effect of the price cut is greater. This is due to the fact that
when there is switching costs, consumers are less likely to switch
brands since they realize that they could lose future utility from
having to switch brands. For Cheer, the price cut also has a greater
effect on its intermediate run market share when there is only state
dependence, as opposed to state dependence and learning. As I
discussed in the introduction, a reason that the price cut is more
effective under switching costs only than under both learning and
switching costs is that some consumers purchase the product and
switch away from it in the latter case. This suggests that firms may
want to combine their price cuts with advertising or free samples in
order to remove the learning. The result is reversed for Surf - the
price cut has more effect when there is both learning and habit
formation than otherwise. An explanation for this can be seen by
recalling that in section 3, I argued that a price cut will have
more effect when there is both learning and switching costs when
consumers expect to like a product. The population mean of the
consumer's predicted taste for Surf is the fourth highest of all the
liquids, higher than Cheer or Dash, which means consumers are more
likely to expect to like it.
The model is estimated using maximum likelihood techniques, which
requires the repeated solution of each individual's dynamic
programming problem at the model's state space points. Because the
state space in this model is continuous, an interpolation method due
to Keane and Wolpin (1994) is used to approximate the value
function. This method of estimating the learning model is extremely
computationally demanding, so the paper makes restrictive
assumptions about the underlying behavioral model. For example, the
paper assumes that individual price coefficients are the same across
the population, and that the distribution of prices does not change
over time. The effects of such assumptions may not be innocuous. For
example, suppose for the sake of argument that the true data
generating process in this market is random coefficients logit, with
no consumer learning. If prices and other exogenous variables are
constant over time, we would expect the model to estimate large
prior variances and low signal noise variances. In this data, there
are three new product introductions, and prices for new products are
initially low and then rise over time. Because under the strawman
data generating process consumers have differing price
sensitivities, we will observe see more brand switching at the right
after the new product introductions because price sensitive
consumers will be purchasing the new products early in their price
cycle, and then switching away as the prices rise. Hence, the
structural model applied to this data would infer that there was
learning, even though there is none in the underlying data
generating process. Another drawback to this computational method is
that it limits the size of the state space. This makes it difficult
to account for alternative sources of state dependence in demand.
***Old Introduction***
An experience good is a product that must be consumed before an
individual learns how much she likes it. This makes the action of
purchasing the product a dynamic decision, since the consumer's
decision to experiment with a new product is an investment that will
pay off if the consumer likes the product and purchases it again in
the future. Consumer learning in experience goods markets has been
an important subject of theoretical research in industrial
organization and marketing since the 1970's. This research has
examined some of the implications of consumer learning and
experimentation from the perspective of welfare and business
strategy. A small empirical literature subsequently developed which
attempts to quantify the importance of consumer learning and
experimentation using structural econometric methods applied to
panel data. Some examples are Erdem and Keane (1996), which analyzes
learning in household panel data on liquid laundry detergent
purchases, Crawford and Shum (2000), which analyzes the importance
of learning in ulcer medications, and Israel (2005), which examines
the importance of learning in consumer departures from an automobile
insurance firm.
In this paper I intend to make three contributions to this empirical
literature. First, there are a number of open questions concerning
when and how consumer experimentation and learning may be
distinguished from alternative dynamic theories of demand, such as
forward-looking switching costs or consumer taste for variety, which
may have different implications for business strategy and policy.
Learning and experimentation is one way of interpreting state
dependence in demand. By state dependence, I mean the effect of a
consumer's previous product purchases on her current product
purchase. To see how learning causes state dependence, consider the
following story. Suppose we observe a market for some packaged good
with 2 products, an established product which we know everyone knows
how much they like, and a new product which is introduced sometime
during the sampling period. Suppose further that we observe the
purchases of individual consumers over time. If there is learning,
we would expect to see whether or not a consumer likes or dislikes a
new product after her first purchase of it: if she likes it, she
will continue to purchase it in the future. If she does not, then
she will stop purchasing after her first purchase. Furthermore, we
would expect to see consumers purchase the new product fairly soon
after it is introduced, since as described in the previous paragraph
there is value to learning about the new product. A problem for
marketing researchers who are interested in learning is that this
may not be the only explanation for state dependence in demand. For
example, some consumers might seek variety in the product category
being examined. These consumers might also purchase the new product
very soon after its introduction, and then would switch away from
the product when they get tired of it. To the marketing researcher,
it might look like these consumers experimented with the new product
and disliked it. On the other hand some consumers might be
habit-formers who tend to purchase the same product as their
previous purchase. Habit-formers who purchase the new product would
continue to purchase it in the future. To the marketing researcher
it might look like these consumers all expected to like the product,
so she would infer that there was no learning.
A related problem for the marketing researcher is that consumers
could be heterogeneous in different ways, and not accounting for
this heterogeneity could cause the researcher to infer that learning
is important when there are no dynamics in demand. For example,
suppose that there is a group of consumers in our small market who
don't care about which product they buy, but are sensitive to
prices. Suppose further that the new product is introduced at a low
price initially, and this price is raised after a short period of
time. These consumers will purchase the new product initially when
it is cheap, and will switch away when it gets more expensive. To
the marketing researcher, it may look like they experimented with
the new product and did not like it.
One way to disentangle learning from alternative sources dynamics in
demand and other sources of heterogeneity would be to estimate a
model of learning that nests these alternatives and controls for
consumer heterogeneity. In this paper I will estimate such a model
on household panel data on laundry detergent purchases, where I
observe three new product introductions during the sample period.
The results suggest the presence of learning, as well as
forward-looking switching costs. I examine the economic significance
of learning by simulating the market share for the new products when
consumers have perfect information about their tastes for all
products. In the absence of learning the market share for all the
new products rises. I also allow learning to vary with household
income and size. The results of my estimation suggest that there is
more learning among smaller and higher income households.
One reason that we care about whether consumers learn, form habits,
or even learn and form habits at the same time is that it will
affect firm pricing policy for new products. Returning to my 2
product example, assume that the firm that has introduced the new
product is a competitor to the firm that produces the established
product. Suppose that the new product has been available for a
period of time, and that its market share has been high at some
point in the past. Consider the effect of a temporary price drop on
the established product: the new entrant may be worried that this
price drop could have long run effects. If consumers form habits
only, then the price drop will draw in habituated consumers, and the
entrant may wish to respond with a subsequent price drop in order to
draw them back. On the other hand, if consumers are only learners
then the price drop will draw in consumers who have similar tastes
for the two products, and when the price of the established product
rises the consumers who switched brands will go back to purchasing
the new product. In this case the entrant may not wish to drop the
new product's price.
A case where this may be more important is when there is
introductory pricing for new products. Suppose that the entrant is
thinking about dropping the price of the new product temporarily on
introduction in order to increase its intermediate run market share.
The impact of this price drop will depend on whether the market is
populated with consumers who learn, form habits, or learn and form
habits at the same time. If consumers learn only, then the effect of
the price drop will be to draw in new consumers, some of whom will
like the product and keep purchasing it in the future. On the other
hand, if consumers learn and form habits at the same time, the price
drop may have less impact: if consumers form habits, they will
dislike switching brands. Consumers will realize that they if they
do not like the new product, they will lose future utility from
switching back to the established product. The impact of the price
drop may also be greater if consumers form habits only as opposed to
forming habits and learning. The reason for this is that if
consumers only form habits, almost everyone who is drawn in by the
switching costs will repurchase the new product; if there is also
learning, some consumers will dislike the product and switch away
from it. If this is the case, then the entrant may wish to combine
its price drop with introductory advertising or free samples in
order to get rid of the learning and make the price cut more
effective.
This story about price cuts brings me to my second objective, which
is to quantify the effect of changing the introductory price process
on the market share of a new product under learning, habit
formation, and learning and switching costs. In order to accomplish
this, the demand model I estimate is structural: the parameters of
the model are policy invariant. This means that I can take the model
away from the data and perform "what-if" experiments. For example, I
can cut the price of the new product and simulate its long run
market share under this alternative price process. I can take the
learning out of the model and perform this exercise, or take away
the switching costs.
As I discussed above, performing these two exercises requires
estimating a structural model of demand where consumers are
forward-looking. Since consumers are forward-looking in my model,
they will take into account any effects of learning, switching costs
or variety-seeking on their future utility. This means that they
will solve a discrete choice dynamic programming problem every time
a purchase is made. Previous papers on structural estimation of
discrete choice dynamic programming problems have used the maximum
likelihood estimator, which is computationally difficult. The reason
for this is that every time a parameter vector is changed, such as
when a derivative is taken, the discrete choice dynamic programming
problem must be re-solved. In the presence of unobserved
heterogeneity, this technique becomes even more computationally
demanding due to the fact that the unobserved heterogeneity must be
integrated out by simulation. Because of this issue, researchers
have had to be very parsimonious in their specification of
unobserved heterogeneity.
The third contribution of my paper is in how I overcome this
problem. I estimate my model using Markov Chain Monte Carlo, which
is better suited to dealing with high-dimensional unobserved
heterogeneity than classical techniques. I solve the discrete choice
dynamic programming problem by applying a new technique by Imai,
Jain, and Ching (2005) that only requires solving the discrete
choice dynamic programming problem once, along with the estimation
of the model parameters. The basic idea behind this algorithm is
that in every loop of the Markov Chain, the value function is
updated once, so by the time the algorithm is finished, an accurate
estimate of the value function has been obtained.
To begin the argument, recall that consumers' first purchase events
after the new product introduction occur at different times: some
consumers will need to buy detergent right after the new product is
introduced, while some will not need to until later. Given that I
have chosen a sample of households who do not appear to enter the
detergent market in response to store price drops, I believe it is
not unreasonable to assume that household purchase timing is
exogenous. Also, assume that the price of the new product is
initially low, and that it rises over time (as the prices of new
products in my data set do). Recall that in my state space, there is
a variable $n_t$ which keeps track of how many purchases the
consumer has until the price path of the new product rises to its
long run state, and that $n_t$ can be 1 or 0. Some consumers will
enter the market when $n_t$ is 1 and prices will be low for some
time, and some when $n_t$ is zero and prices will be high in the
future. The fact that $n_t$ varies across consumers means that in
the data we will observe two share difference moments: one for for
$n_t=0$ and one for $n_t=1$. This will give us the two moments
needed to identify the two parameters, which are the mean of
$\sigma_{ij}^2$ and its variance.
In previous research (Osborne (2005)) I solve a simple version of my
structural model with heterogeneity in $\sigma_{ij}^2$, and simulate
the share difference for low and high price paths. I observed the
following numerical result: when the population variance in
$\sigma_{ij}^2$ increases, the share difference at the low point in
the price path ($n_t=1$) does not change very much, but the share
difference at the high point in the price path ($n_t=0$) increases
significantly. In particular, at both low and high price path points
when the variance in $\sigma_{ij}^2$ increases, the share of
consumers who purchase the new product and then do not drops a small
amount. At high price paths, the share of consumers who do not
purchase the new product and then do drops off much more as the
variance in $\sigma_{ij}^2$ increases than at low price paths. This
provides the key to how we can identify the mean and variance of
$\sigma_{ij}^2$: first, the share difference at $n_t=1$ will pin
down the mean, since this moment doesn't move around as the variance
does. Second, if the share difference at $n_t=0$ is larger than it
should be if the variance in $\sigma_{ij}^2$ were zero, then this
moment will pin down the variance.
The intuition behind this result is as follows. When we increase the
variance in $\sigma_{ij}^2$, we are going to be assigning consumers
values of $\sigma_{ij}^2$ from the more extreme ends of the
distribution. Consider the set of consumers who purchase the new
product and then do not. These will be consumers with a high option
value of learning who find they dislike the product. Raising the
variance in $\sigma_{ij}^2$ means these people will be getting
$\sigma_{ij}^2$'s from a higher end of the distribution, raising
their option value of learning. This will make them less sensitive
to price changes in the first period. Raising the price path will
generally tend to lower their option value of learning, but it
apparently does not have a large effect. Since the $\sigma_{ij}^2$'s
are higher for these consumers, the ones who don't like the product
will like it even less, making them less price sensitive in the
second purchase event.
Now consider the share of consumers who do not purchase the new
product and then do. These will tend to be consumers who have a low
option value of learning, but get a high epsilon draw in the second
purchase event. Lowering $\sigma_{ij}^2$ for these consumers will
lower their option value of learning. Since the option value of
learning is bounded below (it is lowest when $\sigma_{ij}^2=0$),
raising the price path will have a greater effect on the option
value for these consumers than those who have very high
$\sigma_{ij}^2$'s, making them less likely to purchase the new
product right away. So the only thing that could make these
consumers make a purchase in the second purchase event is the error
term. As the price path rises, this will get less and less likely.
\begin{enumerate}
\item Loop through all purchase events and draw new unobserved
coupons. There are three underlying variables here: $z_{it}$, which
is 1 if household $i$ gets a coupon in purchase $t$, and 0
otherwise. The next latent variable is $c_{it}$, a $J-1$ vector
where element $j$ is 1 if consumer $i$ gets a coupon for product $j$
in time $t$. If no coupons are used by any consumers in a given
calendar month for product $j$, I assume that no coupon was
available for brand $j$ in that period. Last, $v_{it}$ is $J-1$
vector representing the values of coupons received in time period
$t$. I do not parametrize the distribution of values, and instead
calculate the empirical probability of getting a coupon of a
particular value for a particular brand. I draw a value of $v_{it}$
from the empirical distribution. In the data coupons are clustered
around 8 different values. I make the following distributional
assumptions:
\begin{equation}\label{eq:zcoup}
\begin{split}
& z_{it} \sim Bernoulli(p_z) \\
& \mbox{Prior on } p_z \mbox{ : } Beta(\alpha_z,\beta_z) \\
& \mbox{Posterior on } z_{it} \mbox{ : }
K(z_{it}=1|c_{it},v_{it},y_{it},\theta_i) \\
&=
\frac{Pr(y_{it}|z_{it}=1,c_{it},v_{it},\theta_{i})p_z}{Pr(y_{it}|z_{it}=1,c_{it},v_{it},\theta_{i})p_z
+Pr(y_{it}|z_{it}=0,c_{it},v_{it},\theta_{i})(1-p_z)} \\
\end{split}
\end{equation}
\begin{equation}\label{eq:zcoup}
\begin{split}
& c_{it}(j) \sim Bernoulli(p_{cj}) \\
& \mbox{Prior on } p_{cj} \mbox{ : } Beta(\alpha_j,\beta_j) \\
& \mbox{Posterior on } c_{it}(j) \mbox{ : }
K(c_{it}(j)=1|z_{it},c_{it}(-j),v_{it},y_{it},\theta_{i}) \\
\end{split}
\end{equation}
\begin{displaymath}
= \left\{
\begin{array}{l l}
\frac{Pr(y_{it}|z_{it}=1,c_{it},v_{it},\theta_{i})p_z}{Pr(y_{it}|z_{it}=1,c_{it},v_{it},\theta_{i})p_z
+Pr(y_{it}|z_{it}=0,c_{it},v_{it},\theta_{i})(1-p_z)} & \mbox{if } z_{it}=1 \mbox{ or used a coupon} \\
p_z & \mbox{ if }
z_{it}=0 \mbox{ and no coupon used} \\
\end{array}\right.
\end{displaymath}
\item The posterior for $p_z$ is Beta, with parameters $\alpha_z + \sum_{i=1}^I
\sum_{t=1}^{T_i}( z_{it} + 1\{$ Used a coupon $\})$ and $\beta_z +
\sum_{i=1}^I T_i - \sum_{i=1}^I \sum_{t=1}^{T_i}( z_{it} + 1\{$ Used
a coupon $\})$. The posteriors for $p_{cj}$ are similar.
\end{enumerate}
\begin{thebibliography}{widest-label}
\bibitem{Ackerberg03} Ackerberg, D. (2001), "A New Use of Importance Sampling to Reduce
Computational Burden in Simulation Estimation," Working Paper.
\bibitem{Ackerberg03} Ackerberg, D. (2003), "Advertising, Learning,
and Consumer Choice in Experience Goods Markets: A Structural
Empirical Examination", {\it International Economic Review,} 44 (3),
1007-1040.
\bibitem{becker88} Becker, G., Murphy, K. (1988), "A Theory of Rational
Addiction," {\it The Journal of Political Economy}, 96 (4), 675-700.
\bibitem{becker94} Becker, G., Grossman, M., Murphy, K. (1994),
"An Empirical Analysis of Cigarette Addiction," {\it The American
Economic Review}, 84 (3), 396-418.
\bibitem{bv96} Bergemann, D., Valimaki, J. (1997), "Market Diffusion with Two-Sided Learning,"
{\it The RAND Journal of Economics,} 28 (4), 773-795.
\bibitem{casellageorge92} Casella, G., George, E. (1992) "Explaining
the Gibbs Sampler," {\it The American Statistician,} 46 (3),
167-174.
\bibitem{chamberlain85} Chamberlain, G. (1985), "Heterogeneity,
Omitted Variable Bias, and Duration Dependence," in {\it
Longitudinal Analysis of Labor Market Data,} ed. J.J. Heckman and B.
Singer, no. 10 in Econometric Society Monograph series, Cambridge,
New York and Sidney: Cambridge University Press, 3-38.
\bibitem{chesudhirseeth05} Che, H., Sudhir, K., Seetharaman, P. (2005) "Pricing
Behavior in Markets with State Dependence in Demand," Working Paper.
\bibitem{chibgreenberg95} Chib, S., Greenberg, E. (1995),
"Understanding the Metropolis-Hastings Algorithm," {\it The American
Statistician,} 49(4), 327-335.
\bibitem{ching02} Ching, A. (2002), "Consumer Learning and
Heterogeneity: Dynamics of Demand for Prescription Drugs After
Patent Expiration," Working Paper.
\bibitem{chpkyr} Chintagunta, P., Kyriazidou, E., Perktold, J. (1999),
"Panel Data Analysis of Household Brand Choice," Working Paper.
\bibitem{cs00} Crawford, G., Shum, M. (2000), "Uncertainty and
Learning in Pharmaceutical Demand," Working Paper.
\bibitem{cyertdegroot} Cyert, R., DeGroot, M. (1987), {\it Bayesian Analysis and
Uncertainty in Economic Theory. } Rowman \& Littlefield.
\bibitem{degroot70} DeGroot, M. (1970), {\it Optimal Statistical
Decisions. } McGraw-Hill, Inc.
\bibitem{erdkeane96} Erdem, T., Keane, M. (1996), "Decision-making Under
Uncertainty: Capturing Dynamic Brand Choice Processes in Turbulent
Consumer Goods Markets," {\it Marketing Science,} 15 (1), 1-20.
\bibitem{erdkeanesun99} Erdem, T., Keane, M., Sun, B. (1999),
"Missing price and coupon availability data in scanner panels:
Correcting for the self-selection bias in choice model parameters.",
{\it Journal of Econometrics,} 89, 177-196.
\bibitem{erdimaikeane02} Erdem, T., Imai, S., Keane, M. (2002), "A
Model of Consumer Brand and Quantity Choice Dynamics Under
Uncertainty.", Working Paper.
\bibitem{thisse92} Gabszewicz, J., Pepall, L., and Thisse, J. (1992),
"Sequential Entry with Brand Loyalty Caused by Consumer
Learning-by-Using," {\it The Journal of Industrial Economics,} 12
(4), 397-416.
\bibitem{gelmanrubin92} Gelman, A., Rubin, D. (1992), "Inference
from Iterative Simulation Using Multiple Sequences," {\it
Statistical Science,} 7, 457-472.
\bibitem{gonulsrinivasan96} Gonul, F., Srinivasan, K., (1996),
"Estimating the Impact of Consumer Expectations of Coupons on
Purchase Behavior: A Dynamic Structural Model," {\it Marketing
Science,} 15 (3), 262-279.
\bibitem{hartmann05} Hartmann, W. (2005),
"Intertemporal Effects of Consumption and Their Implications for
Demand Elasticity Estimates," Working Paper.
\bibitem{imjaching05} Imai, S., Jain, N., Ching, A. (2005),
"Bayesian Estimation of Dynamic Discrete Choice Models", Working
Paper.
\bibitem{israel05} Israel, M. (Feb. 2005), "Services as Experience Goods:
An Empirical Examination of Consumer Learning in Automobile
Insurance," Working Paper.
\bibitem{johnsonkotz70} Johnson, N., Kotz., S. (1970), {\it
Continuous Multivariate Distributions I,} John Wiley, New York.
\bibitem{mcpess82} McAlister, L., Pessemier, E., (1982),
"Variety-Seeking Behavior: An Interdisciplinary Review," {\it The
Journal of Consumer Research,} 9 (3), 311-322.
\bibitem{nelson70} Nelson, P. (1970), "Information and Consumer Behavior," {\it The
Journal of Political Economy,} 78 (2), 311-329.
\bibitem{osborne05} Osborne (2005), "A Test of Consumer
Experimentation and Learning in Packaged Goods Markets," Unpublished
Manuscript.
\bibitem{pollack70} Pollack, R. (1970), "switching costs and Dynamic Demand Functions," {\it The
Journal of Political Economy,} 78 (4), 745-763.
\bibitem{rust87} Rust, J. (1987), "Optimal Replacement of GMC Bus
Engines: An Empirical Model of Harold Zurchner," {\it Econometrica,}
55, 993-1033.
\bibitem{spinnewyn81} Spinnewyn, F. (1981), "Rational switching costs,"
{\it European Economic Review,} 15, 91-109.
\bibitem{stiglitz89} Stiglitz, J. (1989), “Imperfect Information in the Product
Market,” {\it Handbook of Industrial Organization: Volume 1},
Richard Schmalensee and Robert Willig, eds. Amsterdam:
North-Holland.
\bibitem{train2003} Train, K. (2003), {\it Discrete Choice Methods
with Simulation,} Cambridge University Press, New York.
\bibitem{vb04} Villas-Boas, M. (2004), "Dynamic Competition with Experience Goods,"
Forthcoming in {\it Journal of Economics and Management Strategy}.
\end{thebibliography}