## FREQUENTLY ASKED QUESTIONS (FAQ)

This text addresses some frequently asked questions (FAQ), divided into some broad categories. Many other questions by individual users have been answered in the online forum (access via the forum page), and readers are invited to check existing posts there before raising a new question.

### 1 General

**Q:**What is*Apollo*?*Apollo*is a software package for the R programming language. It is a set of tools to aid with the estimation and application of choice models in R. Users are able to write their own model functions or use a mix of already available ones. Random heterogeneity, both continuous and discrete and at the level of individuals and observations, can be incorporated for all models. There is support for both standalone models and hybrid model structures. Both classical and Bayesian estimation is available, and multiple discrete continuous models are covered in addition to discrete choice. Multi-threading processing is supported for estimation. A large number of pre and post-estimation routines, including for computing posterior (individual-level) distributions, are available. For examples, a manual, and a support forum, visit www.ApolloChoiceModelling.com. For more information on choice models see Train (2009) and Hess and Daly (2014) for an overview of the field.

### 2 Installation and updating of *Apollo*

**Q:**How do I install*Apollo*?- Type install.packages("apollo") in the R console and press enter.
**Q:**How do I update*Apollo*?- You update
*Apollo*by re-installing it. Type install.packages("apollo")in the R console and press enter. **Q:**Why do I get an error during installation that one of the packages is not available for the R version I am running?- You are likely running an old version of R. Update R to the latest version and re-install
*Apollo*. You can get the latest version of R from https://cran.r-project.org/ **Q:**Why does installation work on my home laptop/desktop computer but fail in my company laptop/desktop computer- Often computers from big organisations will install R packages in shared libraries (that is in folders in the private company network). R does not like its libraries to be in shared folders. As a general recommendation, always install packages in local libraries, i.e. in a folder in the local hard drive. You can see your active libraries by typing .libPaths(). This will list the active libraries. If the local library is, for example, the second one in the list, you should keep only
that one by typing .libPaths(.libPaths()[2]). Then you should try installing
*Apollo*again.

### 3 Data

**Q:**Why do I get the error message Error in file(file, "rt") : cannot open the connection when trying to open the data?- This is most often caused by the user forgetting to set the working directory meaning that R cannot find the data file. Or there could be a typo in the name of the data file.
**Q:**Can I use “long” formatted data in*Apollo*?- No.
*Apollo*requires data to be in a “wide” format, meaning that all necessary information to calculate the likelihood of a single observation should be contained in a single row of the data. In more practical terms, for an MNL model, this means that attributes for all alternatives should be contained in each row. This format is the more common format in choice modelling, uses less space, and is also more general in allowing for a mixture of different dependent variables in the same data. **Q:**Can I use a list of dummy variables to represent the choice? For example, for a choice between three alternatives, with the third chosen use variables alt1=0, alt2=0, alt3=1?- No.
*Apollo*requires the user to encode the choice in a single variable. In this case, it would be a variable (for example called “choice”) that could take only three values (for example 1, 2 or 3). This is easily created in R on the basis of separate dummy variables. **Q:**Can*Apollo*estimate models with aggregate share data?- Current versions of
*Apollo*require one alternative to be chosen in each row in the data, rather than using data where in each row, each alternative has a share of the choices, with these summing to 1 across alternatives. This type of data can be accommodated either by users coding their own model probability function inside*Apollo*or by replicating each row a number of times. For example, in a binary case with shares of 65-35, the user could replicate the row 100 times, with 65 rows choosing alternative A and 35 rows choosing alternative B. **Q:**Can I model “dual response” survey data with*Apollo*?- Yes, this is possible. We recommend beginning by modelling both questions separately, and if there is evidence of the parameters being similar, then estimate them jointly using a scale parameter between them. See apollo_example_22 to learn how to conduct joint estimation in
*Apollo*.

### 4 Model specification

**Q:**What distributions are possible for random coefficients in*Apollo*?- There are no limits imposed on distributional assumptions. The user can specify whatever distributions they want to use. Distributions can be coded as transformations of either Normal or Uniform draws. While transformations of Normal draws can be used for Lognormal, Censored or Truncated Normals and Johnson SB, Uniform distributions open up even broader scope as an inverse cumulative distribution function can be applied to the Uniform draws for a huge set of possible distributions.
**Q:**How can I capture the panel structure of my data in*Apollo*?- The treatment of panel data depends completely on the model being used. Whenever the data contains multiple choices per individual, the analyst needs to use the function apollo_panelProd to group them together in estimation, except if using the setting apollo_control$panelData=FALSE, in which case the data will be treated as if all observations came from separate individuals. In models without any random heterogeneity, such as MNL, there is no explicit modelling of the correlation across choices for the same individual. All that will happen by using apollo_panelProd is that the calculation of the robust standard errors recognises that the choices come from the same individual. In models with random heterogeneity, such as Mixed Logit, the analyst can more explicitly account for the panel structure, for example by specifying that the heterogeneity in any random taste coefficients is across individuals, not within individuals, and/or by including an explicity pseudo-panel effect error component. These issues are discussed in detail in the manual.
**Q:**How can I avoid writing each utility function separately if I have tens or hundreds of alternatives?- If the utilities all use the same structure but with different attributes for each alternative, then the utility functions (and availabilities) can be written iteratively. For example, imagine we have 100 alternatives, with attributes x1_j and x2_j for alternative j, then we can use:
V = list()

for(j in 1:100){

V[[paste0("alt",j)]] = (b1*get(paste0("x1_",j))+b2*get(paste0("x2_",j)))} **Q:**What starting values should I use for the thresholds in my Ordered Logit/Ordered Probit model?- The thresholds need to be different from each other, and monotonically increasing. If the thresholds are too wide, extreme ratings will obtain very low or zero probabilities. If the thresholds are too narrow, extreme ratings will obtain very large probabilities. Either of these can lead to estimation failures. Some trial and error is often required, but a good starting point is to have thresholds symmetrical around zero, going to extreme values of +/-3.
**Q:**How many draws should I use to estimate my models with random components?- There is no correct answer to this question. More draws is always better. The likelihood of the model is given by an integral without a closed form solution, and the simulation based appraoch only offers an approximation to this integral. Using a low number of draws means that the approximation to this integral is poor. In simple words, it means that the model we
*are*estimating is not the one we*think*we are estimating. The parameter estimates will then be biased for the model we actually specified.**Q:**But the log-likelihood of my model is better with fewer draws, so isn’t that good?- The fact that the log-likelihood is better with fewer draws does not justify the use of fewer draws. It is simply a reflection of the fact that fewer draws offers a poor approximation to the real model. Once the number of draws is increased to a sufficient number, the model fit will stay much more stable for further increases.
**Q:**Why does my model converge with a low number of draws, but fail with a high number of draws?- The fact that the model does not converge with a high number of draws shows that there is a problem with the model. It is known that using a low number of draws can mean a model that is overspecified still converges and can give every impression of being identified (?).
**Q:**Can I at least use fewer draws if I use quasi-Monte Carlo draws?- In theory, yes, but again, more is better. Care is also required in deciding which type of draws to use. Halton draws are an excellent option for models with a low number of random parameters, but the colinearity issues with Halton draws means they should not be used with more than say 5 random components.
**Q:**To keep estimation cost under control, can I use a low number of draws in my specification search and then reestimate the final model with a large number?- This is unfortunately a rather common practice, but it is misguided to think this is a good approach or that it solves the issues arising with using low numbers of draws. The fact that low numbers are used during the specification search means that the approximation to the integral is poor at that stage. This in turn means that the decisions that are leading to the final model specification may themselves be biased. While the final specification is then estimated robustly with a large number of draws, it may in fact be inferior to a specification that would have been obtained by using a high number throughout the specification search.

**Q:**Does*Apollo*allow me to separate scale heterogeneity from preference heterogeneity?- The notion that it is possible to separate out scale heterogeneity from other heterogeneity is a myth, as discussed at length by ?. Many models can allow for scale heterogeneity, but no model can separate it from preference heterogeneity, and there is no need to do so.
**Q:**So can*Apollo*estimate the GMNL model?- The GMNL model is in fact not a new model or a more general model. It is simply a Mixed Logit model with a very particular set of constraints applied to it. It is not more general than Mixed Logit, which is the most general RUM model (cf. ?)]. Given that
*Apollo*allows full flexibility, users can of course specify the heterogeneity in a Mixed Logit model using the GMNL style constraints, but should be mindful when it comes to interpretation of the results given the above points. **Q:**So how about scale adjusted Latent Class (SALC)?- A SALC model is affected by the same issues as discussed by ? for GMNL. It is not possible to separate out scale heterogeneity from other heterogeneity. Users of
*Apollo*can produce a SALC specification, which is simply a two layer Latent Class model, by using S_{1}*S_{2}classes, allowing for S_{1}sets of beta parameters and S_2 sets of mu (scale) parameters, with an appropriate normalisation, and with the S_{1}* S_{2}classes using all combinations of beta and mu model will be more general than a Latent Class model with S_{1}classes with different beta, but less general than a model with S_{1}* S_{2}classes with different beta.

### 5 Errors and failures during estimation

**Q:**Why does*Apollo*complain that some function arguments are missing or incorrect?- There are different reasons for this, but the most likely cause is that the analyst has used the wrong order of arguments. For the predefined functions, the order of arguments passed to the function should be kept in the order specified for the function. For example, if a function is defined to take two inputs, namely dependent and explanatory, e.g. model_prob(dependent,explanatory), and the user wants to use choice and utility as the inputs, then the function can be called as model_prob(choice,utility) but not as model_prob(utility,choice). The latter change in order is only possible if the function is called explicitly as model_prob(explanatory=utility,dependent=choice), which is the same as model_prob(dependent=choice,explanatory=utility).
**Q:**Why is my estimation failing with the message “Log-likelihood calculation fails at starting values”?- This happens when, at the starting values, the likelihood of the model is zero or or cannot be calculated for at least some people in the data.
*Apollo*will report the IDs of these individuals. Three common reasons exist for this problem:**Q:**Are the starting values feasible/appropriate for the model?- The most common reason for the initial likelihood calculation to fail is a problem with the values used in apollo_beta. The starting values of some parameters may be invalid. For example, the molde may be dividing by a parameter with an initial value equal to zero. Also, different models have different requirements, for example the structural parameters in nested and Cross-nested Logit models should be different from zero; the alpha parameters in Cross-nested Logit models should be between zero and one; the gamma and sigma parameters in MDCEV should be greater than zero; the thresholds in Ordered Logit and Probit models should be different and monotonically increasing; the variance of linear regressions should be positive; etc. Another potential cause (less common than the previous one) is that initial values are too poor, leading to an initial likelihood too close to zero, which in turn leads to an infinite log-likelihood. To avoid this, the user should look for better starting values, either by estimating a simpler model and using those estimates as starting values, or using the apollo_searchStart function.
**Q:**Does the data contain many observations for each person and/or does the model use several components (i.e. hybrid choice)?- When multiplying together the probability of many individual observations at the person level (using apollo_panelProd), it is possible that the product becomes too close to zero for R to store it as a number. The same can happen when combining many individual model components using apollo_combineModels. The risk of this is greater in case of models with low probabilities, such as in the case of large choicesets. A solution to this problem is to use set workInLogs=TRUE in apollo_control. This ensures that all calculations are made with the logarithms of probabilities, avoiding the issue of multiplying many small numbers. The use of this setting is however only recommended when necessary as it will slow down estimation.
**Q:**Does the model use lognormal distributions for random coefficients?- The value of lognormally distributed coefficient is given by beta=exp(mu+sigma*xi), or beta=-exp(mu+sigma*xi) n the case of a negative lognormal distribution. The estimated parameters thus relate to the mean and standard deviation of the logarithm of beta (or the logarithm of -beta). A common mistake is to start mu at zero, just as in the case of a normally distributed beta. With the exponential, this will lead to a large starting value for beta, which can result in numerical problems. In the case of lognormally distributed coefficients, it is thus advisable to use a large negative value for the starting value of the mean of the logarithm of the coefficient, e.g. starting mu at something like -3 or lower, as this would imply starting the median of beta close to zero.

**Q:**Why do I get an error saying that one of my parameters does not influence the likelihood, even though I am using it in a utility function?- There may be several reasons for this. A common mistake is when writing utilities (or any other code statement) across multiple lines, the link between lines is missing and only the first one is considered. To split a statement across multiple lines, the incomplete lines should finish with an operator. For example:
U[["A"]] = b0 + b1*x1

+ b2*x2

will ignore the effect of b2*x2. Instead, it should be:U[["A"]] = b0 + b1*x1 +

b2*x2

It could also be that the attribute associated with the parameter does not vary across the utility functions, or that the same constant is included in all utility functions. Much less common, it could be that the starting probabilities in your model are so small that due to rounding errors, they are equal to zero, and changes in parameter values also lead to small probabilities not different from zero. This is more likely to happen in complex models with many observations per individual. In this case, we recommend (1) to begin by estimating a simple model (e.g. constants only) to obtain better starting values, and (2) to set the option apollo_control$workInLogs=TRUE. This last option will increase numerical precision at the expense of estimation speed. **Q:**Estimation of my model failed after a long time. Have I lost all the information?- If estimation was run using BFGS,
*Apollo*will produce a*csv*file with the parameter values at each iteration in the working directory, using the name given to the model inside apollo_control$modelName. **Q:**Why does my estimation fail, saying the maximum number of iterations has been reached?- The default number of iterations for estimation is set to 200. This can be increased in estimate_settings$maxIterations. In general however, if a model has not converged after 200 iterations, this could be a sign of problems with the model. Inspecting the iterations file produced during estimation can help diagnose if there is a problem or if more iterations are required.

### 6 Model results

**Q:**Why am I getting Inf or NaN for standard errors?- There are several main reasons why this could happen:
**Q:**The model could have theoretical identification issues- To diagnose these issues is not easy, as requirements change depending on the particular structure of the model. For example, in random utility models, only difference in utility matters, so the constant of at least one alternative must be fixed to zero. Similarly, a normalisation is required for categorical variables. In Hybrid choice models, the variance of each structural equation (or the slope of one measurement equation per latent variable) should be fixed to one.
**Q:**The model could be too complex for the data, leading to empirical identification issues- Many users fall into the trap of believing that choice models are easy tools and that the most complex model should always be used. Instead, analyst should always begin by by estimating the slest possible model and moving progressively towards more complex formulations. This will help troubleshooting any potential identification problems. In Mixed Logit for example, analyst should always start by introducing only a few random coefficients, leaving the rest as fixed, and progressively making the model more general.
**Q:**Differences in scale between parameters can complicate the calculation of the standard errors- Calculating standard errors requires inverting the Hessian matrix at the estimated value of the parameters. This Hessian is based on numerical derivatives, looking at small changes to either side of the estimates. If the parameters have different scales (e.g. some are close to 0.1 while others are closer to 100), this could lead to problems with this process. Similarly, inverting the Hessian itself can be challenging due to numerical precision issues in this case. This can be diagnosed by looking at model$hessian. If it has very big and very small values, inverting it could be problematic. In these cases, using the estimate_settings$scaling option can help. This is an optional setting that can be given to the apollo_estimate function to scale parameters and therefore avoid numerical issues.
**Q:**The calculation of the numerical derivatives could lead to some zero probabilities- Especially with complex models, the calculation of the numerical derivatives can be affected by a small number of calculations leading to zero probabilities. Greater stability can in this case be obtained by using bootstrapping for estimating the standard errors (i.e. setting estimate_settings$bootstrapSE=TRUE), obviously at the cost of increased estimation time.

**Q:**Why is my estimate of the standard deviation negative?- This happens if a random coefficient is coded as randCoeff[["beta"]]=mu+sigma*draws where draws is a random variate that is symmetrical around zero.
**Q:**So should I constrain the standard deviation to be positive?- There is no reason for doing so. The results will be the same if the random variate draws is symmetrical around zero. Imposing constraints will also make estimation harder. And of course, if a user wants to allow for correlation between individual coefficients, then the parameters multiplying the draws need to be able to be positive or negative.

**Q:**Why are my structural/nesting parameters greater than one or smaller than zero in Nested or Cross-nested Logit?*Apollo*does not constrain the structural parameters in Nested (NL) and Cross-nested (CNL) Logit models to be between zero and one. If, after estimation, the structuraleparameters are outside of this interval, this could beisvidence of the nesting structure not being supported by the data. The userYouould then try a different nesting structure.**Q:**So should I constrain them to be between 0 and 1?- In general, while possible using the settings in maxLik, we do not recommend imposing constraints. If unconstrained estimation yields an estimate outside the bounds of the interval of acceptable values, then it is highly likely that the use of constraints will lead to an estimate that goes to one of the bounds of the interval that the parameter is constrained in. The model fit will be inferior too and the real problem will simply be masked by the constraints.
**Q:**So how about constraints on other parameters, such as standard deviations or gamma parameters in MDCEV?- It is common practice to use exponential transforms for parameters that are only allowed to be positive, e.g. using gamma = exp, with gamma
_{0}being estimated. We have found that this often slows down estimation and does not necessarily lead to the same solution as unconstrained estimation even if the latter finds an acceptable solution. The reason for the problem is that small changes to gamma_{0}will lead to large changes in gamma, making estimation difficult, especially with numerical derivatives. **Q:**How do I calculate hit rates for my model in*Apollo*?- We made the decision not to include hit rates in
*Apollo*outputs. They really offer a very distorted view of the results. Models give probabilities in prediction. If, for each task, an analyst just looks at what alternative has the highest probability, then they’re ignoring the error term in the model. To put it succinctly, imagine you have a case with 2 alternatives, and we have 2 models. Model A gives a probability of 51% for the chosen alternative in 70% of cases in model 1, but a probability of only 10% in the remaining 30% of cases. Model B gives a probability of 49% for the chosen alternative in 70% of cases in model 1, but a probability of 90% in the remaining 30% of cases. Using a hit rate would give model A a figure of 70%, and model B a figure of 30%. But clearly model B is far superior. That’s why outside marketing, choice modellers work with probabilities of correct prediction instead if they want a percent measure like this. And that would give 0.387 for model A but 0.613 for model B.

### References

Hess, S., Daly, A., 2014. Handbook of Choice Modelling. Edward Elgar publishers, Cheltenham.

Train, K., 2009. Discrete Choice Methods with Simulation. second edition ed., Cambridge University Press, Cambridge, MA.