Overview: PickStock (copyright 2003, Bell Software and Services, Inc.) is a computer program designed to use principal components analysis to search out potentially underpriced stocks. The program is believed to be an implementation of what is known in the financial market modeling community as "arbitrage pricing theory" (see for example S. Ross, "The arbitrage theory of capital asset pricing", J. Economic Theory, vol. 13, pp. 341-360, 1976). The theory postulates that a given vector of prices (for example, today's price for a set of stocks) can be explained by a linear combination of underlying economic components. While there is apparently substantial debate in the economic community as to the viability of the arbitrage pricing theory, this application provides a tool to test it out on real data.
Who can best make use of this application:
It is hoped that any user who can download stock price data into MetaStock text format will find PickStock sufficiently easy to use. Facility in handling text files and importing them into spreadsheets is also a plus. Economists and stock analysts may be able to utilize the program along with a more traditional analysis of economic fundamentals. Scientists, engineers, and students may even be able to tap into the powerful principal components analysis engine that PickStock incorporates by replacing closing prices with arbitrary data (ie. a given "date" can be associated with one experiment having outputs for each of the "tickers" with the output values corresponding to the ticker being in the "close" field).
Input file (User Supplied):
Required user input is a text file (obtained through other sources) containing split-corrected stock price history data for all the stocks, trading dates, and closing prices to incorporate into the model. Prices must have been corrected for splits (eg. when a stock executes a "2 for 1" split all prices prior to that day must be multiplied by 0.5), many stock price downloading services perform this correction automatically or just prior to user export into a text file. Either type the full path of the filename or use the browse button to select the input file. An example input file is supplied named "stockdata.txt". There is no limit to the size of the input file; it may contain decades of price data (once the program is registered). This input file must be in MetaStock (Equis International) ASCII text file format. This file format is a single text file of comma-separated values with one line per ticker per day of trading. The top line of the file is a header line and must look something like this (must have at least ticker, date and close fields):
<ticker>,<per>,<date>,<open>,<high>,<low>,<close>,<vol>
Either all letters in the header line should be the same case or only the first letter of each field capitalized. If the date field label is <date> (as above) then a date format mm/dd/yyyy is indicated; <dtyyyymmdd> then an eight-digit slashless date is indicated. The ticker (stock identifier) field must be the first field
Each line of data must correspond to this header and the file must be sorted first by ticker then chronologically by date. For example, data corresponding to the above header line are respectively: ticker symbol for the company, period of the data, date of the prices, opening price for the day, high price for the day, low price for the day, closing price for the day, and volume traded. Until the program is registered no more than ten stocks may be used from the input file. The program operates only on the covariance matrix of transformations of daily closing prices at this time.
It should also be noted that this program can be readily used in principal components analysis applications unrelated to stock analysis. For example, for spectroscopy applications wavenumber (wavelength) would correspond to ticker and experiment identifier would correspond with date. Plot the values in EIG.TXT to see the number of components then use this number of eigenvectors (EIGV????.TXT) to reconstruct component spectra (see E. Malinowski, "Factor Analysis in Chemistry").
Output:
Output is primarily a file called "results.txt" (other output files enable import of these results into the stock analysis computer program, StockWiz, TM of i-Soft, Inc.) that lists the stocks from the user-supplied data in order of relative undervalue (most undervalued on the top of the list) and actually provides a statistical estimate of the magnitude of undervalue; an indication of -0.15 would indicate a stock with current value of $100 per share is currently worth $115 (statistically speaking, ignoring all fundamentals except price and price covariances). Obviously, there are many times when a company's business performance makes an anomalously low price reasonable. But when this program indicates a stock price is below PickStock's estimated value (ie. toward the top part of the file "results.txt") and there is no adverse news on the recent horizon the stock might be truly undervalued. There are of course no guaranties as to the predictive ability of this program, for investment advice consult an investment professional. The program makes no investigation of any stock fundamentals (earnings, P/E ratio, growth, capital spending, etc.) so it is designed only as one of several filters an investor might consider in assessing the true relative value of a stock. Ancillary output files are also generated for the list of tickers used (TICKER.TXT), eigenvalues (EIG.TXT) and eigenvectors (EIGV????.TXT) of the user-selected model; each of the eigenvectors has elements that correspond to the stocks in TICKER.TXT and are in the same sequence. These generated eigenvectors may be re-used in subsequent PickStock runs for short, approximate run (picking yes in the first radio button).
Long version or short version:
Select "Yes" here for an approximate calculation of the long version results when the same set of stocks is being used as was recently used with a long version run of PickStock. It assumes that the few days of trading after the long version run will not substantially affect price averages and covariances. This results in calculation times sometimes on the order of a few seconds rather than hours sometimes required for the long version. Also, the various working files (eg. pickstock.bin, eigv????.txt, ticker.txt) must not have been altered since the long version was run. Select "No" if this is a new set of stocks or if the statistics need freshening to incorporate recent data.
The algorithm:
The program utilizes a principal components analysis algorithm involving only closing prices so at best it will identify anomalously low- (or high-) priced stocks. (For those of you interested in algorithms, the primary eigensolver used in this application is ARPACK which was written by Rich Lehoucq, Kristi Maschhoff, Danny Sorensen, and Chao Yang; see their User's Guide at http://www.caam.rice.edu/software/ARPACK/).
Troubleshooting:
When only one eigenvalue explains most of the variance this may be pointing to input data errors which can sometimes be traced by plotting EIGV0001.TXT and TICKER.TXT alongside one another in a spreadsheet. Where there are anomalous spikes the corresponding ticker should be examined for major discontinuities (data errors in the input file). The user may also want to study the eigenvectors by imported the TICKER.TXT and EIGV????.TXT files into a spreadsheet or use them in other neural network or econometric types of analysis.
First and Last Trading Dates Used: Indicate the first and last trading dates defining the range of data to be used in calculations (must be in the same format as in the Metastock ASCII text file provided). If dates are outside the range of dates in the two supplied date databases (dates.txt and dates_noslashes.txt) then you must expand these two files such that they cover all the dates used in the stock price file. Note, specifying the data to be used in this manner means that a larger stock price database may be used than is necessary (facilitating what-if or historical modeling).
Specify goodness of fit - by fraction of variance: This is one of several ways of specifying the degree to which you would like the entire body of data fit to a model. Enter a fraction less than 1.0 but probably greater than 0.9 (recommend about 0.995). This is analogous to what statisticians refer to as a correlation coefficient (R squared). The system utilizes a principal components analysis approach operating on the covariance matrix of closing prices and will use enough of the principal components so as to account for at least this fraction (specified in this field) of the variance. Click the radio button to the left if you would like to specify this criterion for selecting the number of components used or click on another radio button to select a different criterion.
Specify goodness of fit - by number of principal components: This is one of several ways of specifying the degree to which you would like the entire body of data fit to a model. This option allows the user to specify a set number of principal components to be used in the model. The number must be less than or equal to the number of stocks actually used in the calculation after stocks have been removed by the program for excessive missed days of trading or excessive daily volatility (see user-specified fields below). Usually one wants to choose as few components as possible to reasonably model the data (recommend about 21). Click the radio button to the left if you would like to specify this criterion for selecting the number of components used or click on another radio button to select a different criterion.
Maximum consecutive number of missed trading days allowed: Indicate the maximum consecutive number of missed trading days allowed. The thinking is that long stretches of missed trading days indicate some type of problem with either the data or with the company itself. Either way, inclusion of companies with extended absenteeism is user-controlled by varying this number.
Maximum fractional daily change allowed: Indicate the maximum fractional daily change allowed for a stock. If a stock between any two consecutive trading days either drops by this fraction or increases by this fraction it is not calculated into the model.
Price Transformation Options:
Choose 1 of 4 transformation options:
1. No transformation: Closing price data are not transformed (that is to say, there is no nonlinear operation such as a logarithm) performed on the data before being processed into a covariance matrix. Note, the average value of each stock's price is not subtracted before the data are processed into a covariance matrix. This option might actually be used in certain scientific applications (eg. principal components analysis of spectrophotometric titrations) but might require data file preprocessing.
2. Log transformation with average of logarithms subtracted: Closing price data are transformed by taking their natural logarithm (log base e) then subtracting the average of the natural logarithm of each stock price over all days of trading before being processed into a covariance matrix. This statistical transformation may have the effect of making a stock's variability closer to a normal distribution (bell curve), so it may result in a better model than with no transformation.
3. Log transformation with logarithm of price on earliest common day of trading subtracted: Closing price data are transformed by taking their natural logarithm (log base e) then subtracting each stock's natural logarithm of the closing price on the earliest common day of trading. This statistical transformation recognizes that the most important perspective for an investor is how stocks did relative to where they each were at a given point in time. Closing price data are transformed by taking their natural logarithm (log base e) then subtracting the natural logarithm of each stock price on the earliest common day of trading before being processed into a covariance matrix. The log transformation again will have the effect of making a closer approximation to a normal distribution (bell curve) thereby hopefully improving the model. Subtracting the log of price on a given day should have the effect of enabling a fair comparison of all stocks on a "time now" basis.
4. Exactly the same as option 3 above except that data are then weighted in proportion to the natural logarithm of the product of stock volume and closing price on the last day of trading.