User's Manual for TMY2s

Table of Contents

Section 4   Comparison with Long-Term Data Sets


Procedures for Developing TMY2s

The TMY2s were created based on the procedures that were developed by Sandia National Laboratories (Hall et al. 1978) to create the original TMYs from the 1952-1975 SOLMET/ERSATZ data. Modifications to the Sandia method were made to better optimize the weighting of the indices, to provide preferential selection for months with measured solar radiation data, and to account for missing data. This appendix begins by summarizing the Sandia method, and then it discusses departures from the Sandia method that were used to create the TMY2 data sets.

Sandia Method

The Sandia method is an empirical approach that selects individual months from different years of the period of record. For example, in the case of the NSRDB that contains 30 years of data, all 30 Januarys are examined and the one judged most typical is selected to be included in the TMY. The other months of the year are treated in a like manner, and then the 12 selected typical months are concatenated to form a complete year. Because adjacent months in the TMY may be selected from different years, discontinuities at the month interfaces are smoothed for 6 hours on each side.

The Sandia method selects a typical month based on nine daily indices consisting of the maximum, minimum, and mean dry bulb and dew point temperatures; the maximum and mean wind velocity; and the total global horizontal solar radiation. Final selection of a month includes consideration of the monthly mean and median and the persistence of weather patterns. The process may be considered a series of steps.

Step 1 - For each month of the calendar year, five candidate months with cumulative distribution functions (CDFs) for the daily indices that are closest to the long-term (30 years for the NSRDB) CDFs are selected. The CDF gives the proportion of values that are less than or equal to a specified value of an index.

Candidate monthly CDFs are compared to the long-term CDFs by using the following Finkelstein- Schafer (FS) statistics (Finkelstein and Schafer 1971) for each index.



      =   absolute difference between the long-term CDF and the candidate month CDF at xi
      n     =   the number of daily readings in a month.

Four CDFs for global horizontal solar radiation for the month of June are shown in Figure A-1. Compared to the long-term CDF by using FS statistics, the CDF for June of 1981 compared the best and the CDF for June of 1989 compared the worst. Even though it was not the best month with respect to the long-term CDF, June of 1962 was selected for the TMY2. This was a consequence of additional selection steps described in the following paragraphs.

Because some of the indices are judged more important than others, a weighted sum (WS) of the FS statistics is used to select the 5 candidate months that have the lowest weighted sums.



      wi      =   weighting for index
      FSi     =   FS statistic for index.

Step 2 - The 5 candidate months are ranked with respect to closeness of the month to the long-term mean and median.

Step 3 - The persistence of mean dry bulb temperature and daily global horizontal radiation are evaluated by determining the frequency and run length above and below fixed long-term percentiles. For mean daily dry bulb temperature, the frequency and run length above the 67th percentile (consecutive warm days) and below the 33rd percentile (consecutive cool days) were determined. For global horizontal radiation, the frequency and run length below the 33rd percentile (consecutive low radiation days) were determined.

The persistence data are used to select from the five candidate months the month to be used in the TMY. The highest ranked candidate month from step 2 that meets the persistence criteria is used in the TMY. The persistence criteria excludes the month with the longest run, the month with the most runs, and the month with zero runs.

Step 4 - The 12 selected months were concatenated to make a complete year and smooth discontinuities at the month interfaces for 6 hours each side using curvefitting techniques.

Weighting and Indice Modifications

The weighting for each index plays a role in the selection of the typical months. Ideally, one would select a month that had FS statistics for each index that were better than all the other months. In practice, this is unlikely because the months might be typical with respect to some of the indices, but not others. By weighting the FS statistics, the relative importance and sensitivity of the indices may be taken into account. The Sandia weighting values and the weighting values used for the TMY2s are compared in Table A-1.

For the TMY2s, an index for direct normal radiation was added. This improves the comparison between annual direct normal radiation for the TMY2s and the 30-year annual average by about a factor of 2 (based on 20 geographically representative NSRDB stations). When only global horizontal radiation is used for the solar index, the TMY annual direct radiation values for the 20 stations were within 4% (95% confidence level) of the 30-year annual average. Using both global horizontal and direct radiation indices reduced the differences to 2%, with no adverse effect on global horizontal radiation comparisons.

Weightings for dry bulb and dew point temperature were changed slightly to give more emphasis to dry bulb and dew point temperatures and less to wind velocity, which is of less importance for solar energy conversion systems and buildings. Neither of the TMY weightings is appropriate for wind energy conversion systems.

The relative weights between solar and the other elements were not found to be particularly sensitive. As an indicator, annual heating and cooling degree days (base 18.3°C) were compared for the TMY2s and the 30-year period for the 20 stations. With the selected solar weighting of 50% (global and direct), annual heating degree days for the TMY2s were within 5% (95% confidence level) of the 30-year annual average. As an extreme, reducing the solar weighting to zero only reduced the differences to within 2.5%. Differences between the TMY2 annual averages and the 30-year averages for cooling degree days were within 9%, for both 0% and 50% solar weightings.

As a consequence of adding the index for direct normal radiation, the persistence check in Step 3 was modified to determine the frequency and run length below the 33rd percentile (consecutive low radiation days) for daily values of direct normal radiation. This information, along with that for the other persistence indices, was then used to select the month satisfying the persistence criteria.

El Chichon Years

The volcanic eruption of El Chichon in Mexico in March 1982 spewed large amounts of aerosols into the stratosphere. The aerosols spread northward and circulated around the earth. This phenomenon noticeably decreased the amount of solar radiation reaching the United States during May 1982 until December 1984, when the effects of the aerosols had diminished. Consequently, these months were not used in any of the TMY2 procedures because they were considered not typical.

Leap Years

TMY2 files do not include data for February 29. Consequently, data for February 29 were not used in leap year Februarys to determine their candidate month CDFs. However, to maximize the use of available data, data for February 29 were included for determining the long-term CDFs.

Preference for Months with Measured Solar Radiation Data

For a station, the NSRDB may contain both measured and modeled solar radiation data. Because of additional uncertainties associated with modeled data, preference in the selection of candidate months were given to months that contained either measured global horizontal or direct normal solar radiation data. This was accomplished between Steps 2 and 3 by switching the ranking of the first and second ranked candidate months if the second ranked month contained measured solar radiation data, but the first ranked month did not.

Month Interface Smoothing

Curve-fitting techniques were used to remove discontinuities created by concatenating months from different years to form the TMY2s. These techniques were applied for 6 hours each side of the month interfaces for dry bulb temperature, dew point temperature, wind speed, wind direction, atmospheric pressure, and precipitable water. Relative humidities for 6 hours on each side of the month interfaces were calculated using psychometric relationships (ASHRAE 1993) and curve-fitted values of dry bulb temperature and dew point temperature.

Allowance for Missing Data

The NSRDB has no missing solar radiation data, but meteorological data are missing for some stations and months. Consequently, when creating the TMY2s, procedures were adopted to account for missing meteorological data. From these procedures, two classes of TMY2 stations evolved: Class A and B.

Class A stations are those stations whose 30-year meteorological data records were the most complete and that had an adequate number (15) of candidate months after eliminating any months with data missing for more than 2 consecutive hours. The minimum of 15 candidate months permitted completion of 90% of the stations without extensive data filling. As indicated in Figure A-2, as few as 15 candidate months yielded typical months that were within the range of differences established by 25 or more candidate months when comparing monthly values of direct normal for TMY2 months with monthly averages of direct normal for the 1961-1990 period. This relationship was also found to be true for global horizontal radiation and heating and cooling degree days.

Class B stations had more missing data than Class A stations, and the data were filled for the index elements used to select the TMY2s. Other elements in Class B TMY2s were not filled and may be missing. Table 1-1 shows elements that may have missing data values in TMY2 files for Class A and B stations.

Class A Stations. There are 216 Class A stations. Missing data for these stations were accounted for in the following fashion:

  1. Long-term CDFs in Step 1, based on the 30-year period (excluding the El Chichon period), were determined using only measured data or data modeled (such as solar radiation) from measured or observed data.
  2. Months were eligible to be candidate months if they had no missing or filled data for periods greater than 2 hours. This accommodated data from 1965 to 1981 that was digitized by NOAA only every third hour. For the elements used for the indices, the missing data for the 2-hour sequences were replaced with interpolated or modeled values.

Class B Stations. The NSRDB data from which the 23 Class B stations were derived have substantially more missing data than the NSRDB data from which the Class A stations were derived. This situation required filling missing data to have sufficient candidate months from which to select typical months. The additional missing data for the Class B stations resulted from such things as equipment problems and the fact that some stations did not operate at night for some or all of the 30-year period. Criteria were relaxed for Class B stations to permit filled data for periods of up to 47 hours to be used in determining the long-term CDFs, and months were eligible to be candidate months if they had no missing or filled data for periods greater than 47 hours. For Colorado Springs, Colorado, the criteria were further relaxed to permit missing data for snow depth and days since last snowfall.

Data-Filling Methods

The TMY2 data sets required filling some missing data that were not filled during the development of the NSRDB. The NSRDB was made complete with respect to solar radiation elements (NSRDB - Vol. 1 1992). This required NSRDB filling of missing data, at least for daylight hours, for elements used to model solar radiation, such as total and opaque sky cover, dry bulb temperature, relative humidity, and atmospheric pressure.

For other meteorological elements, data were not filled in the NSRDB. Consequently, to develop the TMY2s, missing data for dry bulb temperature (nighttime), dew point temperature, and wind speed required data filling to complete the selection of typical months. These elements, along with global horizontal and direct normal radiation, were used to generate statistics to determine the appropriate selection of typical months.

To maximize the usefulness of the TMY2s, other missing meteorological data were also filled, with the exception of horizontal visibility, ceiling height, and present weather. The discontinuous nature of these three elements did not readily lend itself to interpolation or other data-filling methods.

Data filling for TMY2 Class B stations was more extensive than for the Class A stations. TMY2s for Class A stations were restricted to the selection of typical months that had no more than 2 consecutive hours of data missing, whereas Class B stations could have up to 47 consecutive hours of data missing.

Two-hour gaps in data records for Class A and Class B stations were filled by linear interpolation, except for relative humidity, which was calculated based on psychometric relationships (ASHRAE 1993) using measured or filled dry bulb temperature and dew point temperature. For Class B stations, longer gaps from 3 to 47 hours were filled using filled data from the NSRDB if available; otherwise TMY2 data filling-methods were used.

The NSRDB contains filled data for total and opaque sky cover, dry bulb temperature, relative humidity, and atmospheric pressure. NSRDB data gaps up to 5 hours were filled by linear interpolation. Gaps from 6 to 47 hours were filled for the above elements by using data from adjacent days for identical hours and then by adjusting the data so that there were no abrupt changes in data values between the filled and measured data. Many Class B stations did not operate for parts of the night and/or early morning and late afternoon. For these stations, NSRDB data were filled from sunrise to sunset to allow model estimates of solar radiation. However, nighttime data were not necessarily filled.

The TMY2 data sets used procedures to fill nighttime data and other data not filled in the NSRDB. These procedures were used for total and opaque sky cover, atmospheric pressure, dry bulb temperatures, dew point temperatures, relative humidity, wind speed, precipitable water, broadband aerosol optical depth, snow depth, and days since last snowfall. Data elements not filled are horizontal visibility, ceiling height, and present weather.

The TMY2 data-filling procedures are described in the following paragraphs.

Total and opaque sky cover, and atmospheric pressure were linearly interpolated over any missing nighttime periods.

Nighttime dry bulb temperatures were linearly interpolated, and then the filled values were adjusted to preserve nonlinearities, such as more rapid changes in temperature near sunrise and sunset. These adjustments were based on average diurnal profiles determined for each calendar month and appropriately scaled to match the endpoints of the interpolation interval.

Missing daytime dew point temperatures were filled using psychometric relationships (ASHRAE 1993) and measured or NSRDB filled values of dry bulb temperature and relative humidity. The same procedure was also used to fill missing nighttime dew point temperatures if measured or NSRDB filled values of dry bulb temperature and relative humidity were available. Otherwise, missing nighttime dew point temperatures were filled by the procedure used to fill nighttime missing dry bulb temperatures--linear interpolation and then adjustment of filled values based on average diurnal profiles determined for each calendar month.

Missing nighttime relative humidity values were filled using psychometric relationships and dry bulb and dew point temperatures. Dry bulb temperatures used were measured or NSRDB filled or TMY2 filled, and dew point temperatures used were measured or TMY2 filled.

Missing wind speed data, for up to 47 hour gaps, were filled by the procedure used to fill nighttime missing dry bulb temperatures--linear interpolation and then adjustment of filled values based on average diurnal profiles determined for each calendar month.

Missing wind direction and precipitable water, for up to 47 hour gaps, were linearly interpolated. For calm winds, wind direction was set to zero (north).

Broadband aerosol optical depth values in the TMY2s are daily values provided by seasonal functions derived during the development of the NSRDB. The seasonal functions are sinusoidal with respect to the day of the year and have peak values occurring in the summer.

Snow depth and days since last snowfall data were available from the NSRDB for all but Colorado Springs and a few stations at southern latitudes, such as Guam and Puerto Rico. So much data were missing for Colorado Springs that no attempt was made to fill the data, and missing data for the elements snow depth and days since last snowfall were flagged as missing. For the southern latitude sites that do not receive snow, snow depth was set to zero and days since last snowfall was set to 88, meaning 88 or more days.

Quality Control

Data were checked before and after processing to ensure that data were reasonable. NCDC provided information identifying some erroneous dew point temperature data in Version 1.1 of the NSRDB, where dew point temperatures exceeded dry bulb temperatures. During processing of the NSRDB data to generate the TMY2s, dew point temperatures were checked to make sure they did not exceed dry bulb temperatures. If they did, the dew point temperature was calculated using relative humidity and dry bulb temperature, if available; otherwise, the data were considered missing.

NCDC also identified three stations (Chattanooga, Tennessee; Huntsville, Alabama; and Louisville, Kentucky) that had erroneous total sky cover data for the period 1970-1974. The cloud cover data had been set to 10 for non-3-hourly values (correct values were present every 3 hours). Consequently, modeled solar radiation for these stations and times would be erroneous. For the TMY2s, data for these stations and time periods were excluded.

Post-processing checks revealed that some of the selected TMY2 months had solar radiation values with obvious errors (diffuse radiation values were zero even though global horizontal and direct normal radiation were a few hundred watt hours). Consequently, these stations were reprocessed with the affected data being excluded. The stations with months excluded during the reprocessing because of erroneous solar data are: Boulder, Colorado (2/88, 3/85, 5/85, and 10/85); Lake Charles, Louisiana (2/80); Caribou, Maine (4/78, 7/85, and 7/72); Great Falls, Montana (10/89); Omaha, Nebraska (5/85, 5/89, and 11/81); Ely, Nevada (6/89 and 9/88); Guam, Pacific Islands (1/88, 9/79, and 9/88); El Paso, Texas (12/88); Midland, Texas (5/80 and 12/79); Salt Lake City, Utah (5/88, 8/80, and 10/89); Lander, Wyoming (3/88 and 8/80).

Calculation of Illuminance Data

To facilitate lighting and energy analysis of buildings, hourly values for global horizontal illuminance, direct normal illuminance, diffuse horizontal illuminance, and zenith luminance were added to the TMY2 data sets. These elements were calculated using luminous efficacy models developed by Perez et al. (1990). Inputs to the models are global horizontal radiation, direct normal radiation, diffuse horizontal radiation, and dew point temperature. The luminous efficacy in terms of lumens per watt is determined as a function of sky clearness, sky brightness, and zenith angle.

Assignment of Source and Uncertainty Flags

With the exception of extraterrestrial horizontal and extraterrestrial direct radiation, each data value was assigned a source and uncertainty flags. The source flag indicates whether the data were measured, modeled, or missing, and the uncertainty flag provides an estimate of the uncertainty of the data. Source and uncertainty flags for extraterrestrial horizontal and extraterrestrial direct radiation are not provided because these elements were calculated using equations considered to give exact values.

Usually, the source and uncertainty flags in the TMY2 data files are the same as the ones in the NSRDB, from which the TMY2 files were derived. However, differences do exist for data that were flagged missing in the NSRDB, but then filled while developing the TMY2 data sets. Differences are also present for illuminance and luminance data values that were not included in the NSRDB. Uncertainty values apply to the data with respect to the time stamp of the data, and not as to how "typical" a particular hour is for a future month and day. The uncertainty values represent the plus or minus interval about the data value that contains the true value 95% of the time.

The uncertainty assigned to modeled solar radiation data includes only the bias error in the model and not the random error component, which could be several times larger for partly cloudy skies. For partly cloudy skies, an hour can be composed of large or small amounts of sunshine, depending on whether the sun is mostly free of the clouds or occluded by the clouds. Consequently, modeled hourly values may depart significantly from true values for partly cloudy skies. The uncertainty assigned to modeled solar radiation data represents the average uncertainty for a large number of model estimates (such as for a month). When averaging large data sets, random errors tend to cancel, leaving only the bias error.

Uncertainties for values of illuminance and luminance were determined by taking the root-sum-square of the two main sources of error: (1) uncertainty of the solar radiation element (global horizontal, direct normal, or diffuse horizontal radiation) from which the illuminance or luminance element is derived, and (2) uncertainty of the model estimate.

The uncertainty of the model estimates are based on the evaluation presented by Perez et al. (1990) for six test stations. To be conservative, the following model mean bias errors for the stations with the largest errors were used:

The uncertainty of the illuminance data value was then determined as the root-sum-square of the model uncertainty and solar radiation element uncertainty.

The use of the bias error, instead of bias and random error, is consistent with the approach in the above paragraph concerning the assignment of uncertainty values to modeled solar radiation elements. Consequently, it also has the same implications. The assigned uncertainty is representative of the average uncertainty for a large number of model estimates (such as for a month), but the actual uncertainty of the individual modeled illuminance and luminance values is greater than indicated.

For meteorological elements, relative uncertainties from the NSRDB were used. These uncertainties do not portray a quantitative evaluation of the uncertainty of the meteorological elements, but rather give relative uncertainties based on the data and the manner in which they were derived (NSRDB-Vol. 1 1992).

The source and uncertainty flags for the solar radiation, illuminance, and meteorological elements are presented in Tables 3-3 through 3-6.


ASHRAE (1993). 1993 ASHRAE Handbook: Fundamentals. Atlanta, GA: American Society of Heating, Refrigerating and Air-Conditioning Engineers, Inc.

Finkelstein, J.M.; Schafer, R.E. (1971). "Improved Goodness-of-Fit Tests."Biometrika, 58(3), pp. 641-645.

Hall, I.; Prairie, R.; Anderson, H.; Boes, E. (1978). Generation of Typical Meteorological Years for 26 SOLMET Stations. SAND78-1601. Albuquerque, NM: Sandia National Laboratories.

NSRDB - Vol. 1 (1992). User's Manual - National Solar Radiation Data Base (1961-1990). Version 1.0. Golden, CO: National Renewable Energy Laboratory and Asheville, NC: National Climatic Data Center.

Perez, R.; Ineichen, P.; Seals, R.; Michalsky, J.; Stewart, R. (1990). "Modeling Daylight Availability and Irradiance Components from Direct and Global Irradiance." Solar Energy, 44(5), pp. 271-289.

Appendix B   Key to Present Weather Elements

Table of Contents

Return to RReDC Homepage ( )