I am attaching 3 files here: the input CSV file which we discussed today, the R code, and the PPT containing the variable distributions in the original population. It would be great if the script could be set up such that the user inputs the following:
(1) Range of acceptable values for each variable in each level (in our example, there are 8 levels)
(2) # of samples we want to select from each level (in this case, 13)
(3) The CSV file
这部分可见:http://www.statmethods.net/input/importingdata.html
And the script will produce as outputs:
(1) The seed that produced the final set of 13 samples for each level
(2) The list of selected sample IDs (in this case, 13) for each level. In this case, the sample IDs are the ‘Sitesubj’ variable in the CSV file.
(3)  
lots and tables showing the final variable values (all scaled in %) for each level
这部分可见:http://www.statmethods.net/management/index.html
http://www.statmethods.net/graphs/index.html
Something else I forgot to mention, which might be of help is that you can set the y-axis range for the boxplots using the command:
boxplot(s1$AGE, ylim=c(10,77))
This helps when comparing the original boxplot to the sample boxplot, since you can set their y-axes to be the same.
Also, you had asked about the continuous variables (what to use for range). I should have asked if you were familiar with boxplots, but here is a brief reference:
http://www.childrensmercy.org/stats/definitions/boxplot.htm
There are 5 numbers that comprise a boxplot—so you could pick some of them to use for setting your ranges. I would suggest trying the median, mean, and box length, since these are the most important components
-- by 会员 阿尔瓦雷斯 (2012/5/12 6:29:00)