To import a data file, use the 'Import Data' option in the File menu. The data
file must be in a tab-delimited text format (for example, exported from an Excel
spreadsheet).
You are first asked to select the file you wish to import. Once that is done you are
presented with the import assistant which will guide you through specifying where various
data items are within the file in order to be able to import it correctly.
To work through the import assistant, just enter the relevant details in the lower
frame and press the 'Forward' button to move onto the next section. You will need
to enter the following pieces of information:
- File Layout
- This specifies whether the markerss are in rows or columns. For example choosing 'Column' here
means that a single marker has its values coded into one or two columns. This also means that
each sample is on a separate row. Choosing 'Row' here would specify that each marker is on
a different row, and the samples are the columns.
- Range of Genetic Markers
- Here you specify what range of rows and columns contain the actual values for your
markers. This should exclude any information on covariates since they are defined later.
- Genetic Marker Type
- Here you specify whether your markers are biallelic (for example, SNP data) or multiallelic
(for example, microsatellite data).
- Marker Coding Format
- If you are using biallelic data, you are asked to supply the coding method used to signify alleles.
There are five possible values:
- Adjacent Columns of 0 or 1 - the SNP is coded as two values in consecutive columns.
Each value is a present/absent or wild-type/variant indicator for that SNP. E.g. 01 indicates heterozygous.
- Single column of 0, 1 or 2 - the SNP is coded as one digit that can have 3 values
to represent homozygous wild-type, homozygous variant, or heterozygous
- Adjacent rows of 0 or 1 - the SNP is coded as two values in consecutive rows.
Each value is a present/absent or wild-type/variant indicator for that SNP
- Adjacent columns of letters - the SNPs are represented by their actual base-pair
values in two adjacent columns separated by a tab. For example: CG indicates a heterozygous genotype.
- Single column of letters - the SNPs are coded with their two base-pair values
combined side-by-side in the same column. For example: CG or C/G
- Coding details
- This is where you specify the coding used in your data file. For files with a single digit
for each SNP, you can simply enter the value used for the Homozygous Wild-type, the
Homoszygous Variant and the Heterozygous cases. If you specified that the file used
adjacent rows or columns for the SNPs, you will need to enter two digits for each of
these, and also both possible arrangements of the Heterozygous case, since this can
occur in two ways - 0 1 or 1 0 for example.
- Missing Data
- If you are using multiallelic data, you will be asked to specify your coding for missing
data points. HPlus will treat data points that have this coding, as well as any empty data
points, as missing data for the purpose of analysis.
- Subject ID and Phenotype
- This section asks for the column (or row, depending on your data orientation) in which
your sample ID numbers are listed, as well as your case/control status. If neither of
these are relevant to your data set, you can leave them blank. Note that if you don't
enter case/control information then analysis of your data set will produce only the
haplotype frequencies. If you have no sample IDs, HPlus will assign consecutive ID
numbers to each sample as they are loaded, in order to refer to them later.
- Covariate Columns/Rows
- If your data set has no covariates, leave these boxes blank. Otherwise fill in the
starting column (or row, depending on the orientation of your data) of your covariates.
Also enter the row (or column) in which the titles of the covariates are stored, in
order that they may be identified for you in the interface.
- Marker Grouping
- As with the covariates, you can enter the rows (or columns) in which the marker location
information are held. This will be information such as the gene or chromosome that the
marker resides in, and can be used in HPlus to segment the markers before analysis.
Once all the necessary pieces of information have been provided, you will be able to press
the 'OK' button at the bottom of the window. HPlus will then display a progress bar as it
reads various elements of the data from the file. Once complete, the main window will display
a summary of the markers that were read, along with lists of the covariates and location variables
that can be used in subsequent analysis.
© 2003 Fred Hutchinson Cancer Research Center
Quantitative Genetic Epiedmiology