HPlus Help


2.1. Importing a Data File

To import a data file, use the 'Import Data' option in the File menu. The data file must be in a tab-delimited text format (for example, exported from an Excel spreadsheet).

You are first asked to select the file you wish to import. Once that is done you are presented with the import assistant which will guide you through specifying where various data items are within the file in order to be able to import it correctly.

To work through the import assistant, just enter the relevant details in the lower frame and press the 'Forward' button to move onto the next section. You will need to enter the following pieces of information:

File Layout
This specifies whether the markerss are in rows or columns. For example choosing 'Column' here means that a single marker has its values coded into one or two columns. This also means that each sample is on a separate row. Choosing 'Row' here would specify that each marker is on a different row, and the samples are the columns.
Range of Genetic Markers
Here you specify what range of rows and columns contain the actual values for your markers. This should exclude any information on covariates since they are defined later.
Genetic Marker Type
Here you specify whether your markers are biallelic (for example, SNP data) or multiallelic (for example, microsatellite data).
Marker Coding Format
If you are using biallelic data, you are asked to supply the coding method used to signify alleles. There are five possible values:
  1. Adjacent Columns of 0 or 1 - the SNP is coded as two values in consecutive columns. Each value is a present/absent or wild-type/variant indicator for that SNP. E.g. 01 indicates heterozygous.
  2. Single column of 0, 1 or 2 - the SNP is coded as one digit that can have 3 values to represent homozygous wild-type, homozygous variant, or heterozygous
  3. Adjacent rows of 0 or 1 - the SNP is coded as two values in consecutive rows. Each value is a present/absent or wild-type/variant indicator for that SNP
  4. Adjacent columns of letters - the SNPs are represented by their actual base-pair values in two adjacent columns separated by a tab. For example: CG indicates a heterozygous genotype.
  5. Single column of letters - the SNPs are coded with their two base-pair values combined side-by-side in the same column. For example: CG or C/G
Coding details
This is where you specify the coding used in your data file. For files with a single digit for each SNP, you can simply enter the value used for the Homozygous Wild-type, the Homoszygous Variant and the Heterozygous cases. If you specified that the file used adjacent rows or columns for the SNPs, you will need to enter two digits for each of these, and also both possible arrangements of the Heterozygous case, since this can occur in two ways - 0 1 or 1 0 for example.
Missing Data
If you are using multiallelic data, you will be asked to specify your coding for missing data points. HPlus will treat data points that have this coding, as well as any empty data points, as missing data for the purpose of analysis.
Subject ID and Phenotype
This section asks for the column (or row, depending on your data orientation) in which your sample ID numbers are listed, as well as your case/control status. If neither of these are relevant to your data set, you can leave them blank. Note that if you don't enter case/control information then analysis of your data set will produce only the haplotype frequencies. If you have no sample IDs, HPlus will assign consecutive ID numbers to each sample as they are loaded, in order to refer to them later.
Covariate Columns/Rows
If your data set has no covariates, leave these boxes blank. Otherwise fill in the starting column (or row, depending on the orientation of your data) of your covariates. Also enter the row (or column) in which the titles of the covariates are stored, in order that they may be identified for you in the interface.
Marker Grouping
As with the covariates, you can enter the rows (or columns) in which the marker location information are held. This will be information such as the gene or chromosome that the marker resides in, and can be used in HPlus to segment the markers before analysis.

Once all the necessary pieces of information have been provided, you will be able to press the 'OK' button at the bottom of the window. HPlus will then display a progress bar as it reads various elements of the data from the file. Once complete, the main window will display a summary of the markers that were read, along with lists of the covariates and location variables that can be used in subsequent analysis.



The Main Window

Opening a Saved Analysis

© 2003 Fred Hutchinson Cancer Research Center
Quantitative Genetic Epiedmiology