SEER*Stat Rate Exercise 5: Incidence Rates by County Attributes

County attribute data from the US Census such as median income or educational attainment are linked to SEER incidence, U.S. mortality, and population data by state-county FIPS codes.

Create a table showing incidence rates for the SEER 18 registries by high school education quintiles. Assign quintiles of counties so that 20% of the counties are in each group. To create the quintiles use the results from the Case Listing Exercise 2. The quintiles should be based on data from all U.S. counties. Results should be based on data for female malignant cervical cancer cases for the years 2007-2011 from the SEER 18 registries.

Key Points and Reminders

  • This exercise illustrates the use of SEER*Stat to view incidence rates by county attribute variables.
  • Using results from Case Listing Exercise 2, see how to determine the quintile cut points for counties by a county attribute. In Rate Exercise 6 you will produce U.S. mortality rates by the same quintiles. For consistency we will use the same quintile cut-points for both analyses.
  • Create a user-defined variable based on a county attribute variable available in the selected incidence database.

Step 1:  Create a Rate Session

  • Start SEER*Stat.
  • From the File menu select New > Rate Session or use the Rate button on the toolbar.

Step 2:  Select a Database (Data Tab)

  • It is extremely important that you select the database as the first step in order to see the correct list of variables. In this problem, we need to select a incidence database with county attribute data.
  • On the Data Tab select "Incidence - SEER 18 Regs Research Data + Hurricane Katrina Impacted Louisiana Cases, Nov 2013 Sub (2000-2011) <Katrina/Rita Population Adjustment>".
  • Make sure the Age Variable is set to "Age recode with <1 year olds".

Step 3:  Choose the Statistics to Display (Statistic Tab)

  • In the Statistics box, select Rates (Age-Adjusted).
  • In the Parameters box:
    • Make sure that the Standard Population is set to "2000 US Std Population (19 age groups - Census P25-1130)".
    • Make sure the Age Variable is set to "Age recode with <1 year olds".

Step 4:  Defining the Analysis Cohort (Selection Tab)

Specific click-by-click instructions for creating individual selection statements were given in previous tutorials (see Frequency Exercise 1a). Use those techniques to create your selection statement.

Make sure that the Malignant Behavior and the Cases in Research Database options are checked in the Select Only box. The Known Age option is always checked and disabled in rate sessions because all records must have values that are included in the US Population and Standard Population data. Unknown age is not a valid value, so records with unknown ages are excluded from the analysis.

For this problem you should create selection statements based on year of diagnosis, sex, and cancer site.

Make the following selections in the "Race, Sex, Year Dx, Registry, County (Pop, Case Files)" box:

{Race, Sex, Year Dx, Registry, County.Year of diagnosis} = '2007','2008','2009','2010','2011'
AND {Race, Sex, Year Dx, Registry, County.Sex} = ' Female'

Make the following selection in the "Other (Case Files)" box:

{Site and Morphology.Site recode ICD-O-3/WHO 2008} = ' Cervix Uteri'

Step 5:  Calculate Quintiles of Counties

Use the results from Case Listing Exercise 2 as a guide to calculate quintiles based on all US counties in 2011. In the exercise, we created a table showing percentages of less than a high school education by county. View the results of this SEER*Stat matrix file: key.case2.slm.

Since there are 3143 valid counties for 2011 (shown as the 3143 rows in the case listing matrix), to create 20% groupings we will assign 628 counties to two quintiles, and 629 to three (3143/5 = 628 with 3 additional counties).

  • The 1st quintile will include counties 1 - 629 (includes extra county)
  • The 2nd quintile will include counties 630 - 1257
  • The 3rd quintile will include counties 1258 - 1886 (includes extra county)
  • The 4th quintile will include counties 1887 - 2514
  • The 5th quintile will include counties 2515 - 3143 (includes extra county)

To determine quintile cut points for a user-defined variable based what percentage of the county population had less than a high school education, use the case listing results. You will see that this matrix is sorted by percent with less than high school education. The rows are numbered on the left side to use as a guide.

  • The 1st quintile begins at (1) CO: Douglas County - 00304 (3.04%)
    and ends at (629) IL: Marshall County - 01504 (15.04%)
  • The 2nd quintile begins at 1504+1=01505 (15.05%)
    and ends at (1257) CO: Montezuma County (08083) - 01888 (18.88%)
  • The 3rd quintile begins at 01888+1=01889 (18.89%)
    and ends at (1886) AL: Houston County (01069) - 02347 (23.47%)
  • The 4th quintile begins at 2347+1=02348 (23.48%)
    and ends at (2514) MD: Somerset County (24039) - 03046 (30.46%)
  • The 5th quintile begins at 3046+1=03047 (30.47%)
    and ends at (3143) TX: Starr County - 06530 (65.30%)

Step 6:  Create a User-Defined Variable

Now use the information from Step 5 to create the user-defined variable:

  • Return to the rate session you created in step 1.
  • Open the Data Dictionary by clicking dictionary button on the toolbar.
  • Expand the "County Attributes 2000s" folder and highlight % < high school education 2000.
  • Click the Create... button.
  • In the Name field, edit the variable name to read, "Quintiles - % < hs education 2000".
  • Delete the existing groupings in the Groupings box on the left by selecting each grouping and clicking the Delete button.
  • In the box marked Unlabeled Values, enter each quintile's grouping in the Selected textbox as follows:
    • For the 1st quintile, type "00304-01504" in the textbox, and then click Add.
    • The grouping you entered will be added to the groupings box. Change its name to "First Quintile (3.04%-15.04%)".
    • Repeat these instructions for each Quintile with the following information:
      "Second Quintile (15.05%-18.88%)": values 01505-01888
      "Third Quintile: (18.89%-23.47%)": values 01889-02347
      "Fourth Quintile (23.48%-30.46%)": values 02348-03046
      "Fifth Quintile (30.47%-65.30%)": values 03047-06530
    • Click OK to save the user-defined variable.
    • Click Close to exit the data dictionary.

Step 7:  Set Table Variables (Table Tab)

Use the Table Tab to choose variables to include in the output matrix. For this exercise, you want to show the incidence rate for malignant cervical cancers by county quintiles for the percentage of the population with less than a high school diploma.

  • On the Table Tab, the variables are listed in categories in the Available Variables box at the bottom of the screen.
  • Use the "+" to expand the "User-Defined" category.
  • Select the new variable, "Quintiles - % < hs education 2000".
  • Click Row on the right hand side of the screen to add this variable to the row dimension in the list of Display Variables at the top of the window.

Step 8:  Specify a Title (Output Tab)

  • Move to the Output Tab.
  • Enter the following title:
    Age-Adjusted Incidence Rates for Female Cervical Cancer, SEER 18 Registries, 2007-2011
    By Quintiles (Based on Total U.S. Counties) of Percentage of Population (Ages 25+) without a High School Degree or Equivalent
    Rate Exercise 5

Step 9:  Execute SEER*Stat and Save the Matrix

  • At this point, you have made all the necessary selections on the session tabs. Use the Execute button or select Execute from the Session menu to execute the session.
  • A new window will be opened containing the output table or matrix. Results shown in the SEER*Stat matrix window cannot be edited. You can print the matrix, export the results to a text file, and copy-and-paste data into other applications. The Results Matrix section of the help system contains more information about the SEER*Stat matrix and its features.
  • Use the Save As command on the File menu to save the matrix. Enter "Rate Exercise 5" as the filename. SEER*Stat will assign the "sim" extension to indicate that this is a "SEER*Stat Rate Matrix" file.
  • Compare your results to this SEER*Stat matrix file: Rate Exercise 5 Matrix Results.