How much of a problem is school bullying in NYC? The answer depends on who you ask. Data Clinic volunteers compared local surveys (where many students say bullying is happening) with federal data (where a majority of schools report zero incidents), to analyze these disparities for the 2013-14 school year. To present this work, the Data Clinic hosted an event as part of NYC’s Open Data Week, featuring a presentation of the analysis and a panel discussion with researchers, advocates, and journalists to better understand this important student safety issue.
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
The State of Open Data on School Bullying
1. www.twosigma.com
The state of open data on school
bullying and harassment in NYC
November 28, 2018
Two Sigma Data Clinic
NYC Open Data Week
March 6th, 2018
4. Open data on school bullying and harassment in NYC
Navigating the landscape
4
CRDC
CCD
ED
ATS
OCR
DASA
NYC
DOE
VADIR
NCES
DBN
NYS
ED
BEDS
School
Survey
November 28, 2018
5. Open data on school bullying and harassment in NYC
Federal
5
CRDC
ED
OCR
• U.S. Department of Education
(ED)
• Office for Civil Rights (OCR)
• Civil Rights Data
Collection (CRDC)
95,507 schools nationwide
November 28, 2018
6. Open data on school bullying and harassment in NYC
Federal
6
Civil Rights Data Collection (CRDC)
2013-14 school year
Allegations of Harassment or Bullying
Students Disciplined for Harassment or Bullying
Students Reported as Harassed or Bullied
Race-, Sex-,
Disability-
based
November 28, 2018
7. Open data on school bullying and harassment in NYC
Local
7
NYC
DOE
• New York City
Department of Education
(NYCDOE)
• NYC School Survey
1,784 schools citywide
School
Survey
November 28, 2018
8. Open data on school bullying and harassment in NYC
Local
November 28, 2018
At my school students
harass or bully other
students.
At my school students
harass or bully each other
based on differences …
Students
None of
the time
All of
the time
Parents
At my child's school students
harass or bully other
students.
At my child's school students
harass or bully each other
based on differences …
Teachers
At my school, students are
often harassed or bullied in
school.
At my school, there are
conflicts based on
differences …
Strongly
Disagree
Strongly
Agree
Strongly
Disagree
Strongly
Agree+ Don’t
Know
NYC School Survey
2013-14 school year
8
10. Bringing federal and local together
November 28, 2018
DBN
NYC
DOE
School
Survey
Schools in the NYC School
Survey are identified by a
District Borough Number
(DBN) …
10
11. Bringing federal and local together
November 28, 2018
CRDC
ED
OCR
11
… while schools in the Office for Civil
Rights’ Civil Rights Data Collection are
identified by a 12-digit numeric ID
called the “combokey”
12. Bringing federal and local together: what we didn’t do
November 28, 2018
NYC School Survey
Office for Civil Rights
12
13. Bringing federal and local together: what we did
November 28, 2018
NYC
DOE
DBN BEDS
NYC School Survey
Office for Civil Rights
NYC Open Data Portal: School Locations
ATS
✔
✘
13
15. Bringing federal and local together: what we did
November 28, 2018
Local (DBN) State (BEDS) Federal (NCESSCH) Federal (COMBOKEY)
one school, many “unique” school IDs:
15
17. Open data on school bullying and harassment in NYC
Federal + Local
17
CRDC
CCD
ED
ATS
OCR
NYC
DOE
NCES
DBN
School
Survey
DASA
VADIRNYS
ED
BEDS
November 28, 2018
18. Open data on school bullying and harassment in NYC
Local
18
CRDC
CCD
ED
OCR
NCES
DASA
VADIRNYS
ED
ATS
NYC
DOE
DBN
School
Survey
BEDS
November 28, 2018
19. based on differences
NYC School Survey
2013-14 school year
November 28, 2018
At my school students
harass or bully other
students.
At my school students
harass or bully each other
based on differences …
Students Parents
At my child's school students
harass or bully other
students.
At my child's school students
harass or bully each other
based on differences …
Teachers
At my school, students are
often harassed or bullied in
school.
At my school, there are
conflicts based on
differences …
19
bullying/harassment
20. NYC School Survey
2013-14 school year
November 28, 2018
Students Parents Teachers
bullying/harassment
bullying/harassment
based on differences
None of
the time
All of
the time
Strongly
Disagree
Strongly
Agree
Strongly
Disagree
Strongly
Agree+ Don’t
Know
20
bullying/harassment
bullying/harassment
based on differences
bullying/harassment
bullying/harassment
based on differences
21. 21November 28, 2018
Strongly Disagree/
None of the time
Strongly Agree/
All of the time
35
Students bully/harass
each other
Students bully/harass
each other based on
differences
45 15 5
35 43 12 10
33 30 21 16
51 35 10 3
46 34 10 9
36 30 19 16
Parents’ “don’t know” responses not shown
23. NYC School Survey
2013-14 school year
November 28, 2018
Students Parents Teachers
bullying/harassment
bullying/harassment
based on differences
None of
the time
All of
the time
Strongly
Disagree
Strongly
Agree
Strongly
Disagree
Strongly
Agree+ Don’t
Know
23
bullying/harassment
bullying/harassment
based on differences
bullying/harassment
bullying/harassment
based on differences
✘1 4 1 4 1 4
24. NYC School Survey
Creating a common scale for comparing schools to each other
November 28, 2018 24
Average school “score” =
1.73
1 4321 432
43 46 6 5
25. NYC School Survey
Creating a common scale for comparing schools to the citywide average
November 28, 2018 25
1 2
Given the average school “score”
for each school…
Calculate the citywide school
“score”
Compare each school’s “score” to
the citywide average
We do this for both
bullying/harassment questions
- Parents
- Students
- Teachers
1 432
average school
“score”
1.73
average NYC
“score”
1.96
school “z-
score”
-0.72
29. Teachers and students appear to disagree
in the largest and smallest schools
29November 28, 2018
School enrollmentLower Higher
Teachers feel
more strongly
Students feel
more strongly
Feelings about the prevalence
of bullying/harassment
no
difference
30. Open data on school bullying and harassment in NYC
Federal
30
CRDC
ED
CCD
NCES
OCR
ATS
NYC
DOE
DBN
School
Survey
DASA
VADIRNYS
ED
BEDS
November 28, 2018
31. Open data on school bullying and harassment in NYC
Federal
31
Civil Rights Data Collection (CRDC)
2013-14 school year
Allegations of Harassment or Bullying
Students Disciplined for Harassment or Bullying
Students Reported as Harassed or Bullied
Race-, Sex-,
Disability-
based
November 28, 2018
32. Civil Rights Data Collection
Office for Civil Rights
2013-14 school year
November 28, 2018
• Sex-based bullying/harassment is the most
commonly alleged category, disability-based
is the least
• 75% of NYC schools report zero allegations
(of any type) of bullying/harassment to the
federal OCR. Nationally, this number is
80%.
32
one
zero
multiple
# of schools reporting ________
allegations of bulling/harassment
1314
268
158
0
at least 1
33. Characteristics of schools with …
33November 28, 2018
0 allegations
Total Enrollment 576
?
1+ allegations
Students with
disabilities (%)
English Language
Learners (%)
713
18 19
14 12
Additional characteristics not shown as there were no significant differences. *z-scores are shown for the
bullying/harassment due to differences question (results for responses the general question were similar)
Parent responses
z-scores*
-0.12 0.37
Student responses
z-scores*
-0.15 0.27
Teacher responses
z-scores*
-0.15 0.45
1314 schools (75%)
reporting 0 allegations
426 schools reporting
1+ allegations
34. each line is a school
How much do the two surveys (dis)agree?
34November 28, 2018
?
% of students who say bullying/
harassment happens all of the time in …
• Schools that reported at least 1 allegation
• Schools that reported 0 allegations
35. agreement
disagreement
Zones of (dis)agreement
35November 28, 2018
?
high perception of bullying/
harassment based on differences
low perception
at least one
allegation
zero
allegations
axis represents square-root of allegations
36. Which factors are associated with survey disagreement?
36November 28, 2018
?High Schools
vs. Elementary Schools
Parents Students Teachers
High Schools
vs. Middle Schools
High Schools
vs. Mixed-Grade Schools
General Education Schools
vs. Career/Technical/etc. Schools
Regular Schools
vs. Charter Schools
Total Enrollment
Female (%)
Black (%)
Hispanic (%)
Students with Disabilities (%)
English Language Learners (%)
Free Lunch (%)
High Schools
vs. Elementary Schools
High Schools
vs. Middle Schools
High Schools
vs. Mixed-Grade Schools
General Education Schools
vs. Career/Technical/etc. Schools
Regular Schools
vs. Charter Schools
Total Enrollment
Female (%)
Black (%)
Hispanic (%)
Students with Disabilities (%)
English Language Learners (%)
Free Lunch (%)
High Schools
vs. Elementary Schools
High Schools
vs. Middle Schools
High Schools
vs. Mixed-Grade Schools
General Education Schools
vs. Career/Technical/etc. Schools
Regular Schools
vs. Charter Schools
Total Enrollment
Female (%)
Black (%)
Hispanic (%)
Students with Disabilities (%)
English Language Learners (%)
Free Lunch (%)
not significantly associated
with OCR-NYC School
Survey (dis)agreement
associated with OCR-NYC
School Survey disagreement
37. We learned that …
November 28, 2018 37
• There are several types of school IDs, at the local-, state-,
and federal- levels:
→ DBN = “District Borough Number”
→ ATS = “Automate the Schools”
→ BEDS = “Basic Education Data System”
→ NCESSCH = “National Center of Education Statistics School ID”
• Teachers’ responses are less correlated with students’ and
parents’. This varies by school enrollment
• In aggregate, perceptions of school bullying/harassment
are related to the federal reporting of bullying/harassment
• But perceptions can vary dramatically, even in schools that
report zero allegations—which is the vast majority of
schools
• Bigger schools tend to have more disagreement between
local and federal school surveys
38. 38November 28, 2018
DataClinicInfo@twosigma.com
Rachael Weiss Riley & Christine Zhang
Dave Santin
Tiffany Chang
Chris Mulligan
special thanks to:
Ben Wellington (Two Sigma), Erin Stein (Overdeck Family Foundation),
Matt Chingos & Erica Blom (Urban Institute), Adrienne Schmoeker (NYC
MODA), Nick Chmura & Two Sigma Q4 2017 Hack Day participants
39. 39November 28, 2018
Julia Bloom-Weltman
Director of Research
Applied Engineering Management
Meghan McCormick
Research Associate
MDRC
Johanna Miller
Advocacy Director
New York Civil Liberties Union
Jasmine Soltani
Data Manager
Research Alliance for NYC Schools
Amy Zimmer
Journalist, Content Strategist
Localize.city
Notes de l'éditeur
placeholder
The Two Sigma Data Clinic harnesses the power of data and technology to help nonprofits have greater impact on the communities they serve. Our team of data scientists and engineers donate time and analytic expertise to help nonprofits understand and use their own data more effectively.
Section I: intro to the open data landscape, challenges we faced in navigating it, etc.
These may seem like alphabet soup/blobs of text/unconnected nodes, but by the end of this talk you’ll see what they mean
https://surveys.nces.ed.gov/crdc/UserAccount/Login?ReturnUrl=%2Fcrdc%2F
The purpose of the U.S. Department of Education (ED) Civil Rights Data Collection (CRDC) is to obtain data related to the nation's public school districts and elementary and secondary schools’ obligation to provide equal educational opportunity. To fulfill this goal, the CRDC collects a variety of information, including student enrollment and educational programs and services data that are disaggregated by race/ethnicity, sex, limited English proficiency, and disability. The CRDC is a longstanding and important aspect of ED’s Office for Civil Rights overall strategy for administering and enforcing the civil rights statutes for which it is responsible. This information is also used by other ED offices as well as policymakers and researchers outside of ED.
The CRDC is a mandatory data collection, authorized under the statutes and regulations implementing Title VI of the Civil Rights Act of 1964, Title IX of the Education Amendments of 1972, Section 504 of the Rehabilitation Act of 1973, and under the Department of Education Organization Act (20 U.S.C. § 3413). The regulations implementing these provisions can be found at 34 CFR 100.6(b); 34 CFR 106.71; and 34 CFR 104.61.
Students in grades 6-12 answer the survey
“differences (such as race, color, ethnicity, national origin, citizenship/immigration status, religion, gender, gender identity, gender expression, sexual orientation, disability or weight).”
Code modified from https://rosettacode.org/wiki/Levenshtein_distance#Python
Other “PS 290” schools in NYC: PS 290 Manhattan New School, PS 290 Juan Morel Campus
Our focus in this presentation.
Begin Section II: some descriptive stats, methodology
Students in grades 6-12 answer the survey
“differences (such as race, color, ethnicity, national origin, citizenship/immigration status, religion, gender, gender identity, gender expression, sexual orientation, disability or weight).”
Students in grades 6-12 answer the survey
“differences (such as race, color, ethnicity, national origin, citizenship/immigration status, religion, gender, gender identity, gender expression, sexual orientation, disability or weight).”
In general there is a fairly high, positive correlation between all the questions
(darker = more)
A variable is correlated with itself
Students more correlated with students, etc. (blue boxes)
Within the surveyed population the two questions are most correlated – e.g., students (dark blue box)
And between students and parents we see some pretty strong correlations (box to left of dark blue box)
But correlations are lower between students and teachers as well as between parents and teachers (bottom boxes)
Students in grades 6-12 answer the survey
“differences (such as race, color, ethnicity, national origin, citizenship/immigration status, religion, gender, gender identity, gender expression, sexual orientation, disability or weight).”
Example is Bronx High School of Science.
We do this for students, teachers, parents for each question
Example is Bronx High School of Science.
Standardized comparison
We do this for students, teachers, parents for each question
If there is perfect agreement, it would be a horizontal line
Scatter notebook
https://surveys.nces.ed.gov/crdc/UserAccount/Login?ReturnUrl=%2Fcrdc%2F
The purpose of the U.S. Department of Education (ED) Civil Rights Data Collection (CRDC) is to obtain data related to the nation's public school districts and elementary and secondary schools’ obligation to provide equal educational opportunity. To fulfill this goal, the CRDC collects a variety of information, including student enrollment and educational programs and services data that are disaggregated by race/ethnicity, sex, limited English proficiency, and disability. The CRDC is a longstanding and important aspect of ED’s Office for Civil Rights overall strategy for administering and enforcing the civil rights statutes for which it is responsible. This information is also used by other ED offices as well as policymakers and researchers outside of ED.
The CRDC is a mandatory data collection, authorized under the statutes and regulations implementing Title VI of the Civil Rights Act of 1964, Title IX of the Education Amendments of 1972, Section 504 of the Rehabilitation Act of 1973, and under the Department of Education Organization Act (20 U.S.C. § 3413). The regulations implementing these provisions can be found at 34 CFR 100.6(b); 34 CFR 106.71; and 34 CFR 104.61.
The top 10 schools form ~30% of the total sex-based allegations
Binary classification
Ocr descriptives notebook
Also descriptive stuff notebook
Scatter notebook
Pull out the 2 examples of outliers
For example
- 75% of schools have 0 allegations; 25% have 1 or more allegations
- So, we take a look at the (bottom) 75th & (top) 25th percentile of scores for parents/students/teachers
- For instance, students on the harass/bully based on differences question
What is the threshold/cutoff z-score of the bottom 75% of schools? 0.70614: 75% of schools are below this #, and 25% are above.
Then, we look at each school’s z-score. If it’s lower than 0.7061 we say it’s 0 (“low” bullying/harassment) and if it’s higher than 0.7061 we say it’s 1 (“high” bullying/harassment)
When allegations are 0 and the school is 0 (low-low) then this is agreement
Same for when allegations are 1+ and the school is 1 (high-high)
Otherwise, it’s disagreement
Scatter notebook
First show a list of all variables
Then show everything (parents, etc.)
Then show grey color change + arrows
Highlight enrollment
Among all three logistic models … enrollment matters