Logo

Data Science: The Discipline, Methods, and Preparation - Shared screen with speaker view
Rachel Levy
41:46
Please feel free to make comments and ask questions here.
Rachel Levy
45:18
Thanks! It is great to hear more from the data scientists here. Please keep sharing.
Rachel Levy
45:47
And of course everyone is welcome to share and ask questions.
Rachel Levy
45:59
Yes, you are muted.
Rachel Levy
47:22
Can you say more about what you mean by extracted.
Kerry Luse
47:42
Hi Rachel—I’m only seeing your messages. Are there comments coming in on the chat from others?
Rachel Levy
48:02
To people typing - please toggle to share to "All panelists and attendees"
Kerry Luse
48:12
Thank you!
Rachel Levy
48:34
Some people are only sharing with panelists, which is the default. Thank you Kevin H(y) Shu for pointing this out.
Rachel Levy
49:31
From Stephen Meskin to All Panelists: 04:23 PMActuaries have been doing this stuff for over 100 years. It looks like new bottles for old wine.
Rachel Levy
49:46
From Dominic Klyve to All Panelists: 04:24 PMFWIW, TF-IDF looks for words (strings) that are used disproportionately often in a document.
Rachel Levy
50:03
From Mark Otto to All Panelists: 04:27 PMMany of these learning methods mean it is more like a black box. How important is it to know the the results were extracted?
Mark Otto
51:17
With random forest or learning methods you may find certain patterns are found but with learning methods, seems many times it impossible to describe what the pattern is, just that it is important: certain nodes related to certain input but nothing clear like regression.
J.D. Walker
53:13
Okay, this is all about what data scientists do. Is there any part of this that touches on what kind of database support they need from database engineers like myself?
Rachel Levy
54:14
Hi JD Walker -- This talk is very much an introduction to DS. You raise an interesting point. Want to share some thoughts here in the chat about database support?
Rachel Levy
55:55
Here's a question for you: has the role of database engineers changed in the last 10-15 years? and if so, how?
J.D. Walker
57:16
The role of a database engineer has definitely changed.
J.D. Walker
57:40
it used to be that there were dbas and database developers, but that line has blurred, as well as the line between database developers and application developers
J.D. Walker
58:15
Database support has also changed with the rise of NoSQL type databases....
Rachel Levy
58:16
dba = database administrator?
J.D. Walker
58:18
yes
J.D. Walker
58:48
Many times, the dba running a NoSQL database doesn't have being a dba as their primary focus..and they're also helping with application development
Jason Douma
59:27
That's consistent with what we've been hearing from many of our graduates. CS majors go out as "application developers" but find themselves spending a lot of time working with and developing databases.
Rachel Levy
59:45
Thanks - I appreciate you sharing a bit about this significant set of relevant roles.
Rachel Levy
01:01:44
Jason - that's interesting. Do you know what additional expertise students might want/need because they will end up working with databases or people working with databases?
J.D. Walker
01:02:17
Yes, and unfortunately, application developers who turn to developing relational databases wind up being not good at it. Database administrator/developers aren't as good as application development. Both need to understand each others' role, and a little bit about the job as well.
J.D. Walker
01:02:48
Based on the lecture so far, what I see is an application (or usage) of the data. That to me is what data science is......
Rachel Levy
01:03:14
This information is SO IMPORTANT for people considering what data science courses to offer.
Jason Douma
01:03:31
I agree!
J.D. Walker
01:04:53
I too agree. This is vitally important....
KEVIN H(Y) SHU
01:05:52
@J.D. Walker, do you have much knowledge of the experiences of those who have moved from DA / DS positions up the hierarchy of needs into Engineering?
Rachel Levy
01:06:46
Anuradha - let's ask this at the end. Thank you for the question.
J.D. Walker
01:07:07
I do not. My firm is training a lot of data scientists right now. Not a lot of data infrastructure being trained
Rachel Levy
01:07:33
Kaggle competition.
Rachel Levy
01:08:08
(Also a good way to get experience and do a project to talk about in an interview)
Jason Douma
01:08:15
As a non-expert, I am both fascinated and utterly mystified by neural networks. I can see how important they have become for data science and machine learning, but my efforts to learn about them always seem to end with "and then something magical happens in the middle." Is there a good "Neural Networks for Dummies" tutorial I could consult to better understand the basic mechanisms?
Super
01:11:15
@"Jason Douma" Search up 3Blue1Brown's YouTube playlist on it
Jason Douma
01:11:27
Thanks!
bugs bunny
01:11:36
I recommend this book, Neural Network Design (2nd Ed) Hagan, Demuth,Beale, De Jesus
Michael Collins
01:11:41
3Blue1Brown +1
J.D. Walker
01:12:16
The magical thing that happens in the middle is infrastructure and support from both application developers and dbas...
Bill McClung
01:13:00
For undergrad CS degrees, what fraction of courses should be CS, Math, Statistics? Also, what are the expectations of employers with regard to this division?
Bill McClung
01:13:11
Undergrad DS degrees.
J.D. Walker
01:13:15
and it isn't just relational databases, because data that data scientists work with isn't always relational
Aashish Nair
01:13:31
Are there any sources where you can practice data collection?
J.D. Walker
01:14:27
It's about structured vs unstructured data.
Michael Collins
01:14:47
As a practicing data scientist, I spend 80% of a data science project in data prep. Efficient querying of databases is key.
Bill McClung
01:14:54
Also, for the CS component of undergrad DS degrees, should it be "specific": e.g. R or Python's pandas module?
J.D. Walker
01:14:58
it was called Big Data because it was thought it was necessary to have a NoSQL structure to handle a large volume of data.. it isn't, but that's what was thought
KEVIN H(Y) SHU
01:15:21
@J.D. Walker given the multitude of NoSQL databases, how does one A) determine which one(s) to focus on B) find use cases for the databases selected?
J.D. Walker
01:15:28
As a software and database engineer, I would caution against concentrating deeply on any one programming language....they come and go
J.D. Walker
01:15:39
I like wine!
David Ruth
01:16:21
Can you comment on the pros and cons of offering a DS major at the undergraduate level (possibly lots of breadth with less depth) vs. DS tracks in Stats, OR, CS, or other?
David Ruth
01:17:11
My institution is working through this choice currently and would benefit from insight from Prof. Morrell and others.
Jason Douma
01:17:12
I second @David Ruth's question. We are having the same debate at my institution (and possibly in the CUPM).
J.D. Walker
01:17:37
@Kevin, it depends on the structure of the data. Different NoSQL type databases store the data in different architectures
J.D. Walker
01:18:09
I'd say if there's an undergrad course, it take three or four of the most popular NoSQL databases, and study the data architecture they use rather than the database systems themselves
Mark Otto
01:21:18
Seems like data science started with the data and figuring what patterns you can find, and there is a lot of data processing to get to that. Statistics starts with the problem and figures out what the data needed and how to collect. We know a lot more about what generated the data and how useful it is to generalize outside the data. Is this a good distinction?
Emily Cercone
01:22:01
Hi, Rachel said there was a link shared? Was this shared with all? Can it be sent again? Thanks!
J.D. Walker
01:22:21
So basically, I'm hearing that you need to know how to analyze data (statistics), how to access the data (database management system architecture) and how to store the data (data architecture)
Aira Domingo
01:23:04
Is there a resource that allows data scientists to find the latest tools used or news in the field?
KEVIN H(Y) SHU
01:24:55
@Aira Domingo, datasciencecentral.com regularly publishes articles that are both overviews and practical how-to guides
Renzo Castagnino
01:25:55
https://www.datacamp.com/
Renzo Castagnino
01:26:06
this is a great resource to learn data science online
Xi Zhang
01:26:06
Is Clustering can be use to find the correlation between variables so as to be applied for feature reduction?
Cristina Giraldo
01:27:39
Could PCA be used for supervised and unsupervised algorithms?
J.D. Walker
01:27:56
It depends on whether you want a job in the commercial and/or government sector, or you want to work in academia
KEVIN H(Y) SHU
01:28:57
@XZ no, by my understanding, clustering is intended to identify groups; there are methods to visualize the Correlation matrix between the variables used in an analysis
Xi Zhang
01:30:21
Thanks Kevin.
Jason Douma
01:34:01
Thank you, Chris and Rachel! This was a good presentation and an engaging discussion.
Aira Domingo
01:37:40
Thank you so much!
Renzo Castagnino
01:37:49
thanks for the presentation
Cristina Giraldo
01:37:52
Many thanks!
David Ruth
01:37:54
Thanks Chris and Rachel!
Xi Zhang
01:38:02
Thank you!
J.D. Walker
01:38:12
Thank you, Chris and Rachel! Also, thank you for ending before 6
Gaofeng Huang
01:38:13
Thank you