Data Science: The Discipline, Methods, and Preparation - Shared screen with speaker view
Please feel free to make comments and ask questions here.
Thanks! It is great to hear more from the data scientists here. Please keep sharing.
And of course everyone is welcome to share and ask questions.
Yes, you are muted.
Can you say more about what you mean by extracted.
Hi Rachel—I’m only seeing your messages. Are there comments coming in on the chat from others?
To people typing - please toggle to share to "All panelists and attendees"
Some people are only sharing with panelists, which is the default. Thank you Kevin H(y) Shu for pointing this out.
From Stephen Meskin to All Panelists: 04:23 PMActuaries have been doing this stuff for over 100 years. It looks like new bottles for old wine.
From Dominic Klyve to All Panelists: 04:24 PMFWIW, TF-IDF looks for words (strings) that are used disproportionately often in a document.
From Mark Otto to All Panelists: 04:27 PMMany of these learning methods mean it is more like a black box. How important is it to know the the results were extracted?
With random forest or learning methods you may find certain patterns are found but with learning methods, seems many times it impossible to describe what the pattern is, just that it is important: certain nodes related to certain input but nothing clear like regression.
Okay, this is all about what data scientists do. Is there any part of this that touches on what kind of database support they need from database engineers like myself?
Hi JD Walker -- This talk is very much an introduction to DS. You raise an interesting point. Want to share some thoughts here in the chat about database support?
Here's a question for you: has the role of database engineers changed in the last 10-15 years? and if so, how?
The role of a database engineer has definitely changed.
it used to be that there were dbas and database developers, but that line has blurred, as well as the line between database developers and application developers
Database support has also changed with the rise of NoSQL type databases....
dba = database administrator?
Many times, the dba running a NoSQL database doesn't have being a dba as their primary focus..and they're also helping with application development
That's consistent with what we've been hearing from many of our graduates. CS majors go out as "application developers" but find themselves spending a lot of time working with and developing databases.
Thanks - I appreciate you sharing a bit about this significant set of relevant roles.
Jason - that's interesting. Do you know what additional expertise students might want/need because they will end up working with databases or people working with databases?
Yes, and unfortunately, application developers who turn to developing relational databases wind up being not good at it. Database administrator/developers aren't as good as application development. Both need to understand each others' role, and a little bit about the job as well.
Based on the lecture so far, what I see is an application (or usage) of the data. That to me is what data science is......
This information is SO IMPORTANT for people considering what data science courses to offer.
I too agree. This is vitally important....
KEVIN H(Y) SHU
@J.D. Walker, do you have much knowledge of the experiences of those who have moved from DA / DS positions up the hierarchy of needs into Engineering?
Anuradha - let's ask this at the end. Thank you for the question.
I do not. My firm is training a lot of data scientists right now. Not a lot of data infrastructure being trained
(Also a good way to get experience and do a project to talk about in an interview)
As a non-expert, I am both fascinated and utterly mystified by neural networks. I can see how important they have become for data science and machine learning, but my efforts to learn about them always seem to end with "and then something magical happens in the middle." Is there a good "Neural Networks for Dummies" tutorial I could consult to better understand the basic mechanisms?
@"Jason Douma" Search up 3Blue1Brown's YouTube playlist on it
I recommend this book, Neural Network Design (2nd Ed) Hagan, Demuth,Beale, De Jesus
The magical thing that happens in the middle is infrastructure and support from both application developers and dbas...
For undergrad CS degrees, what fraction of courses should be CS, Math, Statistics? Also, what are the expectations of employers with regard to this division?
Undergrad DS degrees.
and it isn't just relational databases, because data that data scientists work with isn't always relational
Are there any sources where you can practice data collection?
It's about structured vs unstructured data.
As a practicing data scientist, I spend 80% of a data science project in data prep. Efficient querying of databases is key.
Also, for the CS component of undergrad DS degrees, should it be "specific": e.g. R or Python's pandas module?
it was called Big Data because it was thought it was necessary to have a NoSQL structure to handle a large volume of data.. it isn't, but that's what was thought
KEVIN H(Y) SHU
@J.D. Walker given the multitude of NoSQL databases, how does one A) determine which one(s) to focus on B) find use cases for the databases selected?
As a software and database engineer, I would caution against concentrating deeply on any one programming language....they come and go
I like wine!
Can you comment on the pros and cons of offering a DS major at the undergraduate level (possibly lots of breadth with less depth) vs. DS tracks in Stats, OR, CS, or other?
My institution is working through this choice currently and would benefit from insight from Prof. Morrell and others.
I second @David Ruth's question. We are having the same debate at my institution (and possibly in the CUPM).
@Kevin, it depends on the structure of the data. Different NoSQL type databases store the data in different architectures
I'd say if there's an undergrad course, it take three or four of the most popular NoSQL databases, and study the data architecture they use rather than the database systems themselves
Seems like data science started with the data and figuring what patterns you can find, and there is a lot of data processing to get to that. Statistics starts with the problem and figures out what the data needed and how to collect. We know a lot more about what generated the data and how useful it is to generalize outside the data. Is this a good distinction?
Hi, Rachel said there was a link shared? Was this shared with all? Can it be sent again? Thanks!
So basically, I'm hearing that you need to know how to analyze data (statistics), how to access the data (database management system architecture) and how to store the data (data architecture)
Is there a resource that allows data scientists to find the latest tools used or news in the field?
KEVIN H(Y) SHU
@Aira Domingo, datasciencecentral.com regularly publishes articles that are both overviews and practical how-to guides
this is a great resource to learn data science online
Is Clustering can be use to find the correlation between variables so as to be applied for feature reduction?
Could PCA be used for supervised and unsupervised algorithms?
It depends on whether you want a job in the commercial and/or government sector, or you want to work in academia
KEVIN H(Y) SHU
@XZ no, by my understanding, clustering is intended to identify groups; there are methods to visualize the Correlation matrix between the variables used in an analysis
Thank you, Chris and Rachel! This was a good presentation and an engaging discussion.
Thank you so much!
thanks for the presentation
Thanks Chris and Rachel!
Thank you, Chris and Rachel! Also, thank you for ending before 6