Skip Menu

Return to Skip Menu

Main Content

Old news, new research: observations from the field

Presented by Debora Cheney, Larry and Ellen Foster Communications Librarian and Head, News and Microforms, Penn State University Library

Transcript

0:15
itinerary on plastic
0:18
well a i work closely with the journalism
0:21
media cities
0:22
advertising
0:23
matrix
0:25
penn state
0:26
i guess over parts of the recent has continued to microphones life
0:31
newspaper my heart
0:34
university libraries
0:35
has learned that a lot of
0:37
liaison librarian
0:40
and i'm robert
0:41
i might be done today
0:43
notice a lot
0:44
differences he and the kinds of reference questions again about it
0:49
what kinds of
0:52
things researchers in the back of the money
0:55
uh... their research projects
0:57
as i've been slow the economy since that was my not trying to learn
1:01
more about
1:02
how users unused used cars
1:07
and also i'm thinking more about how when our role is in our best seller i
1:11
guess that's how i might come to be here
1:14
on twenty two days of the things i want a
1:18
child places caters please contact
1:21
it's not hard to say when a friend that when i say in his comments
1:24
we we no longer
1:26
explain a bit more here and you know this already weakened i want to stop by
1:29
these papers
1:31
we were talking about the going digital
1:34
and i want to respond to those issues
1:36
friendlier than today's research that kind of research that's being done
1:41
densities that will talk about some text mining
1:44
i want to show you several examples very simple of examples of the new research
1:49
methods takes my specifically and how they're being is to prevent these
1:53
constant estimate my focus
1:55
environmental and provide some perspective on twilight grazing
1:59
policy to continue to provide access to yesterday and today stands
2:04
content for
2:06
so i am
2:10
just generally and i a m i ask you to suspend any beliefs or understand
2:16
hanging out of how or whether students are getting a jury streamlined process
2:22
realizes that worked in this for this time
2:25
discussing
2:27
ladder people reading newspapers or news are watching and waiting
2:32
is this is a very much again for a problem
2:34
and question
2:35
from where they were going to be
2:37
trees are being screenings research
2:39
unless you are related i can say that they're not seventy different kinds of
2:44
problems in terms of vibrational
2:46
so i'd like to do it is time to talk about news
2:51
skaters in these times
2:53
and the times and research now
2:54
in which you wouldn't expect for the use of its content
2:57
regardless of other people in in the news in a
3:02
so i don't
3:05
this is a draft that night
3:07
uses because
3:09
it shows um...
3:10
there's a best-selling information consumption and i know that
3:14
enjoyable
3:15
uh... day-to-day basis i would still like to use this coming at us from every
3:19
direction that kind of drowning in newsweek had
3:22
and it's hard to say that they had to tell us about selling are sometimes it's
3:25
coming
3:26
our cell phones and it's coming out computers is community in reading
3:31
homer and a lot of responsibilities everywhere
3:34
i think it seems like it's everywhere but i think the cleveland indians more
3:39
carefully you will see the line is this the same day is over and over
3:43
and then we need to think carefully about what constitutes use condoms
3:47
so
3:48
what i want to talk about is the changes that are
3:51
treaties because of changes that are internet users
3:54
uh... are not going to ultimately affects what kinds of research methods
3:59
reusable economies expresses ass
4:02
so i'll have to do this has always been tried and decided to modify an unusual
4:07
is that i think it's you have different ideas too
4:10
this kind of idea that weather impacting research lab
4:14
so first of all
4:16
this is very amazing oppertunity
4:18
that newspapers are
4:19
it sounds papers in march these content
4:22
and not only is it he's content that is not big news is now
4:26
can be defined quite differently and we are going to department needs it
4:31
region definable augustine's on if you have a journalist in congress has since
4:36
the siege
4:37
the students as we did we do regularly they almost always talking about how
4:41
they're also being asked to required as part of their jobs journalist to plot
4:46
and head and what that means to them as a professional work well
4:49
digestion l_a_ there's the bra said us citizens talent you are not an art
4:54
aristide's people in general response to those plots or or blouses hanging out
4:59
the people at home in time
5:01
so these are the coming aboard and forms of millions honestly think about how
5:06
news is creating so with a face book
5:09
uh... twitter on wall hanging from all different sources
5:13
unroll citizen journalists journalism these days is becoming more and more on
5:18
uh... apartments content as they see it
5:22
so the element
5:24
the other thing we had the same thing to realize is that most of this time
5:27
ten eleven pm proform
5:29
and muscling provides your students and faculty researchers
5:33
it's its news content that every citizen needs for n analysis
5:38
tens of indians in the database aggregator lexisnexis whenever but must
5:42
decide content that we have just described everybody said that format
5:46
and never and doesn't have any natural
5:48
halfway
5:49
which made it available to our students and researchers
5:53
and in addition
5:54
more and more of its content has much more even more so than in the past
5:58
visual content
6:00
so it has many more photos
6:01
video and uh... the advertising industry with all this is just on how wealth
6:08
official content that doesn't come into texas
6:11
elements not impacted us not texting their printed marxism
6:17
well i was going to be more website
6:20
news aggregators
6:21
and the reason that's important is when we come back to the leaders of the talk
6:25
about huffington post as aggregators
6:29
we talk about sources for where information can be fast
6:33
and what that means is that all the resources and not the original creators
6:37
of the news we have to think that would create cities quite differently because
6:41
where do you get my visa where researchers maintain his
6:44
is not necessarily going to be from the original preteens creator
6:51
in addition
6:51
another impact is in effect all this is the paper i wasn't going to go out there
6:55
you go higher and higher passing through
6:58
so destination
7:00
not only our delegate
7:02
to provide natives
7:04
and text you are
7:05
researchers and students but it's also going to affect their ability to do that
7:10
personal income question again is do you have any hard assets like to get to the
7:14
new york times website
7:17
as you know why we can't do that
7:19
and that's how much we can text align himself
7:25
the arabic well i think that i am we need to begin to realize that this is
7:30
very high rate of licensing restrictions are also going to impact but web
7:35
crawlers and robots can go through
7:37
pamela mentioned that they're asking
7:39
uh... uh...
7:41
kanamycin is a sizzling
7:44
combo i think probably website take images if you ask a number of news
7:48
organizations that need to find out so those are going to also face
7:52
that information
7:54
amp i mean we need to realize that on-time databases are indicators are
7:59
increasingly serious azhar archive
8:02
from a_b_c_ rehearsal ely is like president obama's
8:06
reducing investigative pieces that subtraction
8:13
i do love then just talk a little bit about the newspaper and to reinforce the
8:17
concept of the newspaper ads and i don't
8:20
something to study and research
8:22
i wanted to add you basically
8:25
common elements that will go thru rumor actually talking about going digital
8:28
content
8:30
so this is a plane the newspaper has always fascinated people
8:34
people older than past
8:35
newspaper what it does what its role is
8:39
researchers have studied this whole israel independence will see here
8:44
on how it changed
8:45
com people's understanding of political culture of its they're interested in
8:49
typeface they're interested in page layout and how that has changed
8:53
when u_s_a_ today came on people are being asked me to make the difference in
8:57
the look and feel of the newspaper like that
9:00
so the whole is catered
9:02
whole if he has done a study
9:05
in many different ways to be having purchased questions
9:09
newspaper people are also fascinated by it
9:12
what the impact of the newspapers political classes yet economics on stock
9:17
market prices
9:18
so there's also that
9:19
instead
9:20
there's also interested in parts of the newspaper and editorials another don't
9:25
know him he realizes that they're in general and there's been a pulitzer for
9:29
editorial writing
9:30
why is that at that'll be a study not sure at some point
9:34
but uh... used to be in a way that were involved in crimes editorial writing
9:40
of dictionaries are
9:41
obviously in other countries newspapers you might have wedding announcements or
9:46
our wedding invitations are things like that it would be is common here
9:52
divided about content analysis in terms of that's going on at george mason
9:57
and content analysis and for any of his story is a very much uh... an ongoing
10:02
interest to you
10:03
researchers
10:04
have a story spending one million versus another how does that then influence
10:08
public opinion and so on
10:11
research is then all the disciplines i did some quick search union members of
10:15
the disciplinary databases and i have researched and engineering
10:20
art in the humanities and social science discipline
10:24
uh... in medicine
10:25
remains newspapers and all those dissidents of the the newspapers study
10:29
by a lot of different sources
10:31
i haven't mentioned also that
10:33
unconditional itself though
10:38
so let's not talk about what to do sometimes
10:42
uh...
10:44
we need to realize that some of it
10:46
coaches are questions and that is that what we're trying to do our journey
10:50
important digital content
10:53
we do need to realize that their wealth
10:57
attacks
10:58
data
10:59
on the internet reportage italy's content is enormous on some researchers
11:04
have found estimated estimated i guess you'd have to say that
11:08
uh... we announce generating
11:11
continent in one day in and content
11:15
that wasn't even beginning civilizations to test three
11:20
gymnasts
11:21
uh... content
11:22
information on the web
11:25
of which means content is the piece is that
11:28
and he said that we think that is spends all of the com
11:33
social media
11:34
lot in case the things that we're treating ourselves not just a freelancer
11:38
sees these organizations that we would not believe him
11:43
text mining llanview's content seems like an obvious thing when what else
11:46
could we do
11:47
and it's not surprising that
11:49
business week on them
11:51
excuse me time magazines businessweek section this is the cost
11:56
it's nineteen minutes
11:58
so let's see if we can look at some of these things and receive everything
12:03
projects
12:04
that i have abu
12:07
so you're still ahead on this one
12:10
vishal here for you
12:11
uh... how do i emphasize it exciting
12:14
is false
12:15
technical as
12:17
mathematical formulas online statements articles on mathematical formulas
12:22
on this is an interesting project so that they are not the issue interesting
12:28
uh... but there are many other uses it exciting
12:31
products
12:36
was that she created by a against perceive california san diego is
12:41
professor has a number of uh... has class and she's looking at their data
12:46
and its impact on cultured society
12:49
and he has statements on food
12:52
take a minute here
12:54
but she was interested in taking all the front pages now this comes from
12:58
over six thousand five pages of the newspaper
13:02
that are on the library of congress promptly america website
13:05
this is an area where visualizing the front page stevens
13:09
going to hearkening back to that idea where before you have to see layout this
13:13
look at it on microfilm
13:15
you can actually see very quickly
13:17
however
13:18
in that time period from eighteen ninety three to nineteen twelve this changed
13:22
over time
13:23
it becomes more visual on to become a but notice that layout headlines and so
13:28
on the manner
13:29
masthead are saying the same place
13:34
came here haha ready seen how many of you know that needs to have
14:11
uh... hello
14:12
you know that mister
14:13
unquestionable
14:15
well let's not get stopped taking hambazaza
14:19
actually i use is a lot classes because i think it's a great visualization
14:25
now where are you
14:26
heinous instead of taking the front pages that were digitized from the labor
14:30
congress except it's a totally different approach
14:33
and they're taking their data probabilities
14:36
athletic abilities i don't know
14:38
located in the release very often but is arranged by the top stories at the top
14:42
in several of these shows little chaotic very chaotic
14:45
here sometime range physicians writes that the category sports center
14:50
sober victimization of those problems
14:53
uh...
14:54
categories they've had not planned algorithm or applications to it that you
14:59
know it's on how many stories from all the resources that are in
15:03
coolgirl news and evaluate and so that then allows you to see what's the
15:09
biggest story so
15:11
that example in my slideshows yesterday about this time and today ever since
15:16
he's not as a non-story about their problem
15:19
and today's about global markets is not saying you can actually if you want to
15:23
see
15:24
this is taking the whole of the world as a whole
15:27
this is taking a safe you want to look at you can almost always find a uh...
15:32
but also on thursday was the top story and you can
15:35
so you can't allow students to you hang yourself the house stories
15:40
the way n coverage stories that's quite different angle of the stories because
15:44
they created
15:46
mindspace top stories
15:48
that they pull out the governments of the local news at the news aggregators
15:52
being here
15:53
but this uh... time story this upcoming on
15:57
testimony wanted about
16:00
masud
16:21
engine
16:37
that's this one which i'd like a rematch is also maybe some of you see this as
16:43
well
16:44
on instead of very interesting
16:47
yet another student project is
16:49
stanford university
16:52
and this century but make sure you know incorrect because i think it's important
16:56
in so i i said yeah that is focused on the american west and what they're
17:00
trying to do is create visual stories
17:04
uh... that about maidstone
17:06
visually on that which is that going on
17:11
and maybe leave it there using visual approaches in order to try and get more
17:15
attention to
17:16
below the poverty issues related to the west
17:19
this particular project
17:21
was focused on uh... was that joint project with a journalism professor
17:26
visiting scholar
17:28
and at night he lifted his students tom
17:31
developed this time
17:33
do we know that if you're interested
17:37
candidates in the library of congress is currently america website indicate all
17:41
that
17:42
uh... but they don't use the actual newspaper website that newspapers on
17:46
that website pages where another piece of that web site which is the directory
17:50
directory of newspapers
17:52
all published in
17:53
united states which is great
17:55
forests and i'm helping students and faculty undisturbed newspapers
18:00
and so they can take the directory
18:04
and
18:05
cut it is so that you can actually watch the development of the first
18:10
newspaper in seventeen oh six
18:13
she knew that if you jump on here
18:15
how westward
18:17
out of the press
18:19
eighteen twenty-eight
18:22
eighteen eighty six you begin to see a very different picture
18:27
and this is just
18:28
size of the circle of peace
18:31
designs just how many newspapers
18:35
and if you take this further you actually seeing newspapers begins
18:39
it is very quickly resistivity
18:42
grounds
18:43
and receipts
18:44
now that allows students and researchers t_c_i_ dicker idea of how the media
18:51
com affected orange
18:53
lanston is bigger
18:55
uh... probably interesting isn't it a you know there are two kinds of highways
18:59
customers
19:01
consistently races
19:06
so arm
19:10
this is a mess
19:11
hi different
19:14
instead of being in special events princess
19:17
it's more like some of the ones we decided on the air
19:20
and this is taking a protocols for dumb asking that he has a plan is for
19:25
mandarin
19:27
but this time
19:28
researchers it is egypt this a_t_c_ which is a magazine
19:32
ms magazine in staying
19:35
anisha an article that was about political crashes
19:38
political parties
19:40
i think you'd found very interesting and he insure all the words of that
19:44
particular story and take this up programming and him and his operated
19:51
noticed i mean isn't there are several different clusters here
19:54
nobody talks a little bit is this cluster c whichever is related to
19:59
friends
20:00
and relationships and those words that seemed to have
20:03
that interventions off without political corruption
20:06
political parties
20:08
so out
20:09
this argument yes inspected
20:11
of newspapers themselves in this environment nowadays we take house warm
20:17
take the setup
20:18
probes to their stories they would be able to map related blogs
20:22
words being used in these logs
20:25
and allow people to c_n_n_'s love is all about
20:28
house stories really is in its relationships to the following scenario
20:32
informal networks and this would be a more useful way for his papers places us
20:37
at the end
20:37
that he's media
20:40
that's what you might also see
20:42
uh... this is a great story this is uh... mubarek simon
20:46
shutting down hub internet's com
20:50
that seems to get in a row
20:51
uh... testing in our country
20:54
on cost at down to see clearly closed it down
20:58
but each line is that as an individual and what this researchers able to see is
21:03
that there are some people who were actually still able to tweed excuse me i
21:07
think i said blocks
21:08
he had to take stashed a specific aspects of treats the space entry
21:15
and sees how
21:16
you can see where shut down when it opens up very clearly
21:19
you can see half-trillion antibiotic individuals and other countries
21:24
increases as a result of this of this shows you that
21:27
hi discusses the didn't bark really actually shut down in
21:32
so it's a way of taking the train
21:34
and and and we know in some cases that treats
21:37
on borrowed some of our first
21:40
our first news and it's about time
21:43
about the world and so they're just a bunch of one of these to be steady
21:47
by researchers at sloan
21:50
as the traditional forms
21:52
now that's that's what is this one is project i became aware in sept dot dot
21:58
working on its own rob had said it had to say
22:02
and the political science at her
22:05
miss uh... abdul set has been
22:08
researchers muslims of the site
22:10
skyline located teams around the world
22:13
have been working for about twenty min almost twenty years
22:17
to try and develop david said the documents uh... every
22:22
dispute militarized dispute it is taking place don't go back to the beginning of
22:27
time
22:28
now on
22:29
that it's more more complicated and so these are a lot of sources to try to
22:34
document joint document where military dispute started when they indeed and so
22:39
on
22:40
so they had this huge database
22:42
and uh... you know it's foundation state haiti's lexus nexus to keep the database
22:47
at today's constantly
22:49
on searching on lexisnexis minnesota keywords that they've come up with
22:54
that everybody designing your descendants recently because the results
22:58
of those uh... scan searches the david jones contributors campaign had been at
23:04
the lapd levels that they don't find the kind of content that they want
23:09
are said this
23:10
map is based on a piece of the mysterious companies lot
23:15
and that's not a salary was terrorist incidents and there is anything but you
23:20
know
23:20
geographic coordinates for each one of them so that you can actually keep hard
23:24
as you can see
23:26
overtime where the most militarized incidences had taken place
23:31
all over time
23:32
so arm
23:34
there's two points for this one is that you can take geographical data use
23:38
newspapers to find the beginning reporting
23:41
beginning in the hands begins analysis is a triumph politician is glad to see
23:46
these letters
23:48
the difficulty
23:48
independence he databases like places exists effect either
23:52
is changing its own way
23:54
and also change but they are able to fly
23:58
subtilis examples of text mining news content to try to take some interviews
24:04
treats at
24:06
commercial databases of writing different sources
24:10
and prepared this presentation i came across this report that just came down
24:14
in la jolla a couple of weeks ago
24:17
from the u_k_ and it's really talking about it until the titles dot and
24:21
benefits of text mining and that even though it comes out in the u_k_ i
24:25
thought it may seem nice summary up what the issues are relationship testing and
24:30
general and of course we can apply qualities
24:33
tijuana
24:35
using tax money with news content
24:37
the issues are going to be made in these kinds of
24:40
will be openly available
24:41
and one of the reasons i emphasize appeal also kind of in the copyright
24:45
restrictions are going up
24:47
is that more and more news content except
24:50
treats possibly and watch
24:53
dozens of things
24:55
is a warning to remember
24:57
news organizations are increasingly not going to be open access
25:02
they do make the point is to protect
25:04
how im
25:06
tax ninety does fit into the southern communications loaded
25:09
is something that our researchers will be doing
25:12
it didn't seem too scholarly methodology and
25:16
universities and i think i praise him
25:19
by the same kinds of support
25:21
repositories are going to be uh... necessary shoe store retain large
25:26
nasa data
25:27
we heard that just makes me think that we must
25:30
of content
25:31
but where's that going to be store
25:33
with other references to another people other organizations news organizations
25:38
c_e_o_ preservation in the same way that we did
25:40
and maybe we don't want to get somebody needs to be reserving
25:45
without talks back copyright licensing issues situation challenges
25:50
beginning to plan this report that takes my name is very red across disciplinary
25:54
nearly every discipline has some interesting use for text mining
26:00
i would say that the role of the aid they also make the point that is the
26:03
role of the academy in furthering this resurgence
26:08
so some final thoughts that i have
26:10
in the media attention
26:13
um...
26:21
farm subsidies
26:22
enhances the summit spacers in laos
26:25
must researchers would prefer to to go digital content
26:30
online sisters on mon
26:32
i do have some faculty
26:35
very much want to look at it on microfilm that they're doing in
26:37
particular
26:38
kind of research so that's why they're interested
26:41
or the concert
26:42
that they're looking at assad enterprise
26:45
we are safe
26:46
tape uh... create new forms of research based on these
26:50
content with digital content
26:53
but we also see that we see
26:55
continuing patterns of looking at page four looking at content analysis of some
26:59
of these themes
27:01
will continue types of research will continue the minority digital content
27:06
and we see in the event read this again the digital content and text mining at
27:11
whatever will attract crowds disciplinary
27:14
on religious grounds will be interested in this kind of of opportunity
27:19
haha
27:20
i have it
27:20
made history as well as i said but i don't see text mining is i see more and
27:24
more references to international comparisons of international
27:29
or other countries meeting so the abc
27:32
amenities magazine or
27:34
paris is about to see her in the u_s_ embassy i think that digital content
27:39
makes it easier for people to do
27:42
comparison
27:42
zorich study obvious from other countries
27:47
why make the point that
27:49
we really should not
27:52
about this is just on record that we do have to be aware that there is still a
27:56
few slots
27:58
wisconsin
28:00
uh... that is not a titles are available in these type of digital form
28:05
on in united states a model
28:07
a hundred forty thousand newspapers were published not have been published
28:12
an alligator
28:14
digitize uh... chronicling
28:17
website
28:18
so that we really have to tell us a fraction of the total
28:22
historical content
28:25
from nineteen twenty three to nineteen eighty-five that's my mother's i cringe
28:29
when a good question
28:30
time period because i have to say i'm sorry that's you know that's that
28:35
the barnum there's nothing rarely instead
28:38
you know unusual situations electrical if they're open
28:43
for the gate at eighty five but that's about women from major newspapers begin
28:47
lisa online databases and
28:50
begins there
28:51
somos libraries will have some microfilm in the house on commercial on-line
28:55
database access
28:57
and then drop back down again to bring about called two thousand about the
29:01
beginning of this digital
29:03
relatives of these kinds of
29:06
in which some of that on but most of it is not
29:09
any digital form except with them
29:12
so we have to think about black people have researchers and five or six years
29:17
after them
29:19
so argue i just yet
29:22
the u_s_
29:23
writing researchers need anything about digital content
29:27
thingy access to a variety of sources stratum uh...
29:32
you'd never guess what next researchers have which newspaper resource and asking
29:38
for
29:38
so we need a dot blots while
29:41
content from a wide range of countries
29:44
available in digital format
29:46
ongoing development of an old form so we need there are still people cellular
29:52
the newspaper
29:53
uh... hastily
29:54
newspaper
29:55
but there are decreasing repeat students who want to study blocks and want to
29:59
study whether website did they wanna study
30:02
uh... those kinds of things that we always forms
30:06
buck classified functionality in terms of finding and searching and looking at
30:10
the content
30:11
so well-run researcher with the very same newspaper wants to turn the pages
30:16
looking carefully
30:17
and study the content what they're looking for cancer is for
30:22
the next researcher doesn't care anything about that case format wants to
30:26
surgery
30:27
earlier researchers need to be a comment
30:29
whatever
30:30
create we create
30:33
megan is is that the reliability and quality of information
30:38
so that you researchers need to know
30:41
really have hurt their search he and lexisnexis database error on website our
30:45
research engineer is the same content for every year
30:49
two years
30:50
with uh... ms content changing cements lately
30:54
that is becoming increasingly in question
30:56
and they need to know that the quality
30:59
above the reproduction d_l_c_ are even all that that is
31:03
part of what they're searching
31:05
ease of access you all know that researchers
31:08
need to be able to get to things easily and quickly
31:11
compared to other students had to do it
31:13
and many good search agent search engines that allow for a very
31:17
sophisticated style relationships of words
31:21
but also the ability to serve specific cell
31:24
abuse content i wanna searches advertised unless it's just photographs
31:29
the wire services survey
31:31
i want to search classified ads those are very important functionality
31:36
must restrictions
31:38
so ob
31:40
him and i just want to say i hope that as we look at me these contacts
31:44
message's content
31:46
we will simply say we're moving forward but we also have to look
31:50
unfortunately wide range of content and resources
31:54
and not continue to know that our research
31:57
x will need all of its content
31:59
today to our institute

About this File

Length: 32:06
Permalink: Old news, new research: observations from the field