ScienceDaily: Latest Science News
Breaking science news and articles on global warming, extrasolar planets, stem cells, bird flu, autism, nanotechnology, dinosaurs, evolution -- the latest discoveries in astronomy, anthropology, biology, chemistry, climate and environment, computers, engineering, health and medicine, math, physics, psychology, technology, and more -- from the world's leading universities and research organizations.
Forget the needle; consider the haystack
http://feeds.sciencedaily.com/~r/sciencedaily/~3/PG7lj_nDTyI/131202121546.htm
Dec 2nd 2013, 17:15
Dec. 2, 2013 — Advances in computer storage have created collections of data so huge that researchers often have trouble uncovering critical patterns in connections among individual items, making it difficult for them to realize fully the power of computing as a research tool.
Now, computer scientists at Princeton University have developed a method that offers a solution to this data overload. Using a mathematical method that calculates the likelihood of a pattern repeating throughout a subset of data, the researchers have been able to cut dramatically the time needed to find patterns in large collections of information such as social networks. The tool allows researchers to identify quickly the connections between seemingly disparate groups such as theoretical physicists who study intermolecular forces and astrophysicists researching black holes.
"The data we are interested in are graphs of networks like friends on Facebook or lists of academic citations," said David Blei, an associate professor of computer science and co-author on the research, which was published Sept. 3 in the Proceedings of the National Academy of Science. "These are vast data sets and we want to apply sophisticated statistical models to them in order to understand various patterns."
Finding patterns in the connections among points of data can be critical for many applications. For example, checking citations to scientific papers can provide insights to the development of new fields of study or show overlap between different academic disciplines. Links between patents can map out groups that indicate new technological developments. And analysis of social networks can provide information about communities and allow predictions of future interests.
"The goal is to detect overlapping communities," Blei said. "The problem is that these data collections have gotten so big that the algorithms cannot solve the problem in a reasonable amount of time."
Currently, Blei said, many algorithms uncover hidden patterns by analyzing potential interactions between every pair of nodes (either connected or unconnected) in the entire data set; that becomes impractical for large amounts of data such as the collected citations of the U.S. Patent Office. Many are also limited to sorting data into single groups.
"In most cases, nodes belong to multiple groups," said Prem Gopalan, a doctoral student in Blei's research group and lead author of the paper. "We want to be able to reflect that."
The research was supported by the Office of Naval Research, the National Science Foundation and the Alfred. P. Sloan Foundation.
In very basic terms, the researchers approached the problem by dividing the analysis into two broad tasks. In one, they created an algorithm that quickly analyzes a subset of a large database. The algorithm calculates the likelihood that nodes belong to various groups in the database. In the second broad task, the researchers created an adjustable matrix that accepts the analysis of the subset and assigns "weights" to each data point reflecting the likelihood that it belongs to different groups.
Blei and Gopalan designed the sampling algorithm to refine its accuracy as it samples more subsets. At the same time, the continual input from the sampling to the weighted matrix refines the accuracy of the overall analysis.
The math behind the work is complex. Essentially, the researchers used a technique called stochastic optimization, which is a method to determine a central pattern from a group of data that seem chaotic or, as mathematicians call it, "noisy." Blei likens it to finding your way from New York to Los Angeles by stopping random people and asking for directions -- if you ask enough people, you will eventually find your way. The key is to know what question to ask and how to interpret the answers.
"With noisy measurements, you can still make good progress by doing it many times as long as the average gives you the correct result," he said.
In their PNAS article, the researchers describe how they used their method to discover patterns in the connections between patents. Using public data from the U.S. National Bureau of Economic Research, Gopalan and Blei analyzed connections to the 1976 patent "Process for producing porous products."
The patent, filed by Robert W. Gore (who several years earlier discovered the process that led to the creation of the waterproof fabric Gore-Tex), described a method for producing porous material from tetrafluoroethylene polymers. The researchers analyzed a data collection of 3.7 million nodes and found that connections between Gore's 1976 filing and other patents formed 39 distinct communities in the database.
The patent "has influenced the design of many everyday materials such as waterproof laminate, adhesives, printed circuit boards, insulated conductors, dental floss and strings of musical instruments," the researchers wrote.
In the past, researchers struggled to find nuggets of critical information in data. The new challenge is not finding the needle in the data haystack, but finding the hidden patterns in the hay.
"Take the data from the world, from what you observe, and then untangle it," Blei said. "What generated it? What are the hidden structures?"
This entry passed through the Full-Text RSS service — if this is your content and you're reading it on someone else's site, please read the FAQ at fivefilters.org/content-only/faq.php#publishers.
You are receiving this email because you subscribed to this feed at https://blogtrottr.com
If you no longer wish to receive these emails, you can unsubscribe here:
https://blogtrottr.com/unsubscribe/cz0/tSbHWJ
订阅:
博文评论 (Atom)
博客归档
-
▼
2013
(16909)
-
▼
十二月
(1320)
- 科技要闻-新浪科技: 黑客组织让iOS 3.1.3设备变身iOS 7风格
- 科技要闻-新浪科技: 传索尼拟于2014年推Windows Phone手机
- 科技要闻-新浪科技: 势不可挡 iOS 7霸占近8成iOS设备
- ScienceDaily: Latest Science News: Vitamin E may d...
- ScienceDaily: Latest Science News: New studies giv...
- 科技要闻-新浪科技: 还是799元:红米电信版再曝光
- 科技要闻-新浪科技: 不叫"诺基亚"了 Lumia630/635细节曝光
- 科技要闻-新浪科技: 新果冻来袭 诺基亚Asha500/503全新上市
- 科技要闻-新浪科技: 陈光标拟筹10亿美元收购纽约时报公司股份
- 网易科技频道IT业界新闻: 苹果电子书价格操纵案续:与监督员激战正酣
- ScienceDaily: Latest Science News: Cloudy weather ...
- 科技要闻-新浪科技: 报告称百度和阿里巴巴主导移动地图市场
- 科技要闻-新浪科技: 创维首发酷开4K电视 强调“工业精神”不会输
- Solidot: 在木星上炸薯条
- Solidot: 三星宣布第一款LPDDR4内存条
- ScienceDaily: Latest Science News: Testosterone in...
- ScienceDaily: Latest Science News: Toward molecula...
- ScienceDaily: Latest Science News: Most clinical s...
- Solidot: 黑莓CEO程守宗的未来计划
- Solidot: 为什么年轻人喜欢冒险?
- 科技要闻-新浪科技: 4399陈金坤遭解聘 圈内人称当事人早已离职
- Solidot: 高科技小偷用U盘为ATM安装恶意代码
- Solidot: 读一本小说能在数天内增强大脑功能
- 科技要闻-新浪科技: 爱可视拟推出100美元以下智能手表
- Solidot: 即将被解雇的员工因为YouTube视频遭停职
- Solidot: Valve首次从玩家游戏库移除购买的游戏
- 网易科技频道IT业界新闻: Intel压赌平板机:99美元安卓
- 网易科技频道IT业界新闻: 消息称索尼将再次裁员 强化消费电子业务
- Solidot: Debian争论选择systemd还是upstart
- 科技要闻-新浪科技: LG G3将配指纹识别技术:配合可穿戴设备
- 科技要闻-新浪科技: WatchGuard:2014年全球安全趋势八大预测
- 科技要闻-新浪科技: 浪潮MDC将为电信提供上万台服务器
- 科技要闻-新浪科技: 英特尔处理器+高清屏 中兴geek售999元
- 科技要闻-新浪科技: 易观:网易等高收益率理财难获持续繁荣
- Solidot: 日本黑帮雇用无家可归者清理福岛
- Solidot: 研究称汽车尾气对北京雾霾贡献不大
- Solidot: 私人火星任务Mars One筛选出第二轮候选人
- 网易科技频道IT业界新闻: 广州游客欧洲丢iPad 老外自费寄至中国被征关税
- 网易科技频道IT业界新闻: 惠普宣布裁员人数增加5000人
- 科技要闻-新浪科技: 绝非雾里看花 2014年十大显示技术趋势
- 科技要闻-新浪科技: 让我们一起期待吧 微软2014年大动作猜想
- 科技要闻-新浪科技: 海信推出VIDAA BOX 发力“盒战争”
- 网易科技频道IT业界新闻: 惠普被控贿赂俄罗斯政府机关
- 科技要闻-新浪科技: 徕卡军事硬派藏品 D-Lux6限量版6999元
- 科技要闻-新浪科技: 高像素智能相机 三星GC110仅1769元
- 科技要闻-新浪科技: 3640万像素微单 索尼A7R单机售11499元
- 科技要闻-新浪科技: 经典实力继承者 松下LX7仅售1850元
- 科技要闻-新浪科技: 搭载24-120mm镜头 尼康D610套机17380
- 科技要闻-新浪科技: “被革命”与“自我革命”家居卖场陷电商难题
- 科技要闻-新浪科技: 后PC时代传统卖场显老态 电脑城转型陷两难
- 科技要闻-新浪科技: 团购老大首度实现全年盈利
- 科技要闻-新浪科技: HTC“内鬼门”涉案高管被公诉
- 科技要闻-新浪科技: 快讯:大摩报告5.2%持股 网秦股价大涨13%
- 网易科技频道IT业界新闻: iBeacon:苹果变革零售业的新武器
- 科技要闻-新浪科技: 日本因比特币软件感染病毒电脑数为全球之最
- 科技要闻-新浪科技: 报告称阿里巴巴主导中国移动电子商务市场
- Solidot: X.Org Server 1.15发布,未合并XWayland
- Solidot: 中科红旗员工前往中科院讨薪
- Solidot: 中国北斗导航系统计划到2020年实现厘米级精度
- Solidot: 新西兰学生父母成功迫使学校移除Wi-Fi
- Solidot: 中国父母开始计算留学成本
- Solidot: 英特尔发布5000页的Haswell文档
- Solidot: 月球上的雕塑
- 科技要闻-新浪科技: 智能电视或面临病毒感染风险 暂不列于三包
- Solidot: 无线电标记的鲨鱼发推警告游泳者
- Solidot: NSA利用Windows崩溃报告“被动访问”目标计算机
- 网易科技频道IT业界新闻: LG或在CES推出WebOS智能电视
- 科技要闻-新浪科技: 移动安全之角逐,无人可赖以苟安
- 科技要闻-新浪科技: LG将推新款智能电视:搭载webOS系统
- 科技要闻-新浪科技: 德媒称美国安局给网购电子产品安装间谍软件
- 科技要闻-新浪科技: 海信全新电视盒VIDAA BOX曝光 外形酷似鹅卵石
- 科技要闻-新浪科技: 锐捷网络发布PowerGet云业务加速产品
- Solidot: 编辑因发错误的“命案”微博被拘留三天
- Solidot: 研究称美国18%的吸毒者从丝绸之路购买毒品
- Solidot: MicroSD卡被发现存在安全隐患
- 科技要闻-新浪科技: 虚拟运营商必须完成的十项修炼
- 科技要闻-新浪科技: 魅族调整应用商店分成:开发者将获全部收益
- 科技要闻-新浪科技: 刘兴亮:2013年中国互联网十大并购
- Solidot: 2013年是Chromebook爆发的一年
- Solidot: 内核D-Bus实现取得突破
- Solidot: 破冰船前往营救被困在南极的科考船
- 科技要闻-新浪科技: 微淘联合商家谋O2O转型:会员统一是首要难题
- 科技要闻-新浪科技: 纤薄轻巧商务 索尼ICD-TX50报价890元
- 科技要闻-新浪科技: 超长录音续航 联想B680现价仅329元
- Solidot: 移动互联冲击中国网吧
- Solidot: 人脑为什么发达?
- 网易科技频道IT业界新闻: 苹果2013“购物清单”:地图、半导体、数据
- 科技要闻-新浪科技: 男子谎称拍照多次盗窃摄影师器材
- 科技要闻-新浪科技: 净水器效果比拼:微滤 超滤 反渗透谁更强?
- 科技要闻-新浪科技: Lomography Petzval复古手动头实物解析
- 网易科技频道IT业界新闻: 蛰伏与蜕变:智能可穿戴设备这一年
- 网易科技频道IT业界新闻: 苹果薪酬曝光:库克425万美元 CFO 263万美元
- ScienceDaily: Latest Science News: Surgery vs. non...
- ScienceDaily: Latest Science News: Researchers hav...
- 科技要闻-新浪科技: 百度阿里等搅局电视行业 4K电视迎降价潮
- 科技要闻-新浪科技: 『摄影教程』舞台摄影拍摄技巧
- 科技要闻-新浪科技: 千万像素旋转镜头 OPPO N1售3498元
- 科技要闻-新浪科技: 2013平板电脑回顾 群雄割据的市场格局
- 科技要闻-新浪科技: 人人都想上头条 主观回望2013年PC行业
- 科技要闻-新浪科技: 比官网便宜500 行货iPhone 5s售4750元
-
▼
十二月
(1320)
没有评论:
发表评论