Singularity | Artificial Intelligence (ai), Technology & Futurology

96 readers
1 users here now

About:

This sublemmy is a place for sharing news and discussions about artificial intelligence, core developments of humanity's technology and societal changes that come with them. Basically futurology sublemmy centered around ai but not limited to ai only.

Rules:
  1. Posts that don't follow the rules and don't comply with them after being pointed out that they break the rules will be deleted no matter how much engagement they got and then reposted by me in a way that follows the rules. I'm going to wait for max 2 days for the poster to comply with the rules before I decide to do this.
  2. No Low-quality/Wildly Speculative Posts.
  3. Keep posts on topic.
  4. Don't make posts with link/s to paywalled articles as their main focus.
  5. No posts linking to reddit posts.
  6. Memes are fine as long they are quality or/and can lead to serious on topic discussions. If we end up having too much memes we will do meme specific singularity sublemmy.
  7. Titles must include information on how old the source is in this format dd.mm.yyyy (ex. 24.06.2023).
  8. Please be respectful to each other.
  9. No summaries made by LLMs. I would like to keep quality of comments as high as possible.
  10. (Rule implemented 30.06.2023) Don't make posts with link/s to tweets as their main focus. Melon decided that the content on the platform is going to be locked behind login requirement and I'm not going to force everyone to make a twitter account just so they can see some news.
  11. No ai generated images/videos unless their role is to represent new advancements in generative technology which are not older that 1 month.
  12. If the title of the post isn't an original title of the article or paper then the first thing in the body of the post should be an original title written in this format "Original title: {title here}".
  13. Please be respectful to each other.

Related sublemmies:

[email protected] (Our community focuses on programming-oriented, hype-free discussion of Artificial Intelligence (AI) topics. We aim to curate content that truly contributes to the understanding and practical application of AI, making it, as the name suggests, “actually useful” for developers and enthusiasts alike.)

Note:

My posts on this sub are currently VERY reliant on getting info from r/singularity and other subreddits on reddit. I'm planning to at some point make a list of sites that write/aggregate news that this subreddit is about so we could get news faster and not rely on reddit as much. If you know any good sites please dm me.

founded 1 year ago
MODERATORS
1
 
 

Some days ago I made a rule 9 which goes:

(Rule implemented 30.06.2023) Don’t make posts with link/s to tweets as their main focus. Melon decided that the content on the platform is going to be locked behind login requirement and I’m not going to force everyone to make a twitter account just so they can see some news.

From what I see you can view tweets now but you can't see comments for them etc.

Should I revert the ban or should I do the same like I did with reddit in rule 4?:

No posts linking to reddit posts.

Personally I want to keep the ban because the platform doesn't deserve traffic and is very unstable right now where we can't trust that they won't block us from viewing the tweets again. What do you think? I want to hear your opinions on the topic. I will do my final decision tomorrow.

2
 
 

If you are lurker please help us grow the community by commenting and posting interesting on topic quality info/discussions so we can attract more people and make this community more interesting to spend time on. ✌️

Previous milestone: https://lemmy.fmhy.ml/post/578477

3
 
 

Note: I still have some posts to go through to decide if I should feature them or not but the process is taking longer than expected so I'm going to take a short break for now. I planned to pin this post once I'm done but IMO there's already enough posts on the list to pin this on sublemmy.

Note2: So not only I started working on the list 2 days late but I also won't finish it the same day I started. Moderating is exhausting.

Ai related

Ai safety/ethical concerns:
  • Post - The study of morality for self-driving cars, using the trolly problem. (article from 24.10.2018)
  • Post - DOD Committed to Ethical Use of Artificial Intelligence (article from 15.06.2023)
  • Post - If AI is plagiarising art and design, then so is every human artist and designer in the world (19.06.2023)
Ai funding:
  • Post - Amazon’s generative AI playground is open (article from 23.06.2023)
  • Post - France makes high-profile push to be the A.I. hub of Europe setting up challenge to U.S., China (article from 18.06.2023)
  • Post - Anthropic Raises $450 Million in Series C Funding to Scale Reliable AI Products (article from 23.05.2023)
Ai research:
  • Post - Meet TRACE: A New AI Approach for Accurate 3D Human Pose and Shape Estimation with Global Coordinate Tracking (paper from 5.06.2023)
  • Post - AudioPaLM: A Large Language Model That Can Speak and Listen (paper from 22.06.2023)
  • Post - Deepmind Researchers Open-Source TAPIR: A New AI Model for Tracking Any Point (TAP) that Effectively Tracks a Query Point in a Video Sequence (paper from 14.06.2023)
  • Post - Fast Segment Anything (40ms/image) (paper from 21.06.2023)
  • Post - AI Will Eat Itself? This AI Paper Introduces A Phenomenon Called Model Collapse That Refers To A Degenerative Learning Process Where Models Start Forgetting Improbable Events Over Time. (paper from 27.05.2023)
  • Post - This AI Paper Proposes A Latent Diffusion Model For 3D (LDM3D) That Generates Both Image And Depth Map Data From A Given Text Prompt (paper from 18.05.2023)
  • Post - Stanford Researchers Introduce Sophia: A Scalable Second-Order Optimizer For Language Model Pre-Training (paper from 23.05.2023)
  • Post - Say Goodbye to Costly Auto-GPT and LangChain Runs: Meet ReWOO – The Game-Changing Modular Paradigm that Cuts Token Consumption by Detaching Reasoning from External Observations (paper from 23.05.2023)
  • Post - A New AI Research Introduces Recognize Anything Model (RAM): A Robust Base Model For Image Tagging (paper from 6.06.2023)
  • Post - Researchers from Harvard Introduce Inference-Time Intervention (ITI): An AI Technique that Improves the Truthfulness of Language Models from 32.5% to 65.1% (paper from 6.06.2023)
  • Post - 3D Pose and Tracking for Human Action Recognition (paper from 3.04.2023)
  • Post - Deepmind’s new AI agent learns 26 games in two hours (article from 19.06.2023)
  • Post - META’s SAM (Segment anything model) is improving (image from 6.06.2023)
  • Post - Textbooks Are All You Need. 1.3B LLM trained on 51B tokens hits 51% on HumanEval. (paper from 20.06.2023)
  • Post - introduce Infinigen, a procedural generator of photorealistic 3D scenes of the natural world. Infinigen is entirely procedural: every asset, from shape to texture, is generated from scratch via randomized mathematical rules, using no external source and allowing infinite variation and composition. Infinigen offers broad coverage of objects and scenes in the natural world including plants, animals, terrains, and natural phenomena such as fire, cloud, rain, and snow. Infinigen can be used to generate unlimited, diverse training data for a wide range of computer vision tasks including object detection, semantic segmentation, optical flow, and 3D reconstruction. We expect Infinigen to be a useful resource for computer vision research and beyond. (paper from 15.06.2023)
  • Post - Revolutionizing AI Efficiency: UC Berkeley’s SqueezeLLM Debuts Dense-and-Sparse Quantization, Marrying Quality and Speed in Large Language Model Serving (paper from 13.06.2023)
  • Post - I-JEPA: The first AI model based on Yann LeCun’s vision for more human-like AI (blogpost from 13.06.2023)
  • Post - Tree of Thoughts: Deliberate Problem Solving with Large Language Models (17.05.2023)
  • Post - Full title: Voyager: An Open-Ended Embodied Agent with Large Language Models - “the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention” (25.05.2023 article)
  • Post - Apple Researchers Introduce ByteFormer: An AI Model That Consumes Only Bytes And Does Not Explicitly Model The Input Modality (9.06.2023 article)
  • Post - Revolutionizing AI Efficiency: Meta AI’s New Approach, READ, Cuts Memory Consumption by 56% and GPU Use by 84% (24.05.2023 paper)
On ai's intelligence:
  • Post - Sparks of AGI: early experiments with GPT-4 (youtube video from 6.04.2023)
  • Post - Geoffrey Hinton on if LLM “understands” what they’re saying (5.06.2023 yt vid)
Generative videos:
  • Post - zeroscope_v2_XL: a new open source 1024x576 video model designed to take on Gen-2 (youtube video from 24.06.2023)
Generative images:
  • Post - Midjourney V5.2 released (article from 23.06.2023)
  • Post - Microsoft: Bing Image Creator Will See Big Improvements In A Month (article from 21.06.2023)
  • Post - Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold (18.05.2023 announcement)
Generative sound/speech:
  • Post - Introducing Voicebox: The first generative AI model for speech to generalize across tasks with state-of-the-art performance (blogpost from 16.06.2023)
  • Post - Meta releases new SOTA text to music model MusicGen. Demonstrated samples are better than existing models including Google’s MusicLM (8.06.2023 paper)
Ai regulation:
  • Post - Stanford grades Leading LLMs’ Compliance with the Draft EU AI Act (reddit repost from 24.06.2023)
  • Post - OpenAI has been publicly asking for AI regulation, but in the meanwhile lobbied its way to loose it for itself (article from 20.06.2023)
  • Post - Biden to meet with A.I. experts in San Francisco to discuss how to regulate the field (article from 20.06.2023)
Ai products/tools:
  • Post - We made a comprehensive list of popular AI agents out there (reddit repost from 23.06.2023)
  • Post - Creating what I always wanted from the singularity: Alpha version of a AI Librarian/Analyst that finds the best voices and most relevant articles, podcasts, and videos for a topic and gives me a synthesis (reddit repost from 23.06.2023)
  • Post - Bing Chat Tests Innovative Image Recognition Feature: Currently Available for “5% of Searches” (article from 15.06.2023)
Large Language Models:
  • Post - MoasicML open sources new 8k context length MPT-30B language model under Apache 2.0 license (blogpost from 22.06.2023)
  • Post - Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding (reddit repost from 19.06.2023)
Ai experiments and potential use cases:
  • Post - ilumine AI turn 2D image into 3D scene (tweet from 24.06.2023)
  • Post - VTM: Bloodlines AI remaster ( TemporalKit v1.3 ) (reddit repost from 28.04.2023)
  • Post - Spirtual voice chat with a ChatGPT-driven monk in VR (2 month old reddit repost (I couldn't find the reddit post to get the absolute date))
  • Post - ChatGPT in Skyrim VR with lip synced voice generation (reddit repost from 26.04.2023)
  • Post - Created an AI Basketball Referee. How will AI change sports? (reddit repost from 1.06.2023)
  • Post - A new version of the self coding voice assistant i showed yesterday, this time with a more complex command (reddit repost from 24.05.2023)
Ai use cases:
  • Post - Trying out the new generative fill feature in Photoshop Beta (reddit repost from 23.05.2023)
  • Post - Using midjourney, Photoshop (beta) and After Effects to create chill pixelart animation (with link to breakdown) (article from 24.05.2023)
Adapting ai into society:
  • Post - An artificial intelligence system based on ChatGPT technology will manage the handling of 112 calls at peak times. The use of this technology is planned to start in 2025. (article from 20.06.2023)
  • Post - Marvel used AI to create opening intro for their new series: Secret Invasion (article from 22.06.2023)
  • Post - A robot🤖 took my order at checkers today. They’re coming for our Wendy’s 👩‍🦰👩‍🦰 jobs next (reddit repost from 16.05.2023)
Ai companies:
  • Post - Six more companies competing with OpenAI (article from 22.06.2023)
Ai and science:
  • Post - Stanford Institute for Human-Centered AI- Demis Hassabis: Using AI To Accelerate Scientific Discovery (event/talk from 28.04.2023)
  • Post - To accelerate search for an Alzheimer’s cure, scientists use artificial intelligence to identify likely drug targets (article from 15.05.2023)
  • Post - “We report a model that can go from natural language instructions, to robot actions, to synthesized molecule with an LLM. We synthesized catalysts, a novel dye, and insect repell… (paper from 11.04.2023)
  • Post - Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity (paper from 19.06.2023)
  • Post - Illumina Launches Genomic Sequencing AI (4.06.2023 reddit repost)
Ai and robotics:
  • Post - Introducing VRB: Use large-scale human videos to train a general-purpose affordance model to jumpstart any robotics paradigm! (tweet from 13.06.2023)
  • Post - Finished my PhD researching “self-aware AI 3D printers” at Cambridge! (reddit repost from 21.06.2023)
  • Post - RoboCat: A self-improving robotic agent [Google Deepmind] (20.06.2023)
  • Post - Berkeley researcher deploys robots and AI to increase pace of research by 100 times (article from 23.04.2023)
  • Post - Figure raises $70M to build its humanoid robots (23.05.2023 article)
  • Post - Tesla Bot has got our competitive juices flowing, says Boston Dynamics CEO (22.05.2023 article)
Misc:
  • Post - Neural Networks Need Data to Learn. Even If It’s Fake. (article from 16.06.2023)
  • Post - Bing vs ChatGPT vs Bard vs C.ai vs PH (Worldwide Engagement) (image from 22.05.2023)
  • Post - A bot on the side: is it adultery if you cheat with an AI companion? (article from 15.06.2023)
  • Post - Star Trek The Next Generation s02e09 on sentience (aired feb 11, 1989)
Discussions on lemmy:
  • Post - Minute of optimism: chatbots help me working on my communication skills and boost the quality of my social life. (reddit repost from 2.06.2023)
  • Post - A personal AI assistant that knows you and your needs better everyday (reddit repost from 19.05.2023)

Technology related

Quantum computers:
  • Post - An IBM Quantum Computer Beat a Supercomputer in a Benchmark Test (article from 20.06.2023)
Misc:
  • Post - SpaceX successfully launches world’s first “space factory” (article from 18.06.2023)
  • Post - Scientists Successfully Transmit Space-Based Solar Power to Earth for the First Time (2.06.2023 article)
  • Post - Scientists create synthetic human embryos using stem cells in major breakthrough (15.06.2023 article)

Universal Basic Income

  • Post - Are guaranteed-income programs working? (article from 16.06.2023)
  • Post - Five proven benefits of Universal Basic Income (article from 11.06.2023)
  • Post - How universal basic income’s impact on people’s finances could transform the nation’s health (article from 12.06.2023)
Ai related
  • Post - 'Labour leader Keir Starmer says he is "not attracted to the idea of universal basic income" in response to advances in AI, and the focus should be on skills and retraining' (13.06.2023 tweet)
4
 
 

In human conversations, individuals can indicate relevant regions within a scene while addressing others. In turn, the other person can then respond by referring to specific regions if necessary. This natural referential ability in dialogue remains absent in current Multimodal Large Language Models (MLLMs). To fill this gap, this paper proposes an MLLM called Shikra, which can handle spatial coordinate inputs and outputs in natural language. Its architecture consists of a vision encoder, an alignment layer, and a LLM. It is designed to be straightforward and simple, without the need for extra vocabularies, position encoder, pre-/post-detection modules, or external plug-in models. All inputs and outputs are in natural language form. Referential dialogue is a superset of various vision-language (VL) tasks. Shikra can naturally handle location-related tasks like REC and PointQA, as well as conventional VL tasks such as Image Captioning and VQA. Experimental results showcase Shikra's promising performance. Furthermore, it enables numerous exciting applications, like providing mentioned objects' coordinates in chains of thoughts and comparing user-pointed regions similarities. Our code, model and dataset are accessed at this https URL.

5
 
 

Amazon CEO Andy Jassy called generative A.I. “one of the biggest technical transformations of our lifetimes” in an interview with CNBC on Thursday. He also called many of today’s A.I. chatbots and other generative A.I. tools part of the “hype cycle,” declaring that Amazon was focused on the “substance cycle.”

Amazon’s bona fides in the space are well established, having been a player in artificial intelligence and machine learning long before the ChatGPTs and Bards of the world were publicly released. Former Fortune editor Brian Dumaine wrote a book in 2020 about how Amazon founder Jeff Bezos realized early on that imbuing machine learning into every facet of the company would allow it to gather data to constantly improve itself.

Much as it did with Amazon Web Services, which practically birthed the cloud computing industry that now powers the internet’s biggest companies, including its competitors, Amazon’s A.I. strategy is focused on cementing its position as a major player across the entirety of the A.I. supply chain.

“Every single business unit inside of Amazon is working intensely and very broadly on generative A.I.,” Jassy says.

Jassy shed some light on Amazon’s A.I. game plan, outlining three macro layers: the computing capabilities, the underlying models, and what Jassy refers to as the “application layer,” for example, ChatGPT or Bard.

6
 
 

For 3D object manipulation, methods that build an explicit 3D representation perform better than those relying only on camera images. But using explicit 3D representations like voxels comes at large computing cost, adversely affecting scalability. In this work, we propose RVT, a multi-view transformer for 3D manipulation that is both scalable and accurate. Some key features of RVT are an attention mechanism to aggregate information across views and re-rendering of the camera input from virtual views around the robot workspace. In simulations, we find that a single RVT model works well across 18 RLBench tasks with 249 task variations, achieving 26% higher relative success than the existing state-of-the-art method (PerAct). It also trains 36X faster than PerAct for achieving the same performance and achieves 2.3X the inference speed of PerAct. Further, RVT can perform a variety of manipulation tasks in the real world with just a few (∼10) demonstrations per task. Visual results, code, and trained model are provided at this https URL.

7
 
 

Covid-19 is said to cause long-term side effects in up to 67% of patients, and these health consequences can include chronic fatigue, loss of taste and smell and brain fog. Increasingly common too is Covid-related hair loss. Known as telogen effluvium, this phenomenon manifests as clumps of hair falling out after brushing or washing your hair.

It’s normal to shed hair daily – we lose about 100-150 hairs each day as hair drops from follicles to make way for new hair growth. This growth cycle occurs because 90% of the hair on our heads is in a growth phase (called anagen), while the remaining 10% is in a resting phase (called telogen). Anagen lasts for about three years before transitioning into the shorter telogen phase, following which hair is shed.

A stressful event like childbirth, certain medications, intense psychological stress and Covid-19 can trigger our bodies to shift a greater-than-normal proportion of growing anagen hairs into a resting telogen state, according to the University of Utah.

“Covid-related hair loss can affect up to 33% of symptomatic patients and 10% of asymptomatic patients,” says a plastic surgeon who deals with hair loss patients. “And this kind of hair loss seems to be different from that induced by stress or disease as cytokines (substances secreted by the body’s immune system) appear to cause direct damage to hair follicles,” she adds.

Covid-induced hair loss has also been reported to start earlier after the stressful event – in two months instead of the usual three.

8
 
 

Recent work suggests that interpolating between the weights of two specialized language models can transfer knowledge between tasks in a way that multi-task learning cannot. However, very few have explored interpolation between more than two models, where each has a distinct knowledge base. In this paper, we introduce Derivative Free Weight-space Ensembling (DFWE), a new few-sample task transfer approach for open-domain dialogue. Our framework creates a set of diverse expert language models trained using a predefined set of source tasks. Next, we finetune each of the expert models on the target task, approaching the target task from several distinct knowledge bases. Finally, we linearly interpolate between the model weights using a gradient-free-optimization algorithm, to efficiently find a good interpolation weighting. We demonstrate the effectiveness of the method on FETA-Friends outperforming the standard pretrain-finetune approach.

9
 
 

The idea is simple - Specify what you want to research, and the AI will autonomously research it for you in minutes!

▸ One prompt generates an unbiased, factual and in depth research report

▸ Generate research, outlines, resource and lessons reports

▸ Aggregates over 20 web sources per research

▸ Includes an easy to use web interface

▸ Open source: https://github.com/assafelovic/gpt-researcher

▸ Scrapes web sources with javascript support

▸ Keeps track and context of visited and used web sources

10
 
 

Abstract:

Since the first laser was invented, the pursuit of high-energy lasers (HELs) has always been enthusiastic. The first revolution of HELs was pushed by the fusion of laser and aerospace in the 1960s, with the chemical rocket engines giving fresh impetus to the birth of gas flow and chemical lasers, which finally turned megawatt lasers from dream into reality. Nowadays, the development of HELs has entered the age of electricity as well as the rocket engines. The properties of current electric rocket engines are highly consistent with HELs’ goals, including electrical driving, effective heat dissipation, little medium consumption and extremely light weight and size, which inspired a second fusion of laser and aerospace and motivated the exploration for potential HELs. As an exploratory attempt, a new configuration of diode pumped metastable rare gas laser was demonstrated, with the gain generator resembling an electric rocket-engine for improved power scaling ability.

11
12
 
 

Original title: Focused Transformer: Contrastive Training for Context Scaling

Large language models have an exceptional capability to incorporate new information in a contextual manner. However, the full potential of such an approach is often restrained due to a limitation in the effective context length. One solution to this issue is to endow an attention layer with access to an external memory, which comprises of (key, value) pairs. Yet, as the number of documents increases, the proportion of relevant keys to irrelevant ones decreases, leading the model to focus more on the irrelevant keys. We identify a significant challenge, dubbed the distraction issue, where keys linked to different semantic values might overlap, making them hard to distinguish. To tackle this problem, we introduce the Focused Transformer (FoT), a technique that employs a training process inspired by contrastive learning. This novel approach enhances the structure of the (key, value) space, enabling an extension of the context length. Our method allows for fine-tuning pre-existing, large-scale models to lengthen their effective context. This is demonstrated by our fine-tuning of 3B and 7B OpenLLaMA checkpoints. The resulting models, which we name LongLLaMA, exhibit advancements in tasks requiring a long context. We further illustrate that our LongLLaMA models adeptly manage a 256k context length for passkey retrieval.

13
14
 
 

The digital plane we traverse is volatile and ever-changing, evolving at a pace that leaves the old world gasping for breath. One of the newest entrants in the tech theatre is the use of AI in content creation, specifically television. Fable, a San Fransisco-based start-up, is pioneering this bold frontier. Their brainchild, aptly named the Showrunner AI technology, or ‘SHOW-1’, aims to generate TV shows with viewers as starring roles.

15
 
 

I just copy/pasted what's in the link so formatting may be broken:

GPT-4's details are leaked.

It is over.

Everything is here: twitter.com/i/web/status/1… Parameters count:

GPT-4 is more than 10x the size of GPT-3. We believe it has a total of ~1.8 trillion parameters across 120 layers. Mixture Of Experts - Confirmed.

OpenAI was able to keep costs reasonable by utilizing a mixture of experts (MoE) model. They utilizes 16 experts within their model, each is about ~111B parameters for MLP. 2 of these experts are routed to per forward pass. MoE Routing:

While the literature talks a lot about advanced routing algorithms for choosing which experts to route each token to, OpenAI’s is allegedly quite simple, for the current GPT-4 model.

There roughly ~55B shared parameters for attention. Inference:

Each forward pass inference (generation of 1 token) only utilizes ~280B parameters and ~560 TFLOPs. This contrasts with the ~1.8 trillion parameters and ~3,700 TFLOP that would be required per forward pass of a purely dense model. Dataset:

GPT-4 is trained on ~13T tokens.

These are not unique tokens, they count the epochs as more tokens as well.

Epoch number: 2 epochs for text-based data and 4 for code-based data.

There is millions of rows of instruction fine-tuning data from ScaleAI & internally. GPT-4 32K

There was an 8k context length (seqlen) for the pre-training phase. The 32k seqlen version of GPT-4 is based on fine-tuning of the 8k after the pre-training. Batch Size:

The batch size was gradually ramped up over a number of days on the cluster, but by the end, OpenAI was using a batch size of 60 million! This, of course, is “only” a batch size of 7.5 million tokens per expert due to not every expert seeing all tokens. For the real batch size: Divide this number by the seq len to get the real batch size. just stop with this misleading numbers already. Parallelism Strategies

To parallelize across all their A100s GPUs They utilized 8-way tensor parallelism as that is the limit for NVLink.

Beyond that, they are using 15-way pipeline parallelism.

(likely used ZeRo Stage 1. It is possible they used block-level FSDP) Training Cost

OpenAI’s training FLOPS for GPT-4 is ~2.15e25, on ~25,000 A100s for 90 to 100 days at about 32% to 36% MFU.

Part of this extremely low utilization is due to an absurd number of failures requiring checkpoints that needed to be restarted from. If their cost in the cloud was about $1 per A100 hour, the training costs for this run alone would be about $63 million.

(Today, the pre-training could be done with ~8,192 H100 in ~55 days for $21.5 million at $2 per H100 hour.) Mixture of Expert Tradeoffs

There are multiple MoE tradeoffs taken: For example, MoE is incredibly difficult to deal with on inference because not every part of the model is utilized on every token generation. This means parts may sit dormant when other parts are being used. When serving users, this really hurts utilization rates.

Researchers have shown that using 64 to 128 experts achieves better loss than 16 experts, but that’s purely research. There are multiple reasons to go with fewer experts. One reason for OpenAI choosing 16 experts is because more experts are difficult to generalize at many tasks. More experts can also be more difficult to achieve convergence with. With such a large training run, OpenAI instead chose to be more conservative on the number of experts. GPT-4 Inference Cost

GPT-4 costs 3x that of the 175B parameter Davinchi. This is largely due to the larger clusters required for GPT-4 and much lower utilization achieved. AN estimate of it's costs is $0.0049 cents per 1k tokens for 128 A100s to inference GPT-4 8k seqlen and $0.0021 cents per 1k tokens for 128 H100’s to inference GPT-4 8k seqlen. It should be noted, we assume decent high utilization, and keeping batch sizes high. Multi-Query Attention

OpenAI are using MQA just like everybody else. Because of that only 1 head is needed and memory capacity can be significantly reduced for the KV cache. Even then, the 32k seqlen GPT-4 definitely cannot run on 40GB A100s, and the 8k is capped on max bsz. Continuous batching

OpenAI implements both variable batch sizes and continuous batching. This is so as to allow some level of maximum latency as well optimizing the inference costs. Vision Multi-Modal

It is a separate vision encoder from the text encoder, with cross-attention. The architecture is similar to Flamingo. This adds more parameters on top of the 1.8T of GPT-4. It is fine-tuned with another ~2 trillion tokens, after the text only pre-training. On the vision model, OpenAI wanted to train it from scratch, but it wasn’t mature enough, so they wanted to derisk it by starting with text. One of the primary purposes of this vision capability is for autonomous agents able to read web pages and transcribe what’s in images and video. Some of the data they train on is joint data (rendered LaTeX/text), screen shots of web page, youtube videos: sampling frames, and run Whisper around it to get transcript.

[Dont want to say "I told you so" but..] Speculative Decoding

OpenAI might be using speculative decoding on GPT-4's inference. (not sure 100%)

The idea is to use a smaller faster model to decode several tokens in advance, and then feeds them into a large oracle model as a single batch. If the small model was right about its predictions – the larger model agrees and we can decode several tokens in a single batch. But if the larger model rejects the tokens predicted by the draft model then the rest of the batch is discarded. And we continue with the larger model. The conspiracy theory that the new GPT-4 quality had been deteriorated might be simply because they are letting the oracle model accept lower probability sequences from the speculative decoding model. Inference Architecture

The inference runs on a cluster of 128 GPUs.

There are multiple of these clusters in multiple datacenters in different locations.

It is done in 8-way tensor parallelism and 16-way pipeline parallelism.

Each node of 8 GPUs has only ~130B parameters, or… twitter.com/i/web/status/1… The model has 120, so it fits in 15 different nodes. [Possibly the there are less layers on the first node since it needs to also compute the embeddings] According to these numbers: OpenAI should have trained on 2x the tokens if they were trying to go by chinchilla's optimal.

[let alone surpass it like we do]

This goes to show that they are struggling to get high quality data. Why no FSDP?

A possible reason for this could be that some of the hardware infra they secured is of an older generation.

This is pretty common at local compute clusters as the organisation usually upgrade the infra in several "waves" to avoid a complete pause of operation.… twitter.com/i/web/status/1… Dataset Mixture

They trained on 13T tokens. CommonCrawl & RefinedWeb are both 5T.

Remove the duplication of tokens from multiple epochs and we get to a much reasonable number of "unaccounted for" tokens: The "secret" data. Which by this point we already get rumors that parts of it came from twitter, reddit & youtube.

[Rumors that start to become lawsuits]

Some speculations are:

  • LibGen (4M+ books)
  • Sci-Hub (80M+ papers)
  • All of GitHub

My own opinion:

The missing dataset it a custom dataset of college textbooks collected by hand for as much courses as possible.

This is very easy to convert to txt file and than with self-instruct into instruction form. This creates the "illusion" that GPT-4 "is smart" no matter who use it.

Computer scientist? sure! it can help you with your questions about P!=NP Philosophy major? It can totally talk to you about epistemology.

Don't you see? It was trained on the textbooks. It is so obvious. There are also papers that try to extract by force memorized parts of books from GPT-4 to understand what it trained on.

There are some books it knows so well that it had seen them for sure.

Moreover, If i remember correctly: It even know the unique ids of project Euler exes.

16
17
 
 

Abstract:

We observe that pre-trained large language models (LLMs) are capable of autoregressively completing complex token sequences -- from arbitrary ones procedurally generated by probabilistic context-free grammars (PCFG), to more rich spatial patterns found in the Abstract Reasoning Corpus (ARC), a general AI benchmark, prompted in the style of ASCII art. Surprisingly, pattern completion proficiency can be partially retained even when the sequences are expressed using tokens randomly sampled from the vocabulary. These results suggest that without any additional training, LLMs can serve as general sequence modelers, driven by in-context learning. In this work, we investigate how these zero-shot capabilities may be applied to problems in robotics -- from extrapolating sequences of numbers that represent states over time to complete simple motions, to least-to-most prompting of reward-conditioned trajectories that can discover and represent closed-loop policies (e.g., a stabilizing controller for CartPole). While difficult to deploy today for real systems due to latency, context size limitations, and compute costs, the approach of using LLMs to drive low-level control may provide an exciting glimpse into how the patterns among words could be transferred to actions.

18
19
20
21
 
 

cross-posted from: https://lemmy.world/post/1750098

Introducing Llama 2 - Meta's Next Generation Free Open-Source Artificially Intelligent Large Language Model

Llama 2

It's incredible it's already here! This is great news for everyone in free open-source artificial intelligence.

Llama 2 unleashes Meta's (previously) closed model (Llama) to become free open-source AI, accelerating access and development for large language models (LLMs).

This marks a significant step in machine learning and deep learning technologies. With this move, a widely supported LLM can become a viable choice for businesses, developers, and entrepreneurs to innovate our future using a model that the community has been eagerly awaiting since its initial leak earlier this year.

Here are some highlights from the official Meta AI announcement:

Llama 2

In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases.

Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closedsource models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.

Llama 2 pretrained models are trained on 2 trillion tokens, and have double the context length than Llama 1. Its fine-tuned models have been trained on over 1 million human annotations.

Inside the Model

With each model download you'll receive:

  • Model code
  • Model Weights
  • README (User Guide)
  • Responsible Use Guide
  • License
  • Acceptable Use Policy
  • Model Card

Benchmarks

Llama 2 outperforms other open source language models on many external benchmarks, including reasoning, coding, proficiency, and knowledge tests. It was pretrained on publicly available online data sources. The fine-tuned model, Llama-2-chat, leverages publicly available instruction datasets and over 1 million human annotations.

RLHF & Training

Llama-2-chat uses reinforcement learning from human feedback to ensure safety and helpfulness. Training Llama-2-chat: Llama 2 is pretrained using publicly available online data. An initial version of Llama-2-chat is then created through the use of supervised fine-tuning. Next, Llama-2-chat is iteratively refined using Reinforcement Learning from Human Feedback (RLHF), which includes rejection sampling and proximal policy optimization (PPO).

The License

Our model and weights are licensed for both researchers and commercial entities, upholding the principles of openness. Our mission is to empower individuals, and industry through this opportunity, while fostering an environment of discovery and ethical AI advancements.

Partnerships

We have a broad range of supporters around the world who believe in our open approach to today’s AI — companies that have given early feedback and are excited to build with Llama 2, cloud providers that will include the model as part of their offering to customers, researchers committed to doing research with the model, and people across tech, academia, and policy who see the benefits of Llama and an open platform as we do.

The/CUT

With the release of Llama 2, Meta has opened up new possibilities for the development and application of large language models. This free open-source AI not only accelerates access but also allows for greater innovation in the field.

Take Three:

  • Video Game Analogy: Just like getting a powerful, rare (or previously banned) item drop in a game, Llama 2's release gives developers a powerful tool they can use and customize for their unique quests in the world of AI.
  • Cooking Analogy: Imagine if a world-class chef decided to share their secret recipe with everyone. That's Llama 2, a secret recipe now open for all to use, adapt, and improve upon in the kitchen of AI development.
  • Construction Analogy: Llama 2 is like a top-grade construction tool now available to all builders. It opens up new possibilities for constructing advanced AI structures that were previously hard to achieve.

Links

Here are the key resources discussed in this post:

Want to get started with free open-source artificial intelligence, but don't know where to begin?

Try starting here:

If you found anything else about this post interesting - consider subscribing to [email protected] where I do my best to keep you in the know about the most important updates in free open-source artificial intelligence.

This particular announcement is exciting to me because it may popularize open-source principles and practices for other enterprises and corporations to follow.

We should see some interesting models emerge out of Llama 2. I for one am looking forward to seeing where this will take us next. Get ready for another wave of innovation! This one is going to be big.

22
23
24
 
 

Significance

Adaptive agents must continually satisfy a range of distinct and possibly conflicting needs. In most models of learning, a monolithic agent tries to maximize one value that measures how well it balances its needs. However, this task is difficult when the world is changing and needs are many. Here, we considered an agent as a collection of modules, each dedicated to a particular need and competing for control of action. Compared to the standard monolithic approach, modular agents were much better at maintaining homeostasis of a set of internal variables in simulated environments, both static and changing. These results suggest that having “multiple selves” may represent an evolved solution to the universal problem of balancing multiple needs in changing environments.

Abstract

Satisfying a variety of conflicting needs in a changing environment is a fundamental challenge for any adaptive agent. Here, we show that designing an agent in a modular fashion as a collection of subagents, each dedicated to a separate need, powerfully enhanced the agent’s capacity to satisfy its overall needs. We used the formalism of deep reinforcement learning to investigate a biologically relevant multiobjective task: continually maintaining homeostasis of a set of physiologic variables. We then conducted simulations in a variety of environments and compared how modular agents performed relative to standard monolithic agents (i.e., agents that aimed to satisfy all needs in an integrated manner using a single aggregate measure of success). Simulations revealed that modular agents a) exhibited a form of exploration that was intrinsic and emergent rather than extrinsically imposed; b) were robust to changes in nonstationary environments, and c) scaled gracefully in their ability to maintain homeostasis as the number of conflicting objectives increased. Supporting analysis suggested that the robustness to changing environments and increasing numbers of needs were due to intrinsic exploration and efficiency of representation afforded by the modular architecture. These results suggest that the normative principles by which agents have adapted to complex changing environments may also explain why humans have long been described as consisting of “multiple selves.”

25
 
 

PIKA LABS site: https://www.pika.art/demo

view more: next ›